This builds upon the extant: https://bugzilla.kernel.org/show_bug.cgi?id=195039 (where I've commented in the past). I have a PM951 1TB drive (this is the uncommon larger variant to the more usual 0.5TB drive) in a Dell XPS 9550. On kernel 4.19.0 I suffer read-only failures within an hour. Disabling APST solves the problem. I had been commenting over on the Ubuntu Launchpad: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816 (my new report) https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 (the original report by someone else, solved, for the more common 0.5TB drive) `apport` suggests I need to post the bug here instead. The bugzilla link in the first line above is for the PM951 NVMe SAMSUNG 512GB, I have the 1TB equivalent. If I disable APST in GRUB then I get no read-only failures. If I use a reduced power-saving APST option (I tried nvme_core.default_ps_max_latency_us=250 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816) then the machine lives a little longer before suffering the same read-only fate. I believe that a quirk needs to be added for my NVMe drive. Details copied over: $ sudo nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 S2FZNYAG801690 PM951 NVMe SAMSUNG 1024GB 1 314.10 GB / 1.02 TB 512 B + 0 B BXV76D0Q $ uname -a Linux ian-XPS-15-9550 4.19.0-041900-generic #201810221809 SMP Mon Oct 22 22:11:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux $ lsb_release -rd Description: Linux Mint 19 Tara Release: 19 I am very happy to add more info - please let me know what you want. Regards, Ian.
sudo nvme id-ctrl /dev/nvme0 NVME Identify Controller: vid : 0x144d ssvid : 0x144d sn : S2FZNYAG801690 mn : PM951 NVMe SAMSUNG 1024GB fr : BXV76D0Q rab : 2 ieee : 002538 cmic : 0 mdts : 5 cntlid : 1 ver : 0 rtd3r : 0 rtd3e : 0 oaes : 0 ctratt : 0 oacs : 0x17 acl : 7 aerl : 3 frmw : 0x6 lpa : 0 elpe : 63 npss : 4 avscc : 0x1 apsta : 0x1 wctemp : 0 cctemp : 0 mtfa : 0 hmpre : 0 hmmin : 0 tnvmcap : 0 unvmcap : 0 rpmbs : 0 edstt : 35 dsto : 0 fwug : 0 kas : 0 hctma : 0 mntmt : 0 mxtmt : 0 sanicap : 0 hmminds : 0 hmmaxd : 0 sqes : 0x66 cqes : 0x44 maxcmd : 0 nn : 1 oncs : 0x1f fuses : 0 fna : 0 vwc : 0x1 awun : 255 awupf : 0 nvscc : 1 acwu : 0 sgls : 0 subnqn : ioccsz : 0 iorcsz : 0 icdoff : 0 ctrattr : 0 msdbd : 0 ps 0 : mp:6.00W operational enlat:5 exlat:5 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:4.20W operational enlat:30 exlat:30 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:3.10W operational enlat:100 exlat:100 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.0700W non-operational enlat:500 exlat:5000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0050W non-operational enlat:2000 exlat:22000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:-
$ cat /proc/cmdline BOOT_IMAGE=/vmlinuz-4.19.0-041900-generic root=/dev/mapper/mint--vg-root ro quiet splash nvme_core.default_ps_max_latency_us=0 vt.handoff=1
Historically I had to stay on 4.9.91. Going >91 caused other problems (e.g. regressions with my Intel WiFi). Going > 4.9 had other issues, generally not booting. Having had to reinstall my home folder (due to Dropbox's requirement to move from encrypted home to whole-disk-encryption) I've taken the opportunity to upgrade everything afresh. Hoping this isn't a pain, Ian.
*Very* annoyingly I've had my first read-only failure just now, whilst (as best I know) APST was disabled. This morning I had a fresh boot and whilst I didn't confirm that APST was disabled, I have no reason to believe it wasn't disabled. I have a script that checks for me, on this boot it shows: $ more get_apste sudo nvme get-feature -f 0x0c -H /dev/nvme0 | grep APSTE $ ./get_apste Autonomous Power State Transition Enable (APSTE): Disabled Here's a snippet of journalctl at the point of failure - I see no relevant logs. I was using the machine maybe 10 minutes prior to this, and it had been on (lightly used) since the morning. I spotted that the machine had gone read-only at 17:47 and did a hard reboot (5 seconds on the power key): Nov 29 17:23:28 ian-XPS-15-9550 org.x.reader.Daemon[1296]: UnregisterDocument URI 'file:///home/ian/workspace/clients/Hiring/2018_11_29%20Robin%20Cole%20CV%208-9-2018.pdf' Nov 29 17:26:25 ian-XPS-15-9550 NetworkManager[994]: <info> [1543512385.6068] manager: NetworkManager state is now CONNECTED_GLOBAL Nov 29 17:26:25 ian-XPS-15-9550 dbus-daemon[938]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.11' (uid=0 pid=994 comm="/usr/sbin/NetworkM anager --no-daemon ") Nov 29 17:26:25 ian-XPS-15-9550 systemd[1]: Starting Network Manager Script Dispatcher Service... Nov 29 17:26:25 ian-XPS-15-9550 dbus-daemon[938]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Nov 29 17:26:25 ian-XPS-15-9550 systemd[1]: Started Network Manager Script Dispatcher Service. Nov 29 17:26:25 ian-XPS-15-9550 nm-dispatcher[10640]: req:1 'connectivity-change': new request (1 scripts) Nov 29 17:26:25 ian-XPS-15-9550 nm-dispatcher[10640]: req:1 'connectivity-change': start running ordered scripts... -- Reboot -- Nov 29 17:47:43 ian-XPS-15-9550 kernel: microcode: microcode updated early to revision 0xc6, date = 2018-04-17 Nov 29 17:47:43 ian-XPS-15-9550 kernel: Linux version 4.19.0-041900-generic (kernel@tangerine) (gcc version 8.2.0 (Ubuntu 8.2.0-7ubuntu1)) #201810221809 SMP Mon Oct 22 22:11:45 UTC 2018 Nov 29 17:47:43 ian-XPS-15-9550 kernel: Command line: BOOT_IMAGE=/vmlinuz-4.19.0-041900-generic root=/dev/mapper/mint--vg-root ro quiet splash nvme_core.default_ps_max_latency_us=0 vt.handoff=1 This is the first read-only failure I've had since I've disabled APST. I'm now strongly considering reverting to 4.9.91. Any ideas would be very happily received.
Having had that read only failure I clean booted and the same machine, with no other modifications, has been running fine for 2 days (no sleeping, so 48+ hours on). It is possible that whilst trying to file this and the Launchpad bug that I ran a command that disabled the APST - I'd be surprised if I did that but I can't rule it out and the read-only failure is coincidental with the bug filing.
I'll note that I've now returned to 4.9.91 which was my last-good kernel from a couple of months back. Following up on #5 I had a second read-only failure despite having APST disabled. In this case I'd upgraded to 4.19.7 (up from 4.19.0 where I had the previous APST-disabled with read-only filesytem failure). I've been using 4.9.91 for a week, it seems to be stable, I get no read-only failures. Maybe this is a quirk of my less-common (but still Dell's standard) 1TB PM951. Perhaps this post will help someone else in the future. Cheers, Ian.
A BIOS upgrade, detailed in https://bugzilla.kernel.org/show_bug.cgi?id=195039 , seems to have solved this issue. Once I know that I this is solved I'll close this issue.