This builds upon the extant: https://bugzilla.kernel.org/show_bug.cgi?id=195039 (where I've commented in the past).
I have a PM951 1TB drive (this is the uncommon larger variant to the more usual 0.5TB drive) in a Dell XPS 9550. On kernel 4.19.0 I suffer read-only failures within an hour. Disabling APST solves the problem.
I had been commenting over on the Ubuntu Launchpad:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816 (my new report)
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 (the original report by someone else, solved, for the more common 0.5TB drive)
`apport` suggests I need to post the bug here instead. The bugzilla link in the first line above is for the PM951 NVMe SAMSUNG 512GB, I have the 1TB equivalent.
If I disable APST in GRUB then I get no read-only failures. If I use a reduced power-saving APST option (I tried nvme_core.default_ps_max_latency_us=250 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816) then the machine lives a little longer before suffering the same read-only fate. I believe that a quirk needs to be added for my NVMe drive.
Details copied over:
$ sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S2FZNYAG801690 PM951 NVMe SAMSUNG 1024GB 1 314.10 GB / 1.02 TB 512 B + 0 B BXV76D0Q
$ uname -a
Linux ian-XPS-15-9550 4.19.0-041900-generic #201810221809 SMP Mon Oct 22 22:11:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -rd
Description: Linux Mint 19 Tara
I am very happy to add more info - please let me know what you want.
sudo nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S2FZNYAG801690
mn : PM951 NVMe SAMSUNG 1024GB
fr : BXV76D0Q
rab : 2
ieee : 002538
cmic : 0
mdts : 5
cntlid : 1
ver : 0
rtd3r : 0
rtd3e : 0
oaes : 0
ctratt : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x6
lpa : 0
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 0
cctemp : 0
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 0
unvmcap : 0
rpmbs : 0
edstt : 35
dsto : 0
fwug : 0
kas : 0
hctma : 0
mntmt : 0
mxtmt : 0
sanicap : 0
hmminds : 0
hmmaxd : 0
sqes : 0x66
cqes : 0x44
maxcmd : 0
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
ioccsz : 0
iorcsz : 0
icdoff : 0
ctrattr : 0
msdbd : 0
ps 0 : mp:6.00W operational enlat:5 exlat:5 rrt:0 rrl:0
rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:4.20W operational enlat:30 exlat:30 rrt:1 rrl:1
rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:3.10W operational enlat:100 exlat:100 rrt:2 rrl:2
rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0700W non-operational enlat:500 exlat:5000 rrt:3 rrl:3
rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2000 exlat:22000 rrt:4 rrl:4
rwt:4 rwl:4 idle_power:- active_power:-
$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-041900-generic root=/dev/mapper/mint--vg-root ro quiet splash nvme_core.default_ps_max_latency_us=0 vt.handoff=1
Historically I had to stay on 4.9.91. Going >91 caused other problems (e.g. regressions with my Intel WiFi). Going > 4.9 had other issues, generally not booting. Having had to reinstall my home folder (due to Dropbox's requirement to move from encrypted home to whole-disk-encryption) I've taken the opportunity to upgrade everything afresh.
Hoping this isn't a pain, Ian.
*Very* annoyingly I've had my first read-only failure just now, whilst (as best I know) APST was disabled. This morning I had a fresh boot and whilst I didn't confirm that APST was disabled, I have no reason to believe it wasn't disabled. I have a script that checks for me, on this boot it shows:
$ more get_apste
sudo nvme get-feature -f 0x0c -H /dev/nvme0 | grep APSTE
Autonomous Power State Transition Enable (APSTE): Disabled
Here's a snippet of journalctl at the point of failure - I see no relevant logs. I was using the machine maybe 10 minutes prior to this, and it had been on (lightly used) since the morning. I spotted that the machine had gone read-only at 17:47 and did a hard reboot (5 seconds on the power key):
Nov 29 17:23:28 ian-XPS-15-9550 org.x.reader.Daemon: UnregisterDocument URI 'file:///home/ian/workspace/clients/Hiring/2018_11_29%20Robin%20Cole%20CV%208-9-2018.pdf'
Nov 29 17:26:25 ian-XPS-15-9550 NetworkManager: <info> [1543512385.6068] manager: NetworkManager state is now CONNECTED_GLOBAL
Nov 29 17:26:25 ian-XPS-15-9550 dbus-daemon: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.11' (uid=0 pid=994 comm="/usr/sbin/NetworkM
anager --no-daemon ")
Nov 29 17:26:25 ian-XPS-15-9550 systemd: Starting Network Manager Script Dispatcher Service...
Nov 29 17:26:25 ian-XPS-15-9550 dbus-daemon: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Nov 29 17:26:25 ian-XPS-15-9550 systemd: Started Network Manager Script Dispatcher Service.
Nov 29 17:26:25 ian-XPS-15-9550 nm-dispatcher: req:1 'connectivity-change': new request (1 scripts)
Nov 29 17:26:25 ian-XPS-15-9550 nm-dispatcher: req:1 'connectivity-change': start running ordered scripts...
-- Reboot --
Nov 29 17:47:43 ian-XPS-15-9550 kernel: microcode: microcode updated early to revision 0xc6, date = 2018-04-17
Nov 29 17:47:43 ian-XPS-15-9550 kernel: Linux version 4.19.0-041900-generic (kernel@tangerine) (gcc version 8.2.0 (Ubuntu 8.2.0-7ubuntu1)) #201810221809 SMP Mon Oct 22 22:11:45 UTC 2018
Nov 29 17:47:43 ian-XPS-15-9550 kernel: Command line: BOOT_IMAGE=/vmlinuz-4.19.0-041900-generic root=/dev/mapper/mint--vg-root ro quiet splash nvme_core.default_ps_max_latency_us=0 vt.handoff=1
This is the first read-only failure I've had since I've disabled APST. I'm now strongly considering reverting to 4.9.91. Any ideas would be very happily received.
Having had that read only failure I clean booted and the same machine, with no other modifications, has been running fine for 2 days (no sleeping, so 48+ hours on).
It is possible that whilst trying to file this and the Launchpad bug that I ran a command that disabled the APST - I'd be surprised if I did that but I can't rule it out and the read-only failure is coincidental with the bug filing.
I'll note that I've now returned to 4.9.91 which was my last-good kernel from a couple of months back.
Following up on #5 I had a second read-only failure despite having APST disabled. In this case I'd upgraded to 4.19.7 (up from 4.19.0 where I had the previous APST-disabled with read-only filesytem failure).
I've been using 4.9.91 for a week, it seems to be stable, I get no read-only failures. Maybe this is a quirk of my less-common (but still Dell's standard) 1TB PM951.
Perhaps this post will help someone else in the future. Cheers, Ian.
A BIOS upgrade, detailed in https://bugzilla.kernel.org/show_bug.cgi?id=195039 , seems to have solved this issue. Once I know that I this is solved I'll close this issue.