Bug 214439

Summary: Kernel Panic during boot - with unsupported namespace (0x6) nvme SSD
Product: IO/Storage Reporter: Stig Nielsen (stig)
Component: NVMeAssignee: IO/NVME Virtual Default Assignee (io_nvme)
Status: RESOLVED IMPLEMENTED    
Severity: normal CC: kbusch
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.13.13-200 Subsystem:
Regression: No Bisected commit-id:
Attachments: Screenshot
Screenshot runlevel 3

Description Stig Nielsen 2021-09-16 19:49:34 UTC
Created attachment 298845 [details]
Screenshot

This problem is observed on Fedora 34 kernel 5.13.13-200. If booting into earlier 5.9.14-200 or removing the nvme SSD, there's no issue. 

From the drive side, it appears that the host is sending an invalid CNS command (0x6) 

>>>> FAIL Boot (kernel 5.13.13-200) log messages
$ journalctl -k -b -1 |grep nvme -i -A2
Sep 14 16:15:37 S65 kernel: nvme nvme0: pci function 0000:07:00.0
Sep 14 16:15:37 S65 kernel: nvme nvme0: Shutdown timeout set to 60 seconds
Sep 14 16:15:37 S65 kernel: input: PC Speaker as /devices/platform/pcspkr/input/input4
Sep 14 16:15:37 S65 kernel: dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.4)

$ journalctl -b -1 |grep nvme -i -A2 -B2
Sep 14 16:15:37 S65 kernel: i2c i2c-0: 6/8 memory slots populated (from DMI)
Sep 14 16:15:37 S65 kernel: i2c i2c-0: Systems with more than 4 memory slots not supported yet, not instantiating SPD
Sep 14 16:15:37 S65 kernel: nvme nvme0: pci function 0000:07:00.0
Sep 14 16:15:37 S65 kernel: nvme nvme0: Shutdown timeout set to 60 seconds
Sep 14 16:15:37 S65 kernel: input: PC Speaker as /devices/platform/pcspkr/input/input4
Sep 14 16:15:37 S65 kernel: dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.4)
--
Sep 14 16:15:46 S65 smartd[3643]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Sep 14 16:15:46 S65 smartd[3643]: Device: /dev/sda [SAT], --capabilites is set, mail notification may not work.
Sep 14 16:15:46 S65 smartd[3643]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Sep 14 16:15:47 S65 avahi-daemon[3612]: Server startup complete. Host name is S65.local. Local service cookie is 2460285571.
Sep 14 16:15:47 S65 systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.



>>>> OK Boot (kernel 5.4.14-200) log messages
$ journalctl -k -b -0 |grep nvme -i -A2
Sep 14 16:18:04 S65 kernel: nvme nvme0: pci function 0000:07:00.0
Sep 14 16:18:04 S65 kernel: nvme nvme0: Shutdown timeout set to 60 seconds
Sep 14 16:18:04 S65 kernel: nvme nvme0: 32/0/0 default/read/poll queues
Sep 14 16:18:04 S65 kernel: input: PC Speaker as /devices/platform/pcspkr/input/input4
Sep 14 16:18:04 S65 kernel: i801_smbus 0000:00:1f.3: enabling device (0000 -> 0003)

$ journalctl -b -0 |grep nvme -i -A2 -B2
Sep 14 16:18:01 S65 kernel: fbcon: Taking over console
Sep 14 16:18:01 S65 kernel: Console: switching to colour frame buffer device 128x48
Sep 14 16:18:04 S65 kernel: nvme nvme0: pci function 0000:07:00.0
Sep 14 16:18:04 S65 kernel: nvme nvme0: Shutdown timeout set to 60 seconds
Sep 14 16:18:04 S65 kernel: nvme nvme0: 32/0/0 default/read/poll queues
Sep 14 16:18:04 S65 systemd-udevd[924]: Using default interface naming scheme 'v247'.
Sep 14 16:18:04 S65 systemd-udevd[924]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
--
Sep 14 16:18:17 S65 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=modprobe@drm comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 14 16:18:17 S65 smartd[4405]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Sep 14 16:18:17 S65 smartd[4405]: Device: /dev/nvme0, opened
Sep 14 16:18:17 S65 smartd[4405]: Device: /dev/nvme0, NGD-HN2530-160T1-C, S/N:NC000000, FW:1D438199
Sep 14 16:18:17 S65 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=smartd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 14 16:18:17 S65 smartd[4405]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Sep 14 16:18:17 S65 systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
Sep 14 16:18:17 S65 smartd[4405]: Device: /dev/sda [SAT], --capabilites is set, mail notification may not work.
Sep 14 16:18:17 S65 smartd[4405]: Device: /dev/nvme0, --capabilites is set, mail notification may not work.
Sep 14 16:18:17 S65 smartd[4405]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Sep 14 16:18:17 S65 polkitd[4383]: Started polkitd version 0.117
Sep 14 16:18:18 S65 polkitd[4383]: Loading rules from directory /etc/polkit-1/rules.d


Additional (maybe irrelevant) information
This system is configured for virtualization but not yet used... (IOMMU is confirmed enabled)
$ sudo cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 rhgb quiet intel_iommu=on rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
Comment 1 Keith Busch 2021-09-16 19:56:54 UTC
CNS 6 is not invalid, though. It's defined by spec, and it's the only defined way a driver can discover length limits for non-r/w commands. There's also no way for the driver to know if the target supports the optional CNS 6 until it tries it. If the target doesn't support it, the correct response is to just return "Invalid Field" response code and the driver will happily carry on.

So what's really happening here? An MCE occurrence due to an optional command attempt sounds like your drive is horribly broken. What model are you using?
Comment 2 Stig Nielsen 2021-09-17 02:58:01 UTC
Thanks for the help. I can switch back and forth to previous kernel and all works fine, including the drive so it's not broken. It may be that the host sends the CNS command continuously as the "activity" led keeps flashing, so some activity keeps going.

To test that an "Invalid Field" is returned from the drive, the following command was asserted to check for correct response (running kernel 5.4.14-200) Although I'm not sure about the correct syntax here, so any suggestions greatly appreciated....

$ sudo nvme admin-passthru -o 6 -4 6 -n 1 /dev/nvme0 -r -l 1
NVMe status: INVALID_FIELD: A reserved coded value or an unsupported value in a defined field(0x2)

$ sudo nvme admin-passthru -o 6 -4 1 -n 1 /dev/nvme0 -r -l 1
NVMe command result:00000000
Comment 3 Keith Busch 2021-09-17 03:12:59 UTC
Thanks for sending those nvme commands, that was the very next thing I was going to ask for. :) Your syntax looks correct to me, and your drive is producing the expected response to the unsupported CNS 6 request, so I am starting to believe your observation is not the CNS 6 identification.

In fact, after reviewing the git commit history, we've been using Identify CNS 6 since 5.13.0, so you're working 5.14.14-200 is issuing this same command sequence. There must something unrelated to that that is breaking your boot. Since your screen shot contains very little information, I think it would be most useful to bisect. Is that something you can do?
Comment 4 Keith Busch 2021-09-17 03:16:49 UTC
Eh, sorry, you wrote 5.4.14, not 5.14.14... 5.4 definitely doesn't send that command.

Right now, I don't see any path in the driver that would issue that identification repeatedly.

Are you able to revert the commit that introduced this command? If you're building from git source, it should be commit 5befc7c26e5a98cd49789fb1beb52c62bd472dba.
Comment 5 Keith Busch 2021-09-17 03:22:41 UTC
And one last comment... On your command line syntax, the 'nvme admin-passthru' parameter value for '-l' should be 4096 since that's the number of bytes this command is supposed to transfer. I suspect that difference won't matter here, though.
Comment 6 Stig Nielsen 2021-09-17 06:06:34 UTC
Created attachment 298861 [details]
Screenshot runlevel 3
Comment 7 Stig Nielsen 2021-09-17 06:08:25 UTC
Thanks for your help. Yes -l 4096 doesn't make a difference. 

I tried to boot into runlevel 3, and were actually able to login and assert "dmesg -w".
Last output before the mce: CPUs not responding..etc... is 
nvme nvme0: I/O 18 QID 0 timeout, disable controller (see picture)

I just updated the kernels from repositories, but I'll try to see if I can build from git. Thanks again
Comment 8 Keith Busch 2021-09-18 03:04:12 UTC
The new screen shot indicates a command during initialization times out, and the MCE follows shortly after. It's not clear at this point if this timeout is the new identify command, but no matter what command it is, the drive is not producing a response for it. If we assume it is the recently added identification command, something about the drive is different during boot compared to when you manually submitted the command from user space later.

I've tried to synthesize this error condition, and it looks to me that the driver is handling it correctly. I've no idea what could be triggering this.
Comment 9 Stig Nielsen 2022-08-12 17:06:12 UTC
The problem has been resolved. The SSD vendor identified the problem and changed the namespace transport mechanism in their firmware. 
Closing this issue