Bug 201463

Summary: 4.17.x kernels cause "nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x810" on MacBook Air
Product: IO/Storage Reporter: Lonni J Friedman (netllama)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: NEW ---    
Severity: blocking CC: anothersname, florian, netllama
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.17.x, 4.18.x, 5.x Subsystem:
Regression: Yes Bisected commit-id:
Attachments: system inventory, including dmesg with failure

Description Lonni J Friedman 2018-10-18 00:49:57 UTC
Created attachment 279085 [details]
system inventory, including dmesg with failure

Ever since Fedora released 4.17.x kernels, my MacBook Air (7,1 with NVMe *not* SATA) hangs and crashes during boot when attempting to use the nvme disk controller, with messages such as:

[   32.742056] llamamac kernel: nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x810
[   32.767211] llamamac kernel: nvme nvme0: detected Apple NVMe controller, set queue depth=2 to work around controller resets
[   48.368123] llamamac kernel: nvme nvme0: Device not ready; aborting reset
[   48.368181] llamamac kernel: nvme nvme0: Removing after probe failure status: -19
[  132.401562] llamamac dracut-initqueue[330]: Warning: dracut-initqueue timeout - starting timeout script

I've tried booting with nvme_core.default_ps_max_latency_us=5500 but that had no impact.  I've also tested 4.18.x kernels, with no change in misbehavior.

There are also reports that vanilla kernel.org kernels of the same versions also behave in the same fashion.  This bug is *NOT* present on 4.16.x (and older) kernels.  It was definitely introduced in 4.17.x.
Comment 1 Anthony Name 2018-10-28 12:50:03 UTC
Getting same problem since 4.17.x kernels using Toshiba XG3 thnsn51t02du7 nvme drive.

Can use kernels 4.15 without issue.

Running Fedora on a x64 platform, cannot post logs as the nvme drive is the only drive in the system and when it borks logs cannot be saved.

As I've had to put the system to one side till a fix am happy to run tests as it's now become a dev system to all intents and purposes.
Comment 2 Lonni J Friedman 2018-10-28 18:39:35 UTC
@anthony you should be able to get the logs off the system with a USB stick.  Eventually the boot process will time out, and dump you to a primitive shell where you can inspect the in memory logs, and mount a USB stick to copy them over.  That's what I was able to do.
Comment 3 Anthony Name 2019-05-17 10:49:38 UTC
@Lonni

Did this ever get fixed?

I had to pull the nvme drive from the box as needed the box for something else and would be useful if I could now put the nvme drive back in.
Comment 4 Lonni J Friedman 2019-05-17 14:03:38 UTC
Nope, not fixed.  Sadly, this hardware has been abandoned by the Linux community.