Bug 220015
Summary: | [BISECTED] NVME re-read ANA log page patch causes boot hang in 6.15.0-rc2 | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Todd Brandt (todd.e.brandt) |
Component: | NVMe | Assignee: | IO/NVME Virtual Default Assignee (io_nvme) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | hare, linuxnet111, tr.ml |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 6.15.0-rc2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | 62baf70c327444338c34703c71aa8cc8e4189bd6 |
Bug Depends on: | |||
Bug Blocks: | 178231 | ||
Attachments: | nvme-boot-error-console.txt |
Description
Todd Brandt
2025-04-15 21:49:30 UTC
The key piece of console error info is this: [ 35.326061] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x2010 [ 35.382840] nvme0n1: I/O Cmd(0x2) @ LBA 0, 8 blocks, I/O Error (sct 0x3 / sc 0x71) [ 35.391169] I/O error, dev nvme0n1, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 [ 35.400504] nvme nvme0: Failed to get ANA log: -4 [ 35.456208] nvme nvme0: 8/0/0 default/read/poll queues [ 35.462684] nvme nvme0: Ignoring bogus Namespace Identifiers [ 35.498123] DMAR: DRHD: handling fault status reg 2 [ 35.503428] DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0x0 [fault reason 0x06] PTE Read access is not set [ 35.515010] DMAR: Dump dmar1 table entries for IOVA 0x0 [ 35.520640] DMAR: root entry: 0x0000000105038001 [ 35.520641] DMAR: context entry: hi 0x0000000000000a02, low 0x0000000105037001 [ 35.533282] DMAR: pte level: 4, pte value: 0x0000000101754003 [ 35.539427] DMAR: pte level: 3, pte value: 0x0000000000000000 [ 35.545577] DMAR: page table not present at level 2 ok I found the following proposed fix in lkml, trying it now: diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index b502ac07483b..eb6ea8acb3cc 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4300,7 +4300,7 @@ static void nvme_scan_work(struct work_struct *work) if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events)) nvme_queue_scan(ctrl); #ifdef CONFIG_NVME_MULTIPATH - else + else if (ctrl->ana_log_buf) /* Re-read the ANA log page to not miss updates */ queue_work(nvme_wq, &ctrl->ana_work); #endif This patch seems to fix things, so once it's available in upstream I'll close this issue. *** Bug 220007 has been marked as a duplicate of this bug. *** Can confirm it happens on a Lenovo Thinkpad L14 as well. RockT: does that above patch fix it? It's fixed it on multiple machines here. I have 3 that boot crashed and were not working, and one HP Spectre that couldn't suspend because nvme refused to suspend. It seems this issue was pretty broad in its effects. The patch fixed it on all 4. I'm using the ubuntu mainline kernel. I was able to patch the source but could not rebuild. All the documentation I found seems outdated. Sorry :( |