Bug 220007

Summary: I/O error, dev nvme0n1
Product: IO/Storage Reporter: doru iorgulescu (linuxnet111)
Component: NVMeAssignee: IO/NVME Virtual Default Assignee (io_nvme)
Status: RESOLVED DISTRO_KERNEL    
Severity: normal CC: kbusch, walter.moeller
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: 6.15.0-rc2 Subsystem:
Regression: No Bisected commit-id:
Attachments: Screen
lspci-v

Description doru iorgulescu 2025-04-13 05:23:43 UTC
Created attachment 307957 [details]
Screen

Linux Kernel 6.15.0-rc2
I/O error, dev nvme0n1
No boot
Comment 1 doru iorgulescu 2025-04-13 08:14:39 UTC
The SATA controller
00:17.0 RAID bus controller: Intel Corporation 82801 Mobile SATA Controller [RAID mode] (rev 31)
	Subsystem: Dell Device 0708
	Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 134
	Memory at dd130000 (32-bit, non-prefetchable) [size=8K]
	Memory at dd134000 (32-bit, non-prefetchable) [size=256]
	I/O ports at f090 [size=8]
	I/O ports at f080 [size=4]
	I/O ports at f060 [size=32]
	Memory at dd133000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [70] Power Management version 3
	Capabilities: [a8] SATA HBA v1.0
	Kernel driver in use: ahci
	Kernel modules: ahci
Comment 2 Artem S. Tashkinov 2025-04-14 16:49:26 UTC
Could you please bisect?

https://docs.kernel.org/admin-guide/bug-bisect.html
Comment 3 Keith Busch 2025-04-14 16:53:37 UTC
Just FYI, the "SATA controller" in the lspci output is not related to an nvme device. The nvme devices show up directly on lspci, not behind a SATA controller.
Comment 4 Artem S. Tashkinov 2025-04-15 08:28:13 UTC
*** Bug 220014 has been marked as a duplicate of this bug. ***
Comment 5 doru iorgulescu 2025-04-15 09:40:06 UTC
I have atached lspci-v
I dont find nvme !
Comment 6 doru iorgulescu 2025-04-15 09:42:02 UTC
Created attachment 307965 [details]
lspci-v
Comment 7 doru iorgulescu 2025-04-15 09:50:56 UTC
The problem
3e:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd PM963 2.5" NVMe PCIe SSD
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at dd400000 (64-bit, non-prefetchable) [size=16K]
        I/O ports at c000 [size=256]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, IntMsgNum 0
        Capabilities: [b0] MSI-X: Enable+ Count=9 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [148] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158] Power Budgeting <?>
        Capabilities: [168] Secondary PCI Express
        Capabilities: [188] Latency Tolerance Reporting
        Capabilities: [190] L1 PM Substates
        Kernel driver in use: nvme
        Kernel modules: nvme
Comment 8 doru iorgulescu 2025-04-15 11:54:33 UTC
I think that the commit is the problem
Merge tag 'block-6.15-20250411' of git://git.kernel.dk/linux
Comment 9 Keith Busch 2025-04-15 16:52:11 UTC
(In reply to doru iorgulescu from comment #8)
> I think that the commit is the problem
> Merge tag 'block-6.15-20250411' of git://git.kernel.dk/linux

You bisected to a merge commit? That doesn't seem right, but it's in the right subsystem at least.

So this worked in 4.16-rc1 but not rc2? There is a pending fix that was mentioned for this device here:

https://lore.kernel.org/linux-nvme/94478dce-05e3-4cc8-8ef2-467e00109575@grimberg.me/T/#t

The failure described there doesn't really match yours, but it's worth a shot.
Comment 10 Keith Busch 2025-04-15 18:33:01 UTC
(In reply to Keith Busch from comment #9)
> So this worked in 4.16-rc1 but not rc2?

Oops, I meant 6.15-rc1!
Comment 11 doru iorgulescu 2025-04-16 05:37:39 UTC
Thank You
The patch:
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b502ac07483b..eb6ea8acb3cc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4300,7 +4300,7 @@ static void nvme_scan_work(struct work_struct *work)
 	if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
 		nvme_queue_scan(ctrl);
 #ifdef CONFIG_NVME_MULTIPATH
-	else
+	else if (ctrl->ana_log_buf)
 		/* Re-read the ANA log page to not miss updates */
 		queue_work(nvme_wq, &ctrl->ana_work);
 #endif

I wil try !
Comment 12 doru iorgulescu 2025-04-16 10:18:38 UTC
Is OK to linux kernel 6.15.0-rc3 and linux-next-20250416
Thank You very muth !!!!!
Comment 13 Artem S. Tashkinov 2025-04-16 11:44:17 UTC

*** This bug has been marked as a duplicate of bug 220015 ***
Comment 14 doru iorgulescu 2025-04-19 06:52:31 UTC
Today is RESOLVED

Merge tag 'block-6.15-20250417' of git://git.kernel.dk/linux

- NVMe pull via Christoph:
      - fix scan failure for non-ANA multipath controllers (Hannes
        Reinecke)
      - fix multipath sysfs links creation for some cases (Hannes
        Reinecke)
      - PCIe endpoint fixes (Damien Le Moal)
      - use NULL instead of 0 in the auth code (Damien Le Moal)
Thank You Very Mutch !!!!!!!!!!!!!!