216536 – Kingston KC3000 PCIe 4.0 SSD - Abort status: 0x371

Bug 216536 - Kingston KC3000 PCIe 4.0 SSD - Abort status: 0x371

Summary: Kingston KC3000 PCIe 4.0 SSD - Abort status: 0x371

Status:	NEW

Alias:	None

Product:	IO/Storage
Classification:	Unclassified
Component:	NVMe (show other bugs)
Hardware:	All Linux

Importance:	P1 normal
Assignee:	IO/NVME Virtual Default Assignee

URL:
Keywords:

Depends on:
Blocks:

Reported:	2022-09-28 12:20 UTC by hastegeldhastewas
Modified:	2022-10-20 17:30 UTC (History)
CC List:	1 user (show)

See Also:
Kernel Version:	6.0.0-rc7-1-mainline
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
tar.xz containing outputs of different commands (35.09 KB, application/x-xz) 2022-09-28 12:21 UTC, hastegeldhastewas	Details
dmesg output (97.52 KB, text/plain) 2022-09-28 12:22 UTC, hastegeldhastewas	Details
dmesg \| grep nvme (1.94 KB, text/plain) 2022-09-28 12:22 UTC, hastegeldhastewas	Details
fdisk -l (748 bytes, text/plain) 2022-09-28 12:22 UTC, hastegeldhastewas	Details
journalctl -b (155.76 KB, text/plain) 2022-09-28 12:22 UTC, hastegeldhastewas	Details
lspci -vv (59.14 KB, text/plain) 2022-09-28 12:23 UTC, hastegeldhastewas	Details
smartctl -a (277 bytes, text/plain) 2022-09-28 12:23 UTC, hastegeldhastewas	Details
proposed fix (654 bytes, patch) 2022-09-28 14:42 UTC, Keith Busch	Details \| Diff
kernel with nvme patch and pci=nomsi parameter (3.20 MB, image/jpeg) 2022-10-01 13:13 UTC, hastegeldhastewas	Details
after ssd replacement 3 outputs of "journalctl -b" from 3 different boots (35.16 KB, application/x-xz) 2022-10-06 15:35 UTC, hastegeldhastewas	Details
trace prints on completion (1.30 KB, patch) 2022-10-06 17:14 UTC, Keith Busch	Details \| Diff
Screenshots and export from kingston ssd manager (420.61 KB, application/x-xz) 2022-10-06 18:16 UTC, hastegeldhastewas	Details
Screenshot from Crystaldiskinfo (153.53 KB, image/png) 2022-10-06 18:16 UTC, hastegeldhastewas	Details
Screenshot from AS SSD Benchmark (63.12 KB, image/png) 2022-10-06 18:17 UTC, hastegeldhastewas	Details
nvmetrace kernel "journalctl -b" (163.05 KB, text/plain) 2022-10-06 23:33 UTC, hastegeldhastewas	Details
nvmetrace kernel "zcat /proc/config.gz" (251.39 KB, text/plain) 2022-10-06 23:34 UTC, hastegeldhastewas	Details
nvmetrace kernel "cat /sys/kernel/debug/tracing/trace" (152.06 KB, text/plain) 2022-10-06 23:35 UTC, hastegeldhastewas	Details
All the quirks (1.30 KB, patch) 2022-10-07 02:40 UTC, Keith Busch	Details \| Diff
maxquirk kernel "dmesg" (96.38 KB, text/plain) 2022-10-07 23:55 UTC, hastegeldhastewas	Details
maxquirk kernel "journalctl -b" (157.42 KB, text/plain) 2022-10-07 23:55 UTC, hastegeldhastewas	Details
other computer, same kernel, same ssd, but it works (35.55 KB, application/x-xz) 2022-10-19 13:32 UTC, hastegeldhastewas	Details
other computer, same kernel, same ssd, but it works + tracing patch (56.81 KB, application/x-xz) 2022-10-19 13:34 UTC, hastegeldhastewas	Details
disable ro for kingston (602 bytes, patch) 2022-10-19 15:12 UTC, Keith Busch	Details \| Diff
kernel without relaxed ordering fails to boot (3.12 MB, image/jpeg) 2022-10-19 19:26 UTC, hastegeldhastewas	Details
logs_for_four_kernels_some_with_no_relaxed_ordering (64.80 KB, application/x-xz) 2022-10-20 11:46 UTC, hastegeldhastewas	Details
output of lspci -nn (2.98 KB, text/plain) 2022-10-20 11:58 UTC, hastegeldhastewas	Details
logs_kernels_with_no_relaxed_ordering quirk not working (44.20 KB, application/x-xz) 2022-10-20 13:47 UTC, hastegeldhastewas	Details
logs_setpci_last_try_linux-6.1.0-rc1-noro (54.91 KB, application/gzip) 2022-10-20 15:20 UTC, hastegeldhastewas	Details
Add an attachment (proposed patch, testcase, etc.)

Description hastegeldhastewas 2022-09-28 12:20:43 UTC

Hello,

I installed a "Kingston KC3000 PCIe 4.0 SSD" of 4 Terabyte size in my new Ryzen 6000 Laptop "Lenovo Yoga Slim 7 Pro X".

The original SSD works fine, but the new Kingston one results in a delayed boot and produces errors in the kernel log. Each operation on the ssd is causing a delay. For example I need to wait for some time until "fdisk -l" responds back.

"smartctl -a" doesn't work at all:
Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Interrupted system call


An excerpt of the most relevant kernel logs "dmesg | grep nvme":
[    0.815616] nvme 0000:02:00.0: platform quirk: setting simple suspend
[    0.815645] nvme nvme0: pci function 0000:02:00.0
[    1.281525] nvme nvme0: Shutdown timeout set to 10 seconds
[    1.285707] nvme nvme0: 16/0/0 default/read/poll queues
[   32.150395] nvme nvme0: invalid id 0 completed on queue 7
[   32.150404] nvme nvme0: I/O 512 (Read) QID 7 timeout, aborting
[   32.150651] nvme nvme0: Abort status: 0x0
[   62.870472] nvme nvme0: I/O 512 QID 7 timeout, reset controller
[  124.310467] nvme nvme0: invalid id 0 completed on queue 0
[  124.310490] nvme nvme0: invalid id 0 completed on queue 0
[  124.310505] nvme nvme0: invalid id 0 completed on queue 0
[  124.310520] nvme nvme0: invalid id 0 completed on queue 0
[  124.310534] nvme nvme0: invalid id 0 completed on queue 0
[  124.310548] nvme nvme0: invalid id 0 completed on queue 0
[  124.310562] nvme nvme0: invalid id 0 completed on queue 0
[  124.310576] nvme nvme0: invalid id 0 completed on queue 0
[  124.310590] nvme nvme0: invalid id 0 completed on queue 0
[  124.310605] nvme nvme0: invalid id 0 completed on queue 0
[  124.310619] nvme nvme0: invalid id 0 completed on queue 0
[  124.310634] nvme nvme0: invalid id 0 completed on queue 0
[  124.310648] nvme nvme0: invalid id 0 completed on queue 0
[  124.310662] nvme nvme0: invalid id 0 completed on queue 0
[  124.310676] nvme nvme0: invalid id 0 completed on queue 0
[  124.310690] nvme nvme0: invalid id 0 completed on queue 0
[  124.310695] nvme nvme0: I/O 8 QID 0 timeout, reset controller
[  124.327412] nvme0n1: Read(0x2) @ LBA 8001573376, 8 blocks, Host Aborted Command (sct 0x3 / sc 0x71)
[  124.327464] I/O error, dev nvme0n1, sector 8001573376 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[  124.359883] nvme nvme0: Shutdown timeout set to 10 seconds
[  124.362027] nvme nvme0: 16/0/0 default/read/poll queues
[  144.913853] nvme nvme0: invalid id 0 completed on queue 2
[  155.030534] nvme nvme0: I/O 704 (Read) QID 2 timeout, aborting
[  185.750213] nvme nvme0: I/O 704 QID 2 timeout, reset controller
[  185.750274] nvme nvme0: invalid id 0 completed on queue 0
[  185.787269] nvme0n1: Read(0x2) @ LBA 8001573376, 8 blocks, Host Aborted Command (sct 0x3 / sc 0x71)
[  185.787363] I/O error, dev nvme0n1, sector 8001573376 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[  185.787471] nvme nvme0: Abort status: 0x371
[  185.802599] nvme nvme0: Shutdown timeout set to 10 seconds
[  185.804537] nvme nvme0: 16/0/0 default/read/poll queues
[  267.670245] nvme nvme0: request 0x0 genctr mismatch (got 0x0 expected 0x1)
[  267.670332] nvme nvme0: invalid id 0 completed on queue 0
[  267.670343] nvme nvme0: I/O 24 QID 0 timeout, reset controller
[  267.710387] nvme0: Identify(0x6), Host Aborted Command (sct 0x3 / sc 0x71)
[  267.726065] nvme nvme0: Shutdown timeout set to 10 seconds
[  267.729259] nvme nvme0: 16/0/0 default/read/poll queues
[  806.976986] nvme nvme0: invalid id 0 completed on queue 7
[  806.977004] nvme nvme0: I/O 513 (Read) QID 7 timeout, aborting
[  806.977236] nvme nvme0: Abort status: 0x0

The attachment is a .tar.xz containing command outputs of:
"smartctl -a ", "fdisk -l", "dmesg | grep nvme", "lspci -vv", "dmesg", "journalctl -b"

My main question:
Is it possible to determine if this is a kernel issue or is it a hardware issue and can it be solved?

Comment 1 hastegeldhastewas 2022-09-28 12:21:26 UTC

Created attachment 301876 [details]
tar.xz containing outputs of different commands

The attachment is a .tar.xz containing command outputs of:
"smartctl -a ", "fdisk -l", "dmesg | grep nvme", "lspci -vv", "dmesg", "journalctl -b"

Comment 2 hastegeldhastewas 2022-09-28 12:22:04 UTC

Created attachment 301877 [details]
dmesg output

Comment 3 hastegeldhastewas 2022-09-28 12:22:25 UTC

Created attachment 301878 [details]
dmesg | grep nvme

dmesg | grep nvme

Comment 4 hastegeldhastewas 2022-09-28 12:22:43 UTC

Created attachment 301879 [details]
fdisk -l

Comment 5 hastegeldhastewas 2022-09-28 12:22:57 UTC

Created attachment 301880 [details]
journalctl -b

Comment 6 hastegeldhastewas 2022-09-28 12:23:11 UTC

Created attachment 301881 [details]
lspci -vv

Comment 7 hastegeldhastewas 2022-09-28 12:23:25 UTC

Created attachment 301882 [details]
smartctl -a

Comment 8 Keith Busch 2022-09-28 13:33:37 UTC

It looks like your device is out of compliance with respect to host command id's. I can send a patch later today that will turn off the generation sequence for this model and see if that fixes your observations.

Comment 9 hastegeldhastewas 2022-09-28 13:55:43 UTC

Thank you for your response.
I standby and make myself mentally ready to compile a kernel.

What consequences I have to expect if your workaround works? 
Making the ssd normally usable again is the goal, but what about other features like the power management?

Comment 10 Keith Busch 2022-09-28 14:02:22 UTC

There are no other side effects from disabling the sequencer. It's just a defensive mechanism in the driver to detect double completions. It's a relatively new driver feature because we didn't believe it was necessary until it actually happened in the wild. The consequences of a double competition are so awful that we were practically compelled to detect it. In general, though, it's unnecessary paranoia for most devices, so it's harmless to turn it off.

Comment 11 Keith Busch 2022-09-28 14:42:24 UTC

Created attachment 301885 [details]
proposed fix

Comment 12 hastegeldhastewas 2022-09-28 19:10:53 UTC

I compiled a new kernel from a freshly pulled linux git repository. 
Some of the messages regarding nvme disappeared, but "gdisk /dev/nvme0n1" and "fdsik -l" are still stuck aswell as "smartctl -a /dev/nvme0n1".
I also have the problem that my whole system freezes after some time.

The kernel log gives the following messages after boot:

archusb ~# journalctl -b | grep nvme
Sep 28 20:59:03 archusb kernel: Linux version 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty (user@headquarter) (gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39.0) #1 SMP PREEMPT_DYNAMIC Wed Sep 28 16:54:32 CEST 2022
Sep 28 20:59:03 archusb kernel: Command line: initrd=\initramfs-linux-nvmefix.img root=UUID=6d6c3551-a53c-425b-ab29-9cf52e1943f7 rw
Sep 28 20:59:03 archusb kernel: Kernel command line: initrd=\initramfs-linux-nvmefix.img root=UUID=6d6c3551-a53c-425b-ab29-9cf52e1943f7 rw
Sep 28 20:59:03 archusb kernel: usb usb1: Manufacturer: Linux 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty xhci-hcd
Sep 28 20:59:03 archusb kernel: usb usb2: Manufacturer: Linux 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty xhci-hcd
Sep 28 20:59:03 archusb kernel: usb usb3: Manufacturer: Linux 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty xhci-hcd
Sep 28 20:59:03 archusb kernel: usb usb4: Manufacturer: Linux 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty xhci-hcd
Sep 28 20:59:03 archusb kernel: usb usb5: Manufacturer: Linux 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty xhci-hcd
Sep 28 20:59:03 archusb kernel: usb usb6: Manufacturer: Linux 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty xhci-hcd
Sep 28 20:59:03 archusb kernel: usb usb7: Manufacturer: Linux 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty xhci-hcd
Sep 28 20:59:03 archusb kernel: usb usb8: Manufacturer: Linux 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty xhci-hcd
Sep 28 20:59:03 archusb kernel: usb usb9: Manufacturer: Linux 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty xhci-hcd
Sep 28 20:59:03 archusb kernel: usb usb10: Manufacturer: Linux 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty xhci-hcd
Sep 28 20:59:03 archusb kernel: nvme 0000:02:00.0: platform quirk: setting simple suspend
Sep 28 20:59:03 archusb kernel: nvme nvme0: pci function 0000:02:00.0
Sep 28 20:59:03 archusb kernel: nvme nvme0: Shutdown timeout set to 10 seconds
Sep 28 20:59:03 archusb kernel: nvme nvme0: 16/0/0 default/read/poll queues



And the following messages after I executed "gdisk /dev/nvme0n1" (3 times in a row, last time my system freezed):
Sep 28 20:59:35 archusb kernel: nvme nvme0: invalid id 0 completed on queue 1
Sep 28 20:59:35 archusb kernel: nvme nvme0: I/O 384 (Read) QID 1 timeout, aborting
Sep 28 20:59:35 archusb kernel: nvme nvme0: Abort status: 0x0
Sep 28 21:00:04 archusb systemd-udevd[375]: nvme0n1: Worker [425] processing SEQNUM=2739 is taking a long time
Sep 28 21:00:06 archusb kernel: nvme nvme0: I/O 384 QID 1 timeout, reset controller
Sep 28 21:00:06 archusb kernel: nvme nvme0: invalid id 0 completed on queue 15
Sep 28 21:00:06 archusb kernel: nvme0n1: Read(0x2) @ LBA 8001573376, 8 blocks, Host Aborted Command (sct 0x3 / sc 0x71)
Sep 28 21:00:06 archusb kernel: I/O error, dev nvme0n1, sector 8001573376 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
Sep 28 21:00:06 archusb kernel: nvme0n1: Read(0x2) @ LBA 0, 32 blocks, Host Aborted Command (sct 0x3 / sc 0x71)
Sep 28 21:00:06 archusb kernel: I/O error, dev nvme0n1, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 2
Sep 28 21:00:06 archusb kernel: nvme nvme0: Shutdown timeout set to 10 seconds
Sep 28 21:00:06 archusb kernel: nvme nvme0: 16/0/0 default/read/poll queues
Sep 28 21:01:46 archusb kernel: nvme nvme0: I/O 0 (Read) QID 13 timeout, aborting
Sep 28 21:01:46 archusb kernel: nvme nvme0: Abort status: 0x0
Sep 28 21:02:24 archusb kernel: nvme nvme0: invalid id 0 completed on queue 12
Sep 28 21:02:24 archusb kernel: nvme nvme0: I/O 960 (Read) QID 12 timeout, aborting
Sep 28 21:02:24 archusb kernel: nvme nvme0: Abort status: 0x0
Sep 28 21:02:55 archusb kernel: nvme nvme0: I/O 960 QID 12 timeout, reset controller
Sep 28 21:03:56 archusb kernel: nvme nvme0: I/O 4 QID 0 timeout, reset controller
Sep 28 21:03:56 archusb kernel:  videobuf2_common tpm_crb snd_soc_acpi aesni_intel typec_ucsi vfat sparse_keymap tpm_tis snd_timer fat crypto_simd videodev snd tpm_tis_core platform_profile ecdh_generic cryptd rapl pcspkr typec mc cec crc16 snd_pci_acp3x soundcore rfkill ccp i2c_piix4 k10temp roles tpm i2c_hid_acpi wmi mac_hid i2c_hid rng_core amd_pmc acpi_cpufreq video acpi_tad fuse bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq uas usb_storage usbhid serio_raw atkbd nvme libps2 vivaldi_fmap nvme_core i8042 xhci_pci crc32c_intel xhci_pci_renesas serio
Sep 28 21:03:56 archusb kernel: CPU: 12 PID: 92 Comm: ksoftirqd/12 Not tainted 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty #1 7ecd7809f8f37e0153b2dde865e868f3fbcc257c
Sep 28 21:03:56 archusb kernel:  nvme_del_cq_end+0x27/0x40 [nvme a5ac8cd8b1b0917064a5d222a5652fc5a9ad93f3]
Sep 28 21:04:24 archusb kernel:  videobuf2_common tpm_crb snd_soc_acpi aesni_intel typec_ucsi vfat sparse_keymap tpm_tis snd_timer fat crypto_simd videodev snd tpm_tis_core platform_profile ecdh_generic cryptd rapl pcspkr typec mc cec crc16 snd_pci_acp3x soundcore rfkill ccp i2c_piix4 k10temp roles tpm i2c_hid_acpi wmi mac_hid i2c_hid rng_core amd_pmc acpi_cpufreq video acpi_tad fuse bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq uas usb_storage usbhid serio_raw atkbd nvme libps2 vivaldi_fmap nvme_core i8042 xhci_pci crc32c_intel xhci_pci_renesas serio
Sep 28 21:04:24 archusb kernel: CPU: 12 PID: 92 Comm: ksoftirqd/12 Tainted: G        W          6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty #1 7ecd7809f8f37e0153b2dde865e868f3fbcc257c
Sep 28 21:04:52 archusb kernel:  videobuf2_common tpm_crb snd_soc_acpi aesni_intel typec_ucsi vfat sparse_keymap tpm_tis snd_timer fat crypto_simd videodev snd tpm_tis_core platform_profile ecdh_generic cryptd rapl pcspkr typec mc cec crc16 snd_pci_acp3x soundcore rfkill ccp i2c_piix4 k10temp roles tpm i2c_hid_acpi wmi mac_hid i2c_hid rng_core amd_pmc acpi_cpufreq video acpi_tad fuse bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq uas usb_storage usbhid serio_raw atkbd nvme libps2 vivaldi_fmap nvme_core i8042 xhci_pci crc32c_intel xhci_pci_renesas serio
Sep 28 21:04:52 archusb kernel: CPU: 12 PID: 92 Comm: ksoftirqd/12 Tainted: G        W    L     6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty #1 7ecd7809f8f37e0153b2dde865e868f3fbcc257c
Sep 28 21:04:52 archusb kernel: RIP: 0010:nvme_pci_complete_rq+0x4/0x90 [nvme]
Sep 28 21:04:56 archusb kernel: CPU: 12 PID: 92 Comm: ksoftirqd/12 Tainted: G        W    L     6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty #1 7ecd7809f8f37e0153b2dde865e868f3fbcc257c
Sep 28 21:04:56 archusb kernel: RIP: 0010:nvme_error_status+0x1c/0x150 [nvme_core]
Sep 28 21:04:56 archusb kernel:  nvme_complete_rq+0x46/0x2e0 [nvme_core 270b20fd7b0b171d3fb7775cf55fe067099f26f1]
Sep 28 21:05:08 archusb kernel:       Tainted: G        W    L     6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty #1
Sep 28 21:05:24 archusb kernel:  videobuf2_common tpm_crb snd_soc_acpi aesni_intel typec_ucsi vfat sparse_keymap tpm_tis snd_timer fat crypto_simd videodev snd tpm_tis_core platform_profile ecdh_generic cryptd rapl pcspkr typec mc cec crc16 snd_pci_acp3x soundcore rfkill ccp i2c_piix4 k10temp roles tpm i2c_hid_acpi wmi mac_hid i2c_hid rng_core amd_pmc acpi_cpufreq video acpi_tad fuse bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq uas usb_storage usbhid serio_raw atkbd nvme libps2 vivaldi_fmap nvme_core i8042 xhci_pci crc32c_intel xhci_pci_renesas serio
Sep 28 21:05:24 archusb kernel: CPU: 12 PID: 92 Comm: ksoftirqd/12 Tainted: G        W    L     6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty #1 7ecd7809f8f37e0153b2dde865e868f3fbcc257c
Sep 28 21:05:52 archusb kernel:  videobuf2_common tpm_crb snd_soc_acpi aesni_intel typec_ucsi vfat sparse_keymap tpm_tis snd_timer fat crypto_simd videodev snd tpm_tis_core platform_profile ecdh_generic cryptd rapl pcspkr typec mc cec crc16 snd_pci_acp3x soundcore rfkill ccp i2c_piix4 k10temp roles tpm i2c_hid_acpi wmi mac_hid i2c_hid rng_core amd_pmc acpi_cpufreq video acpi_tad fuse bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq uas usb_storage usbhid serio_raw atkbd nvme libps2 vivaldi_fmap nvme_core i8042 xhci_pci crc32c_intel xhci_pci_renesas serio
Sep 28 21:05:52 archusb kernel: CPU: 12 PID: 92 Comm: ksoftirqd/12 Tainted: G        W    L     6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty #1 7ecd7809f8f37e0153b2dde865e868f3fbcc257c


It clearly shows some qid timing out, whatever this means.

Comment 13 Keith Busch 2022-09-28 21:03:15 UTC

Could you send output of 'lspci -xs 02:00.0'?

Comment 14 hastegeldhastewas 2022-09-29 07:28:51 UTC

Here it is:

archusb ~# lspci -xs 02:00.0
02:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 5013 (rev 01)
00: 46 26 13 50 06 04 10 00 01 02 08 01 10 00 00 00
10: 04 00 70 80 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 46 26 13 50
30: 00 00 00 00 80 00 00 00 00 00 00 00 ff 01 00 00

Comment 15 Keith Busch 2022-09-29 15:17:48 UTC

Thanks, that confirms the quirk used the correct identifiers. So it's not a matter of sequencing breaking your device, but something seems to have gone very wrong. I'll send a debug patch because your dmesg output doesn't make immediate sense:

[   32.150363] could not locate request for tag 0x0
[   32.150395] nvme nvme0: invalid id 0 completed on queue 7


Assuming we're actually using the queue 7's tagset, which is shown to be an active queue in your setup, tag 0 should always be valid.

Comment 16 Keith Busch 2022-09-29 16:00:21 UTC

In the meantime, could you run with kernel param 'nvme.io_queue_depth=2'?

Comment 17 hastegeldhastewas 2022-09-29 16:17:38 UTC

I booted up the kernel with your parameter. Then I tried to create a simple gpt partition with gdisk. The command hanged, but then gave "The operation has completed successfully.". Opening gdisk a second time gave "Caution: invalid main GPT header, but valid backup; regenerating main header from backup!".

The following logs were generated during the process:
archusb ~# journalctl -b | grep nvme
Sep 29 18:09:43 archusb kernel: Linux version 6.0.0-rc7nvmefix-00068-g49c13ed0316d-dirty (user@headquarter) (gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39.0) #1 SMP PREEMPT_DYNAMIC Wed Sep 28 16:54:32 CEST 2022
Sep 29 18:09:43 archusb kernel: Command line: initrd=\initramfs-linux-nvmefix.img root=UUID=6d6c3551-a53c-425b-ab29-9cf52e1943f7 rw nvme.io_queue_depth=2
Sep 29 18:09:43 archusb kernel: Kernel command line: initrd=\initramfs-linux-nvmefix.img root=UUID=6d6c3551-a53c-425b-ab29-9cf52e1943f7 rw nvme.io_queue_depth=2
[...usbstuff..]
Sep 29 18:09:43 archusb kernel: nvme 0000:02:00.0: platform quirk: setting simple suspend
Sep 29 18:09:43 archusb kernel: nvme nvme0: pci function 0000:02:00.0
Sep 29 18:09:43 archusb kernel: nvme nvme0: Shutdown timeout set to 10 seconds
Sep 29 18:09:43 archusb kernel: nvme nvme0: 16/0/0 default/read/poll queues
Sep 29 16:10:16 archusb kernel: nvme nvme0: I/O 0 (Read) QID 16 timeout, aborting
Sep 29 16:10:16 archusb kernel: nvme nvme0: Abort status: 0x0
Sep 29 16:10:47 archusb kernel: nvme nvme0: I/O 0 (Read) QID 10 timeout, aborting
Sep 29 16:10:47 archusb kernel: nvme nvme0: Abort status: 0x0
Sep 29 16:12:14 archusb kernel: nvme nvme0: I/O 0 (Read) QID 10 timeout, aborting
Sep 29 16:12:14 archusb kernel: nvme nvme0: Abort status: 0x0
Sep 29 16:12:14 archusb kernel:  nvme0n1: p1
Sep 29 16:12:46 archusb kernel: nvme nvme0: I/O 0 QID 5 timeout, completion polled
Sep 29 16:12:46 archusb kernel:  nvme0n1: p1
Sep 29 16:12:46 archusb systemd-homed[551]: block device /sys/devices/pci0000:00/0000:00:02.4/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p1 has been removed.
Sep 29 16:13:32 archusb kernel: nvme nvme0: I/O 0 (Read) QID 8 timeout, aborting
Sep 29 16:13:32 archusb kernel: nvme nvme0: Abort status: 0x0
Sep 29 16:13:32 archusb systemd-homed[551]: block device /sys/devices/pci0000:00/0000:00:02.4/0000:02:00.0/nvme/nvme0/nvme0n1/nvme0n1p1 has been removed.

Comment 18 Keith Busch 2022-09-30 00:14:59 UTC

This is not looking like a functional nvme controller. Was this device working in a different environment?

Could we test with one more kernel parameter in addition to the previous request?: "pci=nomsi". Your system will probably be slower with that, but it will simplify the nvme queue and interrupt setup.

Comment 19 hastegeldhastewas 2022-10-01 13:12:11 UTC

Okay,

I already mentioned that the drive didn't appear in Windows 10 drive selection.
Now I tried with Windows 11 and it appeared and I could install Windows, but unfortunately it won't boot. During boot I get bluescreens.

Because of your findings and the fact that I couldn't make the SSD work on windows, I also conclude the ssd is defective.
I decided to order the exact same ssd (because I am crazy) and send the current one back as defective.

I will report back when the new SSD is here and if it works.

The parameter pci=nomsi lets my kernel drop into a recovery shell.
I attach a photo of how it appears to me. 
I think my USB Stick is indirectly attached via pci and it can't find the UUID of the partition anymore.

Comment 20 hastegeldhastewas 2022-10-01 13:13:10 UTC

Created attachment 301925 [details]
kernel with nvme patch and pci=nomsi parameter

Comment 21 hastegeldhastewas 2022-10-06 15:33:24 UTC

I received my identical replacement SSD (Kingston KC3000 4TB). Unfortunately it still doesn't work and shows the exactly same behavior.
Sometimes my "fdisk -l" command isn't stuck when I freshly booted. But it will reproducable be stuck some time later.

I will provide three log files for three different boots. The first one contains a stack trace which might be of interest:

ct 06 12:49:19 archusb kernel: Hardware name: LENOVO 82TL/LNVNB161216, BIOS JVCN30WW 07/25/2022
Oct 06 12:49:19 archusb kernel: Workqueue: kblockd blk_mq_timeout_work
Oct 06 12:49:19 archusb kernel: RIP: 0010:blk_mq_put_rq_ref+0x4f/0x60
Oct 06 12:49:19 archusb kernel: Code: 2b 90 00 cc 8b 83 98 00 00 00 83 c0 7f 83 f8 7f 76 18 f0 ff 8b 98 00 00 00 74 06 5b c3 cc cc cc cc 48 89 df 5b e9 41 c0 ff ff <0f> 0b eb e4 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 66 0f 1f 00 0f
Oct 06 12:49:19 archusb kernel: RSP: 0018:ffffa73001f4fd78 EFLAGS: 00010293
Oct 06 12:49:19 archusb kernel: RAX: 0000000000000071 RBX: ffff98e8c9418000 RCX: 0000000000000000
Oct 06 12:49:19 archusb kernel: RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff98e8c9418000
Oct 06 12:49:19 archusb kernel: RBP: ffff98e8ca2103b0 R08: 0000000000000000 R09: 0000000000000008
Oct 06 12:49:19 archusb kernel: R10: ffffa73001f4fc94 R11: 000000000000001c R12: ffff98e8c0c4fc00
Oct 06 12:49:19 archusb kernel: R13: 0000000000000001 R14: 0000000000000004 R15: 0000000000000004
Oct 06 12:49:19 archusb kernel: FS:  0000000000000000(0000) GS:ffff98ef22140000(0000) knlGS:0000000000000000
Oct 06 12:49:19 archusb kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 06 12:49:19 archusb kernel: CR2: 000055b20aaf20c8 CR3: 000000071f610000 CR4: 0000000000750ee0
Oct 06 12:49:19 archusb kernel: PKRU: 55555554
Oct 06 12:49:19 archusb kernel: Call Trace:
Oct 06 12:49:19 archusb kernel:  <TASK>
Oct 06 12:49:19 archusb kernel:  bt_iter+0x50/0xa0
Oct 06 12:49:19 archusb kernel:  blk_mq_queue_tag_busy_iter+0x327/0x5e0
Oct 06 12:49:19 archusb kernel:  ? __blk_mq_complete_request_remote+0x20/0x20
Oct 06 12:49:19 archusb kernel:  ? finish_task_switch.isra.0+0x90/0x2d0
Oct 06 12:49:19 archusb kernel:  ? __blk_mq_complete_request_remote+0x20/0x20
Oct 06 12:49:19 archusb kernel:  blk_mq_timeout_work+0x69/0x150
Oct 06 12:49:19 archusb kernel:  process_one_work+0x1c7/0x380
Oct 06 12:49:19 archusb kernel:  worker_thread+0x51/0x390
Oct 06 12:49:19 archusb kernel:  ? rescuer_thread+0x3b0/0x3b0
Oct 06 12:49:19 archusb kernel:  kthread+0xde/0x110
Oct 06 12:49:19 archusb kernel:  ? kthread_complete_and_exit+0x20/0x20
Oct 06 12:49:19 archusb kernel:  ret_from_fork+0x22/0x30
Oct 06 12:49:19 archusb kernel:  </TASK>
Oct 06 12:49:19 archusb kernel: ---[ end trace 0000000000000000 ]---
Oct 06 12:49:19 archusb kernel: nvme nvme0: Shutdown timeout set to 10 seconds
Oct 06 12:49:19 archusb kernel: nvme nvme0: 16/0/0 default/read/poll queues


I now think it is a kernel issue again. How can I be of most help to further trace the problem? First I do is updating from rc7 to the 6.0 kernel.

Comment 22 hastegeldhastewas 2022-10-06 15:35:40 UTC

Created attachment 302944 [details]
after ssd replacement 3 outputs of "journalctl -b" from 3 different boots

Comment 23 Keith Busch 2022-10-06 17:14:34 UTC

Created attachment 302946 [details]
trace prints on completion

Long shot question: do you happen to have access to pcie protocol analyzer, like a LeCroy or Xgig?

Attaching a patch. If you could add that, and recompile the kernel, but change your .config so that NVMe is built-in (option 'y') instead of a module (option 'n'):

  CONFIG_NVME_CORE=y
  CONFIG_BLK_DEV_NVME=y

Then add to the kernel command line:

  trace-event=nvme:nvme_setup_cmd,nvme:nvme_complete_rq,nvme:nvme_sq

The trace-event commandline won't work if nvme is built as a module, so the built-in step is important.

After booting, run 'cat /sys/kernel/debug/tracing/trace' and attach the results.

Comment 24 hastegeldhastewas 2022-10-06 18:15:57 UTC

> do you happen to have access to pcie protocol analyzer, like a LeCroy or
> Xgig?

I do not. I am sorry. I know what the difference between a spinlock and a semaphore is, but I didn't ascend to such a high level yet.

---

I installed Windows 11 on the new SSD. It worked, but setting up the OS was much of a struggle.
There were multiple bluescreens and freezes during the initial setup process. This is the same behavior as with the first KC3000 SSD.
Now it seems to "mostly" work on windows, with which I mean I still suffer from occasional bluescreens, but could execute some programs.

I used "AS SSD Benchmark", "Kingston SSD Manager" and "CrystalDiskInfo". The tools seem to report no errors if I read them correctly. There is no firmware update for the ssd available. I will send the information to the Kingston support e-mail aswell.

I wonder why the normal SSD for my laptop works fine, while the Kingston one does not.
Also other people are using this SSD without any problems at least on windows. Might this be a Kingston Firmware, Ryzen 6000 or Lenovo Firmware problem?

I report back when I have news. I will own the SSD for another week before I need to send it back aswell. Until then I will compile the kernel according to your instructions.

Comment 25 hastegeldhastewas 2022-10-06 18:16:31 UTC

Created attachment 302947 [details]
Screenshots and export from kingston ssd manager

Comment 26 hastegeldhastewas 2022-10-06 18:16:50 UTC

Created attachment 302948 [details]
Screenshot from Crystaldiskinfo

Comment 27 hastegeldhastewas 2022-10-06 18:17:09 UTC

Created attachment 302949 [details]
Screenshot from AS SSD Benchmark

Comment 28 hastegeldhastewas 2022-10-06 23:33:19 UTC

I downloaded the 6.0 mainline kernel sources as tarball. Then applied both patches provided in this issue. The .config is again based on the default Arch Linux config from /proc/config.gz . The two settings in the .config for including the nvme modules into the kernel have been set. 

I compiled the new kernel and reconfigured my bootloader.

#cat /boot/loader/entries/linux-nvmetrace.conf
title    Arch Linux 6.0 Nvmetrace
linux    /vmlinuz-linux-nvmetrace
initrd   /initramfs-linux-nvmetrace.img
options  root=UUID=6d6c3551-a53c-425b-ab29-9cf52e1943f7 rw trace-event=nvme:nvme_setup_cmd,nvme:nvme_complete_rq,nvme:nvme_sq

Next I shortly upload the desired output.

Comment 29 hastegeldhastewas 2022-10-06 23:33:46 UTC

Created attachment 302951 [details]
nvmetrace kernel "journalctl -b"

Comment 30 hastegeldhastewas 2022-10-06 23:34:13 UTC

Created attachment 302952 [details]
nvmetrace kernel "zcat /proc/config.gz"

Comment 31 hastegeldhastewas 2022-10-06 23:35:03 UTC

Created attachment 302953 [details]
nvmetrace kernel "cat /sys/kernel/debug/tracing/trace"

Comment 32 hastegeldhastewas 2022-10-06 23:42:02 UTC

The compiled kernel is slow and also runs into a full system freeze after a while. The "fdisk -l" command was stuck as I generated the files so the issue was already clearly present. 

Once I saw a message indicating a kernel panic as the freeze happened, but couldn't see why.

Comment 33 Keith Busch 2022-10-07 02:27:48 UTC

(In reply to hastegeldhastewas from comment #24)
> > do you happen to have access to pcie protocol analyzer, like a LeCroy or
> > Xgig?
> 
> I do not. I am sorry. I know what the difference between a spinlock and a
> semaphore is, but I didn't ascend to such a high level yet.

That type of equipment would actually be going much lower level: snooping signals off copper wires :) You usually only find them in hardware labs, and I only ask about getting such traces if I strongly suspect a protocol violation (this is clearly in that arena).

Lacking wire traces, we have kernel traces, and that's often good enough. From the kernel trace you provided (thank you!), your device is definitely busted. The relevant snippets are as follows:

  1.093817: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=11, cmdid=128, nsid=1

The above is the first command in your trace dispatched to qid=11.

  32.151747: nvme_handle_cqe: nvme0: qid=11 sq_id=11 head=1 tail=1 cmdid:0(0)

And above is the first completion on qid=11. Two things to note here:

  1. The cqe's cmdid is 0, but the only outstanding command id on that queue is 128. The device posted a completion with the wrong cmdid.

  2. The timestamp is 30 seconds late. This is clearly a timeout polling loop that discovered the malformed CQE. The device failed to deliver an MSI for the posted CQE.

I haven't seen a controller this broken since the first Apple ones. I'll send a new patch with the same quirks that one is using.

Comment 34 Keith Busch 2022-10-07 02:40:25 UTC

Created attachment 302954 [details]
All the quirks

This will enforce the simplest possible queue configuration that nvme can do. When using this patch, there will be exactly one IO queue, one possible command ID, and one MSI vector.

Apply this atop a pristine 6.0+ kernel (please back-out all the previously requested patches), and you can also remove all the kernel cmdline options I previously requested.

Comment 35 hastegeldhastewas 2022-10-07 11:09:47 UTC

I will use "make defconfig" to create a kernel and apply the requested patch.

As I know the KC3000 uses the Phison E18 (PS5018-E18-41) controller. This one is also used on most if not all of the fastest SSDs available right now.
Am I assuming correctly that the encountered violation is something Phison has to fix and not Kingston?

Comment 36 hastegeldhastewas 2022-10-07 23:54:49 UTC

The defconfig kernel didn't work. After googling I found out this is expected on real hardware.
I continued compiling the kernel with the quirks based on the official unmodified Arch Linux config again. Then applied only the quirk patch.

Unfortunately commands like gdisk still hang. The logs show that the kernel applies the quirks as expected. One timeout message is present.

Comment 37 hastegeldhastewas 2022-10-07 23:55:17 UTC

Created attachment 302957 [details]
maxquirk kernel "dmesg"

Comment 38 hastegeldhastewas 2022-10-07 23:55:35 UTC

Created attachment 302958 [details]
maxquirk kernel "journalctl -b"

Comment 39 hastegeldhastewas 2022-10-19 13:30:25 UTC

Hello Keith and maybe the others,

I once again would like to politely ask for your expertise. I found someone in the internet, claiming the KC3000 works in his linux setup. 
After reading that, I organized another computer with an m.2 slot and the kingston KC3000 SSD works perfectly, although only via PCIe 3.0 . Nothing lags and there are no error-entries in the log.

In my understanding the PCIe lanes are basically wires which are connected directly from the cpu to the nvme ssd. I have a hard time to believe the laptop firmware has something todo with them.
But at this point in time all the arrows point at it. Do you have a technical explanation for that behavior?

I will attach all the logs and also a trace based on your tracing kernel patch.
Please note that in the logs nvme1 is the Kingston one, while nvme0 is the one of my friend. I even could execute smartctl without any problems and included an output aswell.

Comment 40 hastegeldhastewas 2022-10-19 13:32:37 UTC

Created attachment 303041 [details]
other computer, same kernel, same ssd, but it works

outputs of:
dmesg
journalctl_-b
lspci_-vv
smartctl_-a

Comment 41 hastegeldhastewas 2022-10-19 13:34:20 UTC

Created attachment 303042 [details]
other computer, same kernel, same ssd, but it works + tracing patch


this kernel has the tracing patch

outputs of:
dmesg
journalctl_-b
lspci_-vv
smartctl_-a
sys_kernel_debug_tracing_trace

Comment 42 Keith Busch 2022-10-19 14:12:20 UTC

Maybe disable pcie relaxed ordering. Some devices do this completely wrong. The "working" one has this disabled. If there's no bios option to turn it off, then we'll need yet another kernel patch to quirk it off.

Comment 43 hastegeldhastewas 2022-10-19 14:33:15 UTC

The bios is somehow limited. I don't know if there are hidden options and how to access them. But I could not find the setting you mentioned. So I need another quirk patch.

From a technical point of view:
1. Do you see any chance that the firmware of my laptop causes the problem? 
2. Is the PCIe connection terminated at the Phison E18 controller? If yes: Assuming kingston left the E18 unmodified, will the problem I encountered be present for all Phison E18 ssd's (which are a lot)?

Comment 44 Keith Busch 2022-10-19 15:12:33 UTC

Created attachment 303043 [details]
disable ro for kingston

In order to verify it successfully did what we want, you can run:

 # lspci -vv -s 04:00.0 | grep RlxOrd

If it shows "RlxOrd-", then it was successful in disabling the feature. If it says "RlxOrd+", then it was not.

Comment 45 Keith Busch 2022-10-19 15:17:07 UTC

(In reply to hastegeldhastewas from comment #43)
> 1. Do you see any chance that the firmware of my laptop causes the problem? 

I don't know, possibly. I think we'd need packet level tracing to know where to point the finger (device vs. platform), which is not a reasonable thing to obtain.

> 2. Is the PCIe connection terminated at the Phison E18 controller? If yes:
> Assuming kingston left the E18 unmodified, will the problem I encountered be
> present for all Phison E18 ssd's (which are a lot)?

My understanding is that these 3rd party controllers are provided with development kits for vendors to customize as they see fit, so they're not all equal. And though some models may have the same Phison NVMe controller, I have no idea if they're all sharing the same PCIe IP core.

Comment 46 hastegeldhastewas 2022-10-19 19:26:19 UTC

Thanks for your answers.

I applied the patch on the current mainline kernel 6.1-rc1 , but it unfortunately doesn't boot.

It can't find the root partition by its UUID. I will attach a photo. My keyboard isn't working in the emergency shell. The patch seems to mess up the pci connection to the usb controller. The desired partition is on the USB-stick I boot from.

I used the same steps to compile the kernel as always and copied the bootloader file with the definately correct UUID. If you can fix this easily I will happily compile another kernel.

If not I would say we stop the experiment. Whatever is going wrong seems to most likely not be related to linux. I of course am very thankful you still took the time to help me.

Comment 47 hastegeldhastewas 2022-10-19 19:26:56 UTC

Created attachment 303044 [details]
kernel without relaxed ordering fails to boot

Comment 48 Keith Busch 2022-10-19 19:46:18 UTC

Well that's weird. So boot succeeded without the pci quirk patch, but breaks with it? That really doesn't make sense right now; the only thing it does is toggle an optional PCIe config register on the nvme device and shouldn't affect anything to do with the root device.

Comment 49 hastegeldhastewas 2022-10-19 20:02:03 UTC

Okay, then I most likeli messed something up. I will try again.

Comment 50 hastegeldhastewas 2022-10-20 11:45:36 UTC

Hello Keith, 

I now managed to boot 4 differnet kernels (so the UUID issue was something I did wrong). Two of them are without modification and two got the no relaxed ordering patch.

linux-6.0.2
linux-6.0.2-noro
linux-6.1.0-rc1
linux-6.1.0-rc1-noro

The "RlxOrd-" quirk doesn't seem to be working.
I tried to verify by using "lspci -vv | grep RlxOrd" and I can only see "RlxOrd+" and no minus. The ssd seems to be "02:00.0" and also shows "RlxOrd+".

I am currently investigating if I can see the reason or if something I did wrong is the reason.

Comment 51 hastegeldhastewas 2022-10-20 11:46:05 UTC

Created attachment 303048 [details]
logs_for_four_kernels_some_with_no_relaxed_ordering

Comment 52 hastegeldhastewas 2022-10-20 11:58:52 UTC

Created attachment 303049 [details]
output of lspci -nn

Vendor and device id seem to be correct. I will compile another kernel.

Comment 53 hastegeldhastewas 2022-10-20 12:15:08 UTC

Two new kernels are compiling. One of them uses the no relaxed reordering patch applied to 6.1-rc1, and the other one uses the same patch and kernel but I changed the class to PCI_CLASS_STORAGE_EXPRESS.

Like so:

/mnt/ssd/dl_unrar/linux-6.1-rc1-noro-otherclass> cat drivers/pci/quirks.c| grep -n1 5013
4331-
4332:DECLARE_PCI_FIXUP_CLASS_EARLY(0x2646, 0x5013, PCI_CLASS_STORAGE_EXPRESS, 8,
4333-			      quirk_relaxedordering_disable);

/mnt/ssd/dl_unrar/linux-6.1-rc1-noro> cat drivers/pci/quirks.c| grep -n1 5013
4331-
4332:DECLARE_PCI_FIXUP_CLASS_EARLY(0x2646, 0x5013, PCI_CLASS_NOT_DEFINED, 8,
4333-			      quirk_relaxedordering_disable);

Comment 54 hastegeldhastewas 2022-10-20 13:47:56 UTC

Created attachment 303054 [details]
logs_kernels_with_no_relaxed_ordering quirk not working

No, both kernels do not seem to apply the quirk and still have the error. See the outputs/logs attached.

There is a line saying something about an nvme quirk for 02:00.0:
Oct 20 13:27:55 archusb kernel: nvme 0000:02:00.0: platform quirk: setting simple suspend

Only idea left for me is that one quirk overwrites the other.

Comment 55 Keith Busch 2022-10-20 14:47:09 UTC

Huh, I guess I don't know how pci early quirks work. I'll mess with this on emulation and see what I'm doing wrong.

You can manually turn it off at the command line after boot to see what happens. If your drive's BDf is 02:00.0, the command looks like:

  # setpci -s 02:00.0 CAP_EXP+8.w=0:10

Verify the RlxOrd bit in lspci is set to '-' after to make sure it worked. If the above doesn't toggle it as expected ... well, I'd be stumped ... again.

Comment 56 Keith Busch 2022-10-20 15:16:22 UTC

(In reply to Keith Busch from comment #55)
> Huh, I guess I don't know how pci early quirks work. I'll mess with this on
> emulation and see what I'm doing wrong.

Ah, I see this quirk only works for root ports. This needs an entirely different identifer in that case. Instead of the '0x2646, 0x5013' combo, it needs 'PCI_VENDOR_ID_AMD, 0x14ba'.

Comment 57 hastegeldhastewas 2022-10-20 15:20:04 UTC

Created attachment 303056 [details]
logs_setpci_last_try_linux-6.1.0-rc1-noro

Hey Keith,

I tried the setpci command as soon as I logged in. Applying the setting worked, but unfortunately the errors persisted. Damn, that could have been it. One last log tarball is attached to this message.

To get my money back I will now bring the ssd back to the post office. So I can't perform any more tests and the issue can be closed.

I thank you again very much for your endurance. I felt very well taken care of.
Maybe we read each other again during my next issue for my next ssd.

Comment 58 hastegeldhastewas 2022-10-20 15:22:20 UTC

Okay I will set PCI_VENDOR_ID_AMD as my last last try now.

like this:
DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x14ba, PCI_CLASS_STORAGE_EXPRESS, 8, quirk_relaxedordering_disable);

Comment 59 hastegeldhastewas 2022-10-20 15:28:58 UTC

Two kernels are in the making:

/mnt/ssd/dl_unrar/linux-6.1-rc1-noro-otherids> cat drivers/pci/quirks.c| grep -n1 0x14ba
4331-
4332:DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x14ba, PCI_CLASS_NOT_DEFINED, 8,
4333-			      quirk_relaxedordering_disable);

/mnt/ssd/dl_unrar/linux-6.1-rc1-noro-otherids-otherclass> cat drivers/pci/quirks.c| grep -n1 0x14ba
4331-
4332:DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x14ba, PCI_CLASS_STORAGE_EXPRESS, 8,
4333-			      quirk_relaxedordering_disable);

Comment 60 hastegeldhastewas 2022-10-20 17:30:26 UTC

Both kernels didn't succeed in applying the quirk. I only see "RlxdOrd+" for both new kernels.

Damn. I will bring back the haunted ssd now.
Thank you again.

Note You need to log in before you can comment on or make changes to this bug.