Bug 217863 - Lexar NM790 SSDs are not recognized anymore after 6.1.50 LTS
Summary: Lexar NM790 SSDs are not recognized anymore after 6.1.50 LTS
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: NVMe (show other bugs)
Hardware: AMD Linux
: P3 high
Assignee: IO/NVME Virtual Default Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-02 14:01 UTC by Claudio Sampaio
Modified: 2023-12-01 04:59 UTC (History)
11 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
lexar patch (976 bytes, patch)
2023-09-07 09:04 UTC, Thomas Mann
Details | Diff
proposed new patch (1.62 KB, patch)
2023-09-08 16:11 UTC, Felix Yan
Details | Diff

Description Claudio Sampaio 2023-09-02 14:01:18 UTC
I bought a new 4 TB Lexar NM790 and I was using kernel 6.3.13 at the time. It wasn't recognized, with these messages in dmesg:

[ 358.950147] nvme nvme0: pci function 0000:06:00.0
[ 358.958327] nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0

My other NVMe appears correctly in the nvme list though.


So I tried using other kernels I had installed at the time: 6.3.7, 6.4.10, 6.5.0rc6, 6.5.0, 6.5.1 and none of these recognized the disk.
I installed the 6.1.50 lts kernel from arch repositories (I can compile my own too if this would be an issue) and then the device was correctly recognized:

[    4.654613] nvme 0000:06:00.0: platform quirk: setting simple suspend
[    4.654632] nvme nvme0: pci function 0000:06:00.0
[    4.667290] nvme nvme0: allocated 40 MiB host memory buffer.
[    4.709473] nvme nvme0: 16/0/0 default/read/poll queues

And then it appears alongside the other nvme:
[15:58] [6836] [patola@risadinha patola]% sudo nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme1n1          /dev/ng1n1            2K36292CEKD9         XPG GAMMIX S11 Pro                       0x1          1.39  TB /   2.05  TB    512   B +  0 B   42B4S9NA
/dev/nvme0n1          /dev/ng0n1            NF9755R000057P2202   Lexar SSD NM790 4TB                      0x1          4.10  TB /   4.10  TB    512   B +  0 B   12237   

And I was able to read and write from it, pvcreate and so on, so it's working. But I can't use a higher kernel version so apparently this is a regression.

There are other people with the same NVMe model (although different capacities) reporting the same issue on this reddit thread: https://www.reddit.com/r/archlinux/comments/15xbxeo/nvme_device_not_ready_aborting_initialisation/

I am not sure but I think this issue might've been introducted after this patch: https://bugzilla.kernel.org/show_bug.cgi?id=215742
Comment 1 Claudio Sampaio 2023-09-02 14:15:12 UTC
Sorry, forgot to tell, PCI ID of my device: 1d97:1602
Comment 2 Claudio Sampaio 2023-09-02 19:04:29 UTC
Adding the two lines

│ 3457   { PCI_DEVICE(0x1d97, 0x1602), /* Lexar NM790 */
│ 3458   │ .driver_data = NVME_QUIRK_BOGUS_NID, },

in file drivers/nvme/host/pci.c made my NVMe work correctly. Compiled a new 6.5.1 kernel and everything works.
Comment 3 Mario Limonciello (AMD) 2023-09-05 16:52:06 UTC
As you already identified a solution; can you format this as a proper patch and send it to the linux-nvme mailing list?
Comment 4 Mario Limonciello (AMD) 2023-09-05 16:54:21 UTC
Sorry; I see there is a related conversation for this.
https://lore.kernel.org/linux-nvme/CA+4wXKAe3==K1CoSvmRpbmPcR9QK+EuzDB-cNteNB34Cgur_0w@mail.gmail.com/T/#ma4e199c887a1b0d1feacb5331a3a6ad9dabeb91f

Can you please reproduce this with 6.5.1 as requested?
Comment 5 Thomas Mann 2023-09-07 09:04:06 UTC
Created attachment 305061 [details]
lexar patch

i can reproduce the bug too. using this patch fixes the problem for me. the patch is from: https://lore.kernel.org/lkml/7cd693dd-a6d7-4aab-aef0-76a8366ceee6@archlinux.org/

the dmesg output:
[    6.547090] nvme nvme0: pci function 0000:3d:00.0
[    6.559676] nvme nvme0: [PATCH] nvme core got timeout 0
[    6.559683] nvme nvme0: [PATCH] nvme_wait_ready now wait for 2, previously 0
[    6.563149] nvme nvme0: allocated 40 MiB host memory buffer.
Comment 6 Thomas Mann 2023-09-07 09:10:38 UTC
i can reproduce the bug too - with 6.5.2
Comment 7 Keith Busch 2023-09-07 12:54:20 UTC
(In reply to Thomas Mann from comment #5)
> 
> i can reproduce the bug too. using this patch fixes the problem for me. the
> patch is from:
> https://lore.kernel.org/lkml/7cd693dd-a6d7-4aab-aef0-76a8366ceee6@archlinux.
> org/

This patch makes sense from the observation. The device is not providing an appropriate time to ready, so making it larger sounds like the right direction. It'll probably need to be a new quirk, though.
Comment 8 Felix Yan 2023-09-08 16:11:23 UTC
Created attachment 305072 [details]
proposed new patch

Hi,

I think I have found a better solution. Please try my new attached patch if you are interested :)
Comment 9 Thomas Mann 2023-09-10 00:31:58 UTC
(In reply to Felix Yan from comment #8)
> Created attachment 305072 [details]
> proposed new patch
> 
> Hi,
> 
> I think I have found a better solution. Please try my new attached patch if
> you are interested :)

Hi,

the patch fixes the bug, at least i couldn't reproduce the error by now.

[    6.547047] nvme nvme0: pci function 0000:3d:00.0
[    6.556641] nvme nvme0: Ignoring bogus CRTO (0), falling back to NVME_CAP_TIMEOUT (255)
[    6.562276] nvme nvme0: allocated 40 MiB host memory buffer.
[    6.572543] nvme nvme0: 8/0/0 default/read/poll queues
[    6.616338]  nvme0n1: p1 p2
Comment 10 Andreas Hauser 2023-09-15 00:32:55 UTC
I can confirm this new patch fixes the problem for me on Archlinux with kernel linux-next.

Thanks a lot!

Best
Andreas
Comment 11 Damian Pala 2023-09-22 08:32:15 UTC
Hi!

I had two problems with my Lexar NM790:

1. NVME Device not ready at booting
2. Freeze at waking up from sleep mode that occurred in about 75 % of wake ups. Hard reset was needed.

This new patch: https://bugzilla.kernel.org/attachment.cgi?id=305072&action=diff
solves all those problems completely. I have Asus ExpertBook B9400 and Kubuntu 23.04 on 6.2.0-31 Kernel.

Thanks guys!
Comment 12 Keith Busch 2023-09-22 14:41:36 UTC
Upstream and stable have applied an appropriate patch to address this bz, so should be fixed at the next release tag.
Comment 13 Ruediger Reifen 2023-09-22 18:23:51 UTC
Hi,
another confirm here. I tested with (2x) 4TB Lexar NM790 (0x1d97, 0x1602) with build based on kernel-6.2.16.
Issues were:
1. (allmost every boot) nvme nvme1: Device not ready; aborting initialisation, CSTS=0x0
2. (occasionally) nvme nvme0: missing or invalid SUBNQN field.

After applying the patch from Felix
 https://bugzilla.kernel.org/attachment.cgi?id=305072&action=diff
no issues seen anymore.

I'm wondering if the invalid SUBNQN field is addressed by the patch too.

Thanks
Comment 14 Tripmag 2023-09-23 11:24:59 UTC
I have the same problem with a 4TB Lexar NM790
I only use Linux occasionally and I'm basically lost in it... :D
Could someone explain to me how to proceed and apply the patch for a new installation of Ubuntu 22.04?
Thanks
Comment 15 Andreas Hauser 2023-09-23 12:21:31 UTC
@Tripmag then you are probably on kernel version 5.15.x. This patch fixes an initialization problem of the controller on the SSD that was made visible after kernel version 6.1.x.

You would need a kernel new enough (> ?) to support the drive and old enough <=6.1.x.
With newer kernels than 6.1.x you would need to build a custom kernel with this patch until your linux distribution catches up.

In your case it is probably best to take this to your distribution and ask people there what is best for you. Someone might just have a fitting repo with a kernel or suggest an upgrade to a newer version of the distro.
Comment 16 Thomas Mann 2023-09-25 08:59:58 UTC
6.5.5 fixes the bug, aka i couldn't reproduce the bug by now
Comment 17 Tripmag 2023-09-25 17:31:12 UTC
thank you, I will try to go this way again

I also thought about a fresh installation on a 1TB disk, then an update + patch and cloning to a 4TB Lexar
but I've never done a kernel patch and it seems out of my league...

maybe I'll try to install 23.10 on 1TB update kernel to 6.5.5 and clone to 4TB
Comment 18 Tripmag 2023-09-25 20:53:53 UTC
OK install 23.10 on 1TB update kernel to 6.5.5 and clone to 4TB this is my working variant for now and an opportunity to test this newer version
Comment 19 Damian Pala 2023-09-30 14:12:36 UTC
For me it is safer to use Kernel tested with my distro version. I am using Kubuntu so probably I will need to patch and build Kernel until my distro will receive at least 6.5.5 version. If someone needs to go the same way, here are tutorial how to patch and build Kernel:
https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel
https://wiki.ubuntu.com/Kernel/Dev/KernelGitGuide

Patch using git:
https://www.specbee.com/blogs/how-create-and-apply-patch-git-diff-and-git-apply-commands-your-drupal-website

I am using secure boot so signing Kernel built is also crucial:
https://gloveboxes.github.io/Ubuntu-for-Azure-Developers/docs/signing-kernel-for-secure-boot.html

I had some problems with building Kernel 6.2.0 with rust so I have built it without rust using this command:
$ cd debian/build/build-generic
$ make bindeb-pkg ARCH=x86 CROSS_COMPILE=x86_64-linux-gnu- HOSTCC=x86_64-linux-gnu-gcc-12 CC=x86_64-linux-gnu-gcc-12 KERNELVERSION=6.2.0-31-generic CONFIG_DEBUG_SECTION_MISMATCH=y KBUILD_BUILD_VERSION="31" LOCALVERSION= localver-extra= CFLAGS_MODULE="-DPKG_ABI=31" PYTHON=python3 O=/home/haz/lunar/debian/build/build-generic -j8 olddefconfig 

Good Luck!
Comment 20 Tomasz 2023-10-12 16:09:04 UTC
Sorry for commenting on a closed bug, I have a 2TB Lexar NM790 which only works with kernel 6.1, with a livecd with kernel 6.5.7 it's effectively invisible during boot, with earlier versions (6.2-6.5.x) I had the same issue as the bug report, now I don't even have that. The only thing I see is:

nvme 0000:02:00.0: platform quirk: setting simple suspend

lsblk still shows no nvme devices. How do I report this?
Comment 21 Claudio Sampaio 2023-10-12 18:43:12 UTC
(In reply to Tomasz from comment #20)
> Sorry for commenting on a closed bug, I have a 2TB Lexar NM790 which only
> works with kernel 6.1, with a livecd with kernel 6.5.7 it's effectively
> invisible during boot, with earlier versions (6.2-6.5.x) I had the same
> issue as the bug report, now I don't even have that. The only thing I see is:
> 
> nvme 0000:02:00.0: platform quirk: setting simple suspend
> 
> lsblk still shows no nvme devices. How do I report this?

Try using kernel 6.6.0-rc5. It has the fix, I'm using it right now and it also handles my AMD CPU and GPU better.
Comment 22 Yijun Zhao 2023-12-01 04:54:44 UTC
seems all ssd using the MaxIO MAP1602A will be affected. 
I use Acer Predator SSD GM7 M.2 4TB and has same issue. Acer GM7 and Lexar NM790 both use MaxIO MAP1602A

Note You need to log in before you can comment on or make changes to this bug.