Bug 216049

Summary: Unable to boot system, won't mount because of missing partitions because of "nvme nvme1: globally duplicate IDs for nsid 1"
Product: IO/Storage Reporter: Jerome C (me)
Component: NVMeAssignee: IO/NVME Virtual Default Assignee (io_nvme)
Status: RESOLVED INVALID    
Severity: normal CC: cegolf, freestuff0002, kbusch, miroslav, nikkikom, phulvios, tiagodfer
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.18.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: Set bogus namespace quirk for PCI Phison E12 NVMe controller - based on kernel 5.18.3
Bogus NID quirk for Netac NV3000

Description Jerome C 2022-05-30 09:51:05 UTC
I have 2 NVMe (PCIe) SSD's that are the same model and firmware and both installed in the system and booted at the same time ( my guess to maybe why they're the same ID ). 


The filesystem is BTRFS ( my root ) and are raid 1 configured. BTRFS will not continue to mount ( by default ) if all the block devices that are part of the filesystem are not there. 


Since Linux 5.18, there has been a change ( https://elixir.bootlin.com/linux/v5.18/source/drivers/nvme/host/core.c#L3806 ) where it checks if the NVMe device's UUID ( not filesystem UUID but NVMe and my device doesn't support this ) or NGUID ( this is zero on both devices and the code doesn't compare this or EUI64 if they are null or zero ) or EUI64 ( this is set on both and they're the same ) already exists globally and if so will no longer register that block device partitions. 


I've done some searching and no answer found and I don't entirely understand the NVMe command line ( not documented enough ). 


My thinking at the moment is to change the EUI64 but I don't understand how to do that with the nvme command line tool without reformatting my device. 

Is this the intended effect? 


The previous kernel I used was 5.17.9 from Arch Linux.
Comment 1 Keith Busch 2022-05-30 14:26:46 UTC
The change to prevent duplicates was on purpose. Duplicates break udev's ability to create reliable by-id symlinks, which can cause data corruption.

The EUI/NGUID/UUID from the nvme controller are supposed to be globally unique and are set by the vendor. If different namespaces report the same EUI64, then your vendor has messed it up.

Except for perhaps vendor specific tools, this is not a user controllable identifier.

If your vendor can't fix it, then we would have to quirk the driver for your vendor:device to ignore the bogus values.
Comment 2 Jerome C 2022-05-31 08:54:45 UTC
PNY firmware update tools only support Microsoft Windows which i don't use.

I've contacted PNY, just waiting for a response but for now I've added this quirk

{
	.vid = 0x1987,
	.mn = "PNY CS3030 1000GB SSD",
	.quirks = NVME_QUIRK_BOGUS_NID,
}

which is working for me.
Comment 3 Keith Busch 2022-05-31 14:09:14 UTC
Sounds good. Let's post a patch to the mailing list if you don't get a good response from Phison.

I think we'd have to make the quirk a little less specific, though. I'm assuming the same controller and firmware with this issue run on more than just the 100GB capacity model.
Comment 4 Jerome C 2022-05-31 18:40:40 UTC
I have a laptop too with this same model SSD ( EUI64 is different from my desktop ) and the code I said that worked earlier was a draft copy from my laptop which is using a different firmware revision, however I did the patch on my desktop from scratch not realizing that the model number shown on my laptop ( "PNY CS3030 1000GB SSD" ) is not the same on my desktop ( "PNY CS3030 1TB SSD" ).

The code I said that worked actually looks like this

{
	.vid = 0x1987,
	.mn = "PNY CS3030 1TB SSD",
	.quirks = NVME_QUIRK_BOGUS_NID,
}

I've changed the code to this based on what you said earlier

{
	.vid = 0x1987,
	.fr = "CS303225",
	.quirks = NVME_QUIRK_BOGUS_NID,
}

I've tested it and works for me on Linux 5.18.1
Comment 5 Keith Busch 2022-05-31 18:49:15 UTC
I'm okay with getting the quirk applied in-tree for the current merge window since you confirm it is successful.

Unless the vendor confirms the issues is specific to a firmware revision, though, I think we ought to use the common pcie vid:did table in drivers/nvme/host/pci.c to set the quirk bit rather than use the more complex quirk matching logic.
Comment 6 David 2022-06-11 04:38:00 UTC
Hello,

I have the same issue Jerome described. I was unsure if I should create a new bug report of I should post here. Please let me know if you want me to open up a new report. 

I currently have two ssds, both are Sk Hynix Gold P31 1TB. I am still able to boot into my system however my second ssd does not show up at all. I do get the same warning "nvme nvme1: globally duplicate ids for nsid 1". I am not really the most knowledgeable as I am relatively new to linux, so I am unsure how to properly add a quirk. I didn't want to mess up my system so I just reverted to kernel 5.17.9. But I thought I should just report that I also am having the same issue. 

Please do let me know if I should just a separate bug report. I am using Opensuse Tumbleweed. Thank you
Comment 7 Keith Busch 2022-06-11 05:07:22 UTC
No need to open a new bugzilla. There are already several for this issue :)

If you can provide the pci vendor and device ids for your nvme devices, we can get a fix applied for the next release.
Comment 8 Jerome C 2022-06-11 17:56:47 UTC
Created attachment 301157 [details]
Set bogus namespace quirk for PCI Phison E12 NVMe controller - based on kernel 5.18.3

Which reminds me, PHY still hasn't responded.

This is the patch I made. Sets quirk via "pci.c" instead of "core.c"

I've tested this and works for me on kernel 5.18.3
Comment 9 David 2022-06-11 18:49:05 UTC
vid = 0x1c5c
vendor = SK Hynix
mn = SHGP31-1000GM-2                         
fr = 41060C20 (if you need)

This is the info for both my ssds. This is for the 1tb model, the SK Hynix Gold P31 1 TB ssd to be specific. 

Please let me know if you need any other info. Appreciate the help! Thanks
Comment 10 David 2022-06-11 19:11:20 UTC
Thank you for the attachment Jerome. I will look into trying to get it to work. 

Sorry Keith I forgot to include the device id. It is 0x174a. 

Thanks to both of you :)
Comment 11 Keith Busch 2022-06-13 14:02:03 UTC
Do you want me to send the patches for both devices, or would anyone else like to take the honors?
Comment 12 Chris Egolf 2022-07-04 21:11:29 UTC
I've run into the same issue as the in the description, but with a slightly different NVMe model.  I'm not sure if this is the best place to report this, but it looks like these changes are getting into recent kernels.  

My devices are:

Device:	04:00.0
Class:	Non-Volatile memory controller [0108]
Vendor:	Phison Electronics Corporation [1987]
Device:	E16 PCIe4 NVMe Controller [5016]
SVendor:	Phison Electronics Corporation [1987]
SDevice:	E16 PCIe4 NVMe Controller [5016]
Rev:	01
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme1n1          /dev/ng1n1            19091050000661       PCIe SSD                                 1         500.11  GB / 500.11  GB    512   B +  0 B   EGFM11.1
/dev/nvme0n1          /dev/ng0n1            19091050000936       PCIe SSD                                 1         500.11  GB / 500.11  GB    512   B +  0 B   EGFM11.1


I applied a patch found on ArchLinux forums, and applied to a generic 5.18.9 kernel and it fixed the problem for me.  

The link to the patch is https://bugs.archlinux.org/task/74916
Comment 13 Keith Busch 2022-07-05 03:02:07 UTC
(In reply to Chris Egolf from comment #12)
> I've run into the same issue as the in the description, but with a slightly
> different NVMe model.  I'm not sure if this is the best place to report
> this, but it looks like these changes are getting into recent kernels.

The fastest way to get this into a stable build is to send the patch to the official mailing list, linux-nvme@lists.infradead.org.
Comment 14 Tiago 2022-11-09 00:53:26 UTC
I've found the same bug using two Netac NV7000 NVME SSDs, wrote the patch and send it to the mailing list, but my email was blocked.
Here's the commit: https://github.com/torvalds/linux/commit/3943eaeb9ac167ff6f8d099459ab158eb1fc8bdb
Comment 15 Tiago 2022-11-09 00:56:26 UTC
(In reply to Tiago from comment #14)
> I've found the same bug using two Netac NV7000 NVME SSDs, wrote the patch
> and send it to the mailing list, but my email was blocked.
> Here's the commit:
> https://github.com/torvalds/linux/commit/
> 3943eaeb9ac167ff6f8d099459ab158eb1fc8bdb

Sent the email again in plain text.
Comment 16 Elmer Miroslav Mosher Golovin 2023-03-08 13:43:12 UTC
Created attachment 303900 [details]
Bogus NID quirk for Netac NV3000

I am experiencing the same issue. Two NVMe SSDs, Netac NV3000.

Product page: https://www.netac.com/product/NV3000--120.html

I've already tried contacting their support for a firmware update/editor to no avail.

Device ID is as follows:

Netac Technology Co.,Ltd Device [1f40:1202]

Please find attached a sample patch.
Comment 17 Elmer Miroslav Mosher Golovin 2023-03-09 20:59:29 UTC
(In reply to Elmer Miroslav Mosher Golovin from comment #16)
> Created attachment 303900 [details]
> Bogus NID quirk for Netac NV3000
> 
> I am experiencing the same issue. Two NVMe SSDs, Netac NV3000.
> 
> Product page: https://www.netac.com/product/NV3000--120.html
> 
> I've already tried contacting their support for a firmware update/editor to
> no avail.
> 
> Device ID is as follows:
> 
> Netac Technology Co.,Ltd Device [1f40:1202]
> 
> Please find attached a sample patch.

Fixed. See:

https://lore.kernel.org/linux-nvme/20230308161929.18446-1-miroslav@mishamosher.com/T/
Comment 18 Nikki Chumakov 2023-08-24 22:33:38 UTC
Would you please consider the patch below? 
I have Minisforum UM790 Pro with AMD 7940HS SOC and 2xSSD Fanxiang S880 2TB SSDs.
This patch helped kernel to recognize the second SSD.

diff -Nuar linux-6.2.0/drivers/nvme/host/pci.c linux-6.2.0-ssd/drivers/nvme/host/pci.c
--- linux-6.2.0/drivers/nvme/host/pci.c	2023-08-25 00:43:41.000000000 +0300
+++ linux-6.2.0-ssd/drivers/nvme/host/pci.c	2023-08-25 00:58:41.744030582 +0300
@@ -3498,6 +3498,8 @@
 				NVME_QUIRK_IGNORE_DEV_SUBNQN, },
 	{ PCI_DEVICE(0x10ec, 0x5763), /* TEAMGROUP T-FORCE CARDEA ZERO Z330 SSD */
 		.driver_data = NVME_QUIRK_BOGUS_NID, },
+	{ PCI_VDEVICE(AMD, 0xb000), /* AMD RAID Bottom Device */
+		.driver_data = NVME_QUIRK_BOGUS_NID, },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0061),
 		.driver_data = NVME_QUIRK_DMA_ADDRESS_BITS_48, },
 	{ PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0065),
Comment 19 Fulvio 2023-09-29 12:21:43 UTC
Hi, I have the same problem...

My computer is a MinisForum UM790 Pro like the user above, and I'm using Ubuntu Mate 22.03

The NVMEs are two "Gudga GXF Series" with 2TB each.

How can I get a patch? Sorry if it is not the correct forum for this issue.

I generate some data below to detail my case, thank you very much.


uname -r
6.2.0-33-generic


lspci -nn -d ::0108

01:00.0 Non-Volatile memory controller [0108]: MAXIO Technology (Hangzhou) Ltd. Device [1e4b:1602] (rev 01)
04:00.0 Non-Volatile memory controller [0108]: MAXIO Technology (Hangzhou) Ltd. Device [1e4b:1602] (rev 01)

lspci | grep MAXIO
01:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. Device 1602 (rev 01)
04:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. Device 1602 (rev 01)


sudo  dmesg | grep nvme
[    1.013240] nvme 0000:01:00.0: platform quirk: setting simple suspend
[    1.013242] nvme 0000:04:00.0: platform quirk: setting simple suspend
[    1.013275] nvme nvme0: pci function 0000:01:00.0
[    1.013280] nvme nvme1: pci function 0000:04:00.0
[    1.018146] nvme nvme1: missing or invalid SUBNQN field.
[    1.022367] nvme nvme1: allocated 32 MiB host memory buffer.
[    1.022607] nvme nvme0: missing or invalid SUBNQN field.
[    1.026650] nvme nvme0: allocated 32 MiB host memory buffer.
[    1.028912] nvme nvme1: 8/0/0 default/read/poll queues
[    1.031664]  nvme1n1: p1 p2 p3 p4 p5
[    1.033383] nvme nvme0: 8/0/0 default/read/poll queues
[    1.035260] nvme nvme0: globally duplicate IDs for nsid 1
[    1.035270] nvme nvme0: VID:DID 1e4b:1602 model:GXF-2TB firmware:SN11590
[    5.109701] EXT4-fs (nvme1n1p5): mounted filesystem d5f43ef0-2ce2-499f-9585-1ee6aa5a6ad8 with ordered data mode. Quota mode: none.
[    5.392166] EXT4-fs (nvme1n1p5): re-mounted d5f43ef0-2ce2-499f-9585-1ee6aa5a6ad8. Quota mode: none.

ls -l /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root 13 set 28 21:48 nvme-eui.0000000000000000d0d0d0d0d0d0d0d0 -> ../../nvme1n1
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-eui.0000000000000000d0d0d0d0d0d0d0d0-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-eui.0000000000000000d0d0d0d0d0d0d0d0-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-eui.0000000000000000d0d0d0d0d0d0d0d0-part3 -> ../../nvme1n1p3
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-eui.0000000000000000d0d0d0d0d0d0d0d0-part4 -> ../../nvme1n1p4
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-eui.0000000000000000d0d0d0d0d0d0d0d0-part5 -> ../../nvme1n1p5
lrwxrwxrwx 1 root root 13 set 28 21:48 nvme-GXF-2TB_0009386001049 -> ../../nvme1n1
lrwxrwxrwx 1 root root 13 set 28 21:48 nvme-GXF-2TB_0009386001049_1 -> ../../nvme1n1
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-GXF-2TB_0009386001049_1-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-GXF-2TB_0009386001049_1-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-GXF-2TB_0009386001049_1-part3 -> ../../nvme1n1p3
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-GXF-2TB_0009386001049_1-part4 -> ../../nvme1n1p4
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-GXF-2TB_0009386001049_1-part5 -> ../../nvme1n1p5
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-GXF-2TB_0009386001049-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-GXF-2TB_0009386001049-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-GXF-2TB_0009386001049-part3 -> ../../nvme1n1p3
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-GXF-2TB_0009386001049-part4 -> ../../nvme1n1p4
lrwxrwxrwx 1 root root 15 set 28 21:48 nvme-GXF-2TB_0009386001049-part5 -> ../../nvme1n1p5

lsblk -a
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0     4K  1 loop /snap/bare/5
loop1         7:1    0  63,4M  1 loop /snap/core20/1974
loop2         7:2    0  73,9M  1 loop /snap/core22/858
loop3         7:3    0 237,2M  1 loop /snap/firefox/2987
loop4         7:4    0 349,7M  1 loop /snap/gnome-3-38-2004/143
loop5         7:5    0 485,5M  1 loop /snap/gnome-42-2204/120
loop6         7:6    0  91,7M  1 loop /snap/gtk-common-themes/1535
loop7         7:7    0  53,3M  1 loop /snap/snapd/19457
loop8         7:8    0   452K  1 loop /snap/snapd-desktop-integration/83
loop9         7:9    0    16K  1 loop /snap/software-boutique/57
loop10        7:10   0  13,5M  1 loop /snap/ubuntu-mate-welcome/720
loop11        7:11   0     0B  0 loop 
nvme1n1     259:0    0   1,9T  0 disk 
├─nvme1n1p1 259:1    0   100M  0 part /boot/efi
├─nvme1n1p2 259:2    0    16M  0 part 
├─nvme1n1p3 259:3    0   1,5T  0 part 
├─nvme1n1p4 259:4    0   625M  0 part 
└─nvme1n1p5 259:5    0 365,8G  0 part /var/snap/firefox/common/host-hunspell