Same nguid with latest firmware version: [~]# nvme id-ns /dev/nvme2 -n 1|grep -E "nguid|i64" nguid : 0100000001000000e4d25c3a83874bf0 eui64 : 0000000000000000 root@canas4[~]# nvme id-ns /dev/nvme0 -n 1|grep -E "nguid|i64" nguid : 0100000001000000e4d25c3a83874bf0 eui64 : 0000000000000000 System doesn't like this: # dmesg -T |grep -i nvme0 [Tue Oct 3 15:06:40 2023] nvme nvme0: pci function 0000:03:00.0 [Tue Oct 3 15:06:40 2023] nvme nvme0: failed to register the CMB [Tue Oct 3 15:06:40 2023] nvme nvme0: 48/0/0 default/read/poll queues [Tue Oct 3 15:06:40 2023] nvme nvme0: VID:DID 8086:0a54 model:INTEL SSDPD2KS019T7 firmware:QDAA0130 [Tue Oct 3 15:06:40 2023] nvme nvme0: ignoring nsid 1 because of duplicate IDs One of the mirrored disk pairs is "lost" now.
Not sure if you need this part now as the id is noted previously: [~]# lspci -nn -d ::0108|grep 0a54 03:00.0 Non-Volatile memory controller [0108]: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] [8086:0a54] 42:00.0 Non-Volatile memory controller [0108]: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] [8086:0a54] Not sure if I've missed anything.
Not sure if anything else is needed. However, I think the only only change should add the "NVME_QUIRK_BOGUS_NID" to the drivers/nvme/host/pci.c based on what I've read so far.
Is this with a recent kernel? The default behavior now should already handle this.
Created attachment 305207 [details] attachment-21992-0.html Looks like a kernel 6.1.50 from the TrueNAS peeps. I originally submitted a bug with them ( https://www.truenas.com/community/threads/bluefin-to-cobia-rc1-drive-now-fails-with-duplicate-ids.113205/) and seemed to think the best course of action would be to check/fix with upstream first. However, I did add a note yesterday ( https://www.truenas.com/community/threads/bluefin-to-cobia-rc1-drive-now-fails-with-duplicate-ids.113205/post-784010) asking them to validate that they have applied a patch from 6.1.40 from July with commit ac522fc6c3165fd0daa2f8da7e07d5f800586daa that will > Relax our check for them for so that it doesn't reject the probe on > single-ported PCIe devices, but prints a big warning instead. The current upstream pci.c code does not seem to indicate a "NVME_QUIRK_BOGUS_NID", however. Basically, I'm not sure who or what is to blame ATM other than I randomly "lose" one of two drives in an array on reboot due a duplice GUID :( On Thu, Oct 12, 2023 at 10:43 AM <bugzilla-daemon@kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217981 > <https://bugzilla.kernel.org/show_bug.cgi?id=217981> > > Keith Busch (kbusch@kernel.org) changed: > > What |Removed |Added > > ---------------------------------------------------------------------------- > CC| |kbusch@kernel.org > > --- Comment #3 from Keith Busch (kbusch@kernel.org) --- > Is this with a recent kernel? The default behavior now should already > handle > this. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You reported the bug. > Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived. Cassens
Sorry for the odd email response... I will attempt to remember to use the bug submission page instead to help avoid confusion.
Oh, that version has the "fix" I mentioned, so must mean your controller is claiming "CMIC", or multi-controller capabilities. I'll apply a kernel quirk for the provided device ID.
Is there a way for me to validate the CMIC attribute?
'nvme id-ctrl /dev/nvme0 | grep cmic'. A value that includes bit 2 set means multi-controller. The other possibility is nmic, and can check with 'nvme id-ns /dev/nvme0n1 | grep nmic'. Any value with bit 1 set is claiming multi capable.
[~]# nvme id-ctrl /dev/nvme0 | grep cmic cmic : 0x3 [~]# nvme id-ns /dev/nvme0n1 |grep nmic nmic : 0x1
Yah, that's doubly confirming it. I'm a bit surprised since that is a pretty old model. Something must have happened with whatever batch you have; the identifiers had been reliably unique as far as I remember. Unfortunately the quirk mechanism works on the device ID granularity, and I'll just post it out to the mailing list.
This is applied for the next 6.6-rc.
Will/can it also be put into the LT 6.1 kernel? Thanks
I'll keep an eye out for the stable release notice after rc7 is posted. If it works like it has in the past, the stable bot should auto apply the quirk patch to all the LTS trees sometime next week.
Seems to all be good now. closing. Thanks for the help! Much appreciated.