Bug 215099 - NVMe driver could be more verbose in situations when it cannot handle devices with exact same serial numbers
Summary: NVMe driver could be more verbose in situations when it cannot handle devices...
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: NVMe (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: IO/NVME Virtual Default Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-22 13:02 UTC by Adam Kłobukowski
Modified: 2021-11-23 18:25 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.10.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Adam Kłobukowski 2021-11-22 13:02:42 UTC
I have 2 NVMe devices with the same serial number. Kernel refuses to work with one of them, and in dmesg there is following message:


> nvme nvme0: Duplicate cntlid 0 with nvme1, rejecting


After I change SN of one of the devices (and reboot) there is no such message and device is accessible.


This message is misleading, and has nothing to do with the root cause of the issue I had.
Comment 1 Keith Busch 2021-11-22 14:32:53 UTC
It sounds like your device does not support a subsystem nqn, in which case the driver falls back to the legacy way of constructing it from the serial number, as per spec. Since your devices have the same serial number, they will appear to be in the same subsystem. The driver detects the problem because those two controllers reported to be in the same subsystem have the same controller id, which is not allowed, and that's the first place the driver detects a problem.

I'd need to double check the spec to see if such legacy subsystems allow multi-controller in this manner. If they do, then the error message is appropriate.
Comment 2 Keith Busch 2021-11-22 17:26:59 UTC
The spec doesn't seem to disallow it. The exact text is:

"
The method shown in Figure 137 should be used by the host to construct an NVM Subsystem NQN for older NVM subsystems that do not provide an NQN in the Identify Controller data structure. The mechanism used by the vendor to assign Serial Number and Model Number values to ensure uniqueness is outside the scope of this specification.

An NVM subsystem may contain multiple controllers. All controllers contained in the NVM subsystem share the same NVM subsystem unique identifier. The Controller ID (CNTLID) value returned in the Identify Controller data structure may be used to uniquely identify a controller within an NVM subsystem. The Controller ID value when combined with the NVM subsystem identifier forms a globally unique value that identifies the controller.
"

Based on this text, the existing error message is appropriate.
Comment 3 Adam Kłobukowski 2021-11-23 07:56:49 UTC
The behaviour change after I changed SN on one of the drives, suggests that kernel is trying to construct legacy NQN and because it conflicts with another drive NQN, it falls back to cntlid to identify drive. But there is no message that such fallback is taking place, so maybe it is not misleading, but incomplete.

Note You need to log in before you can comment on or make changes to this bug.