Bug 205679 - not able to recognize NVME's partition
Summary: not able to recognize NVME's partition
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-27 05:28 UTC by Yuking
Modified: 2024-05-04 05:58 UTC (History)
7 users (show)

See Also:
Kernel Version: 5.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
add seagate drive to quirk list (847 bytes, patch)
2021-02-11 12:55 UTC, Julian Einwag
Details | Diff

Description Yuking 2019-11-27 05:28:45 UTC
With kernel-5.4 (rc1 to now), my NVME ssd's partiton can not be recognized, the following messages are from kernel-5.3.13:
[    0.619656] nvme nvme0: missing or invalid SUBNQN field.
[    0.626661] nvme nvme0: allocated 64 MiB host memory buffer.
[    0.657955] nvme nvme0: 15/0/0 default/read/poll queues
[    0.664427] nvme nvme0: nvme_report_ns_ids: Identify Descriptors failed
[    0.666374] nvme nvme0: nvme_report_ns_ids: Identify Descriptors failed
[    0.667598]  nvme0n1: p1 p2 p3 p4 p5 p6

With 5.4, the first five lines are presented, but the last line is missing.
Comment 1 Ingo Brunberg 2019-11-29 09:42:45 UTC
This definitely is a regression in kernel 5.4. "nvme list" does not even show my SSD. The output with kernel 5.3.13 is:

Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     HBSE18454500029      HP SSD EX900 120GB                       1         120,03  GB / 120,03  GB    512   B +  0 B   R0802B0

NVME part of "lspci -vvv" (5.3.13):

04:00.0 Non-Volatile memory controller: Silicon Motion, Inc. Device 2263 (rev 03) (prog-if 02 [NVM Express])
	Subsystem: Silicon Motion, Inc. Device 2263
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at df000000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <4us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=16 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00002100
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [158 v1] #19
	Capabilities: [178 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [180 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=44us
	Kernel driver in use: nvme
Comment 2 Yuking 2019-11-29 12:30:16 UTC
Still not able to work with 5.4.1 .
Comment 3 Ingo Brunberg 2019-12-01 09:36:57 UTC
Yuking, what is your hardware? Please post the output of "nvme list" and "lspci".
Comment 4 Yuking 2019-12-01 11:52:08 UTC
# nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     201812090458         Asgard AN2 250NVMe-M.2/80                1         250.06  GB / 250.06  GB    512   B +  0 B   R0629A0

# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7
01:00.0 Non-Volatile memory controller: Silicon Motion, Inc. Device 2263 (rev 03)
20:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57ad
21:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a3
21:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a3
21:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a4
21:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a4
21:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a4
26:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
27:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
2a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
2a:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
2a:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
2d:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 14a0 (rev c1)
2e:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 14a1
2f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 (rev c1)
2f:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 HDMI Audio [Radeon VII]
30:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function
31:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
31:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
31:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
32:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
33:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
Comment 5 Yuking 2019-12-01 17:34:48 UTC
01:00.0 Non-Volatile memory controller: Silicon Motion, Inc. Device 2263 (rev 03) (prog-if 02 [NVM Express])
	Subsystem: Silicon Motion, Inc. Device 2263
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 44
	NUMA node: 0
	Region 0: Memory at fcf00000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <4us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS+
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=16 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00002100
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [158 v1] Secondary PCI Express <?>
	Capabilities: [178 v1] Latency Tolerance Reporting
		Max snoop latency: 1048576ns
		Max no snoop latency: 1048576ns
	Capabilities: [180 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=32768ns
		L1SubCtl2: T_PwrOn=10us
	Kernel driver in use: nvme
Comment 6 Ingo Brunberg 2019-12-01 19:20:18 UTC
So, one can guess that all SSDs with a Silicon Motion 2263 controller are affected, since that is what we have in common.

Now we must get a kernel developer to have a look.
Comment 7 Keith Busch 2019-12-02 15:30:35 UTC
There should be a controller handle at /dev/nvme0 even if there the namespace isn't present. Could you confirm that it is, and if so, output from:

  # nvme id-ns /dev/nvme0 -n 1
Comment 8 Keith Busch 2019-12-02 15:37:03 UTC
Nvm, bug introduced with 538af88ea7d9de241e6b6f006e9049c4d96723bb. Will get an appropriate fix posted ASAP.
Comment 9 Ingo Brunberg 2019-12-02 15:48:25 UTC
Thanks, sounds like you found the culprit. Anyway, here is the output of
 # nvme id-ns /dev/nvme0 -n 1

NVME Identify Namespace 1:
nsze    : 0xdf94bb0
ncap    : 0xdf94bb0
nuse    : 0xdf94bb0
nsfeat  : 0
nlbaf   : 0
flbas   : 0
mc      : 0
dpc     : 0
dps     : 0
nmic    : 0
rescap  : 0
fpi     : 0x80
dlfeat  : 0
nawun   : 0
nawupf  : 0
nacwu   : 0
nabsn   : 0
nabo    : 0
nabspf  : 0
noiob   : 0
nvmcap  : 0
nsattr	: 0
nvmsetid: 0
anagrpid: 0
endgid  : 0
nguid   : 00000000000000000000000000000000
eui64   : 0100000000000000
lbaf  0 : ms:0   lbads:9  rp:0 (in use)
Comment 10 Keith Busch 2019-12-02 16:10:30 UTC
(In reply to Ingo Brunberg from comment #9)
> nguid   : 00000000000000000000000000000000
> eui64   : 0100000000000000

Hmm... We can fix the immediate issue easily enough, but those are broken identifications! You will get by with a device like this if you've just one in the system, but we'll observe other problems if more than one is present.

Anyway, fix is posted here: http://lists.infradead.org/pipermail/linux-nvme/2019-December/028243.html
Comment 11 Ingo Brunberg 2019-12-02 16:18:06 UTC
I can confirm the fix works. And yes, I will not buy a second one of those.
Comment 12 Yuking 2019-12-04 14:04:16 UTC
THX. I will test this patch after returning home.
Comment 13 Yuking 2019-12-06 14:59:35 UTC
It works again, thanks.
Comment 14 Islam Bahnasy 2019-12-09 13:16:43 UTC
I'm confirming that this issue is affecting me as well and I'm unable to boot my system.
When the fix will be merged into the kernel as it's taking much time now?

Thanks in advance!
Comment 15 Keith Busch 2019-12-09 15:22:23 UTC
> When the fix will be merged into the kernel as it's taking much time now?

Fix is staged for mainline 5.5-rc2. It is also marked for 5.4-stable.
Comment 16 Islam Bahnasy 2019-12-13 21:12:55 UTC
Why the fix wasn't included in 5.4.3?
Comment 17 Keith Busch 2019-12-13 21:14:48 UTC
(In reply to Islam Bahnasy from comment #16)
> Why the fix wasn't included in 5.4.3?

Why are you expecting it to land there already?
Comment 18 Islam Bahnasy 2019-12-13 21:16:04 UTC
"It is also marked for 5.4-stable", so I was hoping that it will be in the next release.
Comment 19 Keith Busch 2019-12-13 21:20:10 UTC
(In reply to Islam Bahnasy from comment #18)
> "It is also marked for 5.4-stable", so I was hoping that it will be in the
> next release.

Per kernel stable rules, fixes have to land upstream first, which is scheduled to merge before 5.5-rc2 is tagged unless Linus has some problem with anything else bundled in the current pull.
Comment 21 Keith Busch 2019-12-16 16:07:51 UTC
And as predicted, the fix was applied to 5.5-rc2, and 5.4 stable-queue automatically picked it up: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=0cc1d6a16a76d7ee6a77eabc0b043dcec31be7b5

The 5.4 branch will rebase with those patches in its queue for the next tagged release.
Comment 22 Julian Einwag 2021-02-09 15:32:29 UTC
I'm currently experiencing the same issue with 5.4.96 and also 5.10.14.
Our drives are Seagate Nytro XM1441 Datacenter. Kernel 5.3.13 is working.

Output from lspci -vv:

04:00.0 Non-Volatile memory controller: Seagate Technology PLC Nytro Flash Storage (prog-if 02 [NVM Express])
        Subsystem: Seagate Technology PLC Nytro XM1440
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 122
        NUMA node: 0
        Region 0: Memory at 91c00000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at 91c10000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
                DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x2, ASPM not supported
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x2 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, NROPrPrP-, LTR+
                         10BitTagComp-, 10BitTagReq-, OBFF Via message, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, TPHComp+, ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=19 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158 v1] Power Budgeting <?>
        Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [178 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Capabilities: [2b8 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Kernel driver in use: nvme
        Kernel modules: nvme
Comment 23 Keith Busch 2021-02-09 17:50:20 UTC
It doesn't seem likely that it's the same issue since we verified the fix. What are the nvme logs in the dmesg?
Comment 24 Julian Einwag 2021-02-09 18:35:57 UTC
(In reply to Keith Busch from comment #23)
> It doesn't seem likely that it's the same issue since we verified the fix.
> What are the nvme logs in the dmesg?

[   10.785605] nvme nvme0: pci function 0000:04:00.0
[   10.832752] megaraid_sas 0000:03:00.0: NVMe passthru support : No
[   10.876787] nvme nvme1: pci function 0000:81:00.0
[   13.198614] nvme nvme0: missing or invalid SUBNQN field.
[   13.198658] nvme nvme1: missing or invalid SUBNQN field.
[   13.206896] nvme nvme0: Shutdown timeout set to 20 seconds
[   13.215035] nvme nvme1: Shutdown timeout set to 20 seconds
[   13.225407] nvme nvme0: 16/0/0 default/read/poll queues
[   13.233602] nvme nvme1: 16/0/0 default/read/poll queues
[   13.239627] nvme nvme0: Identify Descriptors failed (8194)
[   13.246315] nvme nvme1: Identify Descriptors failed (8194)

The character devices /dev/nvme0 and /dev/nvme1 do exist, but the namespace block devices are missing.
Comment 25 Keith Busch 2021-02-09 18:58:07 UTC
Okay, you'll need to set the NVME_QUIRK_NO_NS_DESC_LIST for your VID:DID. Do you want to send the patch? I can do it for you if you prefer.
Comment 26 Julian Einwag 2021-02-11 12:55:10 UTC
Created attachment 295229 [details]
add seagate drive to quirk list
Comment 27 Julian Einwag 2021-02-11 12:55:53 UTC
I've attached a patch against 5.4.96 which works for me.
Comment 28 Keith Busch 2021-02-11 15:57:02 UTC
Thanks for confirming, the patch looks good. Would you be able to send this as a proper patch to the kernel list, linux-nvme@lists.infradead.org? You might need to merge up to the current nvme-5.12 branch to ensure it cleanly applies upstream.
Comment 29 KimChou 2024-05-04 05:58:56 UTC
Thank you for your great job! My drive Memblaze Pblaze5 has exactly the same issue: /dev/nvme0 exists, and lsblk / fdisk got no result.
I tried the kernel version 6.8.4 and 5.15.85 but no luck. Until I tried kernel version 4.19.0 which can recognize the block device correctly.
So I found this post by googling.Then I tried the patch above with Pblaze5's VID:DID, rebuild the kernel for version 6.8.4, and it worked like a charm!
Now the system can detect the nvme drive successfully, without the "Identify Descriptors failed" at the same time.

[    0.921653] nvme nvme0: pci function 0000:01:00.0
[    0.967765] nvme nvme0: missing or invalid SUBNQN field.
[    0.970283] nvme nvme0: 4/0/0 default/read/poll queues
[    0.971530]  nvme0n1: p1 p2 p3

Here is my patch applied:

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3375,6 +3375,8 @@
 		.driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
 	{ PCI_DEVICE(0x1c5f, 0x0540),	/* Memblaze Pblaze4 adapter */
 		.driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
+	{ PCI_DEVICE(0x1c5f, 0x0555),	/* Memblaze Pblaze5 adapter */
+		.driver_data = NVME_QUIRK_NO_NS_DESC_LIST, },
 	{ PCI_DEVICE(0x144d, 0xa821),   /* Samsung PM1725 */
 		.driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
 	{ PCI_DEVICE(0x144d, 0xa822),   /* Samsung PM1725a */

I'm not a developer, not to mention a kernel developer. This is the first time I build a linux kernel by myself(felt great though;P). So is there any chance for the above patch to be included in the Linux kernel source code? This would prevent others with the same hardware from encountering the same issue. What should I do?

Note You need to log in before you can comment on or make changes to this bug.