Bug 216462
Summary: | Huawei Mate Book D16 NVMe SSD not detected (lost) after resume from suspend | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Nikolai (nickel) |
Component: | NVMe | Assignee: | IO/NVME Virtual Default Assignee (io_nvme) |
Status: | NEW --- | ||
Severity: | high | CC: | toliakpurple |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 5.19.7 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmidecode after UEFI firmware update to latest 1.20 version
ACPI DSDT table for huawei D16 lspci -vvnn log for stock SSD (issue reproducible) lspci -vvnn log for non-stock Samsung SSD (issue non-reproducible) full dmesg log mentioned in topic starting message |
Description
Nikolai
2022-09-08 09:13:13 UTC
(In reply to Nikolai from comment #0) > while was successfully working before. when was that "before"? 5.18.y? Or an earlier 5.19 version? (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #1) > (In reply to Nikolai from comment #0) > > while was successfully working before. > > when was that "before"? 5.18.y? Or an earlier 5.19 version? "before" is literally "before suspending". The issue is also reproducible for 5.15.63 on the same laptop. 5.10.139 is unable to boot on this machine (due to intel iRIS driver I suspect, but I did't try to confirm that). Actually, I found out that replacing stock laptop SSD from "PCIe-8 SSD 512GB" to "Samsung SSD 970 EVO Plus 250GB" eliminates the issue. That may mean both SSD firmware issue or ACPI PM compatibility issue of new hardware. I will add more technical descriptions and logs soon on both configs. If you need some specific diagnostics please feel free to request. (In reply to Nikolai from comment #2) > "before" is literally "before suspending". ha, sorry, yeah, obviously. > is also reproducible for 5.15.63 Thx for clarifying. In that case it's likely not a regression and not something for my todo list. Initially the issue was discovered for laptop UEFI firmware version 1.09, but updating it to v.1.20 (the latest for the moment) didn't solve the problem. SSD reappearing the issue (similar one is reviewed in [1]): SN: YMA1512JA2202320B4 Model: PCIe-8 SSD 512GB FW Rev: YM00D216 SSD without the issue: SN: S4EUNM0R104410T Model: Samsung SSD 970 EVO Plus 250GB FW Rev: 2B2QEXM7 [1] https://fadvices.com/huawei-matebook-16s-review-intel-core-i9-in-sheeps-clothing-tech-reviews/ Created attachment 301779 [details]
dmidecode after UEFI firmware update to latest 1.20 version
Created attachment 301780 [details]
ACPI DSDT table for huawei D16
Created attachment 301781 [details]
lspci -vvnn log for stock SSD (issue reproducible)
Created attachment 301782 [details]
lspci -vvnn log for non-stock Samsung SSD (issue non-reproducible)
Created attachment 301783 [details]
full dmesg log mentioned in topic starting message
Hello. I have concerned with the same issue on the Huawei MateBook D16 with, as I see in the attachment [1], the same vendor NVMe. Seems that not every NVMe drive supports D3cold mode. Kernel is 6.4.2. I have created a dirty workaround patch [2] that just forcefully disables D3cold state for the specified NVMe. It works for me. Could you recompile the kernel and check it? By the way, (maybe a bit offtop in the context of this issue), there is a `/sys/bus/pci/devices/0000:01:00.0/d3cold_allowed`. It can be set to zero, however, the kernel function (that suspends PCI devices) `pci_set_power_state` (`drivers/pci/pci.c`) does not check it [3]. What about to make a check `pci_dev_check_d3cold` inside the `pci_set_power_state` function? That will allow disabling D3cold state from the userspace without the kernel recompilation. [1] https://bugzilla.kernel.org/attachment.cgi?id=301781 [2] https://gist.github.com/Toliak/86340b839b45f2c6fa4337ba6d8e971b#the-solution-part [3] https://gist.github.com/Toliak/86340b839b45f2c6fa4337ba6d8e971b#meanwhile-why-d3cold_allowed-is-not-working |