Huawei Mate Book D16 (Intel i5) laptop resumes from suspend with NVMe undetected, while was successfully working before. ACPI: EC: interrupt blocked [11168.523511] ACPI: EC: interrupt unblocked [11168.527546] nvme 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible [11168.596318] i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_69.0.3.bin version 69.0 [11168.596321] i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9 [11168.596599] nvme 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible [11168.596652] nvme nvme0: Removing after probe failure status: -19 [11168.596662] nvme0n1: detected capacity change from 1000215216 to 0
(In reply to Nikolai from comment #0) > while was successfully working before. when was that "before"? 5.18.y? Or an earlier 5.19 version?
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #1) > (In reply to Nikolai from comment #0) > > while was successfully working before. > > when was that "before"? 5.18.y? Or an earlier 5.19 version? "before" is literally "before suspending". The issue is also reproducible for 5.15.63 on the same laptop. 5.10.139 is unable to boot on this machine (due to intel iRIS driver I suspect, but I did't try to confirm that). Actually, I found out that replacing stock laptop SSD from "PCIe-8 SSD 512GB" to "Samsung SSD 970 EVO Plus 250GB" eliminates the issue. That may mean both SSD firmware issue or ACPI PM compatibility issue of new hardware. I will add more technical descriptions and logs soon on both configs. If you need some specific diagnostics please feel free to request.
(In reply to Nikolai from comment #2) > "before" is literally "before suspending". ha, sorry, yeah, obviously. > is also reproducible for 5.15.63 Thx for clarifying. In that case it's likely not a regression and not something for my todo list.
Initially the issue was discovered for laptop UEFI firmware version 1.09, but updating it to v.1.20 (the latest for the moment) didn't solve the problem. SSD reappearing the issue (similar one is reviewed in [1]): SN: YMA1512JA2202320B4 Model: PCIe-8 SSD 512GB FW Rev: YM00D216 SSD without the issue: SN: S4EUNM0R104410T Model: Samsung SSD 970 EVO Plus 250GB FW Rev: 2B2QEXM7 [1] https://fadvices.com/huawei-matebook-16s-review-intel-core-i9-in-sheeps-clothing-tech-reviews/
Created attachment 301779 [details] dmidecode after UEFI firmware update to latest 1.20 version
Created attachment 301780 [details] ACPI DSDT table for huawei D16
Created attachment 301781 [details] lspci -vvnn log for stock SSD (issue reproducible)
Created attachment 301782 [details] lspci -vvnn log for non-stock Samsung SSD (issue non-reproducible)
Created attachment 301783 [details] full dmesg log mentioned in topic starting message
Hello. I have concerned with the same issue on the Huawei MateBook D16 with, as I see in the attachment [1], the same vendor NVMe. Seems that not every NVMe drive supports D3cold mode. Kernel is 6.4.2. I have created a dirty workaround patch [2] that just forcefully disables D3cold state for the specified NVMe. It works for me. Could you recompile the kernel and check it? By the way, (maybe a bit offtop in the context of this issue), there is a `/sys/bus/pci/devices/0000:01:00.0/d3cold_allowed`. It can be set to zero, however, the kernel function (that suspends PCI devices) `pci_set_power_state` (`drivers/pci/pci.c`) does not check it [3]. What about to make a check `pci_dev_check_d3cold` inside the `pci_set_power_state` function? That will allow disabling D3cold state from the userspace without the kernel recompilation. [1] https://bugzilla.kernel.org/attachment.cgi?id=301781 [2] https://gist.github.com/Toliak/86340b839b45f2c6fa4337ba6d8e971b#the-solution-part [3] https://gist.github.com/Toliak/86340b839b45f2c6fa4337ba6d8e971b#meanwhile-why-d3cold_allowed-is-not-working