Bug 214035
Summary: | acpi_turn_off_unused_power_resources() may take down necessary hardware | ||
---|---|---|---|
Product: | ACPI | Reporter: | Sam Edwards (CFSworks) |
Component: | Power-Other | Assignee: | Rafael J. Wysocki (rjw) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | antdev66, bczhc0, bjorn, c.sawczuk, ilari.nieminen, mastag, rjw, rui.zhang, t.widmo |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 5.14.0 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
acpidump
dmesg output up to when NVMe driver is running (with acpi_turn_off_unused_power_resources disabled) |
Description
Sam Edwards
2021-08-10 23:01:27 UTC
https://git.kernel.org/linus/7e4fdeafa61f ("ACPI: power: Turn off unused power resources unconditionally") Is this a regression caused by 7e4fdeafa61f, i.e., did this device work correctly in v5.12 and broke in v5.13? If so, this is a much higher priority problem. Could you attach the complete dmesg log and the output of acpidump? Created attachment 298279 [details]
acpidump
Created attachment 298281 [details]
dmesg output up to when NVMe driver is running (with acpi_turn_off_unused_power_resources disabled)
I do know that this worked fine on a 5.12.x kernel and the issue appeared when I attempted to boot 5.13.0. I can see if 7e4fdeafa61f itself introduced the problem (by trying a boot with 7e4fdeafa61f and 7e4fdeafa61f^) if desired. I'll spend some time configuring a more minimalist kernel that I can kexec to test patches and do any additional debug steps. The improper device poweroff DOES happen with 7e4fdeafa61f, but NOT with 4b9ee772eaa8 (7e4fdea's parent commit). So it's a regression with 7e4fdeafa61f, although the resource_in_use=0 is not new. I don't know if there is a correlation with this problem, but with the kernel 5.13.x my notebook (asus) sometimes does not perform turn off and I have to do it manually. Also, while it is locked after last kernel log message, the notebook gets overheats a lot. I will try to recompile the kernel without this patch to check if it resolve. This issue has now made it into the 5.14.0 release Hi there, I'm also affected by this bug. Using kernel 5.11 my NVME drive was detected properly. Now using 5.13 or 5.14 I'm getting: pci 0000:02:00.0: CLS mismatch (64 != 1020), using 64 bytes nvme 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible) Hello, A small update. I've replaced the SSSTC nvme ssd with another one from WD and the same thing happens. So for the record, the first (internal) one is now from WD and the second one is the Samsung EVO 970 ssd. So it seems the nvme ssd type/brand is not to blame here. It just doesn't initialize properly when it's in the first slot regardless of the brand. A rather large regression.. If you don't have a second ssd installed, you could try to move it to the optional slot which is right above it. Or optionally enable the Intel VMD Rapid Storage chipset (you don't have to create a RAID set), but that will require you to reinstall Windows with the floppy driver from Intel. (In reply to Sam Edwards from comment #7) > This issue has now made it into the 5.14.0 release There are several improvements after commit 7e4fdeafa61f, and they're all shipped before 5.14. 6381195ad7d0 ACPI: power: Rework turning off unused power resources 9b7ff25d129d ACPI: power: Refine turning off unused power resources 29038ae2ae56 Revert "Revert "ACPI: scan: Turn off unused power resources during initialization"" 5db91e9cb5b3 Revert "ACPI: scan: Turn off unused power resources during initialization" 7e4fdeafa61f ACPI: power: Turn off unused power resources unconditionally So do you mean that the problem still occurs in 5.14 final release? > So do you mean that the problem still occurs in 5.14 final release?
Yes, precisely that.
Trying to make sense of the conversation here- I have nvme hardware that appears to be affected by this issue. There appears to be no kernel command line arguments that can be used to work around the issue and it's still present in the latest release of the kernel? Would it not be better to revert the problematic change? It's almost been a quarter of a year at this point. I'd be happy to help if possible. So can you test 5.16-rc2, please? This has been reworked in 5.15 and 5.16-rc. Sorry to reply so late. But for me the problem resolved itself after upgrading to kernel 5.15. Both my nvme drives are being detected at all times now. No need to enable the Intel VMD Rapid Storage controller any longer. I can confirm that with 5.15.5 I am now able to boot. Same here - had this problem on 5.13.0, on 5.15.4 the SSD is detected correctly. Good to know. Bug closed. I'm experiencing exactly the same problem again now, with kernel 6.1.11. My laptop has two NVMe SSD slots, and when I plug my SSD into slot 1, it will be OK. But if I use slot 2, this SSD will be shut down: ``` nvme 0000:02:00.0: Unable to change power state from D3cold to D0, device inaccessible ``` Also, when I plug the SSD into slot 2, not only will the SSD be shut down, my Nvidia GPU will be shut down too. This makes me unable to use the Nvidia GPU, until I remove the SSD at slot 2. Journalctl log about Nvidia GPU: ``` nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible ``` I tried many kernel versions. 5.13 has this problem, but with 5.15 it doesn't. But, I encounter this problem again with kernel 5.18, and now with the near-latest 6.1.11 it's the same thing. |