Bug 217705
Summary: | kernel 6.4.x power management issues on kaby lake CPU | ||
---|---|---|---|
Product: | Drivers | Reporter: | Michal Hlavac (miso) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | bjorn, kernel, mike, miso, paul.grandperrin, sven.koehler, tiwai |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | Yes | Bisected commit-id: | 8ee39ec479147e29af704639f8e55fce246ed2d9 |
Description
Michal Hlavac
2023-07-25 14:44:23 UTC
Downstream issue #: https://bugzilla.suse.com/show_bug.cgi?id=1213617 On a Dell XPS 15 9560 (Intel 7700hq), it affects my SSD. The system effectively doesn't boot anymore.
The message I get is:
> nvme 0000:04:00.0: Unable to change power state from D3cold to D0, device
> inaccessible
Exact same issue as Sven Köhler, on same hardware, I can't boot anymore on latest kernel. It was working on 6.1.43 and doesn't work on 6.1.51. I'll try to narrow it down some more if I have time I tried a few kernels: 6.1.43: working 6.1.45: working 6.1.46: nvme inaccessible 6.1.47: nvme inaccessible 6.1.51: nvme inaccessible So it seems 6.1.46 is the culprit. I was using NixOS prebuilt kernels until now as my machine is really slow to compile kernels but I'll try to bisect to the exact commit. Having the same issue as Sven on the 6.4.x branch (same hardware), I managed to boot using the workaround suggested by the kernel error, ie. adding
> nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
to the kernel command line.
Thanks Maximilien! Oh what a relief! I was afraid I wouldn't be able to use kernels updates for some time. What do mean they were suggested in the kernel error though? I don't see suggestions anywhere, is it in 6.4 only? Ok so the workaround only works until the computer goes to sleep. Then the SSD becomes inaccessible again.. .. actually, it doesn't work :( Even without suspends, after a while, the nvme becomes inaccessible. Good bye kernel updates, 6.1.45 will be my last version (In reply to Paul Grandperrin from comment #4) > I tried a few kernels: > 6.1.45: working > 6.1.46: nvme inaccessible > > So it seems 6.1.46 is the culprit. Would you feel comfortable to bisect it? Also, my experience with the kernel bugzilla is often that nobody of the kernel developers responds, sadly. I have been bisecting it non stop since... I think I only need to test one or two commit before I'll find it. When I'll have the commit, what should be the next steps? Try the latest kernel with this commit reverted to validate it's this one? Then, how to help as much as possible to get a kernel dev to fix it? I know C a little bit I'm nowhere near able to work on the kernel myself. What's the proper way to communicate with the kernel devs? Email? 8ee39ec479147e29af704639f8e55fce246ed2d9 is the first bad commit commit 8ee39ec479147e29af704639f8e55fce246ed2d9 Author: Ricky WU <ricky_wu@realtek.com> Date: Tue Jul 25 09:10:54 2023 +0000 misc: rtsx: judge ASPM Mode to set PETXCFG Reg commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream. ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0 to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG always set to HIGH during the initialization. Cc: stable@vger.kernel.org Signed-off-by: Ricky Wu <ricky_wu@realtek.com> Link: https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> drivers/misc/cardreader/rts5227.c | 2 +- drivers/misc/cardreader/rts5228.c | 18 ------------------ drivers/misc/cardreader/rts5249.c | 3 +-- drivers/misc/cardreader/rts5260.c | 18 ------------------ drivers/misc/cardreader/rts5261.c | 18 ------------------ drivers/misc/cardreader/rtsx_pcr.c | 5 ++++- 6 files changed, 6 insertions(+), 58 deletions(-) This is kind of weird. This patch touches ASPM things (which makes sense) but only in the cardreader drivers. Is it possible that a bug in a cardreader driver impacts other components, like the NMVE? I'm building 6.1.51 with this commit reverted to check that. I'm currently writing from my patched 6.1.51 kernel, so I can confirm this commit is to blame. Should I contact Ricky WU directly? Good news, blacklisting rtsx_pci and rtsx_pci_sdmmc solves the issue. The card reader won't work anymore, but that's better than not booting. I sent an email to the appropriate mailing list and developers (I hope). https://lore.kernel.org/stable/5DHV0S.D0F751ZF65JA1@gmail.com/T/#u Wow, that was surprising! Thank you Paul! I'm glad I found this report. Trying to install Manjaro on my old Dell 5520 laptop, it was showing no SSD. lsblk shows the nvme0n1 device with 4 partitions (a standard Windows installation), but fdisk/cfdisk were unable to operate on the drive at all. I did all the usual checks (booting UEFI mode, secure boot disabled, Intel RAID disabled), but still wasn't showing up. I found the error in journalctl: "Unable to change power state from D3cold to D0, device inaccessible" and it brought me here. Based on reading the thread, the simplest workaround seemed to be going into BIOS and disabling the SD Card reader. Did that and boom, Manjaro installer works! (6.5.3-1-MANJARO kernel, FWIW.) This should get fixed any time soon. Apparently the fix is 0e4cac557531 ("misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe"), which is included in v6.6. https://git.kernel.org/linus/0e4cac557531 |