Bug 199747
Summary: | resume from hibernate results in kernel panic (bisected) | ||
---|---|---|---|
Product: | Power Management | Reporter: | Marek Lotke (lollul) |
Component: | Hibernation/Suspend | Assignee: | Rafael J. Wysocki (rjw) |
Status: | RESOLVED DOCUMENTED | ||
Severity: | normal | CC: | bjorn, lenb, mika.westerberg |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.16.7 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
PCI / PM: Clear state_saved in more cases in pci_pm_freeze()
Clear state in poweroff only if runtime resumed PCI / PM: set state_saved when skipping noirq device suspend Slightly modified revert of the problematic commit Slightly modified revert of the problematic commit dmesg output after resume from hibernation with att 276023 lspci -vv output PCI / PM: Do not set state_saved too early on system suspend PCI / PM: Always resume runtime-suspended devices during hibernation |
Description
Marek Lotke
2018-05-17 09:12:05 UTC
Created attachment 276013 [details]
PCI / PM: Clear state_saved in more cases in pci_pm_freeze()
Can you check if this patch helps, please?
Sorry to report, but the patch did not help. (I've compiled twice just to be sure). I wish I could tell more about the kernel panic at resume, but all I get is a blank screen and the caps lock blinking once the hibernation image is read completly... I do remember though that I had similar trouble with resume from hibernate on this laptop prior to 4.16. Thank you for looking into this! Created attachment 276017 [details]
Clear state in poweroff only if runtime resumed
Here is another patch to try. I'm not entirely sure if this has anything to do with the problem at hand but I think we may need to do the same in poweroff phase as well.
->poweroff doesn't matter here, because the resume code doesn't see what it does and that's when the crash happens. Created attachment 276019 [details]
PCI / PM: set state_saved when skipping noirq device suspend
OK, so let's try this one.
Sorry, neither 276017 nor 276019 worked :( Can you attach output of 'sudo lspci -vv' and full dmesg after the system has been booted up (e.g from a normal boot)? (In reply to Marek Lotke from comment #6) > Sorry, neither 276017 nor 276019 worked :( So the patch from comment #5 is almost a revert of the problematic commit. Please try to comment out the pci_dev->state_saved = true; line added by it in pci_pm_freeze_noirq() and see if that works. Created attachment 276021 [details]
Slightly modified revert of the problematic commit
IOW, try to apply something like this and see if it works (it should).
Created attachment 276023 [details]
Slightly modified revert of the problematic commit
Better yet, please test this one instead and (if it works) please attach a dmesg output after 1 hibernate-resume cycle.
Created attachment 276027 [details]
dmesg output after resume from hibernation with att 276023
Created attachment 276029 [details]
lspci -vv output
With the patch from attachment 276023 [details] I can resume from hibernation! Thank you!
Well, that just means that it works for you with the Mika's patch reverted which we've known already. I have one more thing for you to test, though. Created attachment 276035 [details]
PCI / PM: Do not set state_saved too early on system suspend
If suspend-to-RAM (or suspend-to-idle) works for you on this system, please test it with the attached patch.
Mika, after your commit, pci_pm_default_resume_early() in pci_pm_restore_noirq() will restore the config space of intel-lpss and that seems to be sufficient to trigger the crash. Would it be possible to try to reproduce this locally in the lab? With 276035 I can still do suspend-to-RAM - it was always working. But resume from hibernate is still broken. This is 4.16.7 + 276035 patch. Should I apply 276035 together with 276023? (In reply to Marek Lotke from comment #17) > With 276035 I can still do suspend-to-RAM - it was always working. Good, thanks! > But resume from hibernate is still broken. This is 4.16.7 + 276035 patch. > Should I apply 276035 together with 276023? No, thanks. I have a machine with the intel-lpss device here, but it resumes from hibernation correctly after the Mika's commit, so the issue appears to be specific to your system. I will attach some additional debug patches to collect more information from it, stay tuned. Created attachment 276037 [details]
PCI / PM: Always resume runtime-suspended devices during hibernation
Please test this patch on top of the failing kernel (without any previous patches applied) and report back.
No luck. 276037 on top of 4.16.7 does not resume from hibernate. Sorry about that. The patch from comment #19 basically restores the state from before commit c4b65157aeef (PCI / PM: Take SMART_SUSPEND driver flag into account) that introduced the bug fixed by the Mika's commit. So, it looks like the bug introduced by commit c4b65157aeef incidentally made resume from hibernation kind of work on your system and now the Mika's fix (which is correct) broke it again. Did resume from hibernation work for you before 4.15? I've just tried Arch's lts - 4.14.40-1 - resume from hibernation does not work. I'm guessing you know this already - as a workaround I can remove intel_lpss_pci, intel_lpss modules before hibernation and then resume from hibernation works. I just have to modprobe them back to get the touchpad working. Another workaround is adding intel_lpss_pci to initrd - found it here: https://bbs.archlinux.org/viewtopic.php?id=231881 (In reply to Rafael J. Wysocki from comment #16) > Mika, after your commit, pci_pm_default_resume_early() in > pci_pm_restore_noirq() will restore the config space of intel-lpss and that > seems to be sufficient to trigger the crash. Would it be possible to try to > reproduce this locally in the lab? I already tried yesterday on KBL system with LPSS but I was not able to reproduce this. Unfortunately that system does not have working touchpanel which may be needed to trigger crash when there is actual I2C traffic. I'll try today with another KBL system. (In reply to Marek Lotke from comment #23) > Another workaround is adding intel_lpss_pci to initrd - found it here: > https://bbs.archlinux.org/viewtopic.php?id=231881 Yes, please do that. The device needs to be initialized properly by the restore kernel before control is passed back to the image one and I'm not even sure if that can be avoided. (In reply to Mika Westerberg from comment #24) > (In reply to Rafael J. Wysocki from comment #16) > > Mika, after your commit, pci_pm_default_resume_early() in > > pci_pm_restore_noirq() will restore the config space of intel-lpss and that > > seems to be sufficient to trigger the crash. Would it be possible to try > to > > reproduce this locally in the lab? > > I already tried yesterday on KBL system with LPSS but I was not able to > reproduce this. Unfortunately that system does not have working touchpanel > which may be needed to trigger crash when there is actual I2C traffic. I'll > try today with another KBL system. It looks like you need to leave the device uninitialized on restore before jumping back to the image kernel to reproduce this. Since the same callbacks are used for hibernation restore and system-wide resume, that is rather expected (or at least not surprising). @Marek: For now I'll mark this as "documented" and we'll try to reproduce the failure locally and improve the hibernation handling in LPSS, but it may just require proper initialization after S4 wakeup. > It looks like you need to leave the device uninitialized on restore before
> jumping back to the image kernel to reproduce this. Since the same
> callbacks are used for hibernation restore and system-wide resume, that is
> rather expected (or at least not surprising).
I figured if I build intel-lpss* as modules and put "resume=/path/to/disk" to the kernel command line the non-hibernation kernel should leave the LPSS devices uninitialized (because image restoration happens before udev gets to run and thus it cannot load additional modules) and once the hibernation image is restored, it should trigger the crash but unfortunately I still can't reproduce it here.
OK, it must be strictly system-specific, then. But yes, it may be necessary to initialize devices after S4 wakeup and before loading the hibernate image in general. |