Kernel Bug Tracker – Bug 30492
kernel sometimes hangs on hibernation
Last modified: 2012-01-18 08:32:45 UTC
While testing for bug #30482 and also before with 2.6.37 and 2.6.38 I had it that on some occasions the kernel will hang on hibernation.
My ThinkPad T42 indicates that hibernation process is in progress by blinking the moon LED. I also switched to a tty. But then it just sits there and blinks for minutes without any apparent progress on hibernation process.
Raphael, I think I mentioned this to you somewhere already and you hinted that it might be some issue with freeing enough memory for hibernation. Please advice on how to proceed with that. It happens quite rarely but chances are that it happens more often when I have more applications open prior to suspending. Bisection is out of question for me, cause I seem to recall having this issue since switching to Radeon KMS.
This all is with in kernel suspend via hibernate script 1.99. I am using the following wrapper script for some of my own stuff:
shambhala:/etc> cat acpi/hibernate-extra.sh
# Zur Sicherheit gleich am Anfang alle ausstehenden Änderungen schreiben
# Versuchen, möglichst viele LowMem Pages freizubekommen
# Dir Entries legt Ext4 offenbar auch ins LowMem
# Und mit zu wenig LowMem Pages klappt der Tiefschlaf mit
# Radeon DRM KMS nicht.
#echo 3 > /proc/sys/vm/drop_caches
# Alternativ kleineres Image bauen, siehe LKML:
# Re: does hibernate to disk try hard enough to free memory? (23.2.2011)
echo 710000000 > /sys/power/image_size
# Network Manager schlafen legen
# siehe /usr/lib/pm-utils/sleep.d/55NetworkManager
dbus-send --print-reply --system \
# ifplugd stoppen
# Systemzeit in Hardware-Uhr speichern
# Uptimed stoppen, damit er die Rekorde schreibt
# Zur Sicherheit hier nochmal alle ausstehenden Änderungen schreiben
#echo 1 > /sys/power/tuxonice/do_hibernate
# Uptimed wieder starten. Dabei schreibt er erneut die Rekorde
# Rekorde gleich schreiben
# Festplatten-Parameter wieder setzen
# Systemzeit anhand Hardware-Uhr wieder setzen
# Network Manager aufwecken
dbus-send --print-reply --system \
# ifplugd starten
Is the problem reproducible at any reasonable rate?
Well, I thought not so, cause it could take 10 to 12 days for it to happen. But now it just happened again, with 2.6.38-rc7-g212e349. It seems to trigger more easily when I keep more applications open prior to hibernation. I worked around it by closing all, or all but one application prior to initiating hibernation. I could stop using this work around in order to try to more easily reproduce this bug or even start some more applications...
But I think first I test your patch from bug #30482, unless you direct me otherwise.
Please do. The issues are independent of each other anyway.
Created attachment 50202 [details]
PM/Hibernate: Try to avoid crashing the kernel unnecessarily
Please check if this patch makes a difference.
If it helps, please see if you're able to trigger the WARN_ON().
Trying with patch now. Thanks
Did not have any hang so far with this patch and the one of bug #30482 applied. But since I had a image size of 700000000 set manually before and still had this hang occasionally, I think the second patch, the one from this bug report, helps.
I did not see any warn:
shambhala:/> zgrep "WARN" /var/log/syslog* | grep kernel | grep -v thinkpad_acpi
/var/log/syslog.1:Mar 8 23:07:39 shambhala kernel: WARNING: at drivers/ata/libata-core.c:6130 ata_host_detach+0xf6/0x100()
Do I have to grep differently for that?
Will try harder - with more applications open, more memory pressure - when I am back to Germany, but until now all seems fine.
Kernel hung again even with both patches applied. This time 2.6.38 plus those two patches:
martin@shambhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> patch -p1 < ../0002-try-to-avoid-crashing-the-kernel-unnecessarily-bug-30492.patch
patching file kernel/power/snapshot.c
Reversed (or previously applied) patch detected! Assume -R? [n] ^C
martin@shambhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> patch -p1 < ../0001-refine-autoestimation-of-image-size-bug-30482.patch
patching file kernel/power/snapshot.c
Reversed (or previously applied) patch detected! Assume -R? [n] c^C
martin@shambhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> cat /sys/power/image_size
I had quite some stuff open, including Iceweasel, KMail, Kontact.
Seems that your patch does not yet (completely) fix this issue.
Last stuff in syslog is this:
Mar 17 00:06:00 shambhala kernel: e1000: eth0 NIC Link is Down
Mar 17 00:06:01 shambhala NetworkManager: <info> (eth0): carrier now OFF (device state 8, deferring action for 4
Mar 17 00:06:01 shambhala kernel: usb 3-2: USB disconnect, address 2
Mar 17 00:06:05 shambhala NetworkManager: <info> (eth0): device state change: 8 -> 2 (reason 40)
Mar 17 00:06:05 shambhala NetworkManager: <info> (eth0): deactivating device (reason: 40).
Mar 17 00:06:06 shambhala NetworkManager: <info> (eth0): canceled DHCP transaction, DHCP client pid 4669
Mar 17 00:06:17 shambhala kernel: PM: Marking nosave pages: 000000000009f000 - 0000000000100000
Mar 17 00:06:17 shambhala kernel: PM: Basic memory bitmaps created
Mar 17 00:06:18 shambhala kernel: PM: Syncing filesystems ... done.
Testing with that patch for quite a while. It again hung on hibernation. Thus the patch does not seem to fix the issue, at least not completely. Or there are several causes for hangs on hibernation and it fixed one, but not the other.
Created attachment 56672 [details]
PM / Hibernate: Alternative method of avoiding crashes
Please test this patch instead. It should fail hibernation if there's
a problem that would crash the kernel in free_unnecessary_pages().
If that happens, please attach the output of dmesg.
Created attachment 56682 [details]
PM / Hibernate: Alternative method of avoiding crashes (v2)
Sorry, please test this one instead.
Created attachment 56772 [details]
PM / Hibernate: Alternative method of avoiding crashes (v3)
The previous patch was broken, sorry about that. Please try this one.
Rafael, I did not have any hibernate related crashes in the last time, possibly cause I closed applications in order for the suspend to succeed. Shall I test this one together with the one from bug 34102?
Yes, please. Anyway, you're the only person known to me that could reproduce
this problem, so please keep this patch on top of your kernel just in case
the problem triggers.
I am now compiling a kernel with that patch.
I had a hang at preallocation as I accidentally hibernated the machine - I wanted to change brightness, but hit the wrong key - a hour ago or so. Hopefully it triggers another time.
This was while hibernating with two KDE 4 sessions running - instead of one which for now always worked okay.
No, I think it wasn't a hang, cause actually the disk drive was still spinning. I think it would have come back, after a longer time. Cause now I tested with higher reserved_size values and it always came back (from a failed hibernation, but I will detail this in bug 34102)
With the patch from comment #13 I think you'll rather see a failing hibernation
than a hang. Anyway, please report if you see any of them. :-)
It's great that the kernel bugzilla is back.
Can you please verify if the problem still exists in the latest upstream kernel, with & without the patch in comment #13?
Thanks for this reminder as well.
I didn´t have it again and the ThinkPad T42 I talked about. But since I got that shiny new T520 I did not use it to memory heavy workloads anymore. Thus I am not sure whether this bug has been resolved or whether I am not triggering anymore.
Since for now, I do not intend to use memory heavy workloads on the T42, I will just close this one as well. Should I ever get this hang again, I can always reopen it.