|Summary:||IBM X41 looses time after Suspend2Disk unless "nohz=off" and "highres=off"|
|Product:||Power Management||Reporter:||Philipp Matthias Hahn (pmhahn)|
|Severity:||normal||CC:||acpi-bugzilla, bunk, lann.martin, Murphy.Gebert, rjw, rui.zhang, tglx|
|Bug Depends on:|
Description Philipp Matthias Hahn 2007-11-06 00:49:28 UTC
Distribution: Debian/etch Hardware Environment: IBM X41 tablet PC Problem Description: I have an IBM X41 tablet pc running 2.6.23 with the following problem: Suspend2Ram work fine, but when I do a Suspend2Disk (echo disk > /sys/power/state), the notebook hangs showing the following lines: Stopping tasks ... done. Shrinking memory... done (49361 pages freed) Freed 197444 kbytes in 0.64 seconds (308.50 MB/s) Suspending console(s) When I repeatedly press some keys or move the TrackPoint, the SwSusp continues and the notebook powers off. On reboot, the notebook resumes but hangs showing the following screen: Stopping tasks ... done. Loading image data pages (62483 pages) ... done Read 249932 kbytes in 18.63 seconds (13.41 MB/s) Suspending console(s) Generating interrupts by pressing keys or moving the TrackPoints makes the resume continue. But after that, the notebook looses time, xterms don't get updated until I press a key, the beeper beeps until I press a key, etc. DMESG: http://corellon.svs.informatik.uni-oldenburg.de/~pmhahn/x41.dmesg CONFIG: http://corellon.svs.informatik.uni-oldenburg.de/~pmhahn/x41.config Steps to reproduce: echo shutdown > /sys/power/disk echo disk > /sys/power/state Adding "nohz=off" solved the problem "timer loosing time after resume". Adding "highres=off" solved the problem of needing key presses. Doing "echo platform > /sys/power/disk" instead didn't change anything. Once ACPI completely barfed up after a suspend-resume cycle spewing messages without end (error during execution of methode \...THM, error evaluation operand ...) I had to remove the battery and pull the power-plug to reboot it.
Comment 1 Rafael J. Wysocki 2007-11-07 04:43:13 UTC
Thomas, please have a look. Thanks!
Comment 2 Thomas Gleixner 2007-11-13 06:31:42 UTC
> Adding "nohz=off" solved the problem "timer loosing time after resume". > Adding "highres=off" solved the problem of needing key presses. That means with nohz=off AND highres=off the box works as expected, right ? What happens if you only add highres=off ? tglx
Comment 3 Philipp Matthias Hahn 2007-11-20 23:52:33 UTC
When using only "highres=off", I need to press keys on supend. After resume, I found the following BUG()s in dmegs multiple times: BUG: soft lockup detected on CPU#0! [<c0104d3a>] show_trace_log_lvl+0x1a/0x2f [<c0105795>] show_trace+0x12/0x14 [<c01057ac>] dump_stack+0x15/0x17 [<c0145b3d>] softlockup_tick+0x95/0xb9 [<c0123aba>] run_local_timers+0x12/0x14 [<c0123af9>] update_process_times+0x3d/0x62 [<c013443d>] tick_nohz_handler+0x7f/0xe2 [<c013340a>] tick_do_broadcast+0x2b/0x4b [<c0133847>] tick_handle_oneshot_broadcast+0x54/0x9d [<c0106c22>] timer_interrupt+0x44/0x4e [<c0145dd2>] handle_IRQ_event+0x21/0x48 [<c014739b>] handle_edge_irq+0xd0/0x12d [<c0105ecb>] do_IRQ+0x8c/0xb5 ======================= The system was upgraded to 22.214.171.124.
Comment 4 Thomas Gleixner 2007-11-21 07:07:49 UTC
> ------- Comment #3 from firstname.lastname@example.org 2007-11-20 23:52 ------- > When using only "highres=off", I need to press keys on supend. > After resume, I found the following BUG()s in dmegs multiple times: > BUG: soft lockup detected on CPU#0! ... > The system was upgraded to 126.96.36.199. 188.8.131.52-rc1 has a fix for this: ftp://ftp.kernel.org/pub/linux/kernel/v2.6/stable-review/patch-184.108.40.206-rc1.bz2 Thanks, tglx
Comment 5 Philipp Matthias Hahn 2007-11-27 01:58:10 UTC
The system was updated to 220.127.116.11, using only "highres=off" The "soft lockup detected"-BUG-reports are gone. On suspend, I still need to press keys. The timer DOES seem to loose time again: After 3 suspend2disk-resume cycles, the pcspeaker-beeper sometimes hangs (beeps until I press a key), xterms+bash delay their output until either press a key or do a ping from a remote host. ssh-login takes 10 seconds. I didn't notice that behaviour until the 3rd cycle, but now it is very prominent. Since it might be related: I normally run ntpd, but I disabled the time daemon this time to not make it re-adjust any timer. I also use cpufreqd. /proc/timer_list: Timer List Version: v0.3 HRTIMER_MAX_CLOCK_BASES: 2 now at 1187728479943 nsecs cpu: 0 clock 0: .index: 0 .resolution: 4000250 nsecs .get_time: ktime_get_real .offset: 0 nsecs active timers: clock 1: .index: 1 .resolution: 4000250 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <d3dffed4>, hrtimer_wakeup, S:01 # expires at 1187728212017 nsecs [in 18446744073709283690 nsecs] #1: <d3dffed4>, it_real_fn, S:01 # expires at 1187744296769 nsecs [in 15816826 nsecs] #2: <d3dffed4>, it_real_fn, S:01 # expires at 1187983884308 nsecs [in 255404365 nsecs] #3: <d3dffed4>, it_real_fn, S:01 # expires at 1188026958310 nsecs [in 298478367 nsecs] #4: <d3dffed4>, hrtimer_wakeup, S:01 # expires at 1188724250671 nsecs [in 995770728 nsecs] #5: <d3dffed4>, it_real_fn, S:01 # expires at 1195169345028 nsecs [in 7440865085 nsecs] #6: <d3dffed4>, hrtimer_wakeup, S:01 # expires at 1200113730295 nsecs [in 12385250352 nsecs] .expires_next : 9223372036854775807 nsecs .hres_active : 0 .nr_events : 0 .nohz_mode : 1 .idle_tick : 1185366051500 nsecs .tick_stopped : 0 .idle_jiffies : 221341 .idle_calls : 264847 .idle_sleeps : 74277 .idle_entrytime : 1187569023452 nsecs .idle_sleeptime : 683915180593 nsecs .last_jiffies : 221892 .next_jiffies : 221893 .idle_expires : 1185368000000 nsecs jiffies: 221931 Tick Device: mode: 1 Clock Event Device: pit max_delta_ns: 27461866 min_delta_ns: 12571 mult: 5124677 shift: 32 mode: 3 next_event: 984092000000 nsecs set_next_event: pit_next_event set_mode: init_pit_timer event_handler: tick_handle_oneshot_broadcast tick_broadcast_mask: 00000001 tick_broadcast_oneshot_mask: 00000000 Tick Device: mode: 1 Clock Event Device: lapic max_delta_ns: 1345544175 min_delta_ns: 2406 mult: 26776373 shift: 32 mode: 3 next_event: 1187730199250 nsecs set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: tick_nohz_handler /proc/interrupts: CPU0 0: 133613 IO-APIC-edge timer 1: 677 IO-APIC-edge i8042 5: 46 IO-APIC-edge serial 8: 7 IO-APIC-edge rtc 9: 6433 IO-APIC-fasteoi acpi 12: 1392 IO-APIC-edge i8042 14: 53233 IO-APIC-edge libata 15: 0 IO-APIC-edge libata 17: 101129 IO-APIC-fasteoi yenta, uhci_hcd:usb2, eth0, i915@pci:0000:00:02.0 18: 0 IO-APIC-fasteoi Intel ICH6 Modem 19: 8 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb5 20: 0 IO-APIC-fasteoi uhci_hcd:usb3, sdhci:slot0 21: 210 IO-APIC-fasteoi uhci_hcd:usb4 22: 0 IO-APIC-fasteoi Intel ICH6 NMI: 0 LOC: 19363 ERR: 0 MIS: 0 /sys/devices/system/clocksource/clocksource0/available_clocksource: acpi_pm pit jiffies tsc /sys/devices/system/clocksource/clocksource0/current_clocksource: acpi_pm
Comment 6 Thomas Gleixner 2008-01-02 09:07:55 UTC
Does it always take 3 suspend2disk-resume cycles ?
Comment 7 Lann Martin 2008-01-18 11:02:27 UTC
Needing to press keys on suspend seems to be happening in TuxOnIce as well. Since it seems to be a kernel issue, this bug has been resolved: http://bugzilla.tuxonice.net/show_bug.cgi?id=350
Comment 8 John P. Weiss 2008-06-02 08:24:10 UTC
I may be a random user, but I'd like to chime in: ^_^ I've seen this very bug, on a ThinkPad X40, since at least kernel v2.6.21. Both suspend and resume just "stop" at the points described above, and do nothing until //something// generates interrupts (mouse or keyboard). Similarly, I see a delay in output on the VT console until there's some interrupt. Adding "nohz=off and highres=off" to the options (or recompiling with both off) fixes the problem. So, the problem existed going back a few versions, occurs on more than one type of hardware (okay, maybe the only difference is the touchscreen, but it's something) and is due to misbehavior with NO_HZ and/or the hires-timer. P.S. - I'm a developer by trade, myself, and suspend/resume my TP-X40 twice a day, 5 days a week, on my daily commute. So, if you wanted me to "instrument" my kernel (however it is that you guys do that) and send you logs or other debug output, I'm happy to do so. (I just might not do so instantaneously ... I do have a 4+ hr. commute to NYC every day.)
Comment 9 John P. Weiss 2008-09-04 09:54:12 UTC
I think I've found the way to fix this problem, at least for a ThinkPad X40. ^_^ After I wrote comment #8, I played around a bit with the kernel options. I found that I could use NO_HZ and the hires-timer ... IF I added "clocksource=acpi_pm" to the kernel boot commandline. I've been running my ThinkPad X40 this way for over 2 months now, suspending and resuming twice a day, Mon-Fri, without problems. People with other ThinkPads who are experiencing this same problem may want to give this fix a try.
Comment 10 Zhang Rui 2008-11-19 23:09:36 UTC
could you please try the latest upstream kernel and see if the problem still exists? there are a couple of timer/clock fixes merged in recently.
Comment 11 Zhang Rui 2009-03-18 19:13:37 UTC
no response from the bug reporter for more than 6 months. John, please reopen this bug if it's reproducible in the latest kernel release.