Bug 9314
Summary: | IBM X41 looses time after Suspend2Disk unless "nohz=off" and "highres=off" | ||
---|---|---|---|
Product: | Power Management | Reporter: | Philipp Matthias Hahn (pmhahn) |
Component: | Hibernation/Suspend | Assignee: | power-management_other |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | acpi-bugzilla, bunk, lann.martin, Murphy.Gebert, rjw, rui.zhang, tglx |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.23.8 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216 |
Description
Philipp Matthias Hahn
2007-11-06 00:49:28 UTC
Thomas, please have a look. Thanks! > Adding "nohz=off" solved the problem "timer loosing time after resume".
> Adding "highres=off" solved the problem of needing key presses.
That means with nohz=off AND highres=off the box works as expected, right ?
What happens if you only add highres=off ?
tglx
When using only "highres=off", I need to press keys on supend. After resume, I found the following BUG()s in dmegs multiple times: BUG: soft lockup detected on CPU#0! [<c0104d3a>] show_trace_log_lvl+0x1a/0x2f [<c0105795>] show_trace+0x12/0x14 [<c01057ac>] dump_stack+0x15/0x17 [<c0145b3d>] softlockup_tick+0x95/0xb9 [<c0123aba>] run_local_timers+0x12/0x14 [<c0123af9>] update_process_times+0x3d/0x62 [<c013443d>] tick_nohz_handler+0x7f/0xe2 [<c013340a>] tick_do_broadcast+0x2b/0x4b [<c0133847>] tick_handle_oneshot_broadcast+0x54/0x9d [<c0106c22>] timer_interrupt+0x44/0x4e [<c0145dd2>] handle_IRQ_event+0x21/0x48 [<c014739b>] handle_edge_irq+0xd0/0x12d [<c0105ecb>] do_IRQ+0x8c/0xb5 ======================= The system was upgraded to 2.6.23.8. > ------- Comment #3 from pmhahn@pmhahn.de 2007-11-20 23:52 ------- > When using only "highres=off", I need to press keys on supend. > After resume, I found the following BUG()s in dmegs multiple times: > BUG: soft lockup detected on CPU#0! ... > The system was upgraded to 2.6.23.8. 2.6.23.9-rc1 has a fix for this: ftp://ftp.kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.23.9-rc1.bz2 Thanks, tglx The system was updated to 2.6.23.9, using only "highres=off" The "soft lockup detected"-BUG-reports are gone. On suspend, I still need to press keys. The timer DOES seem to loose time again: After 3 suspend2disk-resume cycles, the pcspeaker-beeper sometimes hangs (beeps until I press a key), xterms+bash delay their output until either press a key or do a ping from a remote host. ssh-login takes 10 seconds. I didn't notice that behaviour until the 3rd cycle, but now it is very prominent. Since it might be related: I normally run ntpd, but I disabled the time daemon this time to not make it re-adjust any timer. I also use cpufreqd. /proc/timer_list: Timer List Version: v0.3 HRTIMER_MAX_CLOCK_BASES: 2 now at 1187728479943 nsecs cpu: 0 clock 0: .index: 0 .resolution: 4000250 nsecs .get_time: ktime_get_real .offset: 0 nsecs active timers: clock 1: .index: 1 .resolution: 4000250 nsecs .get_time: ktime_get .offset: 0 nsecs active timers: #0: <d3dffed4>, hrtimer_wakeup, S:01 # expires at 1187728212017 nsecs [in 18446744073709283690 nsecs] #1: <d3dffed4>, it_real_fn, S:01 # expires at 1187744296769 nsecs [in 15816826 nsecs] #2: <d3dffed4>, it_real_fn, S:01 # expires at 1187983884308 nsecs [in 255404365 nsecs] #3: <d3dffed4>, it_real_fn, S:01 # expires at 1188026958310 nsecs [in 298478367 nsecs] #4: <d3dffed4>, hrtimer_wakeup, S:01 # expires at 1188724250671 nsecs [in 995770728 nsecs] #5: <d3dffed4>, it_real_fn, S:01 # expires at 1195169345028 nsecs [in 7440865085 nsecs] #6: <d3dffed4>, hrtimer_wakeup, S:01 # expires at 1200113730295 nsecs [in 12385250352 nsecs] .expires_next : 9223372036854775807 nsecs .hres_active : 0 .nr_events : 0 .nohz_mode : 1 .idle_tick : 1185366051500 nsecs .tick_stopped : 0 .idle_jiffies : 221341 .idle_calls : 264847 .idle_sleeps : 74277 .idle_entrytime : 1187569023452 nsecs .idle_sleeptime : 683915180593 nsecs .last_jiffies : 221892 .next_jiffies : 221893 .idle_expires : 1185368000000 nsecs jiffies: 221931 Tick Device: mode: 1 Clock Event Device: pit max_delta_ns: 27461866 min_delta_ns: 12571 mult: 5124677 shift: 32 mode: 3 next_event: 984092000000 nsecs set_next_event: pit_next_event set_mode: init_pit_timer event_handler: tick_handle_oneshot_broadcast tick_broadcast_mask: 00000001 tick_broadcast_oneshot_mask: 00000000 Tick Device: mode: 1 Clock Event Device: lapic max_delta_ns: 1345544175 min_delta_ns: 2406 mult: 26776373 shift: 32 mode: 3 next_event: 1187730199250 nsecs set_next_event: lapic_next_event set_mode: lapic_timer_setup event_handler: tick_nohz_handler /proc/interrupts: CPU0 0: 133613 IO-APIC-edge timer 1: 677 IO-APIC-edge i8042 5: 46 IO-APIC-edge serial 8: 7 IO-APIC-edge rtc 9: 6433 IO-APIC-fasteoi acpi 12: 1392 IO-APIC-edge i8042 14: 53233 IO-APIC-edge libata 15: 0 IO-APIC-edge libata 17: 101129 IO-APIC-fasteoi yenta, uhci_hcd:usb2, eth0, i915@pci:0000:00:02.0 18: 0 IO-APIC-fasteoi Intel ICH6 Modem 19: 8 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb5 20: 0 IO-APIC-fasteoi uhci_hcd:usb3, sdhci:slot0 21: 210 IO-APIC-fasteoi uhci_hcd:usb4 22: 0 IO-APIC-fasteoi Intel ICH6 NMI: 0 LOC: 19363 ERR: 0 MIS: 0 /sys/devices/system/clocksource/clocksource0/available_clocksource: acpi_pm pit jiffies tsc /sys/devices/system/clocksource/clocksource0/current_clocksource: acpi_pm Does it always take 3 suspend2disk-resume cycles ? Needing to press keys on suspend seems to be happening in TuxOnIce as well. Since it seems to be a kernel issue, this bug has been resolved: http://bugzilla.tuxonice.net/show_bug.cgi?id=350 I may be a random user, but I'd like to chime in: ^_^ I've seen this very bug, on a ThinkPad X40, since at least kernel v2.6.21. Both suspend and resume just "stop" at the points described above, and do nothing until //something// generates interrupts (mouse or keyboard). Similarly, I see a delay in output on the VT console until there's some interrupt. Adding "nohz=off and highres=off" to the options (or recompiling with both off) fixes the problem. So, the problem existed going back a few versions, occurs on more than one type of hardware (okay, maybe the only difference is the touchscreen, but it's something) and is due to misbehavior with NO_HZ and/or the hires-timer. P.S. - I'm a developer by trade, myself, and suspend/resume my TP-X40 twice a day, 5 days a week, on my daily commute. So, if you wanted me to "instrument" my kernel (however it is that you guys do that) and send you logs or other debug output, I'm happy to do so. (I just might not do so instantaneously ... I do have a 4+ hr. commute to NYC every day.) I think I've found the way to fix this problem, at least for a ThinkPad X40. ^_^ After I wrote comment #8, I played around a bit with the kernel options. I found that I could use NO_HZ and the hires-timer ... IF I added "clocksource=acpi_pm" to the kernel boot commandline. I've been running my ThinkPad X40 this way for over 2 months now, suspending and resuming twice a day, Mon-Fri, without problems. People with other ThinkPads who are experiencing this same problem may want to give this fix a try. could you please try the latest upstream kernel and see if the problem still exists? there are a couple of timer/clock fixes merged in recently. no response from the bug reporter for more than 6 months. John, please reopen this bug if it's reproducible in the latest kernel release. |