Bug 9314 - IBM X41 looses time after Suspend2Disk unless "nohz=off" and "highres=off"
IBM X41 looses time after Suspend2Disk unless "nohz=off" and "highres=off"
Status: REJECTED INSUFFICIENT_DATA
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend
All Linux
: P1 normal
Assigned To: power-management_other
:
Depends on:
Blocks: 7216
  Show dependency treegraph
 
Reported: 2007-11-06 00:49 UTC by Philipp Matthias Hahn
Modified: 2009-03-18 19:13 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.23.8
Tree: Mainline
Regression: ---


Attachments

Description Philipp Matthias Hahn 2007-11-06 00:49:28 UTC
Distribution: Debian/etch
Hardware Environment: IBM X41 tablet PC
Problem Description:
I have an IBM X41 tablet pc running 2.6.23 with the following problem:
Suspend2Ram work fine, but when I do a Suspend2Disk (echo disk >
/sys/power/state), the notebook hangs showing the following lines:
        Stopping tasks ... done.
        Shrinking memory... done (49361 pages freed)
        Freed 197444 kbytes in 0.64 seconds (308.50 MB/s)
        Suspending console(s)

When I repeatedly press some keys or move the TrackPoint, the SwSusp
continues and the notebook powers off.
On reboot, the notebook resumes but hangs showing the following screen:
        Stopping tasks ... done.
        Loading image data pages (62483 pages) ... done
        Read 249932 kbytes in 18.63 seconds (13.41 MB/s)
        Suspending console(s)

Generating interrupts by pressing keys or moving the TrackPoints makes
the resume continue. But after that, the notebook looses time, xterms
don't get updated until I press a key, the beeper beeps until I press a
key, etc.

DMESG: http://corellon.svs.informatik.uni-oldenburg.de/~pmhahn/x41.dmesg
CONFIG: http://corellon.svs.informatik.uni-oldenburg.de/~pmhahn/x41.config

Steps to reproduce:
echo shutdown > /sys/power/disk
echo disk > /sys/power/state


Adding "nohz=off" solved the problem "timer loosing time after resume".
Adding "highres=off" solved the problem of needing key presses.
Doing "echo platform > /sys/power/disk" instead didn't change anything.
Once ACPI completely barfed up after a suspend-resume cycle spewing messages
without end (error during execution of methode \...THM, error evaluation
operand ...) I had to remove the battery and pull the power-plug to
reboot it.
Comment 1 Rafael J. Wysocki 2007-11-07 04:43:13 UTC
Thomas, please have a look.

Thanks!
Comment 2 Thomas Gleixner 2007-11-13 06:31:42 UTC
> Adding "nohz=off" solved the problem "timer loosing time after resume".
> Adding "highres=off" solved the problem of needing key presses.

That means with nohz=off AND highres=off the box works as expected, right ?

What happens if you only add highres=off ?

    tglx
Comment 3 Philipp Matthias Hahn 2007-11-20 23:52:33 UTC
When using only "highres=off", I need to press keys on supend.
After resume, I found the following BUG()s in dmegs multiple times:
BUG: soft lockup detected on CPU#0!
 [<c0104d3a>] show_trace_log_lvl+0x1a/0x2f
 [<c0105795>] show_trace+0x12/0x14
 [<c01057ac>] dump_stack+0x15/0x17
 [<c0145b3d>] softlockup_tick+0x95/0xb9
 [<c0123aba>] run_local_timers+0x12/0x14
 [<c0123af9>] update_process_times+0x3d/0x62
 [<c013443d>] tick_nohz_handler+0x7f/0xe2
 [<c013340a>] tick_do_broadcast+0x2b/0x4b
 [<c0133847>] tick_handle_oneshot_broadcast+0x54/0x9d
 [<c0106c22>] timer_interrupt+0x44/0x4e
 [<c0145dd2>] handle_IRQ_event+0x21/0x48
 [<c014739b>] handle_edge_irq+0xd0/0x12d
 [<c0105ecb>] do_IRQ+0x8c/0xb5
 =======================
The system was upgraded to 2.6.23.8.
Comment 4 Thomas Gleixner 2007-11-21 07:07:49 UTC
> ------- Comment #3 from pmhahn@pmhahn.de  2007-11-20 23:52 -------
> When using only "highres=off", I need to press keys on supend.
> After resume, I found the following BUG()s in dmegs multiple times:
> BUG: soft lockup detected on CPU#0!
...
> The system was upgraded to 2.6.23.8.

2.6.23.9-rc1 has a fix for this:
ftp://ftp.kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.23.9-rc1.bz2

Thanks,
	tglx

Comment 5 Philipp Matthias Hahn 2007-11-27 01:58:10 UTC
The system was updated to 2.6.23.9, using only "highres=off"

The "soft lockup detected"-BUG-reports are gone.
On suspend, I still need to press keys.

The timer DOES seem to loose time again: After 3 suspend2disk-resume cycles, the pcspeaker-beeper sometimes hangs (beeps until I press a key), xterms+bash delay their output until either press a key or do a ping from a remote host. ssh-login takes 10 seconds.
I didn't notice that behaviour until the 3rd cycle, but now it is very prominent.

Since it might be related: I normally run ntpd, but I disabled the time daemon this time to not make it re-adjust any timer.
I also use cpufreqd.

/proc/timer_list:
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 1187728479943 nsecs

cpu: 0
 clock 0:
  .index:      0
  .resolution: 4000250 nsecs
  .get_time:   ktime_get_real
  .offset:     0 nsecs
active timers:
 clock 1:
  .index:      1
  .resolution: 4000250 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <d3dffed4>, hrtimer_wakeup, S:01
 # expires at 1187728212017 nsecs [in 18446744073709283690 nsecs]
 #1: <d3dffed4>, it_real_fn, S:01
 # expires at 1187744296769 nsecs [in 15816826 nsecs]
 #2: <d3dffed4>, it_real_fn, S:01
 # expires at 1187983884308 nsecs [in 255404365 nsecs]
 #3: <d3dffed4>, it_real_fn, S:01
 # expires at 1188026958310 nsecs [in 298478367 nsecs]
 #4: <d3dffed4>, hrtimer_wakeup, S:01
 # expires at 1188724250671 nsecs [in 995770728 nsecs]
 #5: <d3dffed4>, it_real_fn, S:01
 # expires at 1195169345028 nsecs [in 7440865085 nsecs]
 #6: <d3dffed4>, hrtimer_wakeup, S:01
 # expires at 1200113730295 nsecs [in 12385250352 nsecs]
  .expires_next   : 9223372036854775807 nsecs
  .hres_active    : 0
  .nr_events      : 0
  .nohz_mode      : 1
  .idle_tick      : 1185366051500 nsecs
  .tick_stopped   : 0
  .idle_jiffies   : 221341
  .idle_calls     : 264847
  .idle_sleeps    : 74277
  .idle_entrytime : 1187569023452 nsecs
  .idle_sleeptime : 683915180593 nsecs
  .last_jiffies   : 221892
  .next_jiffies   : 221893
  .idle_expires   : 1185368000000 nsecs
jiffies: 221931

Tick Device: mode:     1
Clock Event Device: pit
 max_delta_ns:   27461866
 min_delta_ns:   12571
 mult:           5124677
 shift:          32
 mode:           3
 next_event:     984092000000 nsecs
 set_next_event: pit_next_event
 set_mode:       init_pit_timer
 event_handler:  tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000001
tick_broadcast_oneshot_mask: 00000000


Tick Device: mode:     1
Clock Event Device: lapic
 max_delta_ns:   1345544175
 min_delta_ns:   2406
 mult:           26776373
 shift:          32
 mode:           3
 next_event:     1187730199250 nsecs
 set_next_event: lapic_next_event
 set_mode:       lapic_timer_setup
 event_handler:  tick_nohz_handler

/proc/interrupts:
           CPU0
  0:     133613   IO-APIC-edge      timer
  1:        677   IO-APIC-edge      i8042
  5:         46   IO-APIC-edge      serial
  8:          7   IO-APIC-edge      rtc
  9:       6433   IO-APIC-fasteoi   acpi
 12:       1392   IO-APIC-edge      i8042
 14:      53233   IO-APIC-edge      libata
 15:          0   IO-APIC-edge      libata
 17:     101129   IO-APIC-fasteoi   yenta, uhci_hcd:usb2, eth0, i915@pci:0000:00:02.0
 18:          0   IO-APIC-fasteoi   Intel ICH6 Modem
 19:          8   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb5
 20:          0   IO-APIC-fasteoi   uhci_hcd:usb3, sdhci:slot0
 21:        210   IO-APIC-fasteoi   uhci_hcd:usb4
 22:          0   IO-APIC-fasteoi   Intel ICH6
NMI:          0
LOC:      19363
ERR:          0
MIS:          0

/sys/devices/system/clocksource/clocksource0/available_clocksource:
acpi_pm pit jiffies tsc

/sys/devices/system/clocksource/clocksource0/current_clocksource:
acpi_pm
Comment 6 Thomas Gleixner 2008-01-02 09:07:55 UTC
Does it always take 3 suspend2disk-resume cycles ?
Comment 7 Lann Martin 2008-01-18 11:02:27 UTC
Needing to press keys on suspend seems to be happening in TuxOnIce as well. Since it seems to be a kernel issue, this bug has been resolved:

http://bugzilla.tuxonice.net/show_bug.cgi?id=350
Comment 8 John P. Weiss 2008-06-02 08:24:10 UTC
I may be a random user, but I'd like to chime in:  ^_^

I've seen this very bug, on a ThinkPad X40, since at least kernel v2.6.21.  Both suspend and resume just "stop" at the points described above, and do nothing until //something// generates interrupts (mouse or keyboard).  Similarly, I see a delay in output on the VT console until there's some interrupt.

Adding "nohz=off and highres=off" to the options (or recompiling with both off) fixes the problem.

So, the problem existed going back a few versions, occurs on more than one type of hardware (okay, maybe the only difference is the touchscreen, but it's something) and is due to misbehavior with NO_HZ and/or the hires-timer.



P.S. - I'm a developer by trade, myself, and suspend/resume my TP-X40 twice a day, 5 days a week, on my daily commute.  So, if you wanted me to "instrument" my kernel (however it is that you guys do that) and send you logs or other debug output, I'm happy to do so.  (I just might not do so instantaneously ... I do have a 4+ hr. commute to NYC every day.)
Comment 9 John P. Weiss 2008-09-04 09:54:12 UTC
I think I've found the way to fix this problem, at least for a ThinkPad X40.  ^_^

After I wrote comment #8, I played around a bit with the kernel options.  I found that I could use NO_HZ and the hires-timer ... IF I added "clocksource=acpi_pm" to the kernel boot commandline.

I've been running my ThinkPad X40 this way for over 2 months now, suspending and resuming twice a day, Mon-Fri, without problems.

People with other ThinkPads who are experiencing this same problem may want to give this fix a try.
Comment 10 Zhang Rui 2008-11-19 23:09:36 UTC
could you please try the latest upstream kernel and see if the problem still exists?
there are a couple of timer/clock fixes merged in recently.
Comment 11 Zhang Rui 2009-03-18 19:13:37 UTC
no response from the bug reporter for more than 6 months.

John, please reopen this bug if it's reproducible in the latest kernel release.

Note You need to log in before you can comment on or make changes to this bug.