Most recent kernel where this bug did *NOT* occur: 2.6.20.x Distribution: Debian/testing Hardware Environment: Dual Xeon with SMP kernel Software Environment: Problem Description: dmesg reports NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 NOHZ: local_softirq_pending 02 Clocksource tsc unstable (delta = 4686825172 ns) NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 NOHZ: local_softirq_pending 08 BUG: soft lockup detected on CPU#0! [<c013c69a>] softlockup_tick+0x90/0xb5 [<c0123564>] update_process_times+0x28/0x5e [<c01328fe>] tick_sched_timer+0x48/0x9a [<c012eff1>] hrtimer_interrupt+0x13f/0x1c5 [<c010e75d>] smp_apic_timer_interrupt+0x55/0x85 [<c011461c>] __wake_up_common+0x39/0x59 [<c01048d0>] apic_timer_interrupt+0x28/0x30 [<c02db6d0>] rt_check_expire+0xf8/0x160 [<c02db5d8>] rt_check_expire+0x0/0x160 [<c01227b7>] run_timer_softirq+0x11e/0x17a [<c011ece9>] it_real_fn+0x0/0x17 [<c011ecfb>] it_real_fn+0x12/0x17 [<c011f8c2>] __do_softirq+0x74/0xd9 [<c011f794>] ksoftirqd+0x0/0xba [<c0106624>] do_softirq+0x5f/0xa8 [<c011f807>] ksoftirqd+0x73/0xba [<c012bd02>] kthread+0xae/0xd3 [<c012bc54>] kthread+0x0/0xd3 [<c0104a53>] kernel_thread_helper+0x7/0x14 ======================= NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 NOHZ: local_softirq_pending 22 Steps to reproduce:
Got something similar on 2.6.21.1 Distribution: Gentoo 2007.0 Hardware: Centrino Duo, Intel T2250 @1.73GHz dmesg: ... Clocksource tsc unstable (delta = 3040409348024 ns) ... BUG: soft lockup detected on CPU#1! [<c01461c2>] softlockup_tick+0x90/0xbf [<c012b977>] update_process_times+0x28/0x5e [<c013a4c0>] tick_periodic+0x22/0x71 [<c013a526>] tick_handle_periodic+0x17/0x71 [<c01159be>] smp_apic_timer_interrupt+0x4f/0x7f [<c022f1a2>] acpi_hw_register_write+0x11b/0x14b [<c01049fc>] apic_timer_interrupt+0x28/0x30 [<c02417b8>] acpi_processor_idle+0x20f/0x3d3 [<c0102386>] cpu_idle+0x84/0xdb ======================= ...
I got a similar problem with 2.6.21.1 (and 2.6.22-rc3) Most recent kernel where this bug did *NOT* occur: 2.6.20.x Distribution: Debian/unstable Hardware Environment: Thinkpad R60e, CPU: Intel(R) Celeron(R) M CPU 420 @ 1.60GHz Description: Every other minute the machine (at least keyboard and mouse) freezes. From the kern.log (with 2.6.21.1, 2.6.22-rc3 doesn't write anything along these lines): May 26 13:26:42 nerys kernel: Linux version 2.6.21-1-686 (Debian 2.6.21-3) (wal di@debian.org) (gcc version 4.1.3 20070518 (prerelease) (Debian 4.1.2-8)) #1 SM P Fri May 25 13:06:47 UTC 2007 [..] May 26 13:26:42 nerys kernel: Kernel command line: root=/dev/mapper/crypt-root ro vga=791 [..] May 26 13:26:42 nerys kernel: Clocksource tsc unstable (delta = -292814672 ns) [..] May 26 13:30:38 nerys kernel: BUG: soft lockup detected on CPU#0! May 26 13:30:38 nerys kernel: [<c014aad3>] softlockup_tick+0xa6/0xb5 May 26 13:30:38 nerys kernel: [<c012a05b>] update_process_times+0x3b/0x5e May 26 13:30:38 nerys kernel: [<c0138d60>] tick_sched_timer+0x78/0xbb May 26 13:30:38 nerys kernel: [<c01358e0>] hrtimer_interrupt+0x131/0x1bd May 26 13:30:38 nerys kernel: [<c0138ce8>] tick_sched_timer+0x0/0xbb May 26 13:30:38 nerys kernel: [<c0114bbd>] smp_apic_timer_interrupt+0x6c/0x7d May 26 13:30:38 nerys kernel: [<c01f7e1a>] acpi_hw_register_write+0x11b/0x14b May 26 13:30:38 nerys kernel: [<c010481c>] apic_timer_interrupt+0x28/0x30 May 26 13:30:38 nerys kernel: [<e0040967>] acpi_processor_idle+0x235/0x40a [pro May 26 13:30:38 nerys kernel: [<c01023b5>] cpu_idle+0xb5/0xd6 May 26 13:30:38 nerys kernel: [<c0345a6b>] start_kernel+0x475/0x47d May 26 13:30:38 nerys kernel: [<c03451b8>] unknown_bootoption+0x0/0x202 May 26 13:30:38 nerys kernel: ======================= Cf. also http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=426738
Booting with clocksource=acpi_pm seems to cirumvent my problem (I got the idea from Bug 8582).
Short update: clocksource=acpi_pm kills the resume after a suspend2ram. clocksource=pit works (i.e. no soft lockups and working resume). The situation is still the same with 2.6.22-rc4.
Hmm, seems the TSC is buggy. But acpi_pm should work. Can you try whether 2.6.22-rc4-mm2 works for you with acpi_pm ?
Sounds like a nice challenge for a Sunday afternoon (I've never used any kernel patches before) :-) Ok, here's what I've done: * downloaded and unpacked linux-2.6.22-rc4.tar.bz2 * downloaded und uncompressed 2.6.22-rc4-mm2.bz2 * patched the former with the latter * run "make oldconfig" against the config of the latest Debian 2.6.22-rc4-686 kernel image * built the kernel (with Debian's kernel-package; shouldn't change the kernel but makes un/installing easier) Results: * booting without any clocksource= parameter: - according to /sys/devices/system/clocksource/clocksource0/current_clocksource tsc is used - no freezes/lockups - suspend to ram (with uswsusp) and resume work - only odd thing: powertop says "< CPU was 100% busy; no C-states were entered >" * booting with clocksource=acpi_pm: - the same: no lockups, resume works, same powertop output * booting with clocksource=pit: - the same I'll happily send further information or do other tests if you have any questions!
Gregor, did we fix all this in 2.6.22? Thanks.
Thanks for coming back to this issue. I'm now running the Debian kernel 2.6.22-1-686 (2.6.22-2) which is based on the 2.6.22.1 release. * If I boot without any clocksource= parameter I get (according to /sys/devices/system/clocksource/clocksource0/current_clocksource) hpet. The system freezes every other minute, there are no messages in /var/log/kern.log and resume after suspend to RAM does not work. * clocksource=tsc: the laptop doesn't get very far in booting, it hangs somewhere between detecting USB hubs and detecting the SATA controller (tried three times). * clocksource=pit: no lockups, resume after s2ram works, no strange powertop outputs anymore * clocksource=acpi_pm: no lockups, resume after s2ram doesn't work * clocksource=jiffies: boots, no lockups on the console but X doesn't come up?! (tried twice). Then weird stuff with suspend to ram/disk happened :-/ If you need any information/logs/output or want me to test something specific just tell me!
Gregor, any news on this ?
Thanks for reminding me of this issue. And I have good news: I just installed a 2.6.23 kernel (the package linux-image-2.6.23-1-686, version 2.6.23-1~experimental.1~snapshot.9723 from the Debian kernel team's repository) and I don't see any problems anymore (booting without an clocksource parameter and getting hpet). Great. As far as I'm concerned I think this bug can be closed. Thanks for your perseverance!
Gregor, Thanks. I'm closing it. tglx