Latest working kernel version: 2.6.25.2-5.fc10.x86_64 (but the bug may be triggered now by fedora-devel userspace changes) Earliest failing kernel version: N/A one-time Distribution: fedora-devel Hardware Environment: AMD X2 on CK804 Software Environment: totem-2.23.2-3.fc9.x86_64 Problem Description: BUG: soft lockup - CPU#0 stuck for 130s! [totem:29381] CPU 0: Pid: 29381, comm: totem Tainted: G M 2.6.25.3-18.fc9.x86_64 #1 RIP: 0010:[<ffffffff8103da1c>] [<ffffffff8103da1c>] run_timer_softirq+0x17e/0x200 RSP: 0018:ffffffff814c1ea0 EFLAGS: 00000282 RAX: ffff81000e889fd8 RBX: ffffffff814c1f00 RCX: ffffffff8837125f RDX: ffffffff814c1ec0 RSI: 0000000000007344 RDI: ffff81007e0d6a88 RBP: ffffffff814c1e20 R08: ffff81007e0d6ac0 R09: 0000000000000000 R10: 00002d6fdb26bbc9 R11: ffffffff814c1e20 R12: ffffffff8100cb16 R13: ffffffff814c1e20 R14: ffff81007e0d6a88 R15: ffff81007e0d6a00 FS: 00000000451e0950(0063) GS:ffffffff813f6000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f631a173000 CR3: 000000004175b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <IRQ> [<ffffffff81039520>] ? __do_softirq+0x66/0xd3 [<ffffffff8100d06c>] ? call_softirq+0x1c/0x28 [<ffffffff8100e210>] ? do_softirq+0x34/0x72 [<ffffffff81039304>] ? irq_exit+0x3f/0x80 [<ffffffff8101bf7a>] ? smp_apic_timer_interrupt+0x8a/0xa3 [<ffffffff8100cb16>] ? apic_timer_interrupt+0x66/0x70 <EOI> [<ffffffff81292430>] ? _spin_unlock_irqrestore+0x8/0xa [<ffffffff81080eae>] ? release_pages+0x16a/0x1de [<ffffffff81091757>] ? free_pages_and_swap_cache+0x7b/0x9b [<ffffffff8108785d>] ? zap_page_range+0xd7/0xf0 [<ffffffff81085f7a>] ? sys_madvise+0x3e6/0x4dc [<ffffffff8106d66b>] ? audit_syscall_entry+0x126/0x15a [<ffffffff8106d33c>] ? audit_syscall_exit+0x331/0x353 [<ffffffff81013073>] ? syscall_trace_enter+0xb5/0xb9 [<ffffffff8100c052>] ? tracesys+0xd5/0xda
Created attachment 16205 [details] kernel log
Impressive. Did it _really_ take that long? Because I have a sneaking suspicion that the warning is errant - we see so many of them. otoh, madvise _could_ take a long time if say the whole world was swapped out or something. Did the application eventually recover and work OK?
The logs are not cooked (ok I removed a few irrelevant userspace fedora-devel segfaults) The application didn't recover. I rebooted the system. Totem was playing a 4G video file at the time, and its recovery powers are rather limited when it looses video sync. I didn't clock the 130s but they seem realistic (before rebooting I switched to the console and after a while I did notice X was getting unstuck)
Nicolas, any update on this one ?
I seem not to hit it anymore with current fedora-devel kernels. But I've done no heavy video streaming lately
I have got similar lockup: BUG: soft lockup - CPU#0 stuck for 177s! [ksoftirqd/0:4] Modules linked in: nvidia(P) CPU 0: Modules linked in: nvidia(P) Pid: 4, comm: ksoftirqd/0 Tainted: P 2.6.26.3 #1 RIP: 0010:[<ffffffff80242087>] [<ffffffff80242087>] run_timer_softirq+0x106/0x20b RSP: 0018:ffffffff8093cf28 EFLAGS: 00000216 RAX: ffffffff8093cf28 RBX: ffffffff809ca110 RCX: ffffffff8093cf28 RDX: ffffffff8091285c RSI: 0000000000000001 RDI: ffffffff809c9f00 RBP: ffffffff8093cea0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff81012fc7bfd8 R11: ffffffff8086cf40 R12: ffffffff8020ceb6 R13: ffffffff80968200 R14: ffffffff8093cea0 R15: ffffffff809c9f00 FS: 0000000000000000(0000) GS:ffffffff808ba000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00007ff7ce112000 CR3: 0000000124452000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <IRQ> [<ffffffff8024205b>] ? run_timer_softirq+0xda/0x20b [<ffffffff8023d412>] ? __do_softirq+0x6b/0xeb [<ffffffff8020d40c>] ? call_softirq+0x1c/0x28 <EOI> [<ffffffff8023d911>] ? ksoftirqd+0x0/0x9c [<ffffffff8020fdfc>] ? do_softirq+0x4a/0x9a [<ffffffff8023d911>] ? ksoftirqd+0x0/0x9c [<ffffffff8023d952>] ? ksoftirqd+0x41/0x9c [<ffffffff8024c9b1>] ? kthread+0x7a/0xae [<ffffffff8020d098>] ? child_rip+0xa/0x12 [<ffffffff8024c937>] ? kthread+0x0/0xae [<ffffffff8020d08e>] ? child_rip+0x0/0x12
That's not a fedora problem. It's a bug in mainline :( A fix for this is available against 2.6.27-rc5: http://bugzilla.kernel.org/attachment.cgi?id=17644&action=view Full details at: http://bugzilla.kernel.org/show_bug.cgi?id=11418 It would be great if you could test the patch on your machine.
Bug is fixed for me since 2.6.26.6 and 2.6.27.