Bug 10753
Summary: | BUG: soft lockup - CPU#0 stuck for 130s! | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Nicolas Mailhot (Nicolas.Mailhot) |
Component: | Other | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | akpm, bunk |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25.3-18.fc9.x86_64 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | kernel log |
Description
Nicolas Mailhot
2008-05-19 14:55:00 UTC
Created attachment 16205 [details]
kernel log
Impressive. Did it _really_ take that long? Because I have a sneaking suspicion that the warning is errant - we see so many of them. otoh, madvise _could_ take a long time if say the whole world was swapped out or something. Did the application eventually recover and work OK? The logs are not cooked (ok I removed a few irrelevant userspace fedora-devel segfaults) The application didn't recover. I rebooted the system. Totem was playing a 4G video file at the time, and its recovery powers are rather limited when it looses video sync. I didn't clock the 130s but they seem realistic (before rebooting I switched to the console and after a while I did notice X was getting unstuck) Nicolas, any update on this one ? I seem not to hit it anymore with current fedora-devel kernels. But I've done no heavy video streaming lately I have got similar lockup: BUG: soft lockup - CPU#0 stuck for 177s! [ksoftirqd/0:4] Modules linked in: nvidia(P) CPU 0: Modules linked in: nvidia(P) Pid: 4, comm: ksoftirqd/0 Tainted: P 2.6.26.3 #1 RIP: 0010:[<ffffffff80242087>] [<ffffffff80242087>] run_timer_softirq+0x106/0x20b RSP: 0018:ffffffff8093cf28 EFLAGS: 00000216 RAX: ffffffff8093cf28 RBX: ffffffff809ca110 RCX: ffffffff8093cf28 RDX: ffffffff8091285c RSI: 0000000000000001 RDI: ffffffff809c9f00 RBP: ffffffff8093cea0 R08: 0000000000000000 R09: 0000000000000000 R10: ffff81012fc7bfd8 R11: ffffffff8086cf40 R12: ffffffff8020ceb6 R13: ffffffff80968200 R14: ffffffff8093cea0 R15: ffffffff809c9f00 FS: 0000000000000000(0000) GS:ffffffff808ba000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00007ff7ce112000 CR3: 0000000124452000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <IRQ> [<ffffffff8024205b>] ? run_timer_softirq+0xda/0x20b [<ffffffff8023d412>] ? __do_softirq+0x6b/0xeb [<ffffffff8020d40c>] ? call_softirq+0x1c/0x28 <EOI> [<ffffffff8023d911>] ? ksoftirqd+0x0/0x9c [<ffffffff8020fdfc>] ? do_softirq+0x4a/0x9a [<ffffffff8023d911>] ? ksoftirqd+0x0/0x9c [<ffffffff8023d952>] ? ksoftirqd+0x41/0x9c [<ffffffff8024c9b1>] ? kthread+0x7a/0xae [<ffffffff8020d098>] ? child_rip+0xa/0x12 [<ffffffff8024c937>] ? kthread+0x0/0xae [<ffffffff8020d08e>] ? child_rip+0x0/0x12 That's not a fedora problem. It's a bug in mainline :( A fix for this is available against 2.6.27-rc5: http://bugzilla.kernel.org/attachment.cgi?id=17644&action=view Full details at: http://bugzilla.kernel.org/show_bug.cgi?id=11418 It would be great if you could test the patch on your machine. Bug is fixed for me since 2.6.26.6 and 2.6.27. |