Bug 42382 - Soft-lockup during cpu-hotplug in VFS callpaths
Summary: Soft-lockup during cpu-hotplug in VFS callpaths
Status: CLOSED DUPLICATE of bug 42402
Alias: None
Product: Power Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: power-management_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-05 10:10 UTC by Srivatsa S. Bhat
Modified: 2012-06-05 03:07 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.0.1, 3.0.3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Soft-lockup_log (57.35 KB, text/plain)
2011-09-05 10:10 UTC, Srivatsa S. Bhat
Details

Description Srivatsa S. Bhat 2011-09-05 10:10:47 UTC
Created attachment 71732 [details]
Soft-lockup_log

While running stressful cpu hotplug tests along with kernel compilation
running in background, soft-lockups are detected on multiple CPUs.
Sometimes this also leads to hard lockups and kernel panic.
All the soft-lockups seem to occur at vfsmount_lock_local_cpu() or other VFS
callpaths.


[37108.410813] BUG: soft lockup - CPU#5 stuck for 22s! [cc1:29669]
<snip>
[37108.694781] Call Trace:
[37108.697306]  [<ffffffff81199e70>] ? vfsmount_lock_local_lock_cpu+0x70/0x70
[37108.704258]  [<ffffffff81187cb5>] path_init+0x315/0x400
[37108.709558]  [<ffffffff8127c398>] ? __raw_spin_lock_init+0x38/0x70
[37108.715812]  [<ffffffff8118961c>] path_openat+0x8c/0x3f0
[37108.721203]  [<ffffffff81012129>] ? sched_clock+0x9/0x10
[37108.726597]  [<ffffffff8109416d>] ? sched_clock_cpu+0xcd/0x110
[37108.732508]  [<ffffffff810a178d>] ? trace_hardirqs_off+0xd/0x10
[37108.738498]  [<ffffffff8109421f>] ? local_clock+0x6f/0x80
[37108.743970]  [<ffffffff81189a99>] do_filp_open+0x49/0xa0
[37108.749362]  [<ffffffff811982f3>] ? alloc_fd+0xc3/0x210
[37108.754665]  [<ffffffff8152584b>] ? _raw_spin_unlock+0x2b/0x40
[37108.760575]  [<ffffffff811982f3>] ? alloc_fd+0xc3/0x210
[37108.765875]  [<ffffffff81179607>] do_sys_open+0x107/0x1e0
[37108.771352]  [<ffffffff810d610f>] ? audit_syscall_entry+0x1bf/0x1f0
[37108.777695]  [<ffffffff81179720>] sys_open+0x20/0x30
[37108.782741]  [<ffffffff8152e202>] system_call_fastpath+0x16/0x1b

Hardware: Dual socket quad-core hyper-threaded Intel x86 machine
Scenario:
(a) Stressful cpu hotplug tests + kernel compilation

(b) IRQ balancing had been disabled and all the IRQs  were made to be
    routed to CPU 0 (except the ones that couldn't be routed).

(c) Lockdep was enabled during kernel configuration.

Steps (b) and (c) were done to dig deeper into the issue. However the same
issue was observed by just doing step (a).

Definitely there seems to be a race condition occurring here, because this
issue is hit after sometime, after starting the tests. And the time it
takes to hit the issue increases as we increase the number of debug print
statements. In some cases (especially when the number of debug print
statements were quite high), the stress on the machine had to be increased
in order to hit the issue within measurable time. In my tests, a maximum
of about 2 to 2.5 hours was sufficient, to hit this bug.
Comment 1 Srivatsa S. Bhat 2011-09-06 08:07:52 UTC

*** This bug has been marked as a duplicate of bug 42402 ***

Note You need to log in before you can comment on or make changes to this bug.