Bug 28722 - kworker: scheduling while atomic
Summary: kworker: scheduling while atomic
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-02-09 21:07 UTC by Nicos Gollan
Modified: 2011-03-15 15:42 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.37
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Console screenshot (2.6.36) (699.55 KB, image/jpeg)
2011-02-09 21:10 UTC, Nicos Gollan
Details
dmesg (2.6.35, works) (56.05 KB, text/plain)
2011-02-09 21:17 UTC, Nicos Gollan
Details
dmesg (2.6.37, crashed) (40.02 KB, text/plain)
2011-02-09 21:18 UTC, Nicos Gollan
Details
dmesg from the latest freeze (64.08 KB, text/plain)
2011-03-05 11:22 UTC, Nicos Gollan
Details

Description Nicos Gollan 2011-02-09 21:07:41 UTC
Kernels starting with 2.6.36 crash periodically with:

BUG: scheduling while atomic: kworker/0:1/0/0x10000100

It's always in kworker. I cannot determine a use pattern that provokes it, but all builds of 2.6.36 and 2.6.37 so far produce the problem, 2.6.35 seems to run perfectly stable. I'm using Debian builds of the kernels, so I have no idea what versions they are based on (changelog and distribution versioning are worthless).

I'm attaching a screenshot of the last console output. This is on a 6 core system, and I guess there are 3-4 screens more of that, but there is no scrolling and no logging.
Comment 1 Nicos Gollan 2011-02-09 21:10:25 UTC
Created attachment 47022 [details]
Console screenshot (2.6.36)
Comment 2 Nicos Gollan 2011-02-09 21:17:51 UTC
Created attachment 47032 [details]
dmesg (2.6.35, works)
Comment 3 Nicos Gollan 2011-02-09 21:18:25 UTC
Created attachment 47042 [details]
dmesg (2.6.37, crashed)
Comment 4 Andrew Morton 2011-02-09 21:31:16 UTC
Thanks.  I'll recategorise this under x86_64.
Comment 5 Nicos Gollan 2011-03-03 09:49:24 UTC
Small update, I've had a surprisingly crash-free time so far with 2.6.38-rc6 from Debian packages, which may be due to the EHCI fix. The other difference is that (due to missing kbuild for .38 on Debian), there are no VirtualBox modules.
Comment 6 Nicos Gollan 2011-03-05 10:55:20 UTC
Spoke too early, 2.6.38-rc6 still freezes.
Comment 7 Nicos Gollan 2011-03-05 11:22:11 UTC
Created attachment 50092 [details]
dmesg from the latest freeze

Note the last bunch of messages. Is it possible that the system timer is deteriorating until it just stops?
Comment 8 Nicos Gollan 2011-03-07 09:34:39 UTC
I've been looking through the changelog for 2.6.36, and the HPET code apparently had some major overhaul in there. Since the problem started with that kernel version, I'm now really suspecting that it might actually be an issue in there, maybe in conjunction with a platform quirk.

I'd really like some feedback on this one, since testing is both nondeterministic (the system sometimes runs for days, sometimes it freezes after an hour), and also comes at the risk of data loss (I've had my KDE configuration, address book, and other things on an XFS partition wiped twice now).
Comment 9 Nicos Gollan 2011-03-15 15:42:43 UTC
I'm definitely not getting the "scheduling while atomic" freezes anymore, and since I disabled use of the HPET (using the ACPI timer as clocksource), the system hasn't frozen once in a few days worth of (non-contiguous) uptime.

So I'm marking this one solved.

Note You need to log in before you can comment on or make changes to this bug.