Created attachment 79561 [details]
BUG at kernel/sched/core.c 1654
invalid opcode: 0000 [#1] SMP
Line 1654 is in try_to_wakeup_local: BUG_ON(rq != this_rq());
bisected to 3.6 workqueue merge:
workqueue: use @pool instead of @gcwq or @cpu where applicable
Given that the description says mechanical changes only, no function changes, maybe an alignment and field ordering change?
I don't however have an easy to trigger reproducer, tho time constraints of playing an incoming internet mp3 stream along with doing a (gentoo) system update to hit 100% usage on all 6 cores (bulldozer fx6100 CPU) seems to help, so it's not a 100% certainty that all my bisect goods are actually good and the first bad commit /might/ therefore be a few short of that. But it's close and I THINK that's it. Everything from there on thru 3.6-rc4 at least is bad tho.
The result is almost always a dead kernel, but I did happen to get one that still let me do magic-srq along the way and thus got the BUG log. But I didn't have SystemMap installed, so if it would have helped... Anyway, it's attached.
I'm setting KCFLAGS. I've not YET tested with generic KCFLAGS but will do so. But 3.5 was fine with the same KCFLAGS, so there's still something funny going on with that commit, even if it turns out that generic KCFLAGS doesn't trigger the problem. Anyway, here they are (not /too/ ricer):
KCFLAGS="-march=native -pipe -O2 -frename-registers -fweb -fmerge-all-constants -fgcse-sm -fgcse-las -fgcse-after-reload -ftree-vectorize"
AMD "bulldozer" fx-6100 (x86_64 mode) w/ 16 gig RAM, Radeon hd6770 "Turks" 0x6758 w/ 1 gig VRAM, KMS and the native freedomware drivers, ga-990fxa-ud3 mobo, onboard Realtek ALC889 (alsa hda driver), onboard Realtek RTL8111/8168B ethernet (8169 driver), monolithic kernel (no modules), gpt and reiserfs.
Gentoo/~amd64, gcc-4.7.1, glibc-2.16.0, xorg-server-1.13.0, xf86-video-ati-6.14.6-r1, mesa-9_pre20120831-r1, kde-4.9.1.
I'll attach a kernel config momentarily, then try a generic KCFLAGS build and confirm the problem (or not) with it, tho without a 100% trigger reproducer it might take hours or days to confirm if it's clean.
Created attachment 79571 [details]
Created attachment 79591 [details]
Is any CPU hotplug operation going on the system? If so, can you please try the attache patch and see whether it changes the situation?
Created attachment 79631 [details]
And if that doesn't work, please try the attached patch on top of any 3.6-rcX. If the patch works, it'll print out a debug message starting with "XXX" instead of triggering BUG_ON(). It doesn't really fix the issue but should help debugging it. Please trigger the problem with the patch applied and report the kernel log.
Re: comment #2, no deliberate CPU hotplugging going on at all. It's a single-socket bulldozer fx6100 so physical hotplugging would be kinda difficult! =:^)
Cpufreq's active, 1400-3600 megahertz range (the top's slightly overclocked from the rated 3300, but it's stable and only exceeds the 95W power rating for occasional fractions of a second at full load), conservative, and I have all the C1E etc power saving options on in the BIOS, but unless the kernel does some hotplugging trickery with its PM and I've no evidence of that, no hotplugging at all.
Meanwhile, just had a crash with the same BUGON, running 3.6-rc5, no custom KCFLAGS at all, so I've confirmed it's not my custom KCFLAGS, now. I wish I had a more reliable trigger; that one took long enough I was beginning to wonder if it WAS my KCFLAGS.
Re: comment #3, I was trying the unmodified KCFLAGS so haven't applied the suggested patch yet. I'll try it next. Hopefully it doesn't take so long to trigger this time. (That's why I hadn't reported it before now too. Having to wait two days plus to be reasonably sure it's a bisect good doesn't help when you're staring at 12 bisect steps! Fortunately most of the bisect bads triggered sooner than this one but I had a long run of bisect goods that took over a week!)
With the patch from #3 applied, the kernel shouldn't crash or hang (well, if it works as intended, that is) but will produce kernel message, so please check dmesg from time to time.
(In reply to comment #5)
> With the patch from #3 applied, the kernel shouldn't crash or hang (well, if
> works as intended, that is) but will produce kernel message, so please check
> dmesg from time to time.
Thanks. I was wondering about that. Fortunately I have a log tail displayed on my system monitor panel, and when I took a look after letting the system run a few hours, three new entries of interest. =:^) (The consistently 4-digit-max PIDs are because I have kernel/pid_max=10000 set. Hasn't been a problem that I know of so far, but then again, I'm not sure what pid-starvation would look like, so...)
[13130.471637] XXX workqueue: foreign wakeup bug: current = 3688 (kworker/0:2) on cpu 3/08, target = 3608 (kworker/0:1) on cpu 0/01, cpu=3
[14706.114935] XXX workqueue: foreign wakeup bug: current = 3608 (kworker/0:1) on cpu 1/02, target = 887 (kworker/0:2) on cpu 0/01, cpu=1
[17105.617686] XXX workqueue: foreign wakeup bug: current = 4336 (kworker/0:0) on cpu 4/10, target = 698 (kworker/0:1) on cpu 0/01, cpu=4
Those were spread over a bit more than an hour, several hours ago. There's no harm in letting the system continue to run, I hope?
Just got another one!
[36863.225461] XXX workqueue: foreign wakeup bug: current = 8563 (kworker/0:1) on cpu 4/10, target = 3700 (kworker/0:0) on cpu 0/01, cpu=4
Hmmm, somebody seems to be mucking with ->cpus_allowed of kworkers. I have no idea why the recent changes affect this in any way yet. Let me cook up another debug patch.
Created attachment 79871 [details]
Can you please test this patch instead and post full kernel log after the debug message is printed? I don't think this is something broken in workqueue. It seems like one of its users is migrating kworker thread to another CPU.
(In reply to comment #8)
> Created an attachment (id=79871) [details]
> Can you please test this patch instead and post full kernel log after the
> message is printed? I don't think this is something broken in workqueue. It
> seems like one of its users is migrating kworker thread to another CPU.
I think the check in wq_worker_sleeping() is late. if the worker whose ->cpus_allowed is changed by a work item does not go to sleep until it processes tens of items, the code will blame to wrong work item. so some similar test is also needed in process_one_work().
Created attachment 79901 [details]
dmesg from local-wakeup-dbg-1
Re: comment #8, here's the full dmesg with the second dbg patch applied, after the bug triggered, as requested.
Lai, we already have the check at the beginning of process_one_work(), so I think the debug patch is enough.
Duncan, we need to map the function address 0xffffffff81405d40 to symbol. Can you please attach the System.map file or redo the test with CONFIG_KALLSYMS[_ALL] = y? It's usually a good idea to have that turned on along with CONFIG_FRAME_POINTER when doing any kind of kernel debugging.
Created attachment 80061 [details]
System.map (gzipped) matching attachment #79901 [details] (dmesg from local-wakeup-dbg-1)
Re: Comment #11:
> Duncan, we need to map the function address 0xffffffff81405d40
> to symbol. Can you please attach the System.map file
I've never known what to do with System.map so originally had it turned off. But luckily I had an idea it might come in useful with this bug and turned on installing it a few builds ago.
Here's the one that goes with the latest dmesg (Comment #10). Gzipped so bugzy will take it.
That address appears to be do_dbs_timer.
(In reply to comment #12)
> Re: Comment #11:
> > Duncan, we need to map the function address 0xffffffff81405d40
> > to symbol.
> That address appears to be do_dbs_timer.
drivers/cpufreq/cpufreq.conservative.c 443 (on that kernel)
I mentioned cpufreq/conservative earlier. Maybe between the hours reading the git logs and the hours reading lwn, this stuff's beginning to sink in a bit, despite no claims here to even know C, let alone at the kernel level? Unless Lai is correct and that's not even pointing at the culprit.
I found the bug. It seems to have been there for quite some time. Dunno why it's showing up now. The culprit is powernow-k8 cpufreq driver which temporarily migrates the kworker thread to cpufreq_policy->cpu. Working on the fix.
Created attachment 80071 [details]
Can you please apply the attached patch along with the debug patch and see whether the problem goes away? Please enable FRAME_POINTER and KALLSYMS.
Cc'ing Rafael. Rafael, powernow-k8 was migrating workqueue worker temporarily to another CPU which isn't allowed. It triggers BUG_ON() in scheduler if it contends on fidvid_mutex and goes to sleep. I created a patch to update it to use a separate work item to punt to the target cpu in comment #15. One question tho. cpufreq guarantees pol->cpu is up across ->target() call, right?
What exactly do you mean by "up"?
To rephrase: Does cpufreq guarantee that CPU_DOWN doesn't finish on pol->cpu while the callback is in progress?
The execution of ->target() is under lock_policy_rwsem_write() and cpufreq_cpu_callback() also calls lock_policy_rwsem_write() before __cpufreq_remove_dev(), so I think the answer is "yes". [lock_policy_rwsem_write() appears to be always called before __cpufreq_remove_dev(), for that matter.]
Alrighty, thanks, Rafael. Duncan, how's the patch behaving?
(In reply to comment #20)
> Duncan, how's the patch behaving?
Looking good so far.
No further instances triggered, but I've only been running it a few hours and sometimes it'd take a couple days to trigger. But so far, so good. =:^)
How is it going? I'm fairly sure the fix is correct (that is, if it isn't broken from cpufreq's POV) and I'd like to push it to Linus before -rc6.
I'd say push it.
Without a 100% reproducer other than time, we don't have a 100% certainty yet, but it's certainly not making things noticeably worse here either, so push it and get as much broader testing as possible before 3.6.0. =:^)
Created attachment 80501 [details]
Duncan, can you please test this one instead? Linus wants to push this version before 3.6 release so it would be great if you can verify this soonish. If possible, please update us as test progresses (e.g. after surviving the first few hours).
(In reply to comment #24)
> Created an attachment (id=80501) [details]
> Duncan, can you please test this one instead? Linus wants to push this
> before 3.6 release so it would be great if you can verify this soonish. If
> possible, please update us as test progresses (e.g. after surviving the first
> few hours).
Thanks for posting the new patch here. I read the thread as CCed, but lost track of which one I was supposed to test. With the post here I know specifically what to test. =:^)
To be clear, this applies with local-wakeup-dbg-1.patch still applied, correct? I'll assume so. And I'm not clear what necessary commit it builds on (re: the thread), but I'll pull before applying so will have the latest mainline.
I'm about to go out but will apply this and leave the system running. When I get back I'll post an update.
Meanwhile, the don't-muck-with-kworker patch that this obsoletes continues to work without issue. Solid as it gets. So we have a backup, if this one fails.
Yes, including local-wakeup-dbg-1.patch can't hurt. So, 3.6-rcX + local-wakeup-dbg-1.patch + powernowk8-fix.patch.
Thanks a lot.
Looking good so far, uptime saying 4:30, on powernowk8-fix.patch. =:^)
Still seems fine, even after running a task that burned enough cycles to trigger CPU frequency changes, for awhile. So again, not 100% sure yet, but maybe 98%, so the odds are pretty good.
Pull request sent. Thanks a lot for the detailed reporting and testing. :)
Resolving as CODE_FIX.
A patch referencing this bug report has been merged in Linux v3.6-rc7:
Author: Tejun Heo <firstname.lastname@example.org>
Date: Tue Sep 18 14:24:59 2012 -0700
cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
A patch referencing a commit referencing this bug report has been merged in Linux v3.7-rc3:
Author: Andreas Herrmann <email@example.com>
Date: Tue Oct 23 00:55:10 2012 +0200
cpufreq / powernow-k8: Remove usage of smp_processor_id() in preemptible code