Subject : 2.6.37-rc1: BUG: scheduling while atomic (kvm_init) Submitter : "Thomas Meyer" <thomas@m3y3r.de> Date : 2010-11-02 19:06 Message-ID : 50AA24C585B14210A44AB599605C7CBD@hugo References : http://marc.info/?l=linux-kernel&m=128872479921559&w=2 This entry is being used for tracking a regression from 2.6.36. Please don't close it until the problem is fixed in the mainline.
Was this fixed by 88b2a9a3d98a19496 "ipv6: fix missing in6_ifa_put in addrconf"? The commit doesn't describe what the effect is and I wasn't able to find the thread where it was reported. *grumble* *grumble*
Crap. I attached that comment to the wrong bug. Sorry. Ignore me.
Created attachment 38432 [details] dmesg kernel 2.6.37-rc3-00311-g0f639a3 still shows the "scheduling while atomic" bug.
On Friday, December 03, 2010, Thomas Meyer wrote: > Am Freitag, den 03.12.2010, 00:41 +0100 schrieb Rafael J. Wysocki: > > This message has been generated automatically as a part of a summary report > > of recent regressions. > > > > The BUG still happens: > > [ 0.502550] BUG: scheduling while atomic: swapper/1/0x00000002 > [ 0.502574] Modules linked in: > [ 0.502577] Pid: 1, comm: swapper Not tainted 2.6.37-rc4-00153-g59e57c6 > #104 > [ 0.502579] Call Trace: > [ 0.502581] [<ffffffff81052552>] __schedule_bug+0x5a/0x5e > [ 0.502585] [<ffffffff816f0492>] schedule+0xd4/0x592 > [ 0.502588] [<ffffffff8105010e>] ? resched_task+0x68/0x6d > [ 0.502591] [<ffffffff8105009c>] ? test_tsk_need_resched+0x12/0x1c > [ 0.502595] [<ffffffff816f0b91>] schedule_timeout+0x31/0xde > [ 0.502598] [<ffffffff81057654>] ? wake_up_process+0x10/0x12 > [ 0.502602] [<ffffffff8106c73c>] ? wake_up_worker+0x1f/0x21 > [ 0.502605] [<ffffffff8106c795>] ? insert_work+0x57/0x5f > [ 0.502608] [<ffffffff816f1e5e>] ? _raw_spin_unlock_irqrestore+0x20/0x2b > [ 0.502611] [<ffffffff816f02ac>] wait_for_common+0x98/0x10e > [ 0.502614] [<ffffffff81057628>] ? default_wake_function+0x0/0xf > [ 0.502618] [<ffffffff816f03bc>] wait_for_completion+0x18/0x1a > [ 0.502621] [<ffffffff8106c160>] call_usermodehelper_exec+0x82/0xd1 > [ 0.502624] [<ffffffff812c7053>] kobject_uevent_env+0x3e6/0x422 > [ 0.502628] [<ffffffff812c709a>] kobject_uevent+0xb/0xd > [ 0.502632] [<ffffffff81390748>] device_add+0x360/0x4f6 > [ 0.502635] [<ffffffff813908f7>] device_register+0x19/0x1e > [ 0.502639] [<ffffffff81390988>] device_create_vargs+0x8c/0xc4 > [ 0.502642] [<ffffffff813909ec>] device_create+0x2c/0x2e > [ 0.502646] [<ffffffff813587df>] misc_register+0xba/0x113 > [ 0.502649] [<ffffffff81001e83>] kvm_init+0x1d1/0x2dc > [ 0.502652] [<ffffffff81d39ab1>] ? pci_iommu_init+0x0/0x41 > [ 0.502655] [<ffffffff81d37505>] ? svm_init+0x0/0x1e > [ 0.502657] [<ffffffff81d37521>] svm_init+0x1c/0x1e > [ 0.502660] [<ffffffff810002f8>] do_one_initcall+0x7a/0x12a > [ 0.502663] [<ffffffff81d34cfb>] kernel_init+0x133/0x1bd > [ 0.502667] [<ffffffff81026794>] kernel_thread_helper+0x4/0x10 > [ 0.502670] [<ffffffff81d34bc8>] ? kernel_init+0x0/0x1bd > [ 0.502673] [<ffffffff81026790>] ? kernel_thread_helper+0x0/0x10 > [ 0.502754] initcall svm_init+0x0/0x1e returned with preemption imbalance
Created attachment 39132 [details] debugging patch Could you try run the attached patch? It adds a printk() for debugging.
Created attachment 39162 [details] dmesg with debug patch "preempt count got messed up in hardware setup"
Can you sprinkle more checks before the one that fails? Lots of stuff before the first check. Please add one at the very beginning of kvm_init(), to be sure we are called with a clean preempt_count().
int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, struct module *module) { int r; int cpu; char *msg = NULL; if (!msg && preempt_count()) msg = "start of kvm_init"; r = kvm_arch_init(opaque); if (r) goto out_fail; if (!msg && preempt_count()) msg = "after kvm_arch_init"; [...] [ 0.570820] preempt count got messed up in after kvm_arch_init [ 0.570924] initcall svm_init+0x0/0x1e returned with preemption imbalance
Created attachment 39282 [details] get_cpu() in kvm_timer_init() Fix preempt count imbalance in kvm_timer_init()
Well done. :) The bug was added in this patch. commit c285545f813d7b0ce989fd34e42ad1fe785dc65d Author: Zachary Amsden <zamsden@redhat.com> Date: Sat Sep 18 14:38:15 2010 -1000 KVM: x86: TSC catchup mode Negate the effects of AN TYM spell while kvm thread is preempted by tracking conversion factor to the highest TSC rate and catching the TSC up when it has fallen behind the kernel view of time. Note that once triggered, we don't turn off catchup mode. A slightly more clever version of this is possible, which only does catchup when TSC rate drops, and which specifically targets only CPUs with broken TSC, but since these all are considered unstable_tsc(), this patch covers all necessary cases. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> You should send just send this as a proper patch with a signed-off-by line and all the rest. CC Zachary and the people from scripts/get_maintainer.pl
First-Bad-Commit: c285545f813d7b0ce989fd34e42ad1fe785dc65d Patch: https://bugzilla.kernel.org/attachment.cgi?id=39282
Fixed by commit 3e26f23 KVM: Fix preemption counter leak in kvm_timer_init() .