Bug 22812

Summary: kernel oops on 2.6.37-rc1
Product: Platform Specific/Hardware Reporter: Andrew (atswartz)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, error27, florian, maciej.rutecki, rjw, rric, t.dekker
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.37-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 21782    
Attachments: fix

Description Andrew 2010-11-12 16:05:04 UTC
Not sure if this is the correct component.  Here are the errors:

[    7.034377] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/483
[    7.034385] caller is setup_APIC_eilvt+0x155/0x180
[    7.034389] Pid: 483, comm: modprobe Not tainted 2.6.37-rc1-20101110+ #1
[    7.034392] Call Trace:
[    7.034400]  [<ffffffff812a2b72>] debug_smp_processor_id+0xd2/0xf0
[    7.034404]  [<ffffffff8101e985>] setup_APIC_eilvt+0x155/0x180
[    7.034413]  [<ffffffffa002e168>] op_amd_init+0x88/0x2b0 [oprofile]
[    7.034420]  [<ffffffffa0043000>] ? oprofile_init+0x0/0x42 [oprofile]
[    7.034425]  [<ffffffffa0043315>] op_nmi_init+0x249/0x2af [oprofile]
[    7.034431]  [<ffffffffa00430b4>] oprofile_arch_init+0x11/0x29 [oprofile]
[    7.034437]  [<ffffffffa0043010>] oprofile_init+0x10/0x42 [oprofile]
[    7.034441]  [<ffffffff810001e3>] do_one_initcall+0x43/0x170
[    7.034445]  [<ffffffff8108a52a>] sys_init_module+0xba/0x200
[    7.034449]  [<ffffffff8100285b>] system_call_fastpath+0x16/0x1b
[    7.034453] [Firmware Bug]: cpu 3, try to setup vector 0x10400, but vector 0xf9 was already reserved by another core, APIC500=0x10000
[    7.034456] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/483
[    7.034463] caller is op_amd_init+0x239/0x2b0 [oprofile]
[    7.034466] Pid: 483, comm: modprobe Not tainted 2.6.37-rc1-20101110+ #1
[    7.034467] Call Trace:
[    7.034470]  [<ffffffff812a2b72>] debug_smp_processor_id+0xd2/0xf0
[    7.034477]  [<ffffffffa002e319>] op_amd_init+0x239/0x2b0 [oprofile]
[    7.034483]  [<ffffffffa0043000>] ? oprofile_init+0x0/0x42 [oprofile]
[    7.034489]  [<ffffffffa0043315>] op_nmi_init+0x249/0x2af [oprofile]
[    7.034495]  [<ffffffffa00430b4>] oprofile_arch_init+0x11/0x29 [oprofile]
[    7.034501]  [<ffffffffa0043010>] oprofile_init+0x10/0x42 [oprofile]
[    7.034505]  [<ffffffff810001e3>] do_one_initcall+0x43/0x170
[    7.034508]  [<ffffffff8108a52a>] sys_init_module+0xba/0x200
[    7.034511]  [<ffffffff8100285b>] system_call_fastpath+0x16/0x1b
[    7.034514] [Firmware Bug]: cpu 3, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100)
[    7.034533] [Firmware Bug]: workaround enabled for IBS LVT offset
[ 1440.418082] INFO: task configure:26558 blocked for more than 120 seconds.
[ 1440.418088] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1440.418094] configure     D ffffffff81604480     0 26558  22866 0x00000000
[ 1440.418104]  ffff8800cdfc1c78 0000000000000086 0000000000000000 0000000000000000
[ 1440.418113]  ffff88012b675800 ffff8800cdfc0000 ffff8800cdfc0000 ffff880121b2db40
[ 1440.418122]  ffff8800cdfc0010 ffff880121b2def0 ffff8800cdfc1fd8 ffff8800cdfc1fd8
[ 1440.418132] Call Trace:
[ 1440.418148]  [<ffffffff815c44ee>] ? sub_preempt_count+0xe/0xd0
[ 1440.418156]  [<ffffffff815be905>] schedule_timeout+0x215/0x310
[ 1440.418162]  [<ffffffff815bdaa1>] ? schedule+0x61/0xa20
[ 1440.418173]  [<ffffffff8103ef8a>] ? resched_task+0x4a/0x80
[ 1440.418179]  [<ffffffff8103ccc1>] ? get_parent_ip+0x11/0x50
[ 1440.418184]  [<ffffffff8103ccc1>] ? get_parent_ip+0x11/0x50
[ 1440.418193]  [<ffffffff815bd8aa>] wait_for_common+0xba/0x170
[ 1440.418201]  [<ffffffff81047520>] ? default_wake_function+0x0/0x20
[ 1440.418207]  [<ffffffff815bda3d>] wait_for_completion+0x1d/0x20
[ 1440.418213]  [<ffffffff810970f3>] stop_one_cpu+0x63/0x80
[ 1440.418219]  [<ffffffff81046c90>] ? migration_cpu_stop+0x0/0x30
[ 1440.418225]  [<ffffffff8103d2a4>] sched_exec+0xe4/0xf0
[ 1440.418232]  [<ffffffff811303f6>] do_execve+0xd6/0x320
[ 1440.418240]  [<ffffffff8100a925>] sys_execve+0x45/0x70
[ 1440.418248]  [<ffffffff81002c7c>] stub_execve+0x6c/0xc0
[ 1560.418069] INFO: task configure:26558 blocked for more than 120 seconds.
[ 1560.418075] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1560.418081] configure     D ffffffff81604480     0 26558  22866 0x00000000
[ 1560.418091]  ffff8800cdfc1c78 0000000000000086 0000000000000000 0000000000000000
[ 1560.418099]  ffff88012b675800 ffff8800cdfc0000 ffff8800cdfc0000 ffff880121b2db40
[ 1560.418106]  ffff8800cdfc0010 ffff880121b2def0 ffff8800cdfc1fd8 ffff8800cdfc1fd8
[ 1560.418114] Call Trace:
[ 1560.418129]  [<ffffffff815c44ee>] ? sub_preempt_count+0xe/0xd0
[ 1560.418137]  [<ffffffff815be905>] schedule_timeout+0x215/0x310
[ 1560.418144]  [<ffffffff815bdaa1>] ? schedule+0x61/0xa20
[ 1560.418153]  [<ffffffff8103ef8a>] ? resched_task+0x4a/0x80
[ 1560.418159]  [<ffffffff8103ccc1>] ? get_parent_ip+0x11/0x50
[ 1560.418164]  [<ffffffff8103ccc1>] ? get_parent_ip+0x11/0x50
[ 1560.418173]  [<ffffffff815bd8aa>] wait_for_common+0xba/0x170
[ 1560.418181]  [<ffffffff81047520>] ? default_wake_function+0x0/0x20
[ 1560.418187]  [<ffffffff815bda3d>] wait_for_completion+0x1d/0x20
[ 1560.418193]  [<ffffffff810970f3>] stop_one_cpu+0x63/0x80
[ 1560.418199]  [<ffffffff81046c90>] ? migration_cpu_stop+0x0/0x30
[ 1560.418207]  [<ffffffff8103d2a4>] sched_exec+0xe4/0xf0
[ 1560.418214]  [<ffffffff811303f6>] do_execve+0xd6/0x320
[ 1560.418222]  [<ffffffff8100a925>] sys_execve+0x45/0x70
[ 1560.418230]  [<ffffffff81002c7c>] stub_execve+0x6c/0xc0
[ 1680.418070] INFO: task configure:26558 blocked for more than 120 seconds.
[ 1680.418076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1680.418082] configure     D ffffffff81604480     0 26558  22866 0x00000000
[ 1680.418092]  ffff8800cdfc1c78 0000000000000086 0000000000000000 0000000000000000
[ 1680.418100]  ffff88012b675800 ffff8800cdfc0000 ffff8800cdfc0000 ffff880121b2db40
[ 1680.418108]  ffff8800cdfc0010 ffff880121b2def0 ffff8800cdfc1fd8 ffff8800cdfc1fd8
[ 1680.418115] Call Trace:
[ 1680.418131]  [<ffffffff815c44ee>] ? sub_preempt_count+0xe/0xd0
[ 1680.418139]  [<ffffffff815be905>] schedule_timeout+0x215/0x310
[ 1680.418145]  [<ffffffff815bdaa1>] ? schedule+0x61/0xa20
[ 1680.418152]  [<ffffffff8103ef8a>] ? resched_task+0x4a/0x80
[ 1680.418158]  [<ffffffff8103ccc1>] ? get_parent_ip+0x11/0x50
[ 1680.418163]  [<ffffffff8103ccc1>] ? get_parent_ip+0x11/0x50
[ 1680.418172]  [<ffffffff815bd8aa>] wait_for_common+0xba/0x170
[ 1680.418180]  [<ffffffff81047520>] ? default_wake_function+0x0/0x20
[ 1680.418186]  [<ffffffff815bda3d>] wait_for_completion+0x1d/0x20
[ 1680.418192]  [<ffffffff810970f3>] stop_one_cpu+0x63/0x80
[ 1680.418199]  [<ffffffff81046c90>] ? migration_cpu_stop+0x0/0x30
[ 1680.418207]  [<ffffffff8103d2a4>] sched_exec+0xe4/0xf0
[ 1680.418214]  [<ffffffff811303f6>] do_execve+0xd6/0x320
[ 1680.418222]  [<ffffffff8100a925>] sys_execve+0x45/0x70
[ 1680.418230]  [<ffffffff81002c7c>] stub_execve+0x6c/0xc0
Comment 1 Rafael J. Wysocki 2010-11-12 20:16:45 UTC
On Friday, November 12, 2010, Andrew Morton wrote:
> On Fri, 12 Nov 2010 16:05:06 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=22812
> 
> post-2.6.36 smp_processor_id() spew, caused by
> 
> commit a68c439b1966c91f0ef474e2bf275d6792312726
> Author:     Robert Richter <robert.richter@amd.com>
> AuthorDate: Wed Oct 6 12:27:53 2010 +0200
> Commit:     Ingo Molnar <mingo@elte.hu>
> CommitDate: Wed Oct 20 04:42:13 2010 +0200
> 
>     apic, x86: Check if EILVT APIC registers are available (AMD only)
Comment 2 Rafael J. Wysocki 2010-11-12 20:18:04 UTC
First-Bad-Commit : a68c439b1966c91f0ef474e2bf275d6792312726
Comment 3 Andrew 2010-11-12 20:43:22 UTC
So, I should reverse this patch to see if the issue is cleared up?
Comment 4 Dan Carpenter 2010-11-14 18:58:17 UTC
It's not really a bug in a68c439b1966c91f0ef474e2bf275d6792312726, it's a bug in oprofile.  ibs_eilvt_valid() shouldn't be called with preempt enabled.  Robert Richter is the maintainer for that code.

Email him and CC lkml.

(I think the email from Andrew which Rafael quoted was a private email because I can't find it in my LKML inbox).
Comment 5 Andrew Morton 2010-12-21 23:43:42 UTC
Created attachment 41192 [details]
fix
Comment 6 Andrew Morton 2010-12-21 23:44:16 UTC
maybe a fix, at least.  Unobvious.
Comment 7 Florian Mickler 2010-12-22 07:23:10 UTC
Nice. 

Patch: https://bugzilla.kernel.org/attachment.cgi?id=41192
Comment 8 Thomas Dekker 2010-12-30 21:23:25 UTC
Don't you mean preempt_disable()?
Comment 9 Robert Richter 2011-01-04 11:23:06 UTC
Fix upstream with commit c7c25802b39c443b3745cfa973dc49a97a3491f8.