Most recent kernel where this bug did not occur: 2.6.22.12 Distribution: openSUSE 10.3 Hardware Environment: Athlon XP 2200+ Software Environment: openSUSE 10.3 Problem Description: instant kernel GPF. Steps to reproduce:load module aoe (vanilla), type 'mkfs.ext3 /dev/etherd/e0.0', and the kernel GPFs. Will be attaching jpg of screen - sorry, no other way to capture GPF for me.
Created attachment 13812 [details] photo of screen
Reply-To: akpm@linux-foundation.org (switched to email - please respond via emailed reply-to-all, not via the bugzilla web interface) On Sat, 1 Dec 2007 11:54:11 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9482 > > Summary: kernel GPF in 2.6.24 (g09f345da) > Product: Other > Version: 2.5 > KernelVersion: 2.6.24rc3 g09f345da > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: other_other@kernel-bugs.osdl.org > ReportedBy: jnelson-kernel-bugzilla@jamponi.net > > > Most recent kernel where this bug did not occur: 2.6.22.12 > Distribution: openSUSE 10.3 > Hardware Environment: Athlon XP 2200+ > Software Environment: openSUSE 10.3 > Problem Description: instant kernel GPF. > > Steps to reproduce:load module aoe (vanilla), type 'mkfs.ext3 > /dev/etherd/e0.0', and the kernel GPFs. > > Will be attaching jpg of screen - sorry, no other way to capture GPF for me. > http://bugzilla.kernel.org/attachment.cgi?id=13812&action=view Damn that's odd. General Protection Fault in __set_page_dirty->__percpu_counter_add(). No sign of AOE in the trace. I assume that it is repeatable and that it doesn't occur with mkfs on regular local disk drives?
Reply-To: jnelson@jamponi.net On 12/1/07, Andrew Morton <akpm@linux-foundation.org> wrote: > (switched to email - please respond via emailed reply-to-all, not via the > bugzilla web interface) > > On Sat, 1 Dec 2007 11:54:11 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=9482 > > > > Summary: kernel GPF in 2.6.24 (g09f345da) > > Product: Other > > Version: 2.5 > > KernelVersion: 2.6.24rc3 g09f345da > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Other > > AssignedTo: other_other@kernel-bugs.osdl.org > > ReportedBy: jnelson-kernel-bugzilla@jamponi.net > > > > > > Most recent kernel where this bug did not occur: 2.6.22.12 > > Distribution: openSUSE 10.3 > > Hardware Environment: Athlon XP 2200+ > > Software Environment: openSUSE 10.3 > > Problem Description: instant kernel GPF. > > > > Steps to reproduce:load module aoe (vanilla), type 'mkfs.ext3 > > /dev/etherd/e0.0', and the kernel GPFs. > > > > Will be attaching jpg of screen - sorry, no other way to capture GPF for > me. > > > > http://bugzilla.kernel.org/attachment.cgi?id=13812&action=view > > Damn that's odd. General Protection Fault in > __set_page_dirty->__percpu_counter_add(). No sign of AOE in the trace. > > I assume that it is repeatable and that it doesn't occur with mkfs on > regular local disk drives? This works fine: mkfs.ext3 /dev/system/test #### a local LVM2-based logical volume mkfs.ext3 /dev/etherd/e0.0 ## Boom.
Reply-To: jnelson@jamponi.net I upgraded to a more recent commit, 92d499d991ec4f5cbd00d6f33967eab9d3ee8d6c and rebuilt. Now I get lots and lots of AoE errors, over and over again, but no GPF. aoe: can't get a frame. This shouldn't happen. It's possible the previous kernel had been patched, so I made really REALLY sure that this was a 100% clean build using the above commit. I can build using more or less any other commit or tag upon request but it's not quick.
Reply-To: ecashin@coraid.com On Sat, Dec 01, 2007 at 12:23:02PM -0800, Andrew Morton wrote: > (switched to email - please respond via emailed reply-to-all, not via the > bugzilla web interface) > > On Sat, 1 Dec 2007 11:54:11 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=9482 ... > Damn that's odd. General Protection Fault in > __set_page_dirty->__percpu_counter_add(). No sign of AOE in the trace. > > I assume that it is repeatable and that it doesn't occur with mkfs on > regular local disk drives? I am encountering this same problem during testing of some patches I would like to send to the LKML, applied to 2.6.24-rc3, and I can trip this problem with just, echo > /dev/etherd/e7.0 ... at which point I get the trace below. (I had added a couple of checks for 0xffffffffffffffff pointers to __percpu_counter_add.) I haven't been able to check the unpatched aoe driver, but it looks the same. Unable to handle kernel paging request at ffffffffffffffff RIP: [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d PGD 203067 PUD 204067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: aoe Pid: 2860, comm: bash Not tainted 2.6.24-rc3-47dbg #5 RIP: 0010:[<ffffffff8036d597>] [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d RSP: 0018:ffff81007a0fbaa8 EFLAGS: 00010092 RAX: ffffffffffffffff RBX: ffff81007fcc48e0 RCX: 0000000000000010 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff81007ace7240 RBP: ffff81007a0fbac8 R08: ffff81007cc077b0 R09: ffffffff802ae5ee R10: ffff81007a0fbaa8 R11: ffff810077dd99d8 R12: ffff81007ace7240 R13: 0000000000000000 R14: 0000000000000200 R15: ffff810078473bb0 FS: 00002ba601c5cdb0(0000) GS:ffffffff8078b000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffffffffff CR3: 0000000077c31000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 2860, threadinfo ffff81007a0fa000, task ffff81007bf48040) Stack: ffff81007a0fbac8 ffff81007fcc48e0 ffff81007c81c380 0000000000000000 ffff81007a0fbaf8 ffffffff802ae682 000010007a0fbae8 ffff810078473bb0 0000000000000200 ffff81007fcc48e0 ffff81007a0fbb18 ffffffff802ae75c Call Trace: [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac [<ffffffff802ae988>] block_write_end+0x4f/0x5b [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648 [<ffffffff8023a752>] current_fs_time+0x22/0x29 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320 [<ffffffff80264cce>] unlock_page+0x2d/0x31 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d [<ffffffff8028e40f>] do_sync_write+0xe2/0x126 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38 [<ffffffff8058e705>] do_page_fault+0x3f8/0x7bb [<ffffffff8028cae8>] fd_install+0x5f/0x68 [<ffffffff8028eb98>] vfs_write+0xae/0x137 [<ffffffff8028f102>] sys_write+0x47/0x70 [<ffffffff8020b7ae>] system_call+0x7e/0x83 Code: 4c 8b 2c d0 49 63 45 00 48 8d 1c 30 48 63 c1 48 39 c3 7d 0a RIP [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d RSP <ffff81007a0fbaa8> CR2: ffffffffffffffff BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 INFO: lockdep is turned off. Call Trace: [<ffffffff802518cb>] debug_show_held_locks+0x1b/0x24 [<ffffffff8022f352>] __might_sleep+0xc7/0xc9 [<ffffffff8024c4d0>] down_read+0x1d/0x4a [<ffffffff80237bd8>] exit_mm+0x34/0xf7 [<ffffffff8023933e>] do_exit+0x247/0x75b [<ffffffff8058e9d4>] do_page_fault+0x6c7/0x7bb [<ffffffff8058a81a>] thread_return+0x42/0x86 [<ffffffff88002698>] :aoe:aoeblk_make_request+0x1e8/0x1f5 [<ffffffff8058cb6d>] error_exit+0x0/0x9a [<ffffffff802ae5ee>] __set_page_dirty+0x48/0x121 [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac [<ffffffff802ae988>] block_write_end+0x4f/0x5b [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648 [<ffffffff8023a752>] current_fs_time+0x22/0x29 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320 [<ffffffff80264cce>] unlock_page+0x2d/0x31 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d [<ffffffff8028e40f>] do_sync_write+0xe2/0x126 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38 [<ffffffff8058e705>] do_page_fault+0x3f8/0x7bb [<ffffffff8028cae8>] fd_install+0x5f/0x68 [<ffffffff8028eb98>] vfs_write+0xae/0x137 [<ffffffff8028f102>] sys_write+0x47/0x70 [<ffffffff8020b7ae>] system_call+0x7e/0x83
Reply-To: akpm@linux-foundation.org On Mon, 3 Dec 2007 11:21:37 -0500 "Ed L. Cashin" <ecashin@coraid.com> wrote: > On Sat, Dec 01, 2007 at 12:23:02PM -0800, Andrew Morton wrote: > > (switched to email - please respond via emailed reply-to-all, not via the > > bugzilla web interface) > > > > On Sat, 1 Dec 2007 11:54:11 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=9482 > ... > > Damn that's odd. General Protection Fault in > > __set_page_dirty->__percpu_counter_add(). No sign of AOE in the trace. > > > > I assume that it is repeatable and that it doesn't occur with mkfs on > > regular local disk drives? > > I am encountering this same problem during testing of some patches I > would like to send to the LKML, applied to 2.6.24-rc3, and I can trip > this problem with just, > > echo > /dev/etherd/e7.0 > > ... at which point I get the trace below. (I had added a couple of > checks for 0xffffffffffffffff pointers to __percpu_counter_add.) I > haven't been able to check the unpatched aoe driver, but it looks the > same. > > Unable to handle kernel paging request at ffffffffffffffff RIP: > [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d > PGD 203067 PUD 204067 PMD 0 > Oops: 0000 [1] SMP > CPU 0 > Modules linked in: aoe > Pid: 2860, comm: bash Not tainted 2.6.24-rc3-47dbg #5 > RIP: 0010:[<ffffffff8036d597>] [<ffffffff8036d597>] > __percpu_counter_add+0x24/0x6d > RSP: 0018:ffff81007a0fbaa8 EFLAGS: 00010092 > RAX: ffffffffffffffff RBX: ffff81007fcc48e0 RCX: 0000000000000010 > RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff81007ace7240 > RBP: ffff81007a0fbac8 R08: ffff81007cc077b0 R09: ffffffff802ae5ee > R10: ffff81007a0fbaa8 R11: ffff810077dd99d8 R12: ffff81007ace7240 > R13: 0000000000000000 R14: 0000000000000200 R15: ffff810078473bb0 > FS: 00002ba601c5cdb0(0000) GS:ffffffff8078b000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffffffffffffffff CR3: 0000000077c31000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process bash (pid: 2860, threadinfo ffff81007a0fa000, task ffff81007bf48040) > Stack: ffff81007a0fbac8 ffff81007fcc48e0 ffff81007c81c380 0000000000000000 > ffff81007a0fbaf8 ffffffff802ae682 000010007a0fbae8 ffff810078473bb0 > 0000000000000200 ffff81007fcc48e0 ffff81007a0fbb18 ffffffff802ae75c > Call Trace: > [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121 > [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99 > [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac > [<ffffffff802ae988>] block_write_end+0x4f/0x5b > [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38 > [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648 > [<ffffffff8023a752>] current_fs_time+0x22/0x29 > [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2 > [<ffffffff80266c84>] filemap_fault+0x1c4/0x320 > [<ffffffff80264cce>] unlock_page+0x2d/0x31 > [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d > [<ffffffff8028e40f>] do_sync_write+0xe2/0x126 > [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38 > [<ffffffff8058e705>] do_page_fault+0x3f8/0x7bb > [<ffffffff8028cae8>] fd_install+0x5f/0x68 > [<ffffffff8028eb98>] vfs_write+0xae/0x137 > [<ffffffff8028f102>] sys_write+0x47/0x70 > [<ffffffff8020b7ae>] system_call+0x7e/0x83 Strange. It _looks_ like we've somehow caused smp_processor_id() to return a not-possible CPU number. This code: void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch) { s64 count; s32 *pcount; int cpu = get_cpu(); pcount = per_cpu_ptr(fbc->counters, cpu); grabs a null pointer out of fbc->counters and then does the __percpu_disguise() thing on it, thus getting an address of ~0. Which I think implies that something in AOE is scribbling on task_struct, thread_info or a machine register (%fs). Quite an achievement if so... Could you debug this a bit please? Find out which CPU number __percpu_counter_add() is using, for a start? I'd do: in __percpu_counter_add(): int foo; EXPORT_SYMBOL(foo); __percpu_counter_add() { int cpu = get_cpu(); if (foo) printk("cpu:%d\n", cpu); ... } in aoe: extern int foo; { ... foo = 1; } Alternatively, just do if (!cpu_possible(cpu)) printk(...) in __percpu_counter_add(). Then you can proceed to work through the various operations which smp_processor_id() does and find out where it went wrong: print out %fs, mainly. If the cpu number is valid then perhaps something scribbled on the cpu's per-cpu memory.
Reply-To: ecashin@coraid.com On Mon, Dec 03, 2007 at 11:34:59AM -0800, Andrew Morton wrote: ... > Strange. It _looks_ like we've somehow caused smp_processor_id() to return > a not-possible CPU number. ... > Could you debug this a bit please? Find out which CPU number > __percpu_counter_add() is using, for a start? I'd do: ... > Alternatively, just do > > if (!cpu_possible(cpu)) > printk(...) > > in __percpu_counter_add(). Then you can proceed to work through the > various operations which smp_processor_id() does and find out where it went > wrong: print out %fs, mainly. > > If the cpu number is valid then perhaps something scribbled on the cpu's > per-cpu memory. I'll keep looking at this, but at a glance it looks like the cpu number is valid, because I don't trip a BUG_ON when I make the change below (the badval variable is noise, sorry). --- lx/lib/percpu_counter.c.20071130 2007-12-03 15:43:19.000000000 -0500 +++ lx/lib/percpu_counter.c 2007-12-03 15:47:38.000000000 -0500 @@ -33,7 +33,9 @@ void __percpu_counter_add(struct percpu_ s64 count; s32 *pcount; int cpu = get_cpu(); + u64 badval = 0xffffffffffffffffULL; + BUG_ON(!cpu_possible(cpu)); pcount = per_cpu_ptr(fbc->counters, cpu); count = *pcount + amount; if (count >= batch || count <= -batch) { The trace is, Unable to handle kernel paging request at ffffffffffffffff RIP: [<ffffffff8036d602>] __percpu_counter_add+0x35/0x7f PGD 203067 PUD 204067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: aoe Pid: 2777, comm: bash Not tainted 2.6.24-rc3-47dbg #9 RIP: 0010:[<ffffffff8036d602>] [<ffffffff8036d602>] __percpu_counter_add+0x35/0x7f RSP: 0018:ffff810078d19aa8 EFLAGS: 00010086 RAX: ffffffffffffffff RBX: ffff81007fc71950 RCX: 0000000000000010 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff810078d9a250 RBP: ffff810078d19ac8 R08: ffff81007cc077b0 R09: ffffffff802ae5ee R10: ffff810078d19aa8 R11: ffff810078cb59d8 R12: ffff81007c81c380 R13: ffff810078d9a250 R14: 0000000000000200 R15: ffff81007805f830 FS: 00002b341db94db0(0000) GS:ffffffff8078b000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffffffffff CR3: 000000007b415000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 2777, threadinfo ffff810078d18000, task ffff810078d160c0) Stack: ffff810078d19ac8 ffff81007fc71950 ffff81007c81c380 0000000000000000 ffff810078d19af8 ffffffff802ae682 0000100078d19ae8 ffff81007805f830 0000000000000200 ffff81007fc71950 ffff810078d19b18 ffffffff802ae75c Call Trace: [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac [<ffffffff802ae988>] block_write_end+0x4f/0x5b [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648 [<ffffffff8023a752>] current_fs_time+0x22/0x29 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320 [<ffffffff80264cce>] unlock_page+0x2d/0x31 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d [<ffffffff8028e40f>] do_sync_write+0xe2/0x126 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38 [<ffffffff8058e715>] do_page_fault+0x3f8/0x7bb [<ffffffff8028cae8>] fd_install+0x5f/0x68 [<ffffffff8028eb98>] vfs_write+0xae/0x137 [<ffffffff8028f102>] sys_write+0x47/0x70 [<ffffffff8020b7ae>] system_call+0x7e/0x83 Code: 4c 8b 24 d0 49 63 04 24 48 8d 1c 30 48 63 c1 48 39 c3 7d 0a RIP [<ffffffff8036d602>] __percpu_counter_add+0x35/0x7f RSP <ffff810078d19aa8> CR2: ffffffffffffffff BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 INFO: lockdep is turned off. Call Trace: [<ffffffff802518cb>] debug_show_held_locks+0x1b/0x24 [<ffffffff8022f352>] __might_sleep+0xc7/0xc9 [<ffffffff8024c4d0>] down_read+0x1d/0x4a [<ffffffff80237bd8>] exit_mm+0x34/0xf7 [<ffffffff8023933e>] do_exit+0x247/0x75b [<ffffffff8058e9e4>] do_page_fault+0x6c7/0x7bb [<ffffffff8058a82a>] thread_return+0x42/0x86 [<ffffffff88002544>] :aoe:aoeblk_make_request+0x1c3/0x1d0 [<ffffffff8058cb7d>] error_exit+0x0/0x9a [<ffffffff802ae5ee>] __set_page_dirty+0x48/0x121 [<ffffffff8036d602>] __percpu_counter_add+0x35/0x7f [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac [<ffffffff802ae988>] block_write_end+0x4f/0x5b [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648 [<ffffffff8023a752>] current_fs_time+0x22/0x29 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320 [<ffffffff80264cce>] unlock_page+0x2d/0x31 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d [<ffffffff8028e40f>] do_sync_write+0xe2/0x126 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38 [<ffffffff8058e715>] do_page_fault+0x3f8/0x7bb [<ffffffff8028cae8>] fd_install+0x5f/0x68 [<ffffffff8028eb98>] vfs_write+0xae/0x137 [<ffffffff8028f102>] sys_write+0x47/0x70 [<ffffffff8020b7ae>] system_call+0x7e/0x83
Reply-To: ecashin@coraid.com On Mon, Dec 03, 2007 at 04:00:05PM -0500, Ed L. Cashin wrote: ... > I'll keep looking at this, but at a glance it looks like the cpu > number is valid, because I don't trip a BUG_ON when I make the change > below (the badval variable is noise, sorry). > > --- lx/lib/percpu_counter.c.20071130 2007-12-03 15:43:19.000000000 -0500 > +++ lx/lib/percpu_counter.c 2007-12-03 15:47:38.000000000 -0500 > @@ -33,7 +33,9 @@ void __percpu_counter_add(struct percpu_ > s64 count; > s32 *pcount; > int cpu = get_cpu(); > + u64 badval = 0xffffffffffffffffULL; > > + BUG_ON(!cpu_possible(cpu)); > pcount = per_cpu_ptr(fbc->counters, cpu); > count = *pcount + amount; > if (count >= batch || count <= -batch) { It appears that the fbc->counters pointer is NULL. I added the line, BUG_ON(!fbc->counters); ... (on line 39 in my percpu_counter.c), and it results in the trace below. It looks like when it's NULL, percpu_ptr passes it to __percpu_disguise, which makes it all ones and then tries to dereference 0xffffffffffffffff to access to the "ptrs" member of the struct percpu_data. ------------[ cut here ]------------ kernel BUG at lib/percpu_counter.c:39! invalid opcode: 0000 [1] SMP CPU 0 Modules linked in: aoe Pid: 3354, comm: bash Not tainted 2.6.24-rc3-47dbg #10 RIP: 0010:[<ffffffff8036d5f7>] [<ffffffff8036d5f7>] __percpu_counter_add+0x2a/0x8f RSP: 0018:ffff810075031aa8 EFLAGS: 00010046 RAX: 00000000ffffffff RBX: ffff81007fd19bd8 RCX: 0000000000000000 RDX: 0000000000000010 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffff810075031ac8 R08: ffff81007cc077b0 R09: ffffffff802ae5ee R10: ffff810075031aa8 R11: ffff8100750318e8 R12: ffff81007c81c380 R13: ffff810073ce8250 R14: 0000000000000200 R15: ffff8100755016b0 FS: 00002b3e5c052db0(0000) GS:ffffffff8078b000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002b7f44fb64e0 CR3: 000000007c4b1000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 3354, threadinfo ffff810075030000, task ffff81007b4da040) Stack: ffff810075031ac8 ffff81007fd19bd8 ffff81007c81c380 0000000000000000 ffff810075031af8 ffffffff802ae682 0000100075031ae8 ffff8100755016b0 0000000000000200 ffff81007fd19bd8 ffff810075031b18 ffffffff802ae75c Call Trace: [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac [<ffffffff802ae988>] block_write_end+0x4f/0x5b [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648 [<ffffffff8023a752>] current_fs_time+0x22/0x29 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320 [<ffffffff80264cce>] unlock_page+0x2d/0x31 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d [<ffffffff8028e40f>] do_sync_write+0xe2/0x126 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38 [<ffffffff8058e725>] do_page_fault+0x3f8/0x7bb [<ffffffff8028cae8>] fd_install+0x5f/0x68 [<ffffffff8028eb98>] vfs_write+0xae/0x137 [<ffffffff8028f102>] sys_write+0x47/0x70 [<ffffffff8020b7ae>] system_call+0x7e/0x83 Code: 0f 0b eb fe 0f a3 3d 7e 08 4f 00 19 c0 85 c0 75 04 0f 0b eb RIP [<ffffffff8036d5f7>] __percpu_counter_add+0x2a/0x8f RSP <ffff810075031aa8> BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 INFO: lockdep is turned off. Call Trace: [<ffffffff802518cb>] debug_show_held_locks+0x1b/0x24 [<ffffffff8022f352>] __might_sleep+0xc7/0xc9 [<ffffffff8024c4d0>] down_read+0x1d/0x4a [<ffffffff80237bd8>] exit_mm+0x34/0xf7 [<ffffffff8023933e>] do_exit+0x247/0x75b [<ffffffff8020d01e>] kernel_math_error+0x0/0x7e [<ffffffff8058d35b>] do_trap+0x101/0x110 [<ffffffff8020d4a6>] do_invalid_op+0x91/0x9a [<ffffffff8036d5f7>] __percpu_counter_add+0x2a/0x8f [<ffffffff88002544>] :aoe:aoeblk_make_request+0x1c3/0x1d0 [<ffffffff8058aaeb>] io_schedule+0x28/0x34 [<ffffffff8058cb8d>] error_exit+0x0/0x9a [<ffffffff802ae5ee>] __set_page_dirty+0x48/0x121 [<ffffffff8036d5f7>] __percpu_counter_add+0x2a/0x8f [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac [<ffffffff802ae988>] block_write_end+0x4f/0x5b [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648 [<ffffffff8023a752>] current_fs_time+0x22/0x29 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320 [<ffffffff80264cce>] unlock_page+0x2d/0x31 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d [<ffffffff8028e40f>] do_sync_write+0xe2/0x126 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38 [<ffffffff8058e725>] do_page_fault+0x3f8/0x7bb [<ffffffff8028cae8>] fd_install+0x5f/0x68 [<ffffffff8028eb98>] vfs_write+0xae/0x137 [<ffffffff8028f102>] sys_write+0x47/0x70 [<ffffffff8020b7ae>] system_call+0x7e/0x83
Reply-To: akpm@linux-foundation.org On Mon, 3 Dec 2007 16:38:37 -0500 "Ed L. Cashin" <ecashin@coraid.com> wrote: > > --- lx/lib/percpu_counter.c.20071130 2007-12-03 15:43:19.000000000 > -0500 > > +++ lx/lib/percpu_counter.c 2007-12-03 15:47:38.000000000 -0500 > > @@ -33,7 +33,9 @@ void __percpu_counter_add(struct percpu_ > > s64 count; > > s32 *pcount; > > int cpu = get_cpu(); > > + u64 badval = 0xffffffffffffffffULL; > > > > + BUG_ON(!cpu_possible(cpu)); > > pcount = per_cpu_ptr(fbc->counters, cpu); > > count = *pcount + amount; > > if (count >= batch || count <= -batch) { > > It appears that the fbc->counters pointer is NULL. Does this fix? --- a/drivers/block/aoe/aoeblk.c~a +++ a/drivers/block/aoe/aoeblk.c @@ -6,6 +6,7 @@ #include <linux/hdreg.h> #include <linux/blkdev.h> +#include <linux/backing-dev.h> #include <linux/fs.h> #include <linux/ioctl.h> #include <linux/genhd.h> @@ -228,6 +229,7 @@ aoeblk_gdalloc(void *vp) spin_lock_irqsave(&d->lock, flags); blk_queue_make_request(&d->blkq, aoeblk_make_request); + bdi_init(&d->blkq.backing_dev_info); gd->major = AOE_MAJOR; gd->first_minor = d->sysminor * AOE_PARTITIONS; gd->fops = &aoe_bdops; _ <wonders whether blk_queue_make_request() should be running bdi_init()?>
Reply-To: akpm@linux-foundation.org On Mon, 3 Dec 2007 14:47:22 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > Does this fix? Slightly more elaborate version - handle errors - don't do illegal things under spinlock - clean up error unwinding --- a/drivers/block/aoe/aoeblk.c~aoe-properly-initialise-the-request_queues-backing_dev_info +++ a/drivers/block/aoe/aoeblk.c @@ -6,6 +6,7 @@ #include <linux/hdreg.h> #include <linux/blkdev.h> +#include <linux/backing-dev.h> #include <linux/fs.h> #include <linux/ioctl.h> #include <linux/genhd.h> @@ -210,25 +211,20 @@ aoeblk_gdalloc(void *vp) if (gd == NULL) { printk(KERN_ERR "aoe: cannot allocate disk structure for %ld.%ld\n", d->aoemajor, d->aoeminor); - spin_lock_irqsave(&d->lock, flags); - d->flags &= ~DEVFL_GDALLOC; - spin_unlock_irqrestore(&d->lock, flags); - return; + goto err; } d->bufpool = mempool_create_slab_pool(MIN_BUFS, buf_pool_cache); if (d->bufpool == NULL) { printk(KERN_ERR "aoe: cannot allocate bufpool for %ld.%ld\n", d->aoemajor, d->aoeminor); - put_disk(gd); - spin_lock_irqsave(&d->lock, flags); - d->flags &= ~DEVFL_GDALLOC; - spin_unlock_irqrestore(&d->lock, flags); - return; + goto err_disk; } - spin_lock_irqsave(&d->lock, flags); blk_queue_make_request(&d->blkq, aoeblk_make_request); + if (bdi_init(&d->blkq.backing_dev_info)) + goto err_mempool; + spin_lock_irqsave(&d->lock, flags); gd->major = AOE_MAJOR; gd->first_minor = d->sysminor * AOE_PARTITIONS; gd->fops = &aoe_bdops; @@ -246,6 +242,16 @@ aoeblk_gdalloc(void *vp) add_disk(gd); aoedisk_add_sysfs(d); + return; + +err_mempool: + mempool_destroy(d->bufpool); +err_disk: + put_disk(gd); +err: + spin_lock_irqsave(&d->lock, flags); + d->flags &= ~DEVFL_GDALLOC; + spin_unlock_irqrestore(&d->lock, flags); } void _ It was done lackadaisically and needs checking.
Reply-To: ecashin@coraid.com On Mon, Dec 03, 2007 at 02:47:22PM -0800, Andrew Morton wrote: > On Mon, 3 Dec 2007 16:38:37 -0500 > "Ed L. Cashin" <ecashin@coraid.com> wrote: ... > > It appears that the fbc->counters pointer is NULL. > > Does this fix? > > --- a/drivers/block/aoe/aoeblk.c~a > +++ a/drivers/block/aoe/aoeblk.c > @@ -6,6 +6,7 @@ > > #include <linux/hdreg.h> > #include <linux/blkdev.h> > +#include <linux/backing-dev.h> > #include <linux/fs.h> > #include <linux/ioctl.h> > #include <linux/genhd.h> > @@ -228,6 +229,7 @@ aoeblk_gdalloc(void *vp) > > spin_lock_irqsave(&d->lock, flags); > blk_queue_make_request(&d->blkq, aoeblk_make_request); > + bdi_init(&d->blkq.backing_dev_info); > gd->major = AOE_MAJOR; > gd->first_minor = d->sysminor * AOE_PARTITIONS; > gd->fops = &aoe_bdops; > _ > > > <wonders whether blk_queue_make_request() should be running bdi_init()?> No, the behavior doesn't change with this patch applied. Meanwhile I have started a git bisect, and hopefully that will turn up a specific patch before I hit an unbootable kernel or get my machine in a state where it won't boot.
Reply-To: ecashin@coraid.com On Mon, Dec 03, 2007 at 03:13:49PM -0800, Andrew Morton wrote: > On Mon, 3 Dec 2007 14:47:22 -0800 > Andrew Morton <akpm@linux-foundation.org> wrote: > > > Does this fix? > > Slightly more elaborate version Yes, this patch does eliminate the problem. Without it, no write can complete, and with it I have seen many writes complete without any trouble. Thank you for looking into this. I will look more closely at this patch tomorrow.
Ping?
Reply-To: jnelson@jamponi.net I can check to see if the patch fixes things tomorrow.
Reply-To: jnelson@jamponi.net I can confirm that 2.6.24rc4 with the (second) patch works fine.
Reply-To: akpm@linux-foundation.org On Sat, 8 Dec 2007 16:59:30 -0600 "Jon Nelson" <jnelson@jamponi.net> wrote: > I can confirm that 2.6.24rc4 with the (second) patch works fine. OK, thanks. We haven't heard back from Ed yet. I'll sit on this for a few more days. From: Andrew Morton <akpm@linux-foundation.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: <stable@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- drivers/block/aoe/aoeblk.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff -puN drivers/block/aoe/aoeblk.c~aoe-properly-initialise-the-request_queues-backing_dev_info drivers/block/aoe/aoeblk.c --- a/drivers/block/aoe/aoeblk.c~aoe-properly-initialise-the-request_queues-backing_dev_info +++ a/drivers/block/aoe/aoeblk.c @@ -6,6 +6,7 @@ #include <linux/hdreg.h> #include <linux/blkdev.h> +#include <linux/backing-dev.h> #include <linux/fs.h> #include <linux/ioctl.h> #include <linux/genhd.h> @@ -210,25 +211,20 @@ aoeblk_gdalloc(void *vp) if (gd == NULL) { printk(KERN_ERR "aoe: cannot allocate disk structure for %ld.%ld\n", d->aoemajor, d->aoeminor); - spin_lock_irqsave(&d->lock, flags); - d->flags &= ~DEVFL_GDALLOC; - spin_unlock_irqrestore(&d->lock, flags); - return; + goto err; } d->bufpool = mempool_create_slab_pool(MIN_BUFS, buf_pool_cache); if (d->bufpool == NULL) { printk(KERN_ERR "aoe: cannot allocate bufpool for %ld.%ld\n", d->aoemajor, d->aoeminor); - put_disk(gd); - spin_lock_irqsave(&d->lock, flags); - d->flags &= ~DEVFL_GDALLOC; - spin_unlock_irqrestore(&d->lock, flags); - return; + goto err_disk; } - spin_lock_irqsave(&d->lock, flags); blk_queue_make_request(&d->blkq, aoeblk_make_request); + if (bdi_init(&d->blkq.backing_dev_info)) + goto err_mempool; + spin_lock_irqsave(&d->lock, flags); gd->major = AOE_MAJOR; gd->first_minor = d->sysminor * AOE_PARTITIONS; gd->fops = &aoe_bdops; @@ -246,6 +242,16 @@ aoeblk_gdalloc(void *vp) add_disk(gd); aoedisk_add_sysfs(d); + return; + +err_mempool: + mempool_destroy(d->bufpool); +err_disk: + put_disk(gd); +err: + spin_lock_irqsave(&d->lock, flags); + d->flags &= ~DEVFL_GDALLOC; + spin_unlock_irqrestore(&d->lock, flags); } void _
Reply-To: ecashin@coraid.com On Sat, Dec 08, 2007 at 07:50:15PM -0800, Andrew Morton wrote: > On Sat, 8 Dec 2007 16:59:30 -0600 "Jon Nelson" <jnelson@jamponi.net> wrote: > > > I can confirm that 2.6.24rc4 with the (second) patch works fine. > > OK, thanks. > > We haven't heard back from Ed yet. I'll sit on this for a few more days. Sorry, yes, the second patch works well for me in testing. I had some initial concerns that were unfounded.
Fixed by: commit 43cbe2cbdd5320f1ac785c6f016923609831effe Author: Andrew Morton <akpm@linux-foundation.org> Date: Mon Dec 10 15:49:13 2007 -0800 aoe: properly initialise the request_queue's backing_dev_info http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=43cbe2cbdd5320f1ac785c6f016923609831effe