Bug 9482 - kernel GPF in 2.6.24 (g09f345da)
kernel GPF in 2.6.24 (g09f345da)
Status: CLOSED CODE_FIX
Product: Other
Classification: Unclassified
Component: Other
All Linux
: P1 normal
Assigned To: Rafael J. Wysocki
:
Depends on:
Blocks: 9243
  Show dependency treegraph
 
Reported: 2007-12-01 11:54 UTC by Jon Nelson
Modified: 2007-12-17 15:23 UTC (History)
0 users

See Also:
Kernel Version: 2.6.24rc3 g09f345da
Tree: Mainline
Regression: Yes


Attachments
photo of screen (543.32 KB, image/jpeg)
2007-12-01 11:54 UTC, Jon Nelson
Details

Description Jon Nelson 2007-12-01 11:54:10 UTC
Most recent kernel where this bug did not occur: 2.6.22.12
Distribution: openSUSE 10.3
Hardware Environment: Athlon XP 2200+
Software Environment: openSUSE 10.3
Problem Description: instant kernel GPF.

Steps to reproduce:load module aoe (vanilla), type 'mkfs.ext3 /dev/etherd/e0.0', and the kernel GPFs.

Will be attaching jpg of screen - sorry, no other way to capture GPF for me.
Comment 1 Jon Nelson 2007-12-01 11:54:57 UTC
Created attachment 13812 [details]
photo of screen
Comment 2 Anonymous Emailer 2007-12-01 12:23:32 UTC
Reply-To: akpm@linux-foundation.org

(switched to email - please respond via emailed reply-to-all, not via the
bugzilla web interface)

On Sat,  1 Dec 2007 11:54:11 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9482
> 
>            Summary: kernel GPF in 2.6.24 (g09f345da)
>            Product: Other
>            Version: 2.5
>      KernelVersion: 2.6.24rc3 g09f345da
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: other_other@kernel-bugs.osdl.org
>         ReportedBy: jnelson-kernel-bugzilla@jamponi.net
> 
> 
> Most recent kernel where this bug did not occur: 2.6.22.12
> Distribution: openSUSE 10.3
> Hardware Environment: Athlon XP 2200+
> Software Environment: openSUSE 10.3
> Problem Description: instant kernel GPF.
> 
> Steps to reproduce:load module aoe (vanilla), type 'mkfs.ext3
> /dev/etherd/e0.0', and the kernel GPFs.
> 
> Will be attaching jpg of screen - sorry, no other way to capture GPF for me.
> 

http://bugzilla.kernel.org/attachment.cgi?id=13812&action=view

Damn that's odd.  General Protection Fault in
__set_page_dirty->__percpu_counter_add().  No sign of AOE in the trace.

I assume that it is repeatable and that it doesn't occur with mkfs on
regular local disk drives?

Comment 3 Anonymous Emailer 2007-12-01 13:05:40 UTC
Reply-To: jnelson@jamponi.net

On 12/1/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> (switched to email - please respond via emailed reply-to-all, not via the
> bugzilla web interface)
>
> On Sat,  1 Dec 2007 11:54:11 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:
>
> > http://bugzilla.kernel.org/show_bug.cgi?id=9482
> >
> >            Summary: kernel GPF in 2.6.24 (g09f345da)
> >            Product: Other
> >            Version: 2.5
> >      KernelVersion: 2.6.24rc3 g09f345da
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: other_other@kernel-bugs.osdl.org
> >         ReportedBy: jnelson-kernel-bugzilla@jamponi.net
> >
> >
> > Most recent kernel where this bug did not occur: 2.6.22.12
> > Distribution: openSUSE 10.3
> > Hardware Environment: Athlon XP 2200+
> > Software Environment: openSUSE 10.3
> > Problem Description: instant kernel GPF.
> >
> > Steps to reproduce:load module aoe (vanilla), type 'mkfs.ext3
> > /dev/etherd/e0.0', and the kernel GPFs.
> >
> > Will be attaching jpg of screen - sorry, no other way to capture GPF for me.
> >
>
> http://bugzilla.kernel.org/attachment.cgi?id=13812&action=view
>
> Damn that's odd.  General Protection Fault in
> __set_page_dirty->__percpu_counter_add().  No sign of AOE in the trace.
>
> I assume that it is repeatable and that it doesn't occur with mkfs on
> regular local disk drives?


This works fine:

mkfs.ext3 /dev/system/test #### a local LVM2-based logical volume

mkfs.ext3 /dev/etherd/e0.0 ## Boom.




Comment 4 Anonymous Emailer 2007-12-02 08:56:56 UTC
Reply-To: jnelson@jamponi.net

I upgraded to a more recent commit,
92d499d991ec4f5cbd00d6f33967eab9d3ee8d6c
and rebuilt.

Now I get lots and lots of AoE errors, over and over again, but no GPF.

aoe: can't get a frame. This shouldn't happen.

It's possible the previous kernel had been patched, so I made really
REALLY sure that this was a 100% clean build using the above commit. I
can build using more or less any other commit or tag upon request but
it's not quick.

Comment 5 Anonymous Emailer 2007-12-03 08:45:34 UTC
Reply-To: ecashin@coraid.com

On Sat, Dec 01, 2007 at 12:23:02PM -0800, Andrew Morton wrote:
> (switched to email - please respond via emailed reply-to-all, not via the
> bugzilla web interface)
> 
> On Sat,  1 Dec 2007 11:54:11 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=9482
...
> Damn that's odd.  General Protection Fault in
> __set_page_dirty->__percpu_counter_add().  No sign of AOE in the trace.
> 
> I assume that it is repeatable and that it doesn't occur with mkfs on
> regular local disk drives?

I am encountering this same problem during testing of some patches I
would like to send to the LKML, applied to 2.6.24-rc3, and I can trip
this problem with just,

  echo > /dev/etherd/e7.0

... at which point I get the trace below.  (I had added a couple of
checks for 0xffffffffffffffff pointers to __percpu_counter_add.)  I
haven't been able to check the unpatched aoe driver, but it looks the
same.

Unable to handle kernel paging request at ffffffffffffffff RIP: 
 [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d
PGD 203067 PUD 204067 PMD 0 
Oops: 0000 [1] SMP 
CPU 0 
Modules linked in: aoe
Pid: 2860, comm: bash Not tainted 2.6.24-rc3-47dbg #5
RIP: 0010:[<ffffffff8036d597>]  [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d
RSP: 0018:ffff81007a0fbaa8  EFLAGS: 00010092
RAX: ffffffffffffffff RBX: ffff81007fcc48e0 RCX: 0000000000000010
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff81007ace7240
RBP: ffff81007a0fbac8 R08: ffff81007cc077b0 R09: ffffffff802ae5ee
R10: ffff81007a0fbaa8 R11: ffff810077dd99d8 R12: ffff81007ace7240
R13: 0000000000000000 R14: 0000000000000200 R15: ffff810078473bb0
FS:  00002ba601c5cdb0(0000) GS:ffffffff8078b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffffffffff CR3: 0000000077c31000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 2860, threadinfo ffff81007a0fa000, task ffff81007bf48040)
Stack:  ffff81007a0fbac8 ffff81007fcc48e0 ffff81007c81c380 0000000000000000
 ffff81007a0fbaf8 ffffffff802ae682 000010007a0fbae8 ffff810078473bb0
 0000000000000200 ffff81007fcc48e0 ffff81007a0fbb18 ffffffff802ae75c
Call Trace:
 [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121
 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99
 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac
 [<ffffffff802ae988>] block_write_end+0x4f/0x5b
 [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38
 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648
 [<ffffffff8023a752>] current_fs_time+0x22/0x29
 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2
 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320
 [<ffffffff80264cce>] unlock_page+0x2d/0x31
 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d
 [<ffffffff8028e40f>] do_sync_write+0xe2/0x126
 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38
 [<ffffffff8058e705>] do_page_fault+0x3f8/0x7bb
 [<ffffffff8028cae8>] fd_install+0x5f/0x68
 [<ffffffff8028eb98>] vfs_write+0xae/0x137
 [<ffffffff8028f102>] sys_write+0x47/0x70
 [<ffffffff8020b7ae>] system_call+0x7e/0x83


Code: 4c 8b 2c d0 49 63 45 00 48 8d 1c 30 48 63 c1 48 39 c3 7d 0a 
RIP  [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d
 RSP <ffff81007a0fbaa8>
CR2: ffffffffffffffff
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
INFO: lockdep is turned off.

Call Trace:
 [<ffffffff802518cb>] debug_show_held_locks+0x1b/0x24
 [<ffffffff8022f352>] __might_sleep+0xc7/0xc9
 [<ffffffff8024c4d0>] down_read+0x1d/0x4a
 [<ffffffff80237bd8>] exit_mm+0x34/0xf7
 [<ffffffff8023933e>] do_exit+0x247/0x75b
 [<ffffffff8058e9d4>] do_page_fault+0x6c7/0x7bb
 [<ffffffff8058a81a>] thread_return+0x42/0x86
 [<ffffffff88002698>] :aoe:aoeblk_make_request+0x1e8/0x1f5
 [<ffffffff8058cb6d>] error_exit+0x0/0x9a
 [<ffffffff802ae5ee>] __set_page_dirty+0x48/0x121
 [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d
 [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121
 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99
 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac
 [<ffffffff802ae988>] block_write_end+0x4f/0x5b
 [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38
 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648
 [<ffffffff8023a752>] current_fs_time+0x22/0x29
 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2
 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320
 [<ffffffff80264cce>] unlock_page+0x2d/0x31
 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d
 [<ffffffff8028e40f>] do_sync_write+0xe2/0x126
 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38
 [<ffffffff8058e705>] do_page_fault+0x3f8/0x7bb
 [<ffffffff8028cae8>] fd_install+0x5f/0x68
 [<ffffffff8028eb98>] vfs_write+0xae/0x137
 [<ffffffff8028f102>] sys_write+0x47/0x70
 [<ffffffff8020b7ae>] system_call+0x7e/0x83



Comment 6 Anonymous Emailer 2007-12-03 11:35:08 UTC
Reply-To: akpm@linux-foundation.org

On Mon, 3 Dec 2007 11:21:37 -0500
"Ed L. Cashin" <ecashin@coraid.com> wrote:

> On Sat, Dec 01, 2007 at 12:23:02PM -0800, Andrew Morton wrote:
> > (switched to email - please respond via emailed reply-to-all, not via the
> > bugzilla web interface)
> > 
> > On Sat,  1 Dec 2007 11:54:11 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:
> > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=9482
> ...
> > Damn that's odd.  General Protection Fault in
> > __set_page_dirty->__percpu_counter_add().  No sign of AOE in the trace.
> > 
> > I assume that it is repeatable and that it doesn't occur with mkfs on
> > regular local disk drives?
> 
> I am encountering this same problem during testing of some patches I
> would like to send to the LKML, applied to 2.6.24-rc3, and I can trip
> this problem with just,
> 
>   echo > /dev/etherd/e7.0
> 
> ... at which point I get the trace below.  (I had added a couple of
> checks for 0xffffffffffffffff pointers to __percpu_counter_add.)  I
> haven't been able to check the unpatched aoe driver, but it looks the
> same.
> 
> Unable to handle kernel paging request at ffffffffffffffff RIP: 
>  [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d
> PGD 203067 PUD 204067 PMD 0 
> Oops: 0000 [1] SMP 
> CPU 0 
> Modules linked in: aoe
> Pid: 2860, comm: bash Not tainted 2.6.24-rc3-47dbg #5
> RIP: 0010:[<ffffffff8036d597>]  [<ffffffff8036d597>] __percpu_counter_add+0x24/0x6d
> RSP: 0018:ffff81007a0fbaa8  EFLAGS: 00010092
> RAX: ffffffffffffffff RBX: ffff81007fcc48e0 RCX: 0000000000000010
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff81007ace7240
> RBP: ffff81007a0fbac8 R08: ffff81007cc077b0 R09: ffffffff802ae5ee
> R10: ffff81007a0fbaa8 R11: ffff810077dd99d8 R12: ffff81007ace7240
> R13: 0000000000000000 R14: 0000000000000200 R15: ffff810078473bb0
> FS:  00002ba601c5cdb0(0000) GS:ffffffff8078b000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: ffffffffffffffff CR3: 0000000077c31000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process bash (pid: 2860, threadinfo ffff81007a0fa000, task ffff81007bf48040)
> Stack:  ffff81007a0fbac8 ffff81007fcc48e0 ffff81007c81c380 0000000000000000
>  ffff81007a0fbaf8 ffffffff802ae682 000010007a0fbae8 ffff810078473bb0
>  0000000000000200 ffff81007fcc48e0 ffff81007a0fbb18 ffffffff802ae75c
> Call Trace:
>  [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121
>  [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99
>  [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac
>  [<ffffffff802ae988>] block_write_end+0x4f/0x5b
>  [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38
>  [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648
>  [<ffffffff8023a752>] current_fs_time+0x22/0x29
>  [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2
>  [<ffffffff80266c84>] filemap_fault+0x1c4/0x320
>  [<ffffffff80264cce>] unlock_page+0x2d/0x31
>  [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d
>  [<ffffffff8028e40f>] do_sync_write+0xe2/0x126
>  [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38
>  [<ffffffff8058e705>] do_page_fault+0x3f8/0x7bb
>  [<ffffffff8028cae8>] fd_install+0x5f/0x68
>  [<ffffffff8028eb98>] vfs_write+0xae/0x137
>  [<ffffffff8028f102>] sys_write+0x47/0x70
>  [<ffffffff8020b7ae>] system_call+0x7e/0x83

Strange.  It _looks_ like we've somehow caused smp_processor_id() to return
a not-possible CPU number.  This code:

void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
{
	s64 count;
	s32 *pcount;
	int cpu = get_cpu();

	pcount = per_cpu_ptr(fbc->counters, cpu);

grabs a null pointer out of fbc->counters and then does the
__percpu_disguise() thing on it, thus getting an address of ~0.

Which I think implies that something in AOE is scribbling on task_struct,
thread_info or a machine register (%fs).  Quite an achievement if so...

Could you debug this a bit please?  Find out which CPU number
__percpu_counter_add() is using, for a start?  I'd do:

in __percpu_counter_add():

int foo;
EXPORT_SYMBOL(foo);

__percpu_counter_add()
{
	int cpu = get_cpu();

	if (foo)
		printk("cpu:%d\n", cpu);
	...
}

in aoe:

extern int foo;

{
	...
	foo = 1;
}



Alternatively, just do

	if (!cpu_possible(cpu))
		printk(...)

in __percpu_counter_add().  Then you can proceed to work through the
various operations which smp_processor_id() does and find out where it went
wrong: print out %fs, mainly.

If the cpu number is valid then perhaps something scribbled on the cpu's
per-cpu memory.

Comment 7 Anonymous Emailer 2007-12-03 13:00:47 UTC
Reply-To: ecashin@coraid.com

On Mon, Dec 03, 2007 at 11:34:59AM -0800, Andrew Morton wrote:
...
> Strange.  It _looks_ like we've somehow caused smp_processor_id() to return
> a not-possible CPU number.
...
> Could you debug this a bit please?  Find out which CPU number
> __percpu_counter_add() is using, for a start?  I'd do:
...
> Alternatively, just do
> 
> 	if (!cpu_possible(cpu))
> 		printk(...)
> 
> in __percpu_counter_add().  Then you can proceed to work through the
> various operations which smp_processor_id() does and find out where it went
> wrong: print out %fs, mainly.
> 
> If the cpu number is valid then perhaps something scribbled on the cpu's
> per-cpu memory.

I'll keep looking at this, but at a glance it looks like the cpu
number is valid, because I don't trip a BUG_ON when I make the change
below (the badval variable is noise, sorry).

  --- lx/lib/percpu_counter.c.20071130    2007-12-03 15:43:19.000000000 -0500
  +++ lx/lib/percpu_counter.c     2007-12-03 15:47:38.000000000 -0500
  @@ -33,7 +33,9 @@ void __percpu_counter_add(struct percpu_
          s64 count;
          s32 *pcount;
          int cpu = get_cpu();
  +       u64 badval = 0xffffffffffffffffULL;
   
  +       BUG_ON(!cpu_possible(cpu));
          pcount = per_cpu_ptr(fbc->counters, cpu);
          count = *pcount + amount;
          if (count >= batch || count <= -batch) {

The trace is,

Unable to handle kernel paging request at ffffffffffffffff RIP: 
 [<ffffffff8036d602>] __percpu_counter_add+0x35/0x7f
PGD 203067 PUD 204067 PMD 0 
Oops: 0000 [1] SMP 
CPU 0 
Modules linked in: aoe
Pid: 2777, comm: bash Not tainted 2.6.24-rc3-47dbg #9
RIP: 0010:[<ffffffff8036d602>]  [<ffffffff8036d602>] __percpu_counter_add+0x35/0x7f
RSP: 0018:ffff810078d19aa8  EFLAGS: 00010086
RAX: ffffffffffffffff RBX: ffff81007fc71950 RCX: 0000000000000010
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff810078d9a250
RBP: ffff810078d19ac8 R08: ffff81007cc077b0 R09: ffffffff802ae5ee
R10: ffff810078d19aa8 R11: ffff810078cb59d8 R12: ffff81007c81c380
R13: ffff810078d9a250 R14: 0000000000000200 R15: ffff81007805f830
FS:  00002b341db94db0(0000) GS:ffffffff8078b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffffffffff CR3: 000000007b415000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 2777, threadinfo ffff810078d18000, task ffff810078d160c0)
Stack:  ffff810078d19ac8 ffff81007fc71950 ffff81007c81c380 0000000000000000
 ffff810078d19af8 ffffffff802ae682 0000100078d19ae8 ffff81007805f830
 0000000000000200 ffff81007fc71950 ffff810078d19b18 ffffffff802ae75c
Call Trace:
 [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121
 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99
 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac
 [<ffffffff802ae988>] block_write_end+0x4f/0x5b
 [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38
 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648
 [<ffffffff8023a752>] current_fs_time+0x22/0x29
 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2
 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320
 [<ffffffff80264cce>] unlock_page+0x2d/0x31
 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d
 [<ffffffff8028e40f>] do_sync_write+0xe2/0x126
 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38
 [<ffffffff8058e715>] do_page_fault+0x3f8/0x7bb
 [<ffffffff8028cae8>] fd_install+0x5f/0x68
 [<ffffffff8028eb98>] vfs_write+0xae/0x137
 [<ffffffff8028f102>] sys_write+0x47/0x70
 [<ffffffff8020b7ae>] system_call+0x7e/0x83


Code: 4c 8b 24 d0 49 63 04 24 48 8d 1c 30 48 63 c1 48 39 c3 7d 0a 
RIP  [<ffffffff8036d602>] __percpu_counter_add+0x35/0x7f
 RSP <ffff810078d19aa8>
CR2: ffffffffffffffff
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
INFO: lockdep is turned off.

Call Trace:
 [<ffffffff802518cb>] debug_show_held_locks+0x1b/0x24
 [<ffffffff8022f352>] __might_sleep+0xc7/0xc9
 [<ffffffff8024c4d0>] down_read+0x1d/0x4a
 [<ffffffff80237bd8>] exit_mm+0x34/0xf7
 [<ffffffff8023933e>] do_exit+0x247/0x75b
 [<ffffffff8058e9e4>] do_page_fault+0x6c7/0x7bb
 [<ffffffff8058a82a>] thread_return+0x42/0x86
 [<ffffffff88002544>] :aoe:aoeblk_make_request+0x1c3/0x1d0
 [<ffffffff8058cb7d>] error_exit+0x0/0x9a
 [<ffffffff802ae5ee>] __set_page_dirty+0x48/0x121
 [<ffffffff8036d602>] __percpu_counter_add+0x35/0x7f
 [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121
 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99
 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac
 [<ffffffff802ae988>] block_write_end+0x4f/0x5b
 [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38
 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648
 [<ffffffff8023a752>] current_fs_time+0x22/0x29
 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2
 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320
 [<ffffffff80264cce>] unlock_page+0x2d/0x31
 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d
 [<ffffffff8028e40f>] do_sync_write+0xe2/0x126
 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38
 [<ffffffff8058e715>] do_page_fault+0x3f8/0x7bb
 [<ffffffff8028cae8>] fd_install+0x5f/0x68
 [<ffffffff8028eb98>] vfs_write+0xae/0x137
 [<ffffffff8028f102>] sys_write+0x47/0x70
 [<ffffffff8020b7ae>] system_call+0x7e/0x83



Comment 8 Anonymous Emailer 2007-12-03 13:40:40 UTC
Reply-To: ecashin@coraid.com

On Mon, Dec 03, 2007 at 04:00:05PM -0500, Ed L. Cashin wrote:
...
> I'll keep looking at this, but at a glance it looks like the cpu
> number is valid, because I don't trip a BUG_ON when I make the change
> below (the badval variable is noise, sorry).
> 
>   --- lx/lib/percpu_counter.c.20071130    2007-12-03 15:43:19.000000000 -0500
>   +++ lx/lib/percpu_counter.c     2007-12-03 15:47:38.000000000 -0500
>   @@ -33,7 +33,9 @@ void __percpu_counter_add(struct percpu_
>           s64 count;
>           s32 *pcount;
>           int cpu = get_cpu();
>   +       u64 badval = 0xffffffffffffffffULL;
>    
>   +       BUG_ON(!cpu_possible(cpu));
>           pcount = per_cpu_ptr(fbc->counters, cpu);
>           count = *pcount + amount;
>           if (count >= batch || count <= -batch) {

It appears that the fbc->counters pointer is NULL.  I added the line,

	BUG_ON(!fbc->counters);

... (on line 39 in my percpu_counter.c), and it results in the trace
below.  It looks like when it's NULL, percpu_ptr passes it to
__percpu_disguise, which makes it all ones and then tries to
dereference 0xffffffffffffffff to access to the "ptrs" member of the
struct percpu_data.

------------[ cut here ]------------
kernel BUG at lib/percpu_counter.c:39!
invalid opcode: 0000 [1] SMP 
CPU 0 
Modules linked in: aoe
Pid: 3354, comm: bash Not tainted 2.6.24-rc3-47dbg #10
RIP: 0010:[<ffffffff8036d5f7>]  [<ffffffff8036d5f7>] __percpu_counter_add+0x2a/0x8f
RSP: 0018:ffff810075031aa8  EFLAGS: 00010046
RAX: 00000000ffffffff RBX: ffff81007fd19bd8 RCX: 0000000000000000
RDX: 0000000000000010 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff810075031ac8 R08: ffff81007cc077b0 R09: ffffffff802ae5ee
R10: ffff810075031aa8 R11: ffff8100750318e8 R12: ffff81007c81c380
R13: ffff810073ce8250 R14: 0000000000000200 R15: ffff8100755016b0
FS:  00002b3e5c052db0(0000) GS:ffffffff8078b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b7f44fb64e0 CR3: 000000007c4b1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 3354, threadinfo ffff810075030000, task ffff81007b4da040)
Stack:  ffff810075031ac8 ffff81007fd19bd8 ffff81007c81c380 0000000000000000
 ffff810075031af8 ffffffff802ae682 0000100075031ae8 ffff8100755016b0
 0000000000000200 ffff81007fd19bd8 ffff810075031b18 ffffffff802ae75c
Call Trace:
 [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121
 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99
 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac
 [<ffffffff802ae988>] block_write_end+0x4f/0x5b
 [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38
 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648
 [<ffffffff8023a752>] current_fs_time+0x22/0x29
 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2
 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320
 [<ffffffff80264cce>] unlock_page+0x2d/0x31
 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d
 [<ffffffff8028e40f>] do_sync_write+0xe2/0x126
 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38
 [<ffffffff8058e725>] do_page_fault+0x3f8/0x7bb
 [<ffffffff8028cae8>] fd_install+0x5f/0x68
 [<ffffffff8028eb98>] vfs_write+0xae/0x137
 [<ffffffff8028f102>] sys_write+0x47/0x70
 [<ffffffff8020b7ae>] system_call+0x7e/0x83


Code: 0f 0b eb fe 0f a3 3d 7e 08 4f 00 19 c0 85 c0 75 04 0f 0b eb 
RIP  [<ffffffff8036d5f7>] __percpu_counter_add+0x2a/0x8f
 RSP <ffff810075031aa8>
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():0, irqs_disabled():1
INFO: lockdep is turned off.

Call Trace:
 [<ffffffff802518cb>] debug_show_held_locks+0x1b/0x24
 [<ffffffff8022f352>] __might_sleep+0xc7/0xc9
 [<ffffffff8024c4d0>] down_read+0x1d/0x4a
 [<ffffffff80237bd8>] exit_mm+0x34/0xf7
 [<ffffffff8023933e>] do_exit+0x247/0x75b
 [<ffffffff8020d01e>] kernel_math_error+0x0/0x7e
 [<ffffffff8058d35b>] do_trap+0x101/0x110
 [<ffffffff8020d4a6>] do_invalid_op+0x91/0x9a
 [<ffffffff8036d5f7>] __percpu_counter_add+0x2a/0x8f
 [<ffffffff88002544>] :aoe:aoeblk_make_request+0x1c3/0x1d0
 [<ffffffff8058aaeb>] io_schedule+0x28/0x34
 [<ffffffff8058cb8d>] error_exit+0x0/0x9a
 [<ffffffff802ae5ee>] __set_page_dirty+0x48/0x121
 [<ffffffff8036d5f7>] __percpu_counter_add+0x2a/0x8f
 [<ffffffff802ae682>] __set_page_dirty+0xdc/0x121
 [<ffffffff802ae75c>] mark_buffer_dirty+0x95/0x99
 [<ffffffff802ae7d2>] __block_commit_write+0x72/0xac
 [<ffffffff802ae988>] block_write_end+0x4f/0x5b
 [<ffffffff802b243f>] blkdev_write_end+0x1b/0x38
 [<ffffffff80265d96>] generic_file_buffered_write+0x1c0/0x648
 [<ffffffff8023a752>] current_fs_time+0x22/0x29
 [<ffffffff80266576>] __generic_file_aio_write_nolock+0x358/0x3c2
 [<ffffffff80266c84>] filemap_fault+0x1c4/0x320
 [<ffffffff80264cce>] unlock_page+0x2d/0x31
 [<ffffffff802666dd>] generic_file_aio_write_nolock+0x3b/0x8d
 [<ffffffff8028e40f>] do_sync_write+0xe2/0x126
 [<ffffffff802497d0>] autoremove_wake_function+0x0/0x38
 [<ffffffff8058e725>] do_page_fault+0x3f8/0x7bb
 [<ffffffff8028cae8>] fd_install+0x5f/0x68
 [<ffffffff8028eb98>] vfs_write+0xae/0x137
 [<ffffffff8028f102>] sys_write+0x47/0x70
 [<ffffffff8020b7ae>] system_call+0x7e/0x83



Comment 9 Anonymous Emailer 2007-12-03 14:48:04 UTC
Reply-To: akpm@linux-foundation.org

On Mon, 3 Dec 2007 16:38:37 -0500
"Ed L. Cashin" <ecashin@coraid.com> wrote:

> >   --- lx/lib/percpu_counter.c.20071130    2007-12-03 15:43:19.000000000 -0500
> >   +++ lx/lib/percpu_counter.c     2007-12-03 15:47:38.000000000 -0500
> >   @@ -33,7 +33,9 @@ void __percpu_counter_add(struct percpu_
> >           s64 count;
> >           s32 *pcount;
> >           int cpu = get_cpu();
> >   +       u64 badval = 0xffffffffffffffffULL;
> >    
> >   +       BUG_ON(!cpu_possible(cpu));
> >           pcount = per_cpu_ptr(fbc->counters, cpu);
> >           count = *pcount + amount;
> >           if (count >= batch || count <= -batch) {
> 
> It appears that the fbc->counters pointer is NULL.

Does this fix?

--- a/drivers/block/aoe/aoeblk.c~a
+++ a/drivers/block/aoe/aoeblk.c
@@ -6,6 +6,7 @@
 
 #include <linux/hdreg.h>
 #include <linux/blkdev.h>
+#include <linux/backing-dev.h>
 #include <linux/fs.h>
 #include <linux/ioctl.h>
 #include <linux/genhd.h>
@@ -228,6 +229,7 @@ aoeblk_gdalloc(void *vp)
 
 	spin_lock_irqsave(&d->lock, flags);
 	blk_queue_make_request(&d->blkq, aoeblk_make_request);
+	bdi_init(&d->blkq.backing_dev_info);
 	gd->major = AOE_MAJOR;
 	gd->first_minor = d->sysminor * AOE_PARTITIONS;
 	gd->fops = &aoe_bdops;
_


<wonders whether blk_queue_make_request() should be running bdi_init()?>

Comment 10 Anonymous Emailer 2007-12-03 15:14:30 UTC
Reply-To: akpm@linux-foundation.org

On Mon, 3 Dec 2007 14:47:22 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> Does this fix?

Slightly more elaborate version

- handle errors

- don't do illegal things under spinlock

- clean up error unwinding

--- a/drivers/block/aoe/aoeblk.c~aoe-properly-initialise-the-request_queues-backing_dev_info
+++ a/drivers/block/aoe/aoeblk.c
@@ -6,6 +6,7 @@
 
 #include <linux/hdreg.h>
 #include <linux/blkdev.h>
+#include <linux/backing-dev.h>
 #include <linux/fs.h>
 #include <linux/ioctl.h>
 #include <linux/genhd.h>
@@ -210,25 +211,20 @@ aoeblk_gdalloc(void *vp)
 	if (gd == NULL) {
 		printk(KERN_ERR "aoe: cannot allocate disk structure for %ld.%ld\n",
 			d->aoemajor, d->aoeminor);
-		spin_lock_irqsave(&d->lock, flags);
-		d->flags &= ~DEVFL_GDALLOC;
-		spin_unlock_irqrestore(&d->lock, flags);
-		return;
+		goto err;
 	}
 
 	d->bufpool = mempool_create_slab_pool(MIN_BUFS, buf_pool_cache);
 	if (d->bufpool == NULL) {
 		printk(KERN_ERR "aoe: cannot allocate bufpool for %ld.%ld\n",
 			d->aoemajor, d->aoeminor);
-		put_disk(gd);
-		spin_lock_irqsave(&d->lock, flags);
-		d->flags &= ~DEVFL_GDALLOC;
-		spin_unlock_irqrestore(&d->lock, flags);
-		return;
+		goto err_disk;
 	}
 
-	spin_lock_irqsave(&d->lock, flags);
 	blk_queue_make_request(&d->blkq, aoeblk_make_request);
+	if (bdi_init(&d->blkq.backing_dev_info))
+		goto err_mempool;
+	spin_lock_irqsave(&d->lock, flags);
 	gd->major = AOE_MAJOR;
 	gd->first_minor = d->sysminor * AOE_PARTITIONS;
 	gd->fops = &aoe_bdops;
@@ -246,6 +242,16 @@ aoeblk_gdalloc(void *vp)
 
 	add_disk(gd);
 	aoedisk_add_sysfs(d);
+	return;
+
+err_mempool:
+	mempool_destroy(d->bufpool);
+err_disk:
+	put_disk(gd);
+err:
+	spin_lock_irqsave(&d->lock, flags);
+	d->flags &= ~DEVFL_GDALLOC;
+	spin_unlock_irqrestore(&d->lock, flags);
 }
 
 void
_


It was done lackadaisically and needs checking.


Comment 11 Anonymous Emailer 2007-12-03 15:25:37 UTC
Reply-To: ecashin@coraid.com

On Mon, Dec 03, 2007 at 02:47:22PM -0800, Andrew Morton wrote:
> On Mon, 3 Dec 2007 16:38:37 -0500
> "Ed L. Cashin" <ecashin@coraid.com> wrote:
...
> > It appears that the fbc->counters pointer is NULL.
> 
> Does this fix?
> 
> --- a/drivers/block/aoe/aoeblk.c~a
> +++ a/drivers/block/aoe/aoeblk.c
> @@ -6,6 +6,7 @@
>  
>  #include <linux/hdreg.h>
>  #include <linux/blkdev.h>
> +#include <linux/backing-dev.h>
>  #include <linux/fs.h>
>  #include <linux/ioctl.h>
>  #include <linux/genhd.h>
> @@ -228,6 +229,7 @@ aoeblk_gdalloc(void *vp)
>  
>  	spin_lock_irqsave(&d->lock, flags);
>  	blk_queue_make_request(&d->blkq, aoeblk_make_request);
> +	bdi_init(&d->blkq.backing_dev_info);
>  	gd->major = AOE_MAJOR;
>  	gd->first_minor = d->sysminor * AOE_PARTITIONS;
>  	gd->fops = &aoe_bdops;
> _
> 
> 
> <wonders whether blk_queue_make_request() should be running bdi_init()?>

No, the behavior doesn't change with this patch applied.

Meanwhile I have started a git bisect, and hopefully that will turn up
a specific patch before I hit an unbootable kernel or get my machine
in a state where it won't boot.

Comment 12 Anonymous Emailer 2007-12-03 15:45:37 UTC
Reply-To: ecashin@coraid.com

On Mon, Dec 03, 2007 at 03:13:49PM -0800, Andrew Morton wrote:
> On Mon, 3 Dec 2007 14:47:22 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > Does this fix?
> 
> Slightly more elaborate version

Yes, this patch does eliminate the problem.  Without it, no write can
complete, and with it I have seen many writes complete without any
trouble.

Thank you for looking into this.  I will look more closely at this
patch tomorrow.

Comment 13 Rafael J. Wysocki 2007-12-07 17:42:43 UTC
Ping?
Comment 14 Anonymous Emailer 2007-12-07 17:56:28 UTC
Reply-To: jnelson@jamponi.net

I can check to see if the patch fixes things tomorrow.


Comment 15 Anonymous Emailer 2007-12-08 15:00:07 UTC
Reply-To: jnelson@jamponi.net

I can confirm that 2.6.24rc4 with the (second) patch works fine.
Comment 16 Anonymous Emailer 2007-12-08 19:50:25 UTC
Reply-To: akpm@linux-foundation.org

On Sat, 8 Dec 2007 16:59:30 -0600 "Jon Nelson" <jnelson@jamponi.net> wrote:

> I can confirm that 2.6.24rc4 with the (second) patch works fine.

OK, thanks.

We haven't heard back from Ed yet.   I'll sit on this for a few more days.


From: Andrew Morton <akpm@linux-foundation.org>

Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: <stable@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/block/aoe/aoeblk.c |   26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff -puN drivers/block/aoe/aoeblk.c~aoe-properly-initialise-the-request_queues-backing_dev_info drivers/block/aoe/aoeblk.c
--- a/drivers/block/aoe/aoeblk.c~aoe-properly-initialise-the-request_queues-backing_dev_info
+++ a/drivers/block/aoe/aoeblk.c
@@ -6,6 +6,7 @@
 
 #include <linux/hdreg.h>
 #include <linux/blkdev.h>
+#include <linux/backing-dev.h>
 #include <linux/fs.h>
 #include <linux/ioctl.h>
 #include <linux/genhd.h>
@@ -210,25 +211,20 @@ aoeblk_gdalloc(void *vp)
 	if (gd == NULL) {
 		printk(KERN_ERR "aoe: cannot allocate disk structure for %ld.%ld\n",
 			d->aoemajor, d->aoeminor);
-		spin_lock_irqsave(&d->lock, flags);
-		d->flags &= ~DEVFL_GDALLOC;
-		spin_unlock_irqrestore(&d->lock, flags);
-		return;
+		goto err;
 	}
 
 	d->bufpool = mempool_create_slab_pool(MIN_BUFS, buf_pool_cache);
 	if (d->bufpool == NULL) {
 		printk(KERN_ERR "aoe: cannot allocate bufpool for %ld.%ld\n",
 			d->aoemajor, d->aoeminor);
-		put_disk(gd);
-		spin_lock_irqsave(&d->lock, flags);
-		d->flags &= ~DEVFL_GDALLOC;
-		spin_unlock_irqrestore(&d->lock, flags);
-		return;
+		goto err_disk;
 	}
 
-	spin_lock_irqsave(&d->lock, flags);
 	blk_queue_make_request(&d->blkq, aoeblk_make_request);
+	if (bdi_init(&d->blkq.backing_dev_info))
+		goto err_mempool;
+	spin_lock_irqsave(&d->lock, flags);
 	gd->major = AOE_MAJOR;
 	gd->first_minor = d->sysminor * AOE_PARTITIONS;
 	gd->fops = &aoe_bdops;
@@ -246,6 +242,16 @@ aoeblk_gdalloc(void *vp)
 
 	add_disk(gd);
 	aoedisk_add_sysfs(d);
+	return;
+
+err_mempool:
+	mempool_destroy(d->bufpool);
+err_disk:
+	put_disk(gd);
+err:
+	spin_lock_irqsave(&d->lock, flags);
+	d->flags &= ~DEVFL_GDALLOC;
+	spin_unlock_irqrestore(&d->lock, flags);
 }
 
 void
_

Comment 17 Anonymous Emailer 2007-12-10 07:25:28 UTC
Reply-To: ecashin@coraid.com

On Sat, Dec 08, 2007 at 07:50:15PM -0800, Andrew Morton wrote:
> On Sat, 8 Dec 2007 16:59:30 -0600 "Jon Nelson" <jnelson@jamponi.net> wrote:
> 
> > I can confirm that 2.6.24rc4 with the (second) patch works fine.
> 
> OK, thanks.
> 
> We haven't heard back from Ed yet.   I'll sit on this for a few more days.

Sorry, yes, the second patch works well for me in testing.  I had some
initial concerns that were unfounded.

Comment 18 Rafael J. Wysocki 2007-12-17 15:23:28 UTC
Fixed by:

commit 43cbe2cbdd5320f1ac785c6f016923609831effe
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Mon Dec 10 15:49:13 2007 -0800

    aoe: properly initialise the request_queue's backing_dev_info

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=43cbe2cbdd5320f1ac785c6f016923609831effe

Note You need to log in before you can comment on or make changes to this bug.