Bug 11046 - Kernel bug in mm/bootmem.c on Sparc machines
Summary: Kernel bug in mm/bootmem.c on Sparc machines
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-06 13:02 UTC by F. Förster
Modified: 2010-01-19 19:31 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.25.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description F. Förster 2008-07-06 13:02:28 UTC
Latest working kernel version:
Earliest failing kernel version:
Distribution: 
Hardware Environment: Sparc Blade B100s ( Ultra Sparc IIe 650 Mhz )
Software Environment: 
Problem Description: Kernel Bug

Steps to reproduce: Create default kernel config for Sparc. Change this config that the kernel can do rarpd and change the network driver to be included in kernel (Cassini). Well set the kernel to be loaded over the net via tftpboot. I di this on my other Sparc machines and it worked there.

Here is the BUG:

[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12 10:40'
[    0.000000] PROMLIB: Root node compatible: 
[    0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42 CEST 2008
[    0.000000] console [earlyprom0] enabled
[    0.000000] ARCH: SUN4U
[    0.000000] Ethernet address: 00:03:ba:7a:f3:d6
[    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel... done.
[    0.000000] kernel BUG at mm/bootmem.c:125!
[    0.000000]               \|/ ____ \|/
[    0.000000]               "@'/ .. \`@"
[    0.000000]               /_| \__/ |_\
[    0.000000]                  \__U_/
[    0.000000] swapper(0): Kernel bad sw trap 5 [#1]
[    0.000000] TSTATE: 0000000080f01604 TPC: 00000000007ae2c4 TNPC: 00000000007ae2c8 Y: 00000000    Not tainted
[    0.000000] TPC: <reserve_bootmem_core+0x38/0xd8>
[    0.000000] g0: 0000000000000000 g1: 0000000000000001 g2: 000000000075ac00 g3: 000000000075afa8
[    0.000000] g4: 00000000007563c0 g5: 0000000000000000 g6: 00000000007523c0 g7: 0000000000000000
[    0.000000] o0: 0000000000000032 o1: 00000000007044e0 o2: 000000000000007d o3: fffff80000438000
[    0.000000] o4: 0000000000000000 o5: 000000000075ef80 sp: 00000000007557c1 ret_pc: 00000000007ae2bc
[    0.000000] RPC: <reserve_bootmem_core+0x30/0xd8>
[    0.000000] l0: 000000000075ef50 l1: 0000000000000030 l2: 0000000000000000 l3: 0000000000000010
[    0.000000] l4: 0000000000000000 l5: 0000000000000010 l6: 0000000000000000 l7: 0000000000000000
[    0.000000] i0: 000000000081c5c0 i1: 0000000000438000 i2: 0000000000000000 i3: 0000000000000000
[    0.000000] i4: 0000000000000000 i5: 0000000000438000 i6: 0000000000755881 i7: 00000000007ab7c8
[    0.000000] I7: <paging_init+0xd50/0xee8>
[    0.000000] Caller[00000000007ab7c8]: paging_init+0xd50/0xee8
[    0.000000] Caller[00000000007a3da0]: setup_arch+0x2d8/0x2e0
[    0.000000] Caller[00000000007a0810]: start_kernel+0x7c/0x300
[    0.000000] Caller[00000000006860f0]: auxio_probe+0x0/0xd0
[    0.000000] Caller[0000000000000000]: 0x8
[    0.000000] Instruction DUMP: 11001c11  7ff1ee23  901220e0 <91d02005> 8406401a  b5307033  8200801a  8330700d  80a04003 
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Press Stop-A (L1-A) to return to the boot prom


Any help would be nice. If you need further information please tell me.
This is my firs bug report and i hope i did it right.
Comment 1 Anonymous Emailer 2008-07-06 13:21:21 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sun,  6 Jul 2008 13:02:28 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11046
> 
>            Summary: Kernel bug in mm/bootmem.c on Sparc machines
>            Product: Memory Management
>            Version: 2.5
>      KernelVersion: 2.6.25.10
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: blocking
>           Priority: P1
>          Component: Other
>         AssignedTo: akpm@osdl.org
>         ReportedBy: lomp0101@gmx.net
> 
> 
> Latest working kernel version:
> Earliest failing kernel version:
> Distribution: 
> Hardware Environment: Sparc Blade B100s ( Ultra Sparc IIe 650 Mhz )
> Software Environment: 
> Problem Description: Kernel Bug
> 
> Steps to reproduce: Create default kernel config for Sparc. Change this
> config
> that the kernel can do rarpd and change the network driver to be included in
> kernel (Cassini). Well set the kernel to be loaded over the net via tftpboot.
> I
> di this on my other Sparc machines and it worked there.
> 
> Here is the BUG:
> 
> [    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12 10:40'
> [    0.000000] PROMLIB: Root node compatible: 
> [    0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version 4.1.2
> 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42 CEST 2008
> [    0.000000] console [earlyprom0] enabled
> [    0.000000] ARCH: SUN4U
> [    0.000000] Ethernet address: 00:03:ba:7a:f3:d6
> [    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
> [    0.000000] Remapping the kernel... done.
> [    0.000000] kernel BUG at mm/bootmem.c:125!
> [    0.000000]               \|/ ____ \|/
> [    0.000000]               "@'/ .. \`@"
> [    0.000000]               /_| \__/ |_\
> [    0.000000]                  \__U_/
> [    0.000000] swapper(0): Kernel bad sw trap 5 [#1]
> [    0.000000] TSTATE: 0000000080f01604 TPC: 00000000007ae2c4 TNPC:
> 00000000007ae2c8 Y: 00000000    Not tainted
> [    0.000000] TPC: <reserve_bootmem_core+0x38/0xd8>
> [    0.000000] g0: 0000000000000000 g1: 0000000000000001 g2: 000000000075ac00
> g3: 000000000075afa8
> [    0.000000] g4: 00000000007563c0 g5: 0000000000000000 g6: 00000000007523c0
> g7: 0000000000000000
> [    0.000000] o0: 0000000000000032 o1: 00000000007044e0 o2: 000000000000007d
> o3: fffff80000438000
> [    0.000000] o4: 0000000000000000 o5: 000000000075ef80 sp: 00000000007557c1
> ret_pc: 00000000007ae2bc
> [    0.000000] RPC: <reserve_bootmem_core+0x30/0xd8>
> [    0.000000] l0: 000000000075ef50 l1: 0000000000000030 l2: 0000000000000000
> l3: 0000000000000010
> [    0.000000] l4: 0000000000000000 l5: 0000000000000010 l6: 0000000000000000
> l7: 0000000000000000
> [    0.000000] i0: 000000000081c5c0 i1: 0000000000438000 i2: 0000000000000000
> i3: 0000000000000000
> [    0.000000] i4: 0000000000000000 i5: 0000000000438000 i6: 0000000000755881
> i7: 00000000007ab7c8
> [    0.000000] I7: <paging_init+0xd50/0xee8>
> [    0.000000] Caller[00000000007ab7c8]: paging_init+0xd50/0xee8
> [    0.000000] Caller[00000000007a3da0]: setup_arch+0x2d8/0x2e0
> [    0.000000] Caller[00000000007a0810]: start_kernel+0x7c/0x300
> [    0.000000] Caller[00000000006860f0]: auxio_probe+0x0/0xd0
> [    0.000000] Caller[0000000000000000]: 0x8
> [    0.000000] Instruction DUMP: 11001c11  7ff1ee23  901220e0 <91d02005>
> 8406401a  b5307033  8200801a  8330700d  80a04003 
> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [    0.000000] Press Stop-A (L1-A) to return to the boot prom
> 
> 
> Any help would be nice. If you need further information please tell me.
> This is my firs bug report and i hope i did it right.
> 
Comment 2 David S. Miller 2008-07-23 20:25:45 UTC
From: Andrew Morton <akpm@linux-foundation.org>
Date: Sun, 6 Jul 2008 13:20:49 -0700

> On Sun,  6 Jul 2008 13:02:28 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
> wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=11046
 ...
> > Here is the BUG:
> > 
> > [    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12 10:40'
> > [    0.000000] PROMLIB: Root node compatible: 
> > [    0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version 4.1.2
> > 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42 CEST
> 2008
> > [    0.000000] console [earlyprom0] enabled
> > [    0.000000] ARCH: SUN4U
> > [    0.000000] Ethernet address: 00:03:ba:7a:f3:d6
> > [    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
> > [    0.000000] Remapping the kernel... done.
> > [    0.000000] kernel BUG at mm/bootmem.c:125!

This can only happen if you attach a zero-sized initrd to the kernel.

I see platforms like x86 sometimes have explicit checks for a zero
size to guard reserve_bootmem() and similar calls, but if that's what
callers are all going to do doesn't it make better sense for
reserve_bootmem_core() to just return instead of BUG on a zero size
argument?
Comment 3 Anonymous Emailer 2008-07-23 20:38:42 UTC
Reply-To: akpm@linux-foundation.org

On Wed, 23 Jul 2008 20:25:33 -0700 (PDT) David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Sun, 6 Jul 2008 13:20:49 -0700
> 
> > On Sun,  6 Jul 2008 13:02:28 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
> wrote:
> > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=11046
>  ...
> > > Here is the BUG:
> > > 
> > > [    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12 10:40'
> > > [    0.000000] PROMLIB: Root node compatible: 
> > > [    0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version 4.1.2
> > > 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42 CEST
> 2008
> > > [    0.000000] console [earlyprom0] enabled
> > > [    0.000000] ARCH: SUN4U
> > > [    0.000000] Ethernet address: 00:03:ba:7a:f3:d6
> > > [    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
> > > [    0.000000] Remapping the kernel... done.
> > > [    0.000000] kernel BUG at mm/bootmem.c:125!
> 
> This can only happen if you attach a zero-sized initrd to the kernel.
> 
> I see platforms like x86 sometimes have explicit checks for a zero
> size to guard reserve_bootmem() and similar calls, but if that's what
> callers are all going to do doesn't it make better sense for
> reserve_bootmem_core() to just return instead of BUG on a zero size
> argument?

Sounds logical.

Johannes just rewrote the bootmem code, but from a quick read it
appears that this behaviour has been retained.

So if we're going to change it in 2.6.26, we'll need a separate patch.
Comment 4 David S. Miller 2008-07-23 20:42:49 UTC
From: Andrew Morton <akpm@linux-foundation.org>
Date: Wed, 23 Jul 2008 20:38:36 -0700

> So if we're going to change it in 2.6.26, we'll need a separate patch.

Here is the 2.6.26 version:

bootmem: Allow zero length reserve and free.

It's either this or all the call sites explicitly check
when such a case is possible and sometimes expected.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 8d9f60e..e540f7a 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -153,7 +153,8 @@ static void __init reserve_bootmem_core(bootmem_data_t *bdata,
 	unsigned long sidx, eidx;
 	unsigned long i;
 
-	BUG_ON(!size);
+	if (!size)
+		return;
 
 	/* out of range */
 	if (addr + size < bdata->node_boot_start ||
@@ -187,7 +188,8 @@ static void __init free_bootmem_core(bootmem_data_t *bdata, unsigned long addr,
 	unsigned long sidx, eidx;
 	unsigned long i;
 
-	BUG_ON(!size);
+	if (!size)
+		return;
 
 	/* out range */
 	if (addr + size < bdata->node_boot_start ||
Comment 5 Johannes Weiner 2008-07-24 05:10:25 UTC
Hi,

Andrew Morton <akpm@linux-foundation.org> writes:

> On Wed, 23 Jul 2008 20:25:33 -0700 (PDT) David Miller <davem@davemloft.net>
> wrote:
>
>> From: Andrew Morton <akpm@linux-foundation.org>
>> Date: Sun, 6 Jul 2008 13:20:49 -0700
>> 
>> > On Sun,  6 Jul 2008 13:02:28 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
>> wrote:
>> > 
>> > > http://bugzilla.kernel.org/show_bug.cgi?id=11046
>>  ...
>> > > Here is the BUG:
>> > > 
>> > > [    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12 10:40'
>> > > [    0.000000] PROMLIB: Root node compatible: 
>> > > [    0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version 4.1.2
>> > > 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42 CEST
>> 2008
>> > > [    0.000000] console [earlyprom0] enabled
>> > > [    0.000000] ARCH: SUN4U
>> > > [    0.000000] Ethernet address: 00:03:ba:7a:f3:d6
>> > > [    0.000000] Kernel: Using 2 locked TLB entries for main kernel image.
>> > > [    0.000000] Remapping the kernel... done.
>> > > [    0.000000] kernel BUG at mm/bootmem.c:125!
>> 
>> This can only happen if you attach a zero-sized initrd to the kernel.
>> 
>> I see platforms like x86 sometimes have explicit checks for a zero
>> size to guard reserve_bootmem() and similar calls, but if that's what
>> callers are all going to do doesn't it make better sense for
>> reserve_bootmem_core() to just return instead of BUG on a zero size
>> argument?
>
> Sounds logical.
>
> Johannes just rewrote the bootmem code, but from a quick read it
> appears that this behaviour has been retained.

In the new version, zero sized ranges are okay for reservation and
freeing.  It still bugs on allocation, though.

> So if we're going to change it in 2.6.26, we'll need a separate patch.

	Hannes
Comment 6 Anonymous Emailer 2008-07-24 11:37:54 UTC
Reply-To: akpm@linux-foundation.org

On Thu, 24 Jul 2008 14:09:38 +0200 Johannes Weiner <hannes@saeurebad.de> wrote:

> Hi,
> 
> Andrew Morton <akpm@linux-foundation.org> writes:
> 
> > On Wed, 23 Jul 2008 20:25:33 -0700 (PDT) David Miller <davem@davemloft.net>
> wrote:
> >
> >> From: Andrew Morton <akpm@linux-foundation.org>
> >> Date: Sun, 6 Jul 2008 13:20:49 -0700
> >> 
> >> > On Sun,  6 Jul 2008 13:02:28 -0700 (PDT)
> bugme-daemon@bugzilla.kernel.org wrote:
> >> > 
> >> > > http://bugzilla.kernel.org/show_bug.cgi?id=11046
> >>  ...
> >> > > Here is the BUG:
> >> > > 
> >> > > [    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12
> 10:40'
> >> > > [    0.000000] PROMLIB: Root node compatible: 
> >> > > [    0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version
> 4.1.2
> >> > > 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42
> CEST 2008
> >> > > [    0.000000] console [earlyprom0] enabled
> >> > > [    0.000000] ARCH: SUN4U
> >> > > [    0.000000] Ethernet address: 00:03:ba:7a:f3:d6
> >> > > [    0.000000] Kernel: Using 2 locked TLB entries for main kernel
> image.
> >> > > [    0.000000] Remapping the kernel... done.
> >> > > [    0.000000] kernel BUG at mm/bootmem.c:125!
> >> 
> >> This can only happen if you attach a zero-sized initrd to the kernel.
> >> 
> >> I see platforms like x86 sometimes have explicit checks for a zero
> >> size to guard reserve_bootmem() and similar calls, but if that's what
> >> callers are all going to do doesn't it make better sense for
> >> reserve_bootmem_core() to just return instead of BUG on a zero size
> >> argument?
> >
> > Sounds logical.
> >
> > Johannes just rewrote the bootmem code, but from a quick read it
> > appears that this behaviour has been retained.
> 
> In the new version, zero sized ranges are okay for reservation and
> freeing.  It still bugs on allocation, though.
> 

Interesting.  So from Dave's patch (which changes only
reserve_bootmem_core() and free_bootmem_core()), it sounds like we
have already fixed 2.6.27?

In which case David's 2.6.26 patch is a "minimal backport".
Comment 7 Johannes Weiner 2008-07-24 14:33:04 UTC
Hi,

David Miller <davem@davemloft.net> writes:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Wed, 23 Jul 2008 20:38:36 -0700
>
>> So if we're going to change it in 2.6.26, we'll need a separate patch.
>
> Here is the 2.6.26 version:
>
> bootmem: Allow zero length reserve and free.
>
> It's either this or all the call sites explicitly check
> when such a case is possible and sometimes expected.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/mm/bootmem.c b/mm/bootmem.c
> index 8d9f60e..e540f7a 100644
> --- a/mm/bootmem.c
> +++ b/mm/bootmem.c
> @@ -153,7 +153,8 @@ static void __init reserve_bootmem_core(bootmem_data_t
> *bdata,
>       unsigned long sidx, eidx;
>       unsigned long i;
>  
> -     BUG_ON(!size);
> +     if (!size)
> +             return;
>  
>       /* out of range */
>       if (addr + size < bdata->node_boot_start ||
> @@ -187,7 +188,8 @@ static void __init free_bootmem_core(bootmem_data_t
> *bdata, unsigned long addr,
>       unsigned long sidx, eidx;
>       unsigned long i;
>  
> -     BUG_ON(!size);
> +     if (!size)
> +             return;
>  
>       /* out range */
>       if (addr + size < bdata->node_boot_start ||

Sorry, Dave, I missed that before: there is still the BUG_ON() in
can_reserve_bootmem_core(), which should just return 0 instead.

Other than that, yes, Andrew, this introduces the same behaviour the
bootmem rewrite.

	Hannes
Comment 8 David S. Miller 2008-07-24 14:59:03 UTC
From: Johannes Weiner <hannes@saeurebad.de>
Date: Thu, 24 Jul 2008 23:32:06 +0200

> Sorry, Dave, I missed that before: there is still the BUG_ON() in
> can_reserve_bootmem_core(), which should just return 0 instead.
> 
> Other than that, yes, Andrew, this introduces the same behaviour the
> bootmem rewrite.

Thanks, here is an updated version of the patch:

bootmem: Allow zero length reserve and free.

It's either this or all the call sites explicitly check
when such a case is possible and sometimes expected.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 8d9f60e..5e3fab8 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -117,7 +117,8 @@ static int __init can_reserve_bootmem_core(bootmem_data_t *bdata,
 	unsigned long sidx, eidx;
 	unsigned long i;
 
-	BUG_ON(!size);
+	if (!size)
+		return 0;
 
 	/* out of range, don't hold other */
 	if (addr + size < bdata->node_boot_start ||
@@ -153,7 +154,8 @@ static void __init reserve_bootmem_core(bootmem_data_t *bdata,
 	unsigned long sidx, eidx;
 	unsigned long i;
 
-	BUG_ON(!size);
+	if (!size)
+		return;
 
 	/* out of range */
 	if (addr + size < bdata->node_boot_start ||
@@ -187,7 +189,8 @@ static void __init free_bootmem_core(bootmem_data_t *bdata, unsigned long addr,
 	unsigned long sidx, eidx;
 	unsigned long i;
 
-	BUG_ON(!size);
+	if (!size)
+		return;
 
 	/* out range */
 	if (addr + size < bdata->node_boot_start ||

Note You need to log in before you can comment on or make changes to this bug.