Bug 11046
Summary: | Kernel bug in mm/bootmem.c on Sparc machines | ||
---|---|---|---|
Product: | Memory Management | Reporter: | F. Förster (lomp0101) |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | blocking | CC: | alan |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25.10 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
F. Förster
2008-07-06 13:02:28 UTC
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 6 Jul 2008 13:02:28 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11046 > > Summary: Kernel bug in mm/bootmem.c on Sparc machines > Product: Memory Management > Version: 2.5 > KernelVersion: 2.6.25.10 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: blocking > Priority: P1 > Component: Other > AssignedTo: akpm@osdl.org > ReportedBy: lomp0101@gmx.net > > > Latest working kernel version: > Earliest failing kernel version: > Distribution: > Hardware Environment: Sparc Blade B100s ( Ultra Sparc IIe 650 Mhz ) > Software Environment: > Problem Description: Kernel Bug > > Steps to reproduce: Create default kernel config for Sparc. Change this > config > that the kernel can do rarpd and change the network driver to be included in > kernel (Cassini). Well set the kernel to be loaded over the net via tftpboot. > I > di this on my other Sparc machines and it worked there. > > Here is the BUG: > > [ 0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12 10:40' > [ 0.000000] PROMLIB: Root node compatible: > [ 0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version 4.1.2 > 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42 CEST 2008 > [ 0.000000] console [earlyprom0] enabled > [ 0.000000] ARCH: SUN4U > [ 0.000000] Ethernet address: 00:03:ba:7a:f3:d6 > [ 0.000000] Kernel: Using 2 locked TLB entries for main kernel image. > [ 0.000000] Remapping the kernel... done. > [ 0.000000] kernel BUG at mm/bootmem.c:125! > [ 0.000000] \|/ ____ \|/ > [ 0.000000] "@'/ .. \`@" > [ 0.000000] /_| \__/ |_\ > [ 0.000000] \__U_/ > [ 0.000000] swapper(0): Kernel bad sw trap 5 [#1] > [ 0.000000] TSTATE: 0000000080f01604 TPC: 00000000007ae2c4 TNPC: > 00000000007ae2c8 Y: 00000000 Not tainted > [ 0.000000] TPC: <reserve_bootmem_core+0x38/0xd8> > [ 0.000000] g0: 0000000000000000 g1: 0000000000000001 g2: 000000000075ac00 > g3: 000000000075afa8 > [ 0.000000] g4: 00000000007563c0 g5: 0000000000000000 g6: 00000000007523c0 > g7: 0000000000000000 > [ 0.000000] o0: 0000000000000032 o1: 00000000007044e0 o2: 000000000000007d > o3: fffff80000438000 > [ 0.000000] o4: 0000000000000000 o5: 000000000075ef80 sp: 00000000007557c1 > ret_pc: 00000000007ae2bc > [ 0.000000] RPC: <reserve_bootmem_core+0x30/0xd8> > [ 0.000000] l0: 000000000075ef50 l1: 0000000000000030 l2: 0000000000000000 > l3: 0000000000000010 > [ 0.000000] l4: 0000000000000000 l5: 0000000000000010 l6: 0000000000000000 > l7: 0000000000000000 > [ 0.000000] i0: 000000000081c5c0 i1: 0000000000438000 i2: 0000000000000000 > i3: 0000000000000000 > [ 0.000000] i4: 0000000000000000 i5: 0000000000438000 i6: 0000000000755881 > i7: 00000000007ab7c8 > [ 0.000000] I7: <paging_init+0xd50/0xee8> > [ 0.000000] Caller[00000000007ab7c8]: paging_init+0xd50/0xee8 > [ 0.000000] Caller[00000000007a3da0]: setup_arch+0x2d8/0x2e0 > [ 0.000000] Caller[00000000007a0810]: start_kernel+0x7c/0x300 > [ 0.000000] Caller[00000000006860f0]: auxio_probe+0x0/0xd0 > [ 0.000000] Caller[0000000000000000]: 0x8 > [ 0.000000] Instruction DUMP: 11001c11 7ff1ee23 901220e0 <91d02005> > 8406401a b5307033 8200801a 8330700d 80a04003 > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > [ 0.000000] Press Stop-A (L1-A) to return to the boot prom > > > Any help would be nice. If you need further information please tell me. > This is my firs bug report and i hope i did it right. > From: Andrew Morton <akpm@linux-foundation.org> Date: Sun, 6 Jul 2008 13:20:49 -0700 > On Sun, 6 Jul 2008 13:02:28 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=11046 ... > > Here is the BUG: > > > > [ 0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12 10:40' > > [ 0.000000] PROMLIB: Root node compatible: > > [ 0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version 4.1.2 > > 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42 CEST > 2008 > > [ 0.000000] console [earlyprom0] enabled > > [ 0.000000] ARCH: SUN4U > > [ 0.000000] Ethernet address: 00:03:ba:7a:f3:d6 > > [ 0.000000] Kernel: Using 2 locked TLB entries for main kernel image. > > [ 0.000000] Remapping the kernel... done. > > [ 0.000000] kernel BUG at mm/bootmem.c:125! This can only happen if you attach a zero-sized initrd to the kernel. I see platforms like x86 sometimes have explicit checks for a zero size to guard reserve_bootmem() and similar calls, but if that's what callers are all going to do doesn't it make better sense for reserve_bootmem_core() to just return instead of BUG on a zero size argument? Reply-To: akpm@linux-foundation.org On Wed, 23 Jul 2008 20:25:33 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Sun, 6 Jul 2008 13:20:49 -0700 > > > On Sun, 6 Jul 2008 13:02:28 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > wrote: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=11046 > ... > > > Here is the BUG: > > > > > > [ 0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12 10:40' > > > [ 0.000000] PROMLIB: Root node compatible: > > > [ 0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version 4.1.2 > > > 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42 CEST > 2008 > > > [ 0.000000] console [earlyprom0] enabled > > > [ 0.000000] ARCH: SUN4U > > > [ 0.000000] Ethernet address: 00:03:ba:7a:f3:d6 > > > [ 0.000000] Kernel: Using 2 locked TLB entries for main kernel image. > > > [ 0.000000] Remapping the kernel... done. > > > [ 0.000000] kernel BUG at mm/bootmem.c:125! > > This can only happen if you attach a zero-sized initrd to the kernel. > > I see platforms like x86 sometimes have explicit checks for a zero > size to guard reserve_bootmem() and similar calls, but if that's what > callers are all going to do doesn't it make better sense for > reserve_bootmem_core() to just return instead of BUG on a zero size > argument? Sounds logical. Johannes just rewrote the bootmem code, but from a quick read it appears that this behaviour has been retained. So if we're going to change it in 2.6.26, we'll need a separate patch. From: Andrew Morton <akpm@linux-foundation.org> Date: Wed, 23 Jul 2008 20:38:36 -0700 > So if we're going to change it in 2.6.26, we'll need a separate patch. Here is the 2.6.26 version: bootmem: Allow zero length reserve and free. It's either this or all the call sites explicitly check when such a case is possible and sometimes expected. Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/mm/bootmem.c b/mm/bootmem.c index 8d9f60e..e540f7a 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -153,7 +153,8 @@ static void __init reserve_bootmem_core(bootmem_data_t *bdata, unsigned long sidx, eidx; unsigned long i; - BUG_ON(!size); + if (!size) + return; /* out of range */ if (addr + size < bdata->node_boot_start || @@ -187,7 +188,8 @@ static void __init free_bootmem_core(bootmem_data_t *bdata, unsigned long addr, unsigned long sidx, eidx; unsigned long i; - BUG_ON(!size); + if (!size) + return; /* out range */ if (addr + size < bdata->node_boot_start || Hi, Andrew Morton <akpm@linux-foundation.org> writes: > On Wed, 23 Jul 2008 20:25:33 -0700 (PDT) David Miller <davem@davemloft.net> > wrote: > >> From: Andrew Morton <akpm@linux-foundation.org> >> Date: Sun, 6 Jul 2008 13:20:49 -0700 >> >> > On Sun, 6 Jul 2008 13:02:28 -0700 (PDT) bugme-daemon@bugzilla.kernel.org >> wrote: >> > >> > > http://bugzilla.kernel.org/show_bug.cgi?id=11046 >> ... >> > > Here is the BUG: >> > > >> > > [ 0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12 10:40' >> > > [ 0.000000] PROMLIB: Root node compatible: >> > > [ 0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version 4.1.2 >> > > 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42 CEST >> 2008 >> > > [ 0.000000] console [earlyprom0] enabled >> > > [ 0.000000] ARCH: SUN4U >> > > [ 0.000000] Ethernet address: 00:03:ba:7a:f3:d6 >> > > [ 0.000000] Kernel: Using 2 locked TLB entries for main kernel image. >> > > [ 0.000000] Remapping the kernel... done. >> > > [ 0.000000] kernel BUG at mm/bootmem.c:125! >> >> This can only happen if you attach a zero-sized initrd to the kernel. >> >> I see platforms like x86 sometimes have explicit checks for a zero >> size to guard reserve_bootmem() and similar calls, but if that's what >> callers are all going to do doesn't it make better sense for >> reserve_bootmem_core() to just return instead of BUG on a zero size >> argument? > > Sounds logical. > > Johannes just rewrote the bootmem code, but from a quick read it > appears that this behaviour has been retained. In the new version, zero sized ranges are okay for reservation and freeing. It still bugs on allocation, though. > So if we're going to change it in 2.6.26, we'll need a separate patch. Hannes Reply-To: akpm@linux-foundation.org On Thu, 24 Jul 2008 14:09:38 +0200 Johannes Weiner <hannes@saeurebad.de> wrote: > Hi, > > Andrew Morton <akpm@linux-foundation.org> writes: > > > On Wed, 23 Jul 2008 20:25:33 -0700 (PDT) David Miller <davem@davemloft.net> > wrote: > > > >> From: Andrew Morton <akpm@linux-foundation.org> > >> Date: Sun, 6 Jul 2008 13:20:49 -0700 > >> > >> > On Sun, 6 Jul 2008 13:02:28 -0700 (PDT) > bugme-daemon@bugzilla.kernel.org wrote: > >> > > >> > > http://bugzilla.kernel.org/show_bug.cgi?id=11046 > >> ... > >> > > Here is the BUG: > >> > > > >> > > [ 0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.11.5 2003/11/12 > 10:40' > >> > > [ 0.000000] PROMLIB: Root node compatible: > >> > > [ 0.000000] Linux version 2.6.25.10 (root@sparc1) (gcc version > 4.1.2 > >> > > 20061115 (prerelease) (Debian 4.1.1-21)) #5 SMP Sun Jul 6 21:05:42 > CEST 2008 > >> > > [ 0.000000] console [earlyprom0] enabled > >> > > [ 0.000000] ARCH: SUN4U > >> > > [ 0.000000] Ethernet address: 00:03:ba:7a:f3:d6 > >> > > [ 0.000000] Kernel: Using 2 locked TLB entries for main kernel > image. > >> > > [ 0.000000] Remapping the kernel... done. > >> > > [ 0.000000] kernel BUG at mm/bootmem.c:125! > >> > >> This can only happen if you attach a zero-sized initrd to the kernel. > >> > >> I see platforms like x86 sometimes have explicit checks for a zero > >> size to guard reserve_bootmem() and similar calls, but if that's what > >> callers are all going to do doesn't it make better sense for > >> reserve_bootmem_core() to just return instead of BUG on a zero size > >> argument? > > > > Sounds logical. > > > > Johannes just rewrote the bootmem code, but from a quick read it > > appears that this behaviour has been retained. > > In the new version, zero sized ranges are okay for reservation and > freeing. It still bugs on allocation, though. > Interesting. So from Dave's patch (which changes only reserve_bootmem_core() and free_bootmem_core()), it sounds like we have already fixed 2.6.27? In which case David's 2.6.26 patch is a "minimal backport". Hi, David Miller <davem@davemloft.net> writes: > From: Andrew Morton <akpm@linux-foundation.org> > Date: Wed, 23 Jul 2008 20:38:36 -0700 > >> So if we're going to change it in 2.6.26, we'll need a separate patch. > > Here is the 2.6.26 version: > > bootmem: Allow zero length reserve and free. > > It's either this or all the call sites explicitly check > when such a case is possible and sometimes expected. > > Signed-off-by: David S. Miller <davem@davemloft.net> > > diff --git a/mm/bootmem.c b/mm/bootmem.c > index 8d9f60e..e540f7a 100644 > --- a/mm/bootmem.c > +++ b/mm/bootmem.c > @@ -153,7 +153,8 @@ static void __init reserve_bootmem_core(bootmem_data_t > *bdata, > unsigned long sidx, eidx; > unsigned long i; > > - BUG_ON(!size); > + if (!size) > + return; > > /* out of range */ > if (addr + size < bdata->node_boot_start || > @@ -187,7 +188,8 @@ static void __init free_bootmem_core(bootmem_data_t > *bdata, unsigned long addr, > unsigned long sidx, eidx; > unsigned long i; > > - BUG_ON(!size); > + if (!size) > + return; > > /* out range */ > if (addr + size < bdata->node_boot_start || Sorry, Dave, I missed that before: there is still the BUG_ON() in can_reserve_bootmem_core(), which should just return 0 instead. Other than that, yes, Andrew, this introduces the same behaviour the bootmem rewrite. Hannes From: Johannes Weiner <hannes@saeurebad.de> Date: Thu, 24 Jul 2008 23:32:06 +0200 > Sorry, Dave, I missed that before: there is still the BUG_ON() in > can_reserve_bootmem_core(), which should just return 0 instead. > > Other than that, yes, Andrew, this introduces the same behaviour the > bootmem rewrite. Thanks, here is an updated version of the patch: bootmem: Allow zero length reserve and free. It's either this or all the call sites explicitly check when such a case is possible and sometimes expected. Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/mm/bootmem.c b/mm/bootmem.c index 8d9f60e..5e3fab8 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -117,7 +117,8 @@ static int __init can_reserve_bootmem_core(bootmem_data_t *bdata, unsigned long sidx, eidx; unsigned long i; - BUG_ON(!size); + if (!size) + return 0; /* out of range, don't hold other */ if (addr + size < bdata->node_boot_start || @@ -153,7 +154,8 @@ static void __init reserve_bootmem_core(bootmem_data_t *bdata, unsigned long sidx, eidx; unsigned long i; - BUG_ON(!size); + if (!size) + return; /* out of range */ if (addr + size < bdata->node_boot_start || @@ -187,7 +189,8 @@ static void __init free_bootmem_core(bootmem_data_t *bdata, unsigned long addr, unsigned long sidx, eidx; unsigned long i; - BUG_ON(!size); + if (!size) + return; /* out range */ if (addr + size < bdata->node_boot_start || |