Bug 8161

Summary:	Crash after "Allocating PCI resources" -- used to work in 2.6.21-rc2
Product:	Other	Reporter:	Vladimir Brik (no.hope)
Component:	Other	Assignee:	other_other
Status:	CLOSED CODE_FIX
Severity:	normal	CC:	andi-bz, bunk
Priority:	P2
Hardware:	i386
OS:	Linux
Kernel Version:	2.6.21-rc3	Subsystem:
Regression:	---	Bisected commit-id:
Attachments:	2.6.21-rc2 dmesg disas 0xc010676a (original build)

Description Vladimir Brik 2007-03-09 11:42:25 UTC

Most recent kernel where this bug did *NOT* occur: 2.6.21-rc2
Distribution: Gentoo
Hardware Environment: Soekris net4521 embedded board, i486 AMD Elan, 64MB RAM
Software Environment: Diskless boot using NFS
Problem Description: Apparent kernel crash on boot

2.6.21-rc3 crashes on boot:

  Booting 'Diskless Gentoo'

root (nd)
 Filesystem type is tftp, using whole disk
kernel /skr/boot/bzImage ip=dhcp root=/dev/nfs nfsroot=172.16.0.1:/diskless/skr
 console=ttyS0,19200n81 earlyprintk=ttyS0,19200n81
   [Linux-bzImage, setup=0x1400, size=0xf6e70]

Linux version 2.6.21-rc3 (root@aces) (gcc version 4.1.1 (Gentoo 4.1.1)) #1 Fri
Mar 9 19:13:30 Local time zone must be set--see zic manua
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start: 0000000000000000 size: 000000000009fc00 end:
000000000009fc00 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 000000000009fc00 size: 0000000000000400 end:
00000000000a0000 type: 2
copy_e820_map() start: 00000000000f0000 size: 0000000000010000 end:
0000000000100000 type: 2
copy_e820_map() start: 0000000000100000 size: 0000000003f00000 end:
0000000004000000 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 00000000fff00000 size: 0000000000100000 end:
0000000100000000 type: 2
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000004000000 (usable)
 BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
64MB LOWMEM available.
Zone PFN ranges:
  DMA             0 ->     4096
  Normal       4096 ->    16384
early_node_map[1] active PFN ranges
    0:        0 ->    16384
DMI not present or invalid.
Allocating PCI resources starting at 10000000 (gap: 04000000:fbf00000)
Int 6: CR2 00000000  err 00000000  EIP c010676a  CS c02e0060  flags 00010046
Stack: 00000000 c02a93a0 c02cdbd0 ffffffff c02daea9 c02dae40 c02c8865 c024a000


lspci:
00:00.0 Host bridge: Advanced Micro Devices [AMD] ELanSC520 Microcontroller
00:11.0 CardBus bridge: Texas Instruments PCI1420
00:11.1 CardBus bridge: Texas Instruments PCI1420
00:12.0 Ethernet controller: National Semiconductor Corporation DP83815
(MacPhyter) Ethernet Controller
00:13.0 Ethernet controller: National Semiconductor Corporation DP83815
(MacPhyter) Ethernet Controller


The same .config works fine with 2.6.21-rc2.

Comment 1 Anonymous Emailer 2007-03-09 22:07:23 UTC

Reply-To: akpm@linux-foundation.org

> On Fri, 9 Mar 2007 11:42:32 -0800 bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=8161
> 
>            Summary: Crash after "Allocating PCI resources" -- used to work
>                     in 2.6.21-rc2
>     Kernel Version: 2.6.21-rc3
>             Status: NEW
>           Severity: normal
>              Owner: other_other@kernel-bugs.osdl.org
>          Submitter: no.hope@gmail.com
> 
> 
> Most recent kernel where this bug did *NOT* occur: 2.6.21-rc2
> Distribution: Gentoo
> Hardware Environment: Soekris net4521 embedded board, i486 AMD Elan, 64MB RAM
> Software Environment: Diskless boot using NFS
> Problem Description: Apparent kernel crash on boot
> 
> 2.6.21-rc3 crashes on boot:
> 
>   Booting 'Diskless Gentoo'
> 
> root (nd)
>  Filesystem type is tftp, using whole disk
> kernel /skr/boot/bzImage ip=dhcp root=/dev/nfs nfsroot=172.16.0.1:/diskless/skr
>  console=ttyS0,19200n81 earlyprintk=ttyS0,19200n81
>    [Linux-bzImage, setup=0x1400, size=0xf6e70]
> 
> Linux version 2.6.21-rc3 (root@aces) (gcc version 4.1.1 (Gentoo 4.1.1)) #1 Fri
> Mar 9 19:13:30 Local time zone must be set--see zic manua
> BIOS-provided physical RAM map:
> sanitize start
> sanitize end
> copy_e820_map() start: 0000000000000000 size: 000000000009fc00 end:
> 000000000009fc00 type: 1
> copy_e820_map() type is E820_RAM
> copy_e820_map() start: 000000000009fc00 size: 0000000000000400 end:
> 00000000000a0000 type: 2
> copy_e820_map() start: 00000000000f0000 size: 0000000000010000 end:
> 0000000000100000 type: 2
> copy_e820_map() start: 0000000000100000 size: 0000000003f00000 end:
> 0000000004000000 type: 1
> copy_e820_map() type is E820_RAM
> copy_e820_map() start: 00000000fff00000 size: 0000000000100000 end:
> 0000000100000000 type: 2
>  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
>  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
>  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
>  BIOS-e820: 0000000000100000 - 0000000004000000 (usable)
>  BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
> 64MB LOWMEM available.
> Zone PFN ranges:
>   DMA             0 ->     4096
>   Normal       4096 ->    16384
> early_node_map[1] active PFN ranges
>     0:        0 ->    16384
> DMI not present or invalid.
> Allocating PCI resources starting at 10000000 (gap: 04000000:fbf00000)
> Int 6: CR2 00000000  err 00000000  EIP c010676a  CS c02e0060  flags 00010046
> Stack: 00000000 c02a93a0 c02cdbd0 ffffffff c02daea9 c02dae40 c02c8865 c024a000
> 
> 
> lspci:
> 00:00.0 Host bridge: Advanced Micro Devices [AMD] ELanSC520 Microcontroller
> 00:11.0 CardBus bridge: Texas Instruments PCI1420
> 00:11.1 CardBus bridge: Texas Instruments PCI1420
> 00:12.0 Ethernet controller: National Semiconductor Corporation DP83815
> (MacPhyter) Ethernet Controller
> 00:13.0 Ethernet controller: National Semiconductor Corporation DP83815
> (MacPhyter) Ethernet Controller
> 
> 
> The same .config works fine with 2.6.21-rc2.

int6?  How weird.  Maybe your hardware has always been trying to do that,
but we made some change which enables interrupts too early, and/or we're
doing something in a different order.


Can you please send the 2.6.21-rc2 dmesg?

Adrian, another regression to track please.

Comment 2 Anonymous Emailer 2007-03-10 04:07:04 UTC

Reply-To: ak@muc.de

> > Allocating PCI resources starting at 10000000 (gap: 04000000:fbf00000)
> > Int 6: CR2 00000000  err 00000000  EIP c010676a  CS c02e0060  flags 00010046
> > Stack: 00000000 c02a93a0 c02cdbd0 ffffffff c02daea9 c02dae40 c02c8865 c024a000
> > 
> > 
> > lspci:
> > 00:00.0 Host bridge: Advanced Micro Devices [AMD] ELanSC520 Microcontroller
> > 00:11.0 CardBus bridge: Texas Instruments PCI1420
> > 00:11.1 CardBus bridge: Texas Instruments PCI1420
> > 00:12.0 Ethernet controller: National Semiconductor Corporation DP83815
> > (MacPhyter) Ethernet Controller
> > 00:13.0 Ethernet controller: National Semiconductor Corporation DP83815
> > (MacPhyter) Ethernet Controller
> > 
> > 
> > The same .config works fine with 2.6.21-rc2.
> 
> int6?  How weird.  Maybe your hardware has always been trying to do that,
> but we made some change which enables interrupts too early, and/or we're
> doing something in a different order.

External hardware cannot cause such a low exception on x86. 
Int 6 is invalid op. Probably a BUG() or a stray function pointer of some sort.

gdb vmlinux
disas 0xc010676a

would be good.

-Andi

Comment 3 Vladimir Brik 2007-03-10 10:08:09 UTC

Created attachment 10682 [details]
2.6.21-rc2 dmesg

dmesg from working 2.6.21-rc2

Comment 4 Vladimir Brik 2007-03-10 10:13:50 UTC

Created attachment 10683 [details]
disas 0xc010676a (original build)

disas 0xc010676a of vmlinux in original post

Comment 5 Vladimir Brik 2007-03-10 10:18:59 UTC

I then enabled DEBUG_KERNEL, CONFIG_FRAME_POINTER, CONFIG_DEBUG_INFO and
recompiled. EIP changed and gdb disas gave me this:
0xc0107787 <sched_clock+0>:     push   %ebp
0xc0107788 <sched_clock+1>:     mov    %esp,%ebp
0xc010778a <sched_clock+3>:     push   %esi
0xc010778b <sched_clock+4>:     push   %ebx
0xc010778c <sched_clock+5>:     cmpl   $0x0,0xc02fbca8
0xc0107793 <sched_clock+12>:    je     0xc01077ba <sched_clock+51>
0xc0107795 <sched_clock+14>:    imul   $0x989680,0xc02b9804,%ecx
0xc010779f <sched_clock+24>:    mov    $0x989680,%eax
0xc01077a4 <sched_clock+29>:    mull   0xc02b9800
0xc01077aa <sched_clock+35>:    lea    (%ecx,%edx,1),%edx
0xc01077ad <sched_clock+38>:    add    $0xd964b800,%eax
0xc01077b2 <sched_clock+43>:    adc    $0xff6769c5,%edx
0xc01077b8 <sched_clock+49>:    jmp    0xc01077d3 <sched_clock+76>
0xc01077ba <sched_clock+51>:    rdtsc
0xc01077bc <sched_clock+53>:    mov    0xc02d4334,%ecx
0xc01077c2 <sched_clock+59>:    mov    %edx,%esi
0xc01077c4 <sched_clock+61>:    imul   %ecx,%esi
0xc01077c7 <sched_clock+64>:    mul    %ecx
0xc01077c9 <sched_clock+66>:    lea    (%esi,%edx,1),%edx
0xc01077cc <sched_clock+69>:    shrd   $0xa,%edx,%eax
0xc01077d0 <sched_clock+73>:    shr    $0xa,%edx
0xc01077d3 <sched_clock+76>:    pop    %ebx
0xc01077d4 <sched_clock+77>:    pop    %esi
0xc01077d5 <sched_clock+78>:    pop    %ebp
0xc01077d6 <sched_clock+79>:    ret

Then I disabled DEBUG_KERNEL and got this:
Allocating PCI resources starting at 10000000 (gap: 04000000:fbf00000)
Int 6: CR2 00000000  err 00000000  EIP c01075a2  CS c02f0060  flags 00010046
Stack: 00000000 c02ae3a0 c02da61d ffffffff c02e7aa9 c02e7a40 c02d286a c024d000

(gdb) disassemble 0xc01075a2
No function contains specified address.

Comment 6 Andi Kleen 2007-03-13 08:12:45 UTC

I tried to use RDTSC on your CPU which it probably doesn't support
This should be already fixed in my patch queue

Cannot close the bug because I don't own it.

Comment 7 Diego Calleja 2007-03-13 08:34:29 UTC

Andi, you should be able to "accept" it and then close it.

Comment 8 Vladimir Brik 2007-03-13 08:37:13 UTC

RDTSC is indeed unsupported by Elan

Comment 9 Adrian Bunk 2007-03-18 13:22:50 UTC

Fixed in Linus' tree (and will therefore be fixed in 2.6.21-rc5).