Bug 5758 - PANIC: early exception rip 10 error ffffffff805e1e9d cr2 0 on boot with (multi) dual core Opteron
Summary: PANIC: early exception rip 10 error ffffffff805e1e9d cr2 0 on boot with (mult...
Status: CLOSED CODE_FIX
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Andi Kleen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-19 03:02 UTC by Mark Williamson
Modified: 2005-12-20 11:08 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.15-rc6
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Config file for the referenced kernel (24.29 KB, application/octet-stream)
2005-12-19 03:03 UTC, Mark Williamson
Details
System map file for the referenced kernel (922.32 KB, text/plain)
2005-12-19 03:06 UTC, Mark Williamson
Details
Check for bad SRATs not covering all memory. (1.83 KB, patch)
2005-12-20 05:52 UTC, Andi Kleen
Details | Diff

Description Mark Williamson 2005-12-19 03:02:04 UTC
Most recent kernel where this bug did not occur:
2.6.14

Distribution:
RHEL AS4/U2

Hardware Environment:
8CPU Dual Core Opteron; http://www.iwill.net/product_2.asp?p_id=90&sp=Y

Software Environment:
RHEL AS4/U2, but a vanilla 2.6.15-rc6 kernel from kernel.org

Problem Description:

Captured via serial with earlyprintk:

Bootdata ok (command line is ro root=LABEL=/ selinux=0 earlyprintk=ttyS0,57600
console=tty0)
Linux version 2.6.15-rc6 (root@f02) (gcc version 3.4.4 20050721 (Red Hat
3.4.4-2)) #2 SMP Mon Dec 19 10:44:53 GMT 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000c0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000bfff0000 (usable)
 BIOS-e820: 00000000bfff0000 - 00000000bffff000 (ACPI data)
 BIOS-e820: 00000000bffff000 - 00000000c0000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000840000000 (usable)
kernel direct mapping tables upto ffff810840000000 @ 8000-2a000
SRAT: PXM 0 -> APIC 16 -> Node 0
SRAT: PXM 0 -> APIC 17 -> Node 0
SRAT: PXM 1 -> APIC 18 -> Node 1
SRAT: PXM 1 -> APIC 19 -> Node 1
SRAT: PXM 2 -> APIC 20 -> Node 2
SRAT: PXM 2 -> APIC 21 -> Node 2
SRAT: PXM 3 -> APIC 22 -> Node 3
SRAT: PXM 3 -> APIC 23 -> Node 3
SRAT: PXM 4 -> APIC 24 -> Node 4
SRAT: PXM 4 -> APIC 25 -> Node 4
SRAT: PXM 5 -> APIC 26 -> Node 5
SRAT: PXM 5 -> APIC 27 -> Node 5
SRAT: PXM 6 -> APIC 28 -> Node 6
SRAT: PXM 6 -> APIC 29 -> Node 6
SRAT: PXM 7 -> APIC 30 -> Node 7
SRAT: PXM 7 -> APIC 31 -> Node 7
SRAT: Node 0 PXM 0 0-0
SRAT: Node 1 PXM 1 0-0
SRAT: Node 2 PXM 2 0-0
SRAT: Node 3 PXM 3 0-0
SRAT: Node 4 PXM 4 0-0
SRAT: Node 5 PXM 5 0-0
SRAT: Node 6 PXM 6 0-0
SRAT: Node 7 PXM 7 0-0
SRAT: Node 0 PXM 0 0-9fc00
Bootmem setup node 0 0000000000000000-000000000009fc00
PANIC: early exception rip 10 error ffffffff805e1e9d cr2 0
PANIC: early exception rip ffffffff8011910c error 0 cr2 ffffffffff5fd023



teps to reproduce:
Boot kernel :)
Comment 1 Mark Williamson 2005-12-19 03:03:58 UTC
Created attachment 6856 [details]
Config file for the referenced kernel
Comment 2 Mark Williamson 2005-12-19 03:06:50 UTC
Created attachment 6857 [details]
System map file for the referenced kernel
Comment 3 Andi Kleen 2005-12-19 05:06:36 UTC
Your SRAT table is broken. It reports all nodes as having zero length. Update
the BIOS. But it should work with numa=noacpi, right?

I will add a test for that particular breakage.
Comment 4 Mark Williamson 2005-12-19 05:37:27 UTC
Just tried it with numa=noacpi, and yes you are correct, it does work. Thanks
for your guidance on this.


For reference, the BIOS version is 

H8501180 06/28/2005 

It is the latest one here (referred to as "Version: V180"):

http://www.iwill.net/product_2s.asp?p_id=90&tp=BIOS

as of time of bug posting.


N.B even though the date on the website says 2005/7/5 for V180, string dumping
the rom:

http://www.iwill.net/product_imgs/90/H8501V180.zip

[root@f02 tmp]# strings  H8501180.ROM | grep  "/05"
06/28/05

gives the date seen in the bootup screen, hence the date on the website is
misleading/incorrect.
Comment 5 Andi Kleen 2005-12-20 05:52:33 UTC
Created attachment 6864 [details]
Check for bad SRATs not covering all memory.

With this patch it should work without pci=noacpi. It checks if the SRAT
covers all memory and rejects it if not.

Regarding BIOS - yes I will is not very good with them. We actually had
a long conversation with them about this, but it went nowhere in the end.
Comment 6 Andi Kleen 2005-12-20 05:55:03 UTC
Mark, FYI. Looks like the official Iwill 8 way BIOS is still broken,
but now in a different way than the older ones.


 
Comment 7 Mark Williamson 2005-12-20 07:51:54 UTC
Patch applies and works. Thanks again:


...SNIP....
SRAT: PXM 7 -> APIC 31 -> Node 7
SRAT: Node 0 PXM 0 0-0
SRAT: Node 1 PXM 1 0-0
SRAT: Node 2 PXM 2 0-0
SRAT: Node 3 PXM 3 0-0
SRAT: Node 4 PXM 4 0-0
SRAT: Node 5 PXM 5 0-0
SRAT: Node 6 PXM 6 0-0
SRAT: Node 7 PXM 7 0-0
SRAT: Node 0 PXM 0 0-9fc00
SRAT: PXMs only cover 0MB of your 32767MB e820 RAM. Not used.
SRAT: SRAT not used.
Scanning NUMA topology in Northbridge 24
Number of nodes 8
Node 0 MemBase 0000000000000000 Limit 0000000140000000
Node 1 MemBase 0000000140000000 Limit 0000000240000000
Node 2 MemBase 0000000240000000 Limit 0000000340000000
Node 3 MemBase 0000000340000000 Limit 0000000440000000
Node 4 MemBase 0000000440000000 Limit 0000000540000000
Node 5 MemBase 0000000540000000 Limit 0000000640000000
Node 6 MemBase 0000000640000000 Limit 0000000740000000
Node 7 MemBase 0000000740000000 Limit 0000000840000000
Using 30 for the hash shift.
Using node hash shift of 30
Bootmem setup node 0 0000000000000000-0000000140000000
Bootmem setup node 1 0000000140000000-0000000240000000
Bootmem setup node 2 0000000240000000-0000000340000000
Bootmem setup node 3 0000000340000000-0000000440000000
Bootmem setup node 4 0000000440000000-0000000540000000
Bootmem setup node 5 0000000540000000-0000000640000000
Bootmem setup node 6 0000000640000000-0000000740000000
Bootmem setup node 7 0000000740000000-0000000840000000
...SNIP....
Comment 8 Mark Langsdorf 2005-12-20 11:08:03 UTC
I'll bring it up with the relevant teams.

Note You need to log in before you can comment on or make changes to this bug.