Most recent kernel where this bug did not occur: 2.6.14 Distribution: RHEL AS4/U2 Hardware Environment: 8CPU Dual Core Opteron; http://www.iwill.net/product_2.asp?p_id=90&sp=Y Software Environment: RHEL AS4/U2, but a vanilla 2.6.15-rc6 kernel from kernel.org Problem Description: Captured via serial with earlyprintk: Bootdata ok (command line is ro root=LABEL=/ selinux=0 earlyprintk=ttyS0,57600 console=tty0) Linux version 2.6.15-rc6 (root@f02) (gcc version 3.4.4 20050721 (Red Hat 3.4.4-2)) #2 SMP Mon Dec 19 10:44:53 GMT 2005 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000c0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000bfff0000 (usable) BIOS-e820: 00000000bfff0000 - 00000000bffff000 (ACPI data) BIOS-e820: 00000000bffff000 - 00000000c0000000 (ACPI NVS) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 0000000840000000 (usable) kernel direct mapping tables upto ffff810840000000 @ 8000-2a000 SRAT: PXM 0 -> APIC 16 -> Node 0 SRAT: PXM 0 -> APIC 17 -> Node 0 SRAT: PXM 1 -> APIC 18 -> Node 1 SRAT: PXM 1 -> APIC 19 -> Node 1 SRAT: PXM 2 -> APIC 20 -> Node 2 SRAT: PXM 2 -> APIC 21 -> Node 2 SRAT: PXM 3 -> APIC 22 -> Node 3 SRAT: PXM 3 -> APIC 23 -> Node 3 SRAT: PXM 4 -> APIC 24 -> Node 4 SRAT: PXM 4 -> APIC 25 -> Node 4 SRAT: PXM 5 -> APIC 26 -> Node 5 SRAT: PXM 5 -> APIC 27 -> Node 5 SRAT: PXM 6 -> APIC 28 -> Node 6 SRAT: PXM 6 -> APIC 29 -> Node 6 SRAT: PXM 7 -> APIC 30 -> Node 7 SRAT: PXM 7 -> APIC 31 -> Node 7 SRAT: Node 0 PXM 0 0-0 SRAT: Node 1 PXM 1 0-0 SRAT: Node 2 PXM 2 0-0 SRAT: Node 3 PXM 3 0-0 SRAT: Node 4 PXM 4 0-0 SRAT: Node 5 PXM 5 0-0 SRAT: Node 6 PXM 6 0-0 SRAT: Node 7 PXM 7 0-0 SRAT: Node 0 PXM 0 0-9fc00 Bootmem setup node 0 0000000000000000-000000000009fc00 PANIC: early exception rip 10 error ffffffff805e1e9d cr2 0 PANIC: early exception rip ffffffff8011910c error 0 cr2 ffffffffff5fd023 teps to reproduce: Boot kernel :)
Created attachment 6856 [details] Config file for the referenced kernel
Created attachment 6857 [details] System map file for the referenced kernel
Your SRAT table is broken. It reports all nodes as having zero length. Update the BIOS. But it should work with numa=noacpi, right? I will add a test for that particular breakage.
Just tried it with numa=noacpi, and yes you are correct, it does work. Thanks for your guidance on this. For reference, the BIOS version is H8501180 06/28/2005 It is the latest one here (referred to as "Version: V180"): http://www.iwill.net/product_2s.asp?p_id=90&tp=BIOS as of time of bug posting. N.B even though the date on the website says 2005/7/5 for V180, string dumping the rom: http://www.iwill.net/product_imgs/90/H8501V180.zip [root@f02 tmp]# strings H8501180.ROM | grep "/05" 06/28/05 gives the date seen in the bootup screen, hence the date on the website is misleading/incorrect.
Created attachment 6864 [details] Check for bad SRATs not covering all memory. With this patch it should work without pci=noacpi. It checks if the SRAT covers all memory and rejects it if not. Regarding BIOS - yes I will is not very good with them. We actually had a long conversation with them about this, but it went nowhere in the end.
Mark, FYI. Looks like the official Iwill 8 way BIOS is still broken, but now in a different way than the older ones.
Patch applies and works. Thanks again: ...SNIP.... SRAT: PXM 7 -> APIC 31 -> Node 7 SRAT: Node 0 PXM 0 0-0 SRAT: Node 1 PXM 1 0-0 SRAT: Node 2 PXM 2 0-0 SRAT: Node 3 PXM 3 0-0 SRAT: Node 4 PXM 4 0-0 SRAT: Node 5 PXM 5 0-0 SRAT: Node 6 PXM 6 0-0 SRAT: Node 7 PXM 7 0-0 SRAT: Node 0 PXM 0 0-9fc00 SRAT: PXMs only cover 0MB of your 32767MB e820 RAM. Not used. SRAT: SRAT not used. Scanning NUMA topology in Northbridge 24 Number of nodes 8 Node 0 MemBase 0000000000000000 Limit 0000000140000000 Node 1 MemBase 0000000140000000 Limit 0000000240000000 Node 2 MemBase 0000000240000000 Limit 0000000340000000 Node 3 MemBase 0000000340000000 Limit 0000000440000000 Node 4 MemBase 0000000440000000 Limit 0000000540000000 Node 5 MemBase 0000000540000000 Limit 0000000640000000 Node 6 MemBase 0000000640000000 Limit 0000000740000000 Node 7 MemBase 0000000740000000 Limit 0000000840000000 Using 30 for the hash shift. Using node hash shift of 30 Bootmem setup node 0 0000000000000000-0000000140000000 Bootmem setup node 1 0000000140000000-0000000240000000 Bootmem setup node 2 0000000240000000-0000000340000000 Bootmem setup node 3 0000000340000000-0000000440000000 Bootmem setup node 4 0000000440000000-0000000540000000 Bootmem setup node 5 0000000540000000-0000000640000000 Bootmem setup node 6 0000000640000000-0000000740000000 Bootmem setup node 7 0000000740000000-0000000840000000 ...SNIP....
I'll bring it up with the relevant teams.