Bug 12940 - CONFIG_DMAR/CONFIG_DMAR_DEFAULT_ON makes the kernel unbootable
Summary: CONFIG_DMAR/CONFIG_DMAR_DEFAULT_ON makes the kernel unbootable
Status: CLOSED OBSOLETE
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: David Woodhouse
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-25 18:37 UTC by Ed Martin
Modified: 2012-05-30 14:57 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.29
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Problem .config (70.15 KB, text/plain)
2009-03-25 18:42 UTC, Ed Martin
Details
The boot messages with a normal boot (22.17 KB, application/octet-stream)
2009-07-11 14:32 UTC, Ed Martin
Details
The same thing, but with iommu=pt added to the boot line (25.80 KB, application/octet-stream)
2009-07-11 14:35 UTC, Ed Martin
Details
I changed the fixup to trigger on 0x4003 and got this log (22.28 KB, application/octet-stream)
2009-07-11 17:15 UTC, Ed Martin
Details
I removed the ATI and disabled the onboard VGA in the BIOS (38.88 KB, application/octet-stream)
2009-07-11 18:20 UTC, Ed Martin
Details

Description Ed Martin 2009-03-25 18:37:48 UTC
When i enable the options CONFIG_DMAR/CONFIG_DMAR_DEFAULT_ON my kernel fails to boot. It just hangs at:

hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 comparators, 64-bit 14.318180 MHz counter

right before that i see stuff saying that the IOMMU was remapping stuff. When i encountered this i recompiled the kernel, disabled only CONFIG_DMAR/CONFIG_DMAR_DEFAULT_ON and my kernel worked fine.


My hardware:
Tyan i5400PW/ 2x E5410 CPUs (latest BIOS)
8GB RAM
ATI radeonhd 3870
3ware 9650se
Comment 1 Ed Martin 2009-03-25 18:42:39 UTC
Created attachment 20678 [details]
Problem .config
Comment 2 David Woodhouse 2009-07-05 00:01:56 UTC
Can you make sure legacy keyboard/mouse emulation is turned off in the BIOS, so it isn't trying to do anything with USB?

Often, this type of bug happens because your BIOS is written by idiots, doesn't bother to tell us which bits of memory it needs to DMA to, then locks up in SMM mode if the DMA gets stopped (which it does, when we turn on the IOMMU).

Did you see any messages about DMA faults? They might look something like this:

DRHD: handling fault status reg 2
DMAR:[DMA Read] Request device [00:1a.1] fault addr ec000 
DMAR:[fault reason 06] PTE Read access is not set

Please could you also test 2.6.31-rc2 and let me know if it's still happening?

Thanks.
Comment 3 Ed Martin 2009-07-07 08:44:48 UTC
I tried 2.6.31-rc2 and i had the same problem. I searched my BIOS and could not find anything about keyboard/mouse emulation.

So here is everything i could get, hopefully this helps. The error happens early in boot, i have to turn off my frame buffer to see the error at all. And i can only see one screen full of text/errors because the usb turns off and i can't use a keyboard. Here is what i could see

DMAR: base address with 38
DMAR: DRHD base <something>
DMAR: DRHD base <something>
DMAR: DRHD base <something>
DMAR: RMRR base <something>
DMAR: NO ATSR
setting identity map
PCI-DMA Virtulization
multi-level page table translation for DMAR

and that is all i see (with a few of the IOMMU remapping things as well). After that the system just stops, no panic message is displayed on the screen. I did not see anything with those faults (that text goes really fast, and i can't scroll and can't tell)
Comment 4 David Woodhouse 2009-07-08 07:42:03 UTC
Hm, confused. Can you try with 'iommu=pt'? Can you also #define DEBUG at the top of drivers/pci/intel-iommu.c and add 'debug' to your kernel command line too?

I don't suppose you have any way of hooking up a serial console to capture the messages reliably? Or a digital camera...
Comment 5 Ed Martin 2009-07-09 06:57:38 UTC
Well i tried with iommu=pt, and i added the debug stuff as well, and I honestly don't see anything different, just the IOMMU remapping stuff and then stopping with the hpet0 stuff. The debug did not change anything, but i kinda just think that is because i can't see it it since the screen can't hold much text. So i bought a serial adapter so i can get a serial console up to my other USB-only computer, when i get that i will post the full log (by monday i think).
Comment 6 Ed Martin 2009-07-11 14:32:17 UTC
Created attachment 22309 [details]
The boot messages with a normal boot
Comment 7 Ed Martin 2009-07-11 14:35:27 UTC
Created attachment 22310 [details]
The same thing, but with iommu=pt added to the boot line
Comment 8 Ed Martin 2009-07-11 14:37:24 UTC
That is what i get with the serial stuff, i don't see much but an early backtrace that looks to me like it thinks something is slow
Comment 9 David Woodhouse 2009-07-11 16:53:59 UTC
Can you show me the output of 'lspci -vtnn'? And just for fun, can you edit the quirk at the bottom of drivers/pci/intel-iommu.c (the very last line) so that it triggers on your hardware?
Comment 10 Ed Martin 2009-07-11 16:58:00 UTC
this is the lspci output


-[0000:00]-+-00.0  Intel Corporation 5400 Chipset Memory Controller Hub [8086:4003]
           +-01.0-[0000:01]----00.0  3ware Inc 9650SE SATA-II RAID [13c1:1004]
           +-05.0-[0000:05]--+-00.0  ATI Technologies Inc Radeon HD 3870 [1002:9501]
           |                 \-00.1  ATI Technologies Inc Radeon HD 3870 Audio device [1002:aa18]
           +-09.0-[0000:09-10]--+-00.0-[0000:0a-0f]--+-00.0-[0000:0b]--
           |                    |                    \-02.0-[0000:0f]--+-00.0  Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) [8086:1096]
           |                    |                                      \-00.1  Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) [8086:1096]
           |                    \-00.3-[0000:10]----09.0  Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter [168c:0013]
           +-10.0  Intel Corporation 5400 Chipset FSB Registers [8086:4030]
           +-10.1  Intel Corporation 5400 Chipset FSB Registers [8086:4030]
           +-10.2  Intel Corporation 5400 Chipset FSB Registers [8086:4030]
           +-10.3  Intel Corporation 5400 Chipset FSB Registers [8086:4030]
           +-10.4  Intel Corporation 5400 Chipset FSB Registers [8086:4030]
           +-11.0  Intel Corporation 5400 Chipset CE/SF Registers [8086:4031]
           +-15.0  Intel Corporation 5400 Chipset FBD Registers [8086:4035]
           +-15.1  Intel Corporation 5400 Chipset FBD Registers [8086:4035]
           +-16.0  Intel Corporation 5400 Chipset FBD Registers [8086:4036]
           +-16.1  Intel Corporation 5400 Chipset FBD Registers [8086:4036]
           +-1b.0  Intel Corporation 631xESB/632xESB High Definition Audio Controller [8086:269a]
           +-1c.0-[0000:1f]--
           +-1d.0  Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 [8086:2688]
           +-1d.1  Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 [8086:2689]
           +-1d.2  Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 [8086:268a]
           +-1d.3  Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #4 [8086:268b]
           +-1d.7  Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller [8086:268c]
           +-1e.0-[0000:20]--+-04.0  Internext Compression Inc iTVC16 (CX23416) MPEG-2 Encoder [4444:0016]
           |                 +-05.0  XGI Technology Inc. (eXtreme Graphics Innovation) Volari Z7/Z9/Z9s [18ca:0020]
           |                 \-06.0  VIA Technologies, Inc. VT6306 Fire II IEEE 1394 OHCI Link Layer Controller [1106:3044]
           +-1f.0  Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller [8086:2670]
           +-1f.1  Intel Corporation 631xESB/632xESB IDE Controller [8086:269e]
           +-1f.2  Intel Corporation 631xESB/632xESB SATA AHCI Controller [8086:2681]
           \-1f.3  Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller [8086:269b]
Comment 11 Ed Martin 2009-07-11 17:15:24 UTC
Created attachment 22311 [details]
I changed the fixup to trigger on 0x4003 and got this log
Comment 12 David Woodhouse 2009-07-11 17:34:37 UTC
Not that then; it was a long shot but I wanted to eliminate it.

I'm guessing that the 'P' or 'PC' that you see at the end of the log is the first couple of characters of its attempt to print "PCI-DMA: Intel(R) Virtualization Technology for Directed I/O", which it does immediately after turning the IOMMU on?

I strongly suspect that the BIOS is crashing in SMM mode. Can you boot with 'nmi_watchdog=lapic', and also can you try removing the graphics card(s)?
Comment 13 Ed Martin 2009-07-11 18:20:48 UTC
Created attachment 22313 [details]
I removed the ATI and disabled the onboard VGA in the BIOS

Alright, i tried with the video stuff disabled, still stopped at the same spot, i removed the ATI card, but i still have an onboard that can't be removed, i disabled it in the BIOS, but i don't know what that did

Note You need to log in before you can comment on or make changes to this bug.