Booting 64bit 2.6.31-rc6 on a hp xw4600 results in an infinite loop of DMAR:[DMA READ] Request device [ff:1f.7] fault addr fffffffffffff000 DMAR:[fault reason 255] Unknown even with iommu=soft (2.6.30 could boot with iommu=soft, but would result in the same error with hardware iommu enabled). 2.6.30 reports this about the DMAR ACPI tables: ACPI: DMAR 00000000defc247f 00158 (v01 COMPAQ BEARLX38 00000001 00000000) DMAR:Host address width 36 DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000 DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000 DMAR:RMRR base: 0x00000000defd0000 end: 0x00000000defd0fff DMAR:RMRR base: 0x00000000defd1000 end: 0x00000000defd1fff DMAR:RMRR base: 0x00000000defd2000 end: 0x00000000defd2fff DMAR:RMRR base: 0x00000000defd3000 end: 0x00000000defd3fff DMAR:RMRR base: 0x00000000defd4000 end: 0x00000000defd4fff DMAR:RMRR base: 0x00000000defd5000 end: 0x00000000defd5fff DMAR:RMRR base: 0x00000000defd6000 end: 0x00000000defd6fff DMAR:RMRR base: 0x00000000defd7000 end: 0x00000000defd7fff Not all IO-APIC's listed under remapping hardware Device ff:1f.7 doesn't actually exist, lspci -n says 00:00.0 0600: 8086:29e0 00:01.0 0604: 8086:29e1 00:1a.0 0c03: 8086:2937 (rev 02) 00:1a.1 0c03: 8086:2938 (rev 02) 00:1a.2 0c03: 8086:2939 (rev 02) 00:1a.7 0c03: 8086:293c (rev 02) 00:1b.0 0403: 8086:293e (rev 02) 00:1c.0 0604: 8086:2940 (rev 02) 00:1c.4 0604: 8086:2948 (rev 02) 00:1c.5 0604: 8086:294a (rev 02) 00:1d.0 0c03: 8086:2934 (rev 02) 00:1d.1 0c03: 8086:2935 (rev 02) 00:1d.2 0c03: 8086:2936 (rev 02) 00:1d.7 0c03: 8086:293a (rev 02) 00:1e.0 0604: 8086:244e (rev 92) 00:1f.0 0601: 8086:2916 (rev 02) 00:1f.2 0106: 8086:2922 (rev 02) 01:00.0 0300: 1002:7187 01:00.1 0380: 1002:71a7 10:0b.0 0c00: 11c1:5811 (rev 61) 3f:00.0 0200: 14e4:167b (rev 02)
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 18 Aug 2009 14:54:22 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=14003 > > Summary: Infinite loop on bootup while handling DMAR That's a box-killing post-2.6.30 regression. > Product: Drivers > Version: 2.5 > Kernel Version: 2.6.31-rc6 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: PCI > AssignedTo: drivers_pci@kernel-bugs.osdl.org > ReportedBy: bero@arklinux.org > Regression: Yes > > > Booting 64bit 2.6.31-rc6 on a hp xw4600 results in an infinite loop of > > DMAR:[DMA READ] Request device [ff:1f.7] fault addr fffffffffffff000 > DMAR:[fault reason 255] Unknown > > even with iommu=soft (2.6.30 could boot with iommu=soft, but would result in > the same error with hardware iommu enabled). > > 2.6.30 reports this about the DMAR ACPI tables: > > ACPI: DMAR 00000000defc247f 00158 (v01 COMPAQ BEARLX38 00000001 > 00000000) > DMAR:Host address width 36 > DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000 > DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000 > DMAR:RMRR base: 0x00000000defd0000 end: 0x00000000defd0fff > DMAR:RMRR base: 0x00000000defd1000 end: 0x00000000defd1fff > DMAR:RMRR base: 0x00000000defd2000 end: 0x00000000defd2fff > DMAR:RMRR base: 0x00000000defd3000 end: 0x00000000defd3fff > DMAR:RMRR base: 0x00000000defd4000 end: 0x00000000defd4fff > DMAR:RMRR base: 0x00000000defd5000 end: 0x00000000defd5fff > DMAR:RMRR base: 0x00000000defd6000 end: 0x00000000defd6fff > DMAR:RMRR base: 0x00000000defd7000 end: 0x00000000defd7fff > Not all IO-APIC's listed under remapping hardware > > Device ff:1f.7 doesn't actually exist, lspci -n says > > 00:00.0 0600: 8086:29e0 > 00:01.0 0604: 8086:29e1 > 00:1a.0 0c03: 8086:2937 (rev 02) > 00:1a.1 0c03: 8086:2938 (rev 02) > 00:1a.2 0c03: 8086:2939 (rev 02) > 00:1a.7 0c03: 8086:293c (rev 02) > 00:1b.0 0403: 8086:293e (rev 02) > 00:1c.0 0604: 8086:2940 (rev 02) > 00:1c.4 0604: 8086:2948 (rev 02) > 00:1c.5 0604: 8086:294a (rev 02) > 00:1d.0 0c03: 8086:2934 (rev 02) > 00:1d.1 0c03: 8086:2935 (rev 02) > 00:1d.2 0c03: 8086:2936 (rev 02) > 00:1d.7 0c03: 8086:293a (rev 02) > 00:1e.0 0604: 8086:244e (rev 92) > 00:1f.0 0601: 8086:2916 (rev 02) > 00:1f.2 0106: 8086:2922 (rev 02) > 01:00.0 0300: 1002:7187 > 01:00.1 0380: 1002:71a7 > 10:0b.0 0c00: 11c1:5811 (rev 61) > 3f:00.0 0200: 14e4:167b (rev 02) >
Looking at the log from 2.6.30, it seems that the iommu wasn't being used. I don't think this is a regression -- it's just another BIOS written by incompetents and shipped without any QA. There should be a workaround for this in http://git.infradead.org/iommu-2.6.git/commit/0815565a but it can't _work_ -- it'll just disable the IOMMU after bitching a bit.
On Wed, 2009-08-19 at 14:26 -0700, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 18 Aug 2009 14:54:22 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=14003 > > > > Summary: Infinite loop on bootup while handling DMAR > > That's a box-killing post-2.6.30 regression. > > > > > Booting 64bit 2.6.31-rc6 on a hp xw4600 results in an infinite loop of > > > > DMAR:[DMA READ] Request device [ff:1f.7] fault addr fffffffffffff000 > > DMAR:[fault reason 255] Unknown I think David has a fix queued up for this already. Please check http://git.infradead.org/iommu-2.6.git/commit/0815565a thanks, suresh
On Wed, 2009-08-19 at 14:26 -0700, Andrew Morton wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=14003 > > Summary: Infinite loop on bootup while handling DMAR > > That's a box-killing post-2.6.30 regression. It's a BIOS bug -- the user's BIOS is written by idiots who obviously shipped it without any QA whatsoever. As far as I can tell, 2.6.30 aborted early because of a _different_ BIOS bug, but now we cope with that particular bug and we fall over the next bug. Or just come across them in a different order. The IOMMU on this board can _never_ have worked. Just disable it. Or use a board with open source firmware available, and this kind of crap won't happen. (At least if it does, you'll be able to fix it).
Confirmed that the patch fixes it (as far as "fixing" it by turning off the iommu goes). BIOS developers should be forced to use real operating systems...
Reply-To: paravoid@debian.org David Woodhouse wrote: > On Wed, 2009-08-19 at 14:26 -0700, Andrew Morton wrote: >>> http://bugzilla.kernel.org/show_bug.cgi?id=14003 >>> Summary: Infinite loop on bootup while handling DMAR >> That's a box-killing post-2.6.30 regression. > > It's a BIOS bug -- the user's BIOS is written by idiots who obviously > shipped it without any QA whatsoever. Matt may be wondering why he was addressed, since the bugzilla entry mentions only a bug in HP's BIOS. I'm experiencing the same bug on a newly bought Dell Optiplex 760 with BIOS version A03, as explained in my mail in lkml, <4A89CB52.4030008@debian.org>, subject "[regression, bisected] fails to boot on Dell Optiplex 760 with VT-d enabled". Thanks, Faidon
No, using a real operating system isn't relevant. This would break VT-d under Windows too -- I don't think there's anything the OS can do other than just ignore the broken DMAR tables and run without the IOMMU. So the offending BIOS teams obviously didn't bother to test this AT ALL.
Closing, since this is not a kernel bug.