Bug 14003 - Infinite loop on bootup while handling DMAR
Infinite loop on bootup while handling DMAR
Status: CLOSED DOCUMENTED
Product: Drivers
Classification: Unclassified
Component: PCI
All Linux
: P1 normal
Assigned To: drivers_pci@kernel-bugs.osdl.org
:
Depends on:
Blocks: 13615
  Show dependency treegraph
 
Reported: 2009-08-18 14:54 UTC by Bernhard Rosenkraenzer
Modified: 2009-08-20 14:35 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.31-rc6
Tree: Mainline
Regression: Yes


Attachments

Description Bernhard Rosenkraenzer 2009-08-18 14:54:20 UTC
Booting 64bit 2.6.31-rc6 on a hp xw4600 results in an infinite loop of

DMAR:[DMA READ] Request device [ff:1f.7] fault addr fffffffffffff000
DMAR:[fault reason 255] Unknown

even with iommu=soft (2.6.30 could boot with iommu=soft, but would result in the same error with hardware iommu enabled).

2.6.30 reports this about the DMAR ACPI tables:

ACPI: DMAR 00000000defc247f 00158 (v01 COMPAQ BEARLX38 00000001      00000000)
DMAR:Host address width 36
DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
DMAR:RMRR base: 0x00000000defd0000 end: 0x00000000defd0fff
DMAR:RMRR base: 0x00000000defd1000 end: 0x00000000defd1fff
DMAR:RMRR base: 0x00000000defd2000 end: 0x00000000defd2fff
DMAR:RMRR base: 0x00000000defd3000 end: 0x00000000defd3fff
DMAR:RMRR base: 0x00000000defd4000 end: 0x00000000defd4fff
DMAR:RMRR base: 0x00000000defd5000 end: 0x00000000defd5fff
DMAR:RMRR base: 0x00000000defd6000 end: 0x00000000defd6fff
DMAR:RMRR base: 0x00000000defd7000 end: 0x00000000defd7fff
Not all IO-APIC's listed under remapping hardware

Device ff:1f.7 doesn't actually exist, lspci -n says

00:00.0 0600: 8086:29e0
00:01.0 0604: 8086:29e1
00:1a.0 0c03: 8086:2937 (rev 02)
00:1a.1 0c03: 8086:2938 (rev 02)
00:1a.2 0c03: 8086:2939 (rev 02)
00:1a.7 0c03: 8086:293c (rev 02)
00:1b.0 0403: 8086:293e (rev 02)
00:1c.0 0604: 8086:2940 (rev 02)
00:1c.4 0604: 8086:2948 (rev 02)
00:1c.5 0604: 8086:294a (rev 02)
00:1d.0 0c03: 8086:2934 (rev 02)
00:1d.1 0c03: 8086:2935 (rev 02)
00:1d.2 0c03: 8086:2936 (rev 02)
00:1d.7 0c03: 8086:293a (rev 02)
00:1e.0 0604: 8086:244e (rev 92)
00:1f.0 0601: 8086:2916 (rev 02)
00:1f.2 0106: 8086:2922 (rev 02)
01:00.0 0300: 1002:7187
01:00.1 0380: 1002:71a7
10:0b.0 0c00: 11c1:5811 (rev 61)
3f:00.0 0200: 14e4:167b (rev 02)
Comment 1 Andrew Morton 2009-08-19 21:26:42 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 18 Aug 2009 14:54:22 GMT bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=14003
> 
>            Summary: Infinite loop on bootup while handling DMAR

That's a box-killing post-2.6.30 regression.

>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.31-rc6
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: PCI
>         AssignedTo: drivers_pci@kernel-bugs.osdl.org
>         ReportedBy: bero@arklinux.org
>         Regression: Yes
> 
> 
> Booting 64bit 2.6.31-rc6 on a hp xw4600 results in an infinite loop of
> 
> DMAR:[DMA READ] Request device [ff:1f.7] fault addr fffffffffffff000
> DMAR:[fault reason 255] Unknown
> 
> even with iommu=soft (2.6.30 could boot with iommu=soft, but would result in
> the same error with hardware iommu enabled).
> 
> 2.6.30 reports this about the DMAR ACPI tables:
> 
> ACPI: DMAR 00000000defc247f 00158 (v01 COMPAQ BEARLX38 00000001      00000000)
> DMAR:Host address width 36
> DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
> DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
> DMAR:RMRR base: 0x00000000defd0000 end: 0x00000000defd0fff
> DMAR:RMRR base: 0x00000000defd1000 end: 0x00000000defd1fff
> DMAR:RMRR base: 0x00000000defd2000 end: 0x00000000defd2fff
> DMAR:RMRR base: 0x00000000defd3000 end: 0x00000000defd3fff
> DMAR:RMRR base: 0x00000000defd4000 end: 0x00000000defd4fff
> DMAR:RMRR base: 0x00000000defd5000 end: 0x00000000defd5fff
> DMAR:RMRR base: 0x00000000defd6000 end: 0x00000000defd6fff
> DMAR:RMRR base: 0x00000000defd7000 end: 0x00000000defd7fff
> Not all IO-APIC's listed under remapping hardware
> 
> Device ff:1f.7 doesn't actually exist, lspci -n says
> 
> 00:00.0 0600: 8086:29e0
> 00:01.0 0604: 8086:29e1
> 00:1a.0 0c03: 8086:2937 (rev 02)
> 00:1a.1 0c03: 8086:2938 (rev 02)
> 00:1a.2 0c03: 8086:2939 (rev 02)
> 00:1a.7 0c03: 8086:293c (rev 02)
> 00:1b.0 0403: 8086:293e (rev 02)
> 00:1c.0 0604: 8086:2940 (rev 02)
> 00:1c.4 0604: 8086:2948 (rev 02)
> 00:1c.5 0604: 8086:294a (rev 02)
> 00:1d.0 0c03: 8086:2934 (rev 02)
> 00:1d.1 0c03: 8086:2935 (rev 02)
> 00:1d.2 0c03: 8086:2936 (rev 02)
> 00:1d.7 0c03: 8086:293a (rev 02)
> 00:1e.0 0604: 8086:244e (rev 92)
> 00:1f.0 0601: 8086:2916 (rev 02)
> 00:1f.2 0106: 8086:2922 (rev 02)
> 01:00.0 0300: 1002:7187
> 01:00.1 0380: 1002:71a7
> 10:0b.0 0c00: 11c1:5811 (rev 61)
> 3f:00.0 0200: 14e4:167b (rev 02)
>
Comment 2 David Woodhouse 2009-08-19 22:01:57 UTC
Looking at the log from 2.6.30, it seems that the iommu wasn't being used. I
don't think this is a regression -- it's just another BIOS written by
incompetents and shipped without any QA.

There should be a workaround for this in
http://git.infradead.org/iommu-2.6.git/commit/0815565a
but it can't _work_ -- it'll just disable the IOMMU after bitching a bit.
Comment 3 Suresh B Siddha 2009-08-20 01:16:25 UTC
On Wed, 2009-08-19 at 14:26 -0700, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Tue, 18 Aug 2009 14:54:22 GMT bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=14003
> > 
> >            Summary: Infinite loop on bootup while handling DMAR
> 
> That's a box-killing post-2.6.30 regression.
> 
> > 
> > Booting 64bit 2.6.31-rc6 on a hp xw4600 results in an infinite loop of
> > 
> > DMAR:[DMA READ] Request device [ff:1f.7] fault addr fffffffffffff000
> > DMAR:[fault reason 255] Unknown

I think David has a fix queued up for this already. Please check
http://git.infradead.org/iommu-2.6.git/commit/0815565a

thanks,
suresh
Comment 4 David Woodhouse 2009-08-20 07:53:09 UTC
On Wed, 2009-08-19 at 14:26 -0700, Andrew Morton wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=14003
> >            Summary: Infinite loop on bootup while handling DMAR
> 
> That's a box-killing post-2.6.30 regression.

It's a BIOS bug -- the user's BIOS is written by idiots who obviously
shipped it without any QA whatsoever.

As far as I can tell, 2.6.30 aborted early because of a _different_ BIOS
bug, but now we cope with that particular bug and we fall over the next
bug. Or just come across them in a different order.

The IOMMU on this board can _never_ have worked. Just disable it.

Or use a board with open source firmware available, and this kind of
crap won't happen. (At least if it does, you'll be able to fix it).
Comment 5 Bernhard Rosenkraenzer 2009-08-20 08:16:23 UTC
Confirmed that the patch fixes it (as far as "fixing" it by turning off the iommu goes).
BIOS developers should be forced to use real operating systems...
Comment 6 Anonymous Emailer 2009-08-20 08:41:28 UTC
Reply-To: paravoid@debian.org

David Woodhouse wrote:
> On Wed, 2009-08-19 at 14:26 -0700, Andrew Morton wrote:
>>> http://bugzilla.kernel.org/show_bug.cgi?id=14003
>>>            Summary: Infinite loop on bootup while handling DMAR
>> That's a box-killing post-2.6.30 regression.
> 
> It's a BIOS bug -- the user's BIOS is written by idiots who obviously
> shipped it without any QA whatsoever.
Matt may be wondering why he was addressed, since the bugzilla entry
mentions only a bug in HP's BIOS.

I'm experiencing the same bug on a newly bought Dell Optiplex 760 with
BIOS version A03, as explained in my mail in lkml,
<4A89CB52.4030008@debian.org>, subject "[regression, bisected] fails to
boot on Dell Optiplex 760 with VT-d enabled".

Thanks,
Faidon
Comment 7 David Woodhouse 2009-08-20 08:46:00 UTC
No, using a real operating system isn't relevant. This would break VT-d under Windows too -- I don't think there's anything the OS can do other than just ignore the broken DMAR tables and run without the IOMMU.

So the offending BIOS teams obviously didn't bother to test this AT ALL.
Comment 8 Rafael J. Wysocki 2009-08-20 14:35:19 UTC
Closing, since this is not a kernel bug.

Note You need to log in before you can comment on or make changes to this bug.