Bug 202723 - DMAR: DRHD: handling fault status reg 2
Summary: DMAR: DRHD: handling fault status reg 2
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: IOMMU (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: drivers_iommu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-02 09:21 UTC by marcomom
Modified: 2019-08-21 09:32 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.20
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
.config (218.88 KB, application/x-kdeuser2)
2019-03-02 09:21 UTC, marcomom
Details

Description marcomom 2019-03-02 09:21:12 UTC
Created attachment 281459 [details]
.config

Hi,

Running a Intel(R) Core(TM) i7-4790K CPU with intel_iommu=on in order to run virtual machines with a nvidia GPU passthrough.

Since kernel 4.20, dmesg is flooded with 

DMAR: DRHD: handling fault status reg 2
[38612.128348] DMAR: [DMA Write] Request device [00:02.0] fault addr a3f5c000 [fault reason 05] PTE Write access is not set
[38640.215367] DMAR: DRHD: handling fault status reg 2
[38640.215378] DMAR: [DMA Write] Request device [00:02.0] fault addr b2dfc000 [fault reason 05] PTE Write access is not set
[38699.725691] DMAR: DRHD: handling fault status reg 2
[38699.725695] DMAR: [DMA Write] Request device [00:02.0] fault addr a3f5c000 [fault reason 05] PTE Write access is not set
[38731.439607] DMAR: DRHD: handling fault status reg 2

This is a regression as it doesn't happen with kernel 4.19. It seems to come up when using video players such as vlc and kodi.

# lspci
00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:01.2 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x4 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller
00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-V
00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2
00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0)
00:1c.2 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 3 (rev d0)
00:1c.3 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 4 (rev d0)
00:1c.4 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 5 (rev d0)
00:1c.6 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 7 (rev d0)
00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1
00:1f.0 ISA bridge: Intel Corporation Z97 Chipset LPC Controller
00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode]
00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller
01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
02:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
04:00.0 Ethernet controller: Qualcomm Atheros Killer E220x Gigabit Ethernet Controller (rev 10)
05:00.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 41)
07:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller (rev 12)


It doesn't seem to be totally blocking the system, however slowdowns and stuttering occurs. I have tried every 4.20 releases until this 4.20.12-4. Only reverting to 4.19 fixes the problem.

Feel free to ask any additional information.

Thank you
Comment 1 marcomom 2019-07-10 04:57:13 UTC
Well still happens with recent kernel 5.1.16.
Comment 2 Paul Menzel 2019-08-21 09:25:59 UTC
Debian enabled CONFIG_INTEL_IOMMU_DEFAULT_ON=y in Linux 5.2.9 [1], and now I am having problems on the Dell Latitude E7250.

```
[   41.975448] DMAR: DRHD: handling fault status reg 3
[   41.975456] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1003000 [fault reason 23] Unknown
[   41.987006] DMAR: DRHD: handling fault status reg 3
[   41.987013] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1006000 [fault reason 23] Unknown
[   42.157270] DMAR: DRHD: handling fault status reg 3
[   42.157275] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1003000 [fault reason 23] Unknown
[   42.886618] DMAR: DRHD: handling fault status reg 3
[   47.549330] dmar_fault: 11 callbacks suppressed
[   47.549333] DMAR: DRHD: handling fault status reg 3
[   47.549343] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1006000 [fault reason 23] Unknown
[   48.699784] DMAR: DRHD: handling fault status reg 3
[   48.699799] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1003000 [fault reason 23] Unknown
[   48.880717] DMAR: DRHD: handling fault status reg 3
[   48.880733] DMAR: [DMA Write] Request device [00:02.0] fault addr fffec1006000 [fault reason 23] Unknown
[   50.033196] DMAR: DRHD: handling fault status reg 3
[   57.985477] i915 0000:00:02.0: GPU HANG: ecode 8:1:0xfffffffe, in gnome-shell [938], hang on rcs0
[   57.985479] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   57.985479] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   57.985479] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   57.985480] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   57.985480] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   57.986489] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
```

Booting with `intel_iommu=igfx_off` fixes this [2].

This seems to be a common problem [3][4].


[1]: https://bugs.debian.org/934309
[2]: https://bugs.kali.org/view.php?id=5644
[3]: https://bbs.archlinux.org/viewtopic.php?id=230362
[4]: https://bugs.freedesktop.org/show_bug.cgi?id=103076
Comment 3 Paul Menzel 2019-08-21 09:32:52 UTC
As the other tickets are convoluted, I created a new issue at https://bugs.freedesktop.org [1].

[1]: https://bugs.freedesktop.org/show_bug.cgi?id=111451

Note You need to log in before you can comment on or make changes to this bug.