Bug 82761
Summary: | DMAR:[fault reason 06] PTE Read access is not set | ||
---|---|---|---|
Product: | Drivers | Reporter: | Ansa89 (ansalonistefano) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | REOPENED --- | ||
Severity: | normal | CC: | alan, alex.williamson, szg00000, v |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.16.2 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Ansa89
2014-08-19 12:18:21 UTC
Does it work on 3.17-rc1? Are all of the 8169 NICs on bus 05 up and running? Please provide lspci -vv info for 04:00.0. 1) I would prefer stay on stable kernel if it's possible (which commits of 3.17-rc1 would be relevant for this bug?). 2) Yes, all of the 8169 NICs are up and running. 3) lspci -vvs 04:00.0 04:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=04, secondary=05, subordinate=05, sec-latency=32 I/O behind bridge: 0000c000-0000cfff Memory behind bridge: f7800000-f78fffff Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [78] Power Management version 3 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [80] Express (v1) PCI/PCI-X Bridge, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ BrConfRtry- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <2us, L1 <2us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- Capabilities: [c0] Subsystem: Micro-Star International Co., Ltd. Device 7758 Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- (In reply to Ansa89 from comment #2) > 1) I would prefer stay on stable kernel if it's possible (which commits of > 3.17-rc1 would be relevant for this bug?). 579305f iommu/vt-d: Update to use PCI DMA aliases e17f9ff iommu/vt-d: Use iommu_group_get_for_dev() 104a1c1 iommu/core: Create central IOMMU group lookup/creation interface I will try 3.17-rc1 (hoping it's enough stable for home-server). Tested with 3.17-rc1: the errors still there, but the spam rate seems lower than 3.16.1 (with 3.16.1 I get the errors repeated a lot of times and the count grows fast; with 3.17-rc1 I get the same errors repeated less times and the count seems to grow slower). After ~10 minutes: dmesg | grep -i dmar ACPI: DMAR 0x00000000C8EA83F0 0000B8 (v01 INTEL SNB 00000001 INTL 00000001) dmar: Host address width 36 dmar: DRHD base: 0x000000fed90000 flags: 0x0 dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c0000020e60262 ecap f0101a dmar: DRHD base: 0x000000fed91000 flags: 0x1 dmar: IOMMU 1: reg_base_addr fed91000 ver 1:0 cap c9008020660262 ecap f0105a dmar: RMRR base: 0x000000c8d17000 end: 0x000000c8d24fff dmar: RMRR base: 0x000000cb800000 end: 0x000000cf9fffff DMAR: No ATSR found [drm] DMAR active, disabling use of stolen memory dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set dmar: DRHD: handling fault status reg 3 dmar: DMAR:[DMA Read] Request device [05:00.0] fault addr ff3f4000 DMAR:[fault reason 06] PTE Read access is not set In the end the bug seems not fixed in 3.17-rc1. Ok, then it's probably not a result of the PCIe-to-PCI bridge since 05:00.0 is the correct requester ID for all the devices behind the bridge. Unfortunately that means that the problem may not be fixable. We're only seeing reads to a single address, which may mean the NIC is using that read to synchronize transaction ordering, ex. using a DMA read to flush a DMA write from the device. If the NIC driver has visibility of this address, then it could attempt to do a coherent mapping for the device(s) to avoid the fault. If it doesn't, then these NICs may simply be incompatible with the IOMMU. Are these 3 separate NICs plugged into PCI slots on the motherboard or is this a single triple-port card with embedded PCIe-to-PCI bridge? You might be able to run the IOMMU in passthrough mode with iommu=pt r8169.use_dac=1, but note the warning in modinfo "use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot." Unfortunately if you don't enable use_dac, then intel_iommu will ignore the passthrough option for these devices. Also note that this problem has nothing to do with Virtualization/KVM. Drivers/Network or perhaps Drivers/PCI would be a more appropriate classification. I'm guessing this might be the motherboard here: MSI ZH77A-G43 Since you're apparently trying to use VT-d on this system for KVM and therefore presumably device assignment, I'll note that you will never be able to successfully assign the conventional PCI devices separately between guests or between host and guests. The IOMMU does not have the granularity to create separate IOMMU domains per PCI slot in this topology. Also, some (all?) Realtek NICs have some strange backdoors to PCI configuration space that make them poor targets for PCI device assignment: http://git.qemu.org/?p=qemu.git;a=commit;h=4cb47d281a995cb49e4652cb26bafb3ab2d9bd28 (In reply to Alex Williamson from comment #6) > Are these 3 separate NICs plugged into PCI slots on the motherboard or is > this a single triple-port card with embedded PCIe-to-PCI bridge? They are 3 separate NICs plugged into 3 separate PCI slots. > You might be able to run the IOMMU in passthrough mode with iommu=pt > r8169.use_dac=1, but note the warning in modinfo "use_dac:Enable PCI DAC. > Unsafe on 32 bit PCI slot." Unfortunately if you don't enable use_dac, then > intel_iommu will ignore the passthrough option for these devices. I tried using "intel_iommu=pt", but it didn't work (resulted in vt-d disabled). However with "intel_iommu=on iommu=pt" the errors remain (probably because I didn't add "r8169.use_dac=1"). I'm on a 64 bit system, but I think it has nothing to with "32 bit PCI slot". > Also note that this problem has nothing to do with Virtualization/KVM. > Drivers/Network or perhaps Drivers/PCI would be a more appropriate > classification. I searched for "IOMMU" section but it doesn't exist. I will probably change classification to "Drivers/PCI". (In reply to Alex Williamson from comment #7) > I'm guessing this might be the motherboard here: MSI ZH77A-G43 Yes, that is my motherboard. > Since you're apparently trying to use VT-d on this system for KVM and > therefore presumably device assignment, I'll note that you will never be > able to successfully assign the conventional PCI devices separately between > guests or between host and guests. The IOMMU does not have the granularity > to create separate IOMMU domains per PCI slot in this topology. Also, some > (all?) Realtek NICs have some strange backdoors to PCI configuration space > that make them poor targets for PCI device assignment: Yes, I'm trying to do device assignment, but not with those NICs: I want to pass only the nVidia PCIe VGA card to guest; while all NICs (and the integrated VGA card) will remain available to host. It would be nice if there would be a way to prevent IOMMU on these NICs (or something like that). SIDE NOTE: in the qemu commit they talk about RTL8168, but I have real RTL8169 devices (the only RTL8168 device is the integrated NIC and for that device I'm using r8168 driver from realtek compiled by hand). If you are using an out of tree driver, then please take the bug up with the supplier. If you can duplicate it with the in-tree driver then please re-open the bug The problem is related to 00:05.0 device (Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller) which actually use the in-tree r8169 driver. The out of tree r8168 driver is used by 00:03.0 device (Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller) which has nothing to do with the issue. For testing purpose I also tried using only the in-tree r8169 driver for all devices, but the problem persists. Problem persists also with linux 3.16.2. |