Created attachment 284511 [details]
dmesg from -rc5 showing IOMMU faults for 07:00.1
Rework of IOMMU code in 5.3 seems to reveal hardware problem with HighPoint RocketRAID 642L:
07:00.0 RAID bus controller: HighPoint Technologies, Inc. RocketRAID 642L SATA-III Controller (2 eSATA ports + 2 internal SATA ports) (rev 01)
While device is single-function, in 5.3-rcs it fails with following error:
[ 15.580858] scsi host6: ahci
[ 15.581250] DMAR: DRHD: handling fault status reg 2
[ 15.585050] scsi host7: ahci
[ 15.590041] DMAR: [DMA Write] Request device [07:00.1] fault addr fffe0000 [fault reason 02] Present bit in context entry is clear
[ 15.594238] scsi host8: ahci
[ 15.610215] scsi host9: ahci
[ 15.614111] ata7: SATA max UDMA/133 abar m2048@0xfb110000 port 0xfb110100 irq 37
[ 15.622174] ata8: SATA max UDMA/133 abar m2048@0xfb110000 port 0xfb110180 irq 37
[ 15.630243] ata9: SATA max UDMA/133 abar m2048@0xfb110000 port 0xfb110200 irq 37
[ 15.638312] ata10: SATA max UDMA/133 abar m2048@0xfb110000 port 0xfb110280 irq 37
[ 15.646632] sata_sil24 0000:06:00.0: version 1.1
[ 15.648229] scsi host10: sata_sil24
[ 15.652935] scsi host11: sata_sil24
[ 15.653544] DMAR: DRHD: handling fault status reg 102
[ 15.657427] ata11: SATA max UDMA/100 host m128@0xfb284000 port 0xfb280000 irq 17
[ 15.662851] DMAR: [DMA Write] Request device [07:00.1] fault addr fffa0000 [fault reason 02] Present bit in context entry is clear
[ 15.670974] ata12: SATA max UDMA/100 host m128@0xfb284000 port 0xfb282000 irq 17
[ 15.663541] DMAR: [DMA Write] Request device [07:00.1] fault addr fffe0000 [fault reason 02] Present bit in context entry is clear
[ 15.683347] DMAR: DRHD: handling fault status reg 300
Trial and error confirms that device needs IOMMU entry for both 07:00.0 and 07:00.1 to work correctly - I've created hack that sets IOMMU entry for 07:00.1 to same value present in 07:00.0, and things do work correctly with such patch.
Created attachment 284513 [details]
dmesg from -rc1 without faults
Created attachment 284515 [details]
Created attachment 284517 [details]
hack to copy IOMMU PTE 07:00.0 to 07:00.1
Created attachment 284519 [details]
dmesg from -rc5 with hack
This commit seems to remove aliases handling from map/unmap, but I cannot find where else would be aliases handled.
Author: Lu Baolu <email@example.com>
Date: Tue Jul 9 13:22:45 2019 +0800
iommu/vt-d: Avoid duplicated pci dma alias consideration
As we have abandoned the home-made lazy domain allocation
and delegated the DMA domain life cycle up to the default
domain mechanism defined in the generic iommu layer, we
needn't consider pci alias anymore when mapping/unmapping
the context entries. Without this fix, we see kernel NULL
pointer dereference during pci device hot-plug test.
Cc: Ashok Raj <firstname.lastname@example.org>
Cc: Jacob Pan <email@example.com>
Cc: Kevin Tian <firstname.lastname@example.org>
Fixes: fa954e6831789 ("iommu/vt-d: Delegate the dma domain to upper layer")
Signed-off-by: Lu Baolu <email@example.com>
Reported-and-tested-by: Xu Pengfei <firstname.lastname@example.org>
Signed-off-by: Joerg Roedel <email@example.com>
Thanks for reporting.
How does this device connect to the system? The kernel message shows that IOMMU didn't probe 07:00.0 and 07:00.1 during boot. And there's no message shows that these functions were hot-added during run time.
It is present in boot. I see it in the dmesg I have uploaded, including message about enabling DMA quirk:
[ 8.991237] pci 0000:07:00.0: [1103:0642] type 00 class 0x010400
[ 8.993608] pci 0000:07:00.0: reg 0x10: [io 0xc050-0xc057]
[ 8.999898] pci 0000:07:00.0: reg 0x14: [io 0xc040-0xc043]
[ 9.003591] pci 0000:07:00.0: reg 0x18: [io 0xc030-0xc037]
[ 9.009880] pci 0000:07:00.0: reg 0x1c: [io 0xc020-0xc023]
[ 9.023593] pci 0000:07:00.0: reg 0x20: [io 0xc000-0xc01f]
[ 9.029883] pci 0000:07:00.0: reg 0x24: [mem 0xfb110000-0xfb1107ff]
[ 9.033591] pci 0000:07:00.0: reg 0x30: [mem 0xfb100000-0xfb10ffff pref]
[ 9.041060] pci 0000:07:00.0: Enabling fixed DMA alias to 00.1
[ 9.043646] pci 0000:07:00.0: PME# supported from D3hot
Oh, my bad.
07:00.0 was probed during boot.
[ 10.565034] pci 0000:07:00.0: Adding to iommu group 19
So how about 07:00.1?
There is no such device - see my initial description. Device uses incorrect requester ID for some requests.
See also https://github.com/torvalds/linux/blob/master/drivers/pci/quirks.c#L3876 - there is bunch of other pci_add_dma_alias() in that file.
Any update? I think that missing disks attached to Marvell based adapters could be ship-stopper for releasing 5.3.
I am working on this and will come up with a fix soon.