Bug 220031
Summary: | Which config will cause "idxd 0000:6a:01.0: failed to attach device pasid 1, domain type 4" | ||
---|---|---|---|
Product: | Drivers | Reporter: | shangsong (shangsong2) |
Component: | Other | Assignee: | drivers_other |
Status: | NEW --- | ||
Severity: | normal | CC: | baolu.lu, dave.jiang, dave.jiang, tiwai, vcgomes, vkoul |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
config file of SLES and ubuntu
dmesg log config and dmesg file of kernel-6.14.3-2.g493ad7 lspci_-vvv config and dmesg file about comment 16 change |
Created attachment 308001 [details]
dmesg log
Created attachment 308003 [details]
config and dmesg file of kernel-6.14.3-2.g493ad7
To enable kernel pasid, you need to have at least IOMMU legacy mode turned on. "intel_iommu=on" in the kernel commandline. To enable user pasid as well, you need to be in IOMMU scalable mode. "intel_iommu=on,sm_on". The above is not exactly an error. It just means you are running without IOMMU support and capabilities are limited for the DMA device. (In reply to Dave Jiang from comment #3) > To enable kernel pasid, you need to have at least IOMMU legacy mode turned > on. "intel_iommu=on" in the kernel commandline. To enable user pasid as > well, you need to be in IOMMU scalable mode. "intel_iommu=on,sm_on". The > above is not exactly an error. It just means you are running without IOMMU > support and capabilities are limited for the DMA device. Hi Dave, From the dmesg_kernel-6.14.3-2.g493ad7 log, boot parameter "intel_iommo=on,sm_on" has been set and addition "no5lvl" also setup for SVA. [ 0.000000] [ T0] Linux version 6.14.3-2.g493ad77-default (geeko@buildhost) (gcc (SUSE Linux) 14.2.1 20250220 [revision 9ffecde121af883b60bbe60d00425036bc873048], GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.43.1.20241209-5) #1 SMP PREEMPT_DYNAMIC Mon Apr 21 06:23:20 UTC 2025 (493ad77) [ 0.000000] [ T0] Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.3-2.g493ad77-default root=UUID=b85cfef7-216b-4167-9ed2-2eeaec961951 console=tty0 console=ttyS0 ignore_loglevel debug BOOT_IMAGE=(hd0)/boot/x86_64/loader/linux mitigations=auto intel_iommu=on,sm_on no5lvl quiet security=selinux selinux=1 enforcing=1 Hi Dave, The failure does not be seen on RHEL/Ubuntu OSes with same boot parameter(intel_iommu=on,sm_on no5lvl), so i think the difference may come from kernel config, could you help to confirm my thought is correct or investigate which config/setting will affect the different behavior? Can you provide your 'lspci -vvv' for the failed machine? I'm also assuming that VT-d is enabled via BIOS on that platform? I'll check with Lu, Baolu as well WRT changes with upstream IOMMU. I'm actually covering upstream for Vinicious regarding idxd driver upstream. He's the owner but he's on 4wks sabbatical at the moment. (In reply to shangsong from comment #5) > Hi Dave, > The failure does not be seen on RHEL/Ubuntu OSes with same boot > parameter(intel_iommu=on,sm_on no5lvl), so i think the difference may come > from kernel config, could you help to confirm my thought is correct or > investigate which config/setting will affect the different behavior? I don't suppose you know which kernel version started showing this behavior? Created attachment 308008 [details]
lspci_-vvv
(In reply to Dave Jiang from comment #8) > (In reply to shangsong from comment #5) > > Hi Dave, > > The failure does not be seen on RHEL/Ubuntu OSes with same boot > > parameter(intel_iommu=on,sm_on no5lvl), so i think the difference may come > > from kernel config, could you help to confirm my thought is correct or > > investigate which config/setting will affect the different behavior? > > I don't suppose you know which kernel version started showing this behavior? I try compile upstream kernel v6.14 on SLES and Ubuntu24.04 OS, ubuntu+V6.14 dmesg does not contain the failure, but SLES OS have with the same boot parameter. I suppose the difference is kernel config, but do not know which config affect this. (In reply to Dave Jiang from comment #6) > Can you provide your 'lspci -vvv' for the failed machine? I'm also assuming > that VT-d is enabled via BIOS on that platform? Please check attached "lspci_-vvv". (In reply to shangsong from comment #10) > I try compile upstream kernel v6.14 on SLES and Ubuntu24.04 OS, ubuntu+V6.14 > dmesg does not contain the failure, but SLES OS have with the same boot > parameter. > I suppose the difference is kernel config, but do not know which config > affect this. Can you attach your kernel config file? I'll take a look. (In reply to Dave Jiang from comment #12) > (In reply to shangsong from comment #10) > > I try compile upstream kernel v6.14 on SLES and Ubuntu24.04 OS, > ubuntu+V6.14 > > dmesg does not contain the failure, but SLES OS have with the same boot > > parameter. > > I suppose the difference is kernel config, but do not know which config > > affect this. > > Can you attach your kernel config file? I'll take a look. Please check attachment "config file of SLES and ubuntu" I see that CONFIG_IOMMU_DEFAULT_PASSHTROUGH=y is set on SLES config but not Ubuntu. Can you see if that is the difference? (In reply to Dave Jiang from comment #14) > I see that CONFIG_IOMMU_DEFAULT_PASSHTROUGH=y is set on SLES config but not > Ubuntu. Can you see if that is the difference? After modify SLES config to following, the failure message disappear(Same to Ubuntu) CONFIG_IOMMU_DEFAULT_DMA_LAZY=y # CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set # dmesg |grep -i idxd [ 252.447172] [ T1758] idxd 0000:6a:01.0: enabling device (0144 -> 0146) [ 252.573623] [ T1758] idxd 0000:6a:01.0: DBG: attach device pasid 1, domain type 11, ret is 0 [ 252.869690] [ T1758] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100) [ 253.148534] [ T1758] idxd 0000:74:02.0: enabling device (0140 -> 0142) [ 253.789340] [ T1758] idxd 0000:74:02.0: DBG: attach device pasid 2, domain type 11, ret is 0 [ 254.160091] [ T1758] idxd 0000:74:02.0: Intel(R) Accelerator Device (v100) [ 254.390963] [ T409] idxd 0000:e7:01.0: enabling device (0144 -> 0146) [ 254.615063] [ T409] idxd 0000:e7:01.0: DBG: attach device pasid 3, domain type 11, ret is 0 [ 254.789783] [ T409] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100) [ 254.951992] [ T409] idxd 0000:f1:02.0: enabling device (0140 -> 0142) [ 255.161344] [ T409] idxd 0000:f1:02.0: DBG: attach device pasid 4, domain type 11, ret is 0 [ 255.315794] [ T409] idxd 0000:f1:02.0: Intel(R) Accelerator Device (v100) [ 266.856727] [ T2660] idxd: crypto: iaa_crypto now ENABLED Can you please try below change? diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index cb0b993bebb4..63c9c97ccf69 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4385,6 +4385,7 @@ static struct iommu_domain identity_domain = { .attach_dev = identity_domain_attach_dev, .set_dev_pasid = identity_domain_set_dev_pasid, }, + .owner = &intel_iommu_ops, }; const struct iommu_ops intel_iommu_ops = { (In reply to Lu Baolu from comment #16) > Can you please try below change? > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index cb0b993bebb4..63c9c97ccf69 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -4385,6 +4385,7 @@ static struct iommu_domain identity_domain = { > .attach_dev = identity_domain_attach_dev, > .set_dev_pasid = identity_domain_set_dev_pasid, > }, > + .owner = &intel_iommu_ops, > }; > > const struct iommu_ops intel_iommu_ops = { The failure also disappear with the change and below config: # CONFIG_IOMMU_DEFAULT_DMA_LAZY is not set CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y # dmesg |grep -i idxd [ 238.086488] [ T10] idxd 0000:6a:01.0: enabling device (0144 -> 0146) [ 238.472576] [ T10] idxd 0000:6a:01.0: DBG: attach device pasid 1, domain type 4, ret is 0 [ 238.497791] [ T10] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100) [ 238.501686] [ T2341] idxd 0000:74:02.0: enabling device (0140 -> 0142) [ 238.645659] [ T2341] idxd 0000:74:02.0: DBG: attach device pasid 2, domain type 4, ret is 0 [ 238.687496] [ T2341] idxd 0000:74:02.0: Intel(R) Accelerator Device (v100) [ 238.688152] [ T409] idxd 0000:e7:01.0: enabling device (0144 -> 0146) [ 238.721393] [ T409] idxd 0000:e7:01.0: DBG: attach device pasid 3, domain type 4, ret is 0 [ 238.846712] [ T409] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100) [ 239.041390] [ T409] idxd 0000:f1:02.0: enabling device (0140 -> 0142) [ 239.244029] [ T409] idxd 0000:f1:02.0: DBG: attach device pasid 4, domain type 4, ret is 0 [ 239.382179] [ T409] idxd 0000:f1:02.0: Intel(R) Accelerator Device (v100) [ 258.564422] [ T2278] (udev-worker)[2278]: Inserted module 'idxd' [ 258.564423] [ T2318] (udev-worker)[2318]: Inserted module 'idxd' [ 258.564514] [ T2315] (udev-worker)[2315]: Inserted module 'idxd' [ 258.564527] [ T2275] (udev-worker)[2275]: Inserted module 'idxd' [ 279.035631] [ T2778] idxd: crypto: iaa_crypto now ENABLED Created attachment 308009 [details] config and dmesg file about comment 16 change A fix patch has been posted here: https://lore.kernel.org/linux-iommu/20250422075422.2084548-1-baolu.lu@linux.intel.com/ |
Created attachment 307983 [details] config file of SLES and ubuntu Some failure about driver idxd will be found in dmesg after compile kernel v6.14 on SLES 15, but the failure message does not repeat on ubuntu 24+ kernel V6.14, please help figure out which config cause this failure. Failure message: # dmesg|grep -i idxd [ 75.442782] [ T1732] idxd 0000:6a:01.0: enabling device (0144 -> 0146) [ 75.481254] [ T1732] idxd 0000:6a:01.0: failed to attach device pasid 1, domain type 4 [ 75.481300] [ T1732] idxd 0000:6a:01.0: No in-kernel DMA with PASID. -22 [ 75.570500] [ T1732] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100) [ 75.570676] [ T1732] idxd 0000:74:02.0: enabling device (0140 -> 0142) [ 75.612132] [ T1732] idxd 0000:74:02.0: failed to attach device pasid 1, domain type 4 [ 75.612149] [ T1732] idxd 0000:74:02.0: No in-kernel DMA with PASID. -22 [ 75.744284] [ T1732] idxd 0000:74:02.0: Intel(R) Accelerator Device (v100) [ 75.744879] [ T1738] idxd 0000:e7:01.0: enabling device (0144 -> 0146) [ 75.777447] [ T1738] idxd 0000:e7:01.0: failed to attach device pasid 1, domain type 4 [ 75.777472] [ T1738] idxd 0000:e7:01.0: No in-kernel DMA with PASID. -22 [ 76.348407] [ T1738] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100) [ 76.348545] [ T1738] idxd 0000:f1:02.0: enabling device (0140 -> 0142) [ 76.392729] [ T1738] idxd 0000:f1:02.0: failed to attach device pasid 1, domain type 4 [ 76.392749] [ T1738] idxd 0000:f1:02.0: No in-kernel DMA with PASID. -22 [ 76.485552] [ T1738] idxd 0000:f1:02.0: Intel(R) Accelerator Device (v100) [ 77.231379] [ T3739] idxd: crypto: iaa_crypto now ENABLED