Bug 220031

Summary: Which config will cause "idxd 0000:6a:01.0: failed to attach device pasid 1, domain type 4"
Product: Drivers Reporter: shangsong (shangsong2)
Component: OtherAssignee: drivers_other
Status: NEW ---    
Severity: normal CC: baolu.lu, dave.jiang, dave.jiang, tiwai, vcgomes, vkoul
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: config file of SLES and ubuntu
dmesg log
config and dmesg file of kernel-6.14.3-2.g493ad7
lspci_-vvv
config and dmesg file about comment 16 change

Description shangsong 2025-04-18 08:17:43 UTC
Created attachment 307983 [details]
config file of SLES and ubuntu

Some failure about driver idxd will be found in dmesg after compile kernel v6.14 on SLES 15, but the failure message does not repeat on ubuntu 24+ kernel V6.14, please help figure out which config cause this failure.

Failure message:
# dmesg|grep -i idxd
[   75.442782] [  T1732] idxd 0000:6a:01.0: enabling device (0144 -> 0146)
[   75.481254] [  T1732] idxd 0000:6a:01.0: failed to attach device pasid 1, domain type 4
[   75.481300] [  T1732] idxd 0000:6a:01.0: No in-kernel DMA with PASID. -22
[   75.570500] [  T1732] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100)
[   75.570676] [  T1732] idxd 0000:74:02.0: enabling device (0140 -> 0142)
[   75.612132] [  T1732] idxd 0000:74:02.0: failed to attach device pasid 1, domain type 4
[   75.612149] [  T1732] idxd 0000:74:02.0: No in-kernel DMA with PASID. -22
[   75.744284] [  T1732] idxd 0000:74:02.0: Intel(R) Accelerator Device (v100)
[   75.744879] [  T1738] idxd 0000:e7:01.0: enabling device (0144 -> 0146)
[   75.777447] [  T1738] idxd 0000:e7:01.0: failed to attach device pasid 1, domain type 4
[   75.777472] [  T1738] idxd 0000:e7:01.0: No in-kernel DMA with PASID. -22
[   76.348407] [  T1738] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100)
[   76.348545] [  T1738] idxd 0000:f1:02.0: enabling device (0140 -> 0142)
[   76.392729] [  T1738] idxd 0000:f1:02.0: failed to attach device pasid 1, domain type 4
[   76.392749] [  T1738] idxd 0000:f1:02.0: No in-kernel DMA with PASID. -22
[   76.485552] [  T1738] idxd 0000:f1:02.0: Intel(R) Accelerator Device (v100)
[   77.231379] [  T3739] idxd: crypto: iaa_crypto now ENABLED
Comment 1 shangsong 2025-04-21 03:11:40 UTC
Created attachment 308001 [details]
dmesg log
Comment 2 shangsong 2025-04-21 09:09:20 UTC
Created attachment 308003 [details]
config and dmesg file of kernel-6.14.3-2.g493ad7
Comment 3 Dave Jiang 2025-04-21 15:12:37 UTC
To enable kernel pasid, you need to have at least IOMMU legacy mode turned on. "intel_iommu=on" in the kernel commandline. To enable user pasid as well, you need to be in IOMMU scalable mode. "intel_iommu=on,sm_on". The above is not exactly an error. It just means you are running without IOMMU support and capabilities are limited for the DMA device.
Comment 4 shangsong 2025-04-22 01:37:20 UTC
(In reply to Dave Jiang from comment #3)
> To enable kernel pasid, you need to have at least IOMMU legacy mode turned
> on. "intel_iommu=on" in the kernel commandline. To enable user pasid as
> well, you need to be in IOMMU scalable mode. "intel_iommu=on,sm_on". The
> above is not exactly an error. It just means you are running without IOMMU
> support and capabilities are limited for the DMA device.

Hi Dave,
From the dmesg_kernel-6.14.3-2.g493ad7 log, boot parameter "intel_iommo=on,sm_on" has been set and addition "no5lvl" also setup for SVA.

[    0.000000] [      T0] Linux version 6.14.3-2.g493ad77-default (geeko@buildhost) (gcc (SUSE Linux) 14.2.1 20250220 [revision 9ffecde121af883b60bbe60d00425036bc873048], GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.43.1.20241209-5) #1 SMP PREEMPT_DYNAMIC Mon Apr 21 06:23:20 UTC 2025 (493ad77)
[    0.000000] [      T0] Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.3-2.g493ad77-default root=UUID=b85cfef7-216b-4167-9ed2-2eeaec961951 console=tty0 console=ttyS0 ignore_loglevel debug BOOT_IMAGE=(hd0)/boot/x86_64/loader/linux mitigations=auto intel_iommu=on,sm_on no5lvl quiet security=selinux selinux=1 enforcing=1
Comment 5 shangsong 2025-04-22 01:43:20 UTC
Hi Dave,
The failure does not be seen on RHEL/Ubuntu OSes with same boot parameter(intel_iommu=on,sm_on no5lvl), so i think the difference may come from kernel config, could you help to confirm my thought is correct or investigate which config/setting will affect the different behavior?
Comment 6 Dave Jiang 2025-04-22 01:44:48 UTC
Can you provide your 'lspci -vvv' for the failed machine? I'm also assuming that VT-d is enabled via BIOS on that platform?
Comment 7 Dave Jiang 2025-04-22 01:46:21 UTC
I'll check with Lu, Baolu as well WRT changes with upstream IOMMU. I'm actually covering upstream for Vinicious regarding idxd driver upstream. He's the owner but he's on 4wks sabbatical at the moment.
Comment 8 Dave Jiang 2025-04-22 01:47:18 UTC
(In reply to shangsong from comment #5)
> Hi Dave,
> The failure does not be seen on RHEL/Ubuntu OSes with same boot
> parameter(intel_iommu=on,sm_on no5lvl), so i think the difference may come
> from kernel config, could you help to confirm my thought is correct or
> investigate which config/setting will affect the different behavior?

I don't suppose you know which kernel version started showing this behavior?
Comment 9 shangsong 2025-04-22 01:49:25 UTC
Created attachment 308008 [details]
lspci_-vvv
Comment 10 shangsong 2025-04-22 01:53:34 UTC
(In reply to Dave Jiang from comment #8)
> (In reply to shangsong from comment #5)
> > Hi Dave,
> > The failure does not be seen on RHEL/Ubuntu OSes with same boot
> > parameter(intel_iommu=on,sm_on no5lvl), so i think the difference may come
> > from kernel config, could you help to confirm my thought is correct or
> > investigate which config/setting will affect the different behavior?
> 
> I don't suppose you know which kernel version started showing this behavior?

I try compile upstream kernel v6.14 on SLES and Ubuntu24.04 OS, ubuntu+V6.14 dmesg  does not contain the failure, but SLES OS have with the same boot parameter.
I suppose the difference is kernel config, but do not know which config affect this.
Comment 11 shangsong 2025-04-22 01:55:00 UTC
(In reply to Dave Jiang from comment #6)
> Can you provide your 'lspci -vvv' for the failed machine? I'm also assuming
> that VT-d is enabled via BIOS on that platform?

Please check attached "lspci_-vvv".
Comment 12 Dave Jiang 2025-04-22 01:55:52 UTC
(In reply to shangsong from comment #10)
> I try compile upstream kernel v6.14 on SLES and Ubuntu24.04 OS, ubuntu+V6.14
> dmesg  does not contain the failure, but SLES OS have with the same boot
> parameter.
> I suppose the difference is kernel config, but do not know which config
> affect this.

Can you attach your kernel config file? I'll take a look.
Comment 13 shangsong 2025-04-22 01:59:50 UTC
(In reply to Dave Jiang from comment #12)
> (In reply to shangsong from comment #10)
> > I try compile upstream kernel v6.14 on SLES and Ubuntu24.04 OS,
> ubuntu+V6.14
> > dmesg  does not contain the failure, but SLES OS have with the same boot
> > parameter.
> > I suppose the difference is kernel config, but do not know which config
> > affect this.
> 
> Can you attach your kernel config file? I'll take a look.

Please check attachment "config file of SLES and ubuntu"
Comment 14 Dave Jiang 2025-04-22 02:16:45 UTC
I see that CONFIG_IOMMU_DEFAULT_PASSHTROUGH=y is set on SLES config but not Ubuntu. Can you see if that is the difference?
Comment 15 shangsong 2025-04-22 03:31:44 UTC
(In reply to Dave Jiang from comment #14)
> I see that CONFIG_IOMMU_DEFAULT_PASSHTROUGH=y is set on SLES config but not
> Ubuntu. Can you see if that is the difference?

After modify SLES config to following, the failure message disappear(Same to Ubuntu)
     CONFIG_IOMMU_DEFAULT_DMA_LAZY=y
     # CONFIG_IOMMU_DEFAULT_PASSTHROUGH is not set

 # dmesg |grep -i idxd
[  252.447172] [   T1758] idxd 0000:6a:01.0: enabling device (0144 -> 0146)
[  252.573623] [   T1758] idxd 0000:6a:01.0: DBG: attach device pasid 1, domain type 11, ret is 0
[  252.869690] [   T1758] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100)
[  253.148534] [   T1758] idxd 0000:74:02.0: enabling device (0140 -> 0142)
[  253.789340] [   T1758] idxd 0000:74:02.0: DBG: attach device pasid 2, domain type 11, ret is 0
[  254.160091] [   T1758] idxd 0000:74:02.0: Intel(R) Accelerator Device (v100)
[  254.390963] [    T409] idxd 0000:e7:01.0: enabling device (0144 -> 0146)
[  254.615063] [    T409] idxd 0000:e7:01.0: DBG: attach device pasid 3, domain type 11, ret is 0
[  254.789783] [    T409] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100)
[  254.951992] [    T409] idxd 0000:f1:02.0: enabling device (0140 -> 0142)
[  255.161344] [    T409] idxd 0000:f1:02.0: DBG: attach device pasid 4, domain type 11, ret is 0
[  255.315794] [    T409] idxd 0000:f1:02.0: Intel(R) Accelerator Device (v100)
[  266.856727] [   T2660] idxd: crypto: iaa_crypto now ENABLED
Comment 16 Lu Baolu 2025-04-22 06:26:41 UTC
Can you please try below change?

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cb0b993bebb4..63c9c97ccf69 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4385,6 +4385,7 @@ static struct iommu_domain identity_domain = {
                .attach_dev     = identity_domain_attach_dev,
                .set_dev_pasid  = identity_domain_set_dev_pasid,
        },
+       .owner = &intel_iommu_ops,
 };
 
 const struct iommu_ops intel_iommu_ops = {
Comment 17 shangsong 2025-04-22 07:16:29 UTC
(In reply to Lu Baolu from comment #16)
> Can you please try below change?
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index cb0b993bebb4..63c9c97ccf69 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4385,6 +4385,7 @@ static struct iommu_domain identity_domain = {
>                 .attach_dev     = identity_domain_attach_dev,
>                 .set_dev_pasid  = identity_domain_set_dev_pasid,
>         },
> +       .owner = &intel_iommu_ops,
>  };
>  
>  const struct iommu_ops intel_iommu_ops = {


The failure also disappear with the change and below config:
# CONFIG_IOMMU_DEFAULT_DMA_LAZY is not set
CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y

# dmesg |grep -i idxd
[  238.086488] [     T10] idxd 0000:6a:01.0: enabling device (0144 -> 0146)
[  238.472576] [     T10] idxd 0000:6a:01.0: DBG: attach device pasid 1, domain type 4, ret is 0
[  238.497791] [     T10] idxd 0000:6a:01.0: Intel(R) Accelerator Device (v100)
[  238.501686] [   T2341] idxd 0000:74:02.0: enabling device (0140 -> 0142)
[  238.645659] [   T2341] idxd 0000:74:02.0: DBG: attach device pasid 2, domain type 4, ret is 0
[  238.687496] [   T2341] idxd 0000:74:02.0: Intel(R) Accelerator Device (v100)
[  238.688152] [    T409] idxd 0000:e7:01.0: enabling device (0144 -> 0146)
[  238.721393] [    T409] idxd 0000:e7:01.0: DBG: attach device pasid 3, domain type 4, ret is 0
[  238.846712] [    T409] idxd 0000:e7:01.0: Intel(R) Accelerator Device (v100)
[  239.041390] [    T409] idxd 0000:f1:02.0: enabling device (0140 -> 0142)
[  239.244029] [    T409] idxd 0000:f1:02.0: DBG: attach device pasid 4, domain type 4, ret is 0
[  239.382179] [    T409] idxd 0000:f1:02.0: Intel(R) Accelerator Device (v100)
[  258.564422] [   T2278] (udev-worker)[2278]: Inserted module 'idxd'
[  258.564423] [   T2318] (udev-worker)[2318]: Inserted module 'idxd'
[  258.564514] [   T2315] (udev-worker)[2315]: Inserted module 'idxd'
[  258.564527] [   T2275] (udev-worker)[2275]: Inserted module 'idxd'
[  279.035631] [   T2778] idxd: crypto: iaa_crypto now ENABLED
Comment 18 shangsong 2025-04-22 07:17:20 UTC
Created attachment 308009 [details]
config and dmesg file about comment 16 change
Comment 19 Lu Baolu 2025-04-22 07:55:02 UTC
A fix patch has been posted here:

https://lore.kernel.org/linux-iommu/20250422075422.2084548-1-baolu.lu@linux.intel.com/