Bug 217218

Summary: Trying to boot Linux version 6-2.2 kernel with Marvell SATA controller 88SE9235
Product: IO/Storage Reporter: jason_a69
Component: Serial ATAAssignee: Tejun Heo (tj)
Status: ASSIGNED ---    
Severity: normal CC: regressions
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 6-2.2 Subsystem:
Regression: Yes Bisected commit-id:

Description jason_a69 2023-03-20 11:05:54 UTC
The machine will not boot as the controller appears to be lock up, reset itself and then only 2 of the 4 disks are detected which are connected to the controller.

The man errors I am getting are

dmar_fault 8 callbacks suppressed
DMAR : DRHD: handling fault status req 2
DMAR : [DMA Write NO_PASID] Request device [07.00.1] fault addr 0xfffe0000 [fault reason 0x82] Present bit in contect entry is clear

Kernel version 5.15.91 works fine, I also tried 6-0.0 which also failed

Looking in the change log for 6.0
https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.0

There are quite a few iommu changes. As a result I changed /etc/default/grub from

GRUB_CMDLINE_LINUX="iommu=soft intel_iommu=on"

to

GRUB_CMDLINE_LINUX="iommu=soft intel_iommu=on iommu.forcedac=1"

which did not help.

If I do lspci on a kernel that boots I get 

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port Desktop SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
02:00.0 USB controller: Etron Technology, Inc. EJ188/EJ198 USB 3.0 Host Controller
03:00.0 PCI bridge: PLX Technology, Inc. PEX 8603 3-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ab)
04:01.0 PCI bridge: PLX Technology, Inc. PEX 8603 3-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ab)
04:02.0 PCI bridge: PLX Technology, Inc. PEX 8603 3-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ab)
05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
07:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 10)

I have had a look at the kernel parameters which are here

https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html?highlight=iommu

I have tried a few different parameters, the only thing that did work was
intel_iommu=off

Using that option would mess up my VMs so I would rather not do that.

I am sure I am just missing a kernel parameter.
Comment 1 Artem S. Tashkinov 2023-03-20 13:50:45 UTC
Your best bet will be to bisect:

https://docs.kernel.org/admin-guide/bug-bisect.html
Comment 2 Artem S. Tashkinov 2023-03-20 13:51:34 UTC
In a perfect world no kernel parameters are needed, so it's a regression which needs to be addressed/fixed.
Comment 3 jason_a69 2023-03-21 06:57:10 UTC
Ok, right, that was an interesting learning curve :)

Here is the my BISECT_LOG

git bisect start
# bad: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
git bisect bad 45eb8ae5370d5df1ee8236f45df3f29103ba6e12
# good: [8bb7eca972ad531c9b149c0a51ab43a417385813] Linux 5.15
git bisect good dc7089468610f429e9264420c43d5a3625fd5d8b
# bad: [e8d018dd0257f744ca50a729e3d042cf2ec9da65] Linux 6.3-rc3
git bisect bad e8d018dd0257f744ca50a729e3d042cf2ec9da65
# bad: [0180290abb5ce5c870f84a00ffeda5802f641dce] Merge tag 'topic/nouveau-misc-2022-07-13-1' of git://anongit.freedesktop.org/drm
/drm into drm-next
git bisect bad 0180290abb5ce5c870f84a00ffeda5802f641dce
# good: [5191290407668028179f2544a11ae9b57f0bcf07] Merge tag 'for-5.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave
/linux
git bisect good 5191290407668028179f2544a11ae9b57f0bcf07
# good: [b7da9c6b01cbd3e50e7611d608d46628ba8addde] Merge branch 'lan95xx-no-polling'
git bisect good b7da9c6b01cbd3e50e7611d608d46628ba8addde
# bad: [cc3c470ae4ad758b8ddad825ab199f7eaa8b0a9e] Merge tag 'arm-drivers-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/so
c/soc
git bisect bad cc3c470ae4ad758b8ddad825ab199f7eaa8b0a9e
# good: [e908305fb262588471958f560eb3c6c18cc683a1] Merge tag 'checkpatch-new-alloc-check-5.19-rc1' of git://git.kernel.org/pub/scm
/linux/kernel/git/gustavoars/linux
git bisect good e908305fb262588471958f560eb3c6c18cc683a1
# good: [2518f226c60d8e04d18ba4295500a5b0b8ac7659] Merge tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm
git bisect good 2518f226c60d8e04d18ba4295500a5b0b8ac7659
# bad: [c011dd537ffe47462051930413fed07dbdc80313] Merge tag 'arm-soc-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/so
c
git bisect bad c011dd537ffe47462051930413fed07dbdc80313
# good: [f7a344468105ef8c54086dfdc800e6f5a8417d3e] ASoC: max98090: Move check for invalid values before casting in max98090_put_en
ab_tlv()
git bisect good f7a344468105ef8c54086dfdc800e6f5a8417d3e
# good: [fbe86daca0ba878b04fa241b85e26e54d17d4229] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scs
i
git bisect good fbe86daca0ba878b04fa241b85e26e54d17d4229
# good: [709c8632597c3276cd21324b0256628f1a7fd4df] xfs: rework deferred attribute operation setup
git bisect good 709c8632597c3276cd21324b0256628f1a7fd4df
# bad: [babf0bb978e3c9fce6c4eba6b744c8754fd43d8e] Merge tag 'xfs-5.19-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad babf0bb978e3c9fce6c4eba6b744c8754fd43d8e
# bad: [8b728edc5be161799434cc17e1279db2f8eabe29] Merge tag 'fs_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ja
ck/linux-fs
git bisect bad 8b728edc5be161799434cc17e1279db2f8eabe29
# bad: [3f70356edf5611c28a68d8d5a9c2b442c9eb81e6] swiotlb: merge swiotlb-xen initialization into swiotlb
git bisect bad 3f70356edf5611c28a68d8d5a9c2b442c9eb81e6
# good: [f39f8d0eb081407e470396fd4cc376c526d13066] MIPS/octeon: use swiotlb_init instead of open coding it
git bisect good f39f8d0eb081407e470396fd4cc376c526d13066
# bad: [c6af2aa9ffc9763826607bc2664ef3ea4475ed18] swiotlb: make the swiotlb_init interface more useful
git bisect bad c6af2aa9ffc9763826607bc2664ef3ea4475ed18
# bad: [a3e230926708125205ffd06d3dc2175a8263ae7e] x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled
git bisect bad a3e230926708125205ffd06d3dc2175a8263ae7e
# bad: [78013eaadf696d2105982abb4018fbae394ca08f] x86: remove the IOMMU table infrastructure
git bisect bad 78013eaadf696d2105982abb4018fbae394ca08f
# first bad commit: [78013eaadf696d2105982abb4018fbae394ca08f] x86: remove the IOMMU table infrastructure

How can I help further?
Comment 4 jason_a69 2023-03-21 09:49:47 UTC
I went back to branch v6.2 and tried to revert change 78013eaadf696d2105982abb4018fbae394ca08f

I have these unmerged files which I am trying to solve

U	arch/x86/include/asm/dma-mapping.h
U	arch/x86/kernel/pci-dma.c
U	drivers/iommu/amd/iommu.c
U	drivers/iommu/intel/dmar.c
Comment 5 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-03-21 13:58:34 UTC
Thx for bisecting. Forwarded your report to the developer that should handle this:

https://lore.kernel.org/all/a79ea7f5-6a41-a6c9-cfec-ba01aa2a3cfa@leemhuis.info/
Comment 6 jason_a69 2023-03-21 15:37:47 UTC
Forward to Christoph Hellwig?
Comment 7 Artem S. Tashkinov 2023-03-23 19:19:42 UTC
From https://lore.kernel.org/all/20230322135406.GB16587@lst.de/


> Hi Jason,
> 
> I'm a little unresponsive right now as I'm dealing with the fallout
> of a strike tomorrow that is disrupting my travel.  So for now,
> just a quick idea off my mind:
> 
>   1) is CONFIG_GART_IOMMU enabled in your kernel
>   2) if so can you disable it and see if that makes the problem go away?

Please reply in the mailing list. I'm not sure if the developer is subscribed to this bug report.

Thanks a ton for your git bisect work.
Comment 8 jason_a69 2023-03-24 13:35:51 UTC
I have emailed the developer with an update, he suggested a fix which unfortunately did not work. I am waiting for him to get back to me.

No problem re : bisect. I am happy to help out if and when I can.