Bug 219765 - Regression: VFIO NVIDIA GPU Passthrough Fails in Linux 6.13.0 (GPU Reset & Audio Controller Disappears)
Summary: Regression: VFIO NVIDIA GPU Passthrough Fails in Linux 6.13.0 (GPU Reset & Au...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P3 high
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-02-09 07:21 UTC by Joel Mathew Thomas
Modified: 2025-04-10 18:21 UTC (History)
4 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: Yes
Bisected commit-id: 665745f274870c921020f610e2c99a3b1613519b


Attachments
dmesg logs for the kernel in which gpu passthrough works (98.07 KB, text/plain)
2025-02-09 07:21 UTC, Joel Mathew Thomas
Details
dmesg logs for the kernel in which gpu passthrough does not work (3.88 KB, text/plain)
2025-02-09 07:22 UTC, Joel Mathew Thomas
Details
dmesg logs for the kernel in which gpu passthrough works (2.99 KB, text/plain)
2025-02-09 07:31 UTC, Joel Mathew Thomas
Details
dmesg logs for kernel after using pcie_port_pm=off kernel parameter on kernel 6.13.2-arch1-1 (92.68 KB, text/plain)
2025-02-09 19:04 UTC, Joel Mathew Thomas
Details
git bisect log (3.15 KB, text/plain)
2025-02-11 18:05 UTC, Joel Mathew Thomas
Details
Patch containing the reverts of commits that caused the regression. (21.80 KB, patch)
2025-02-11 18:05 UTC, Joel Mathew Thomas
Details | Diff
dmesg logs when the pcie_port_pm=off parameter is not set. (3.37 KB, text/plain)
2025-02-12 10:38 UTC, Joel Mathew Thomas
Details
dmesg logs when the pcie_port_pm=off parameter is set. (2.96 KB, text/plain)
2025-02-12 10:38 UTC, Joel Mathew Thomas
Details
lspci_working_config_6.14.0-rc2-00036-gecb69e7576a0_with_reverts (23.90 KB, text/plain)
2025-02-12 11:54 UTC, Joel Mathew Thomas
Details
lspci_working_config_6.13.2-arch1-1 (23.26 KB, text/plain)
2025-02-12 11:54 UTC, Joel Mathew Thomas
Details
lspci_working_config_6.14.0-rc2-00036-gecb69e7576a0_with_reverts (96.98 KB, text/plain)
2025-02-12 12:05 UTC, Joel Mathew Thomas
Details
lspci_non_working_config_6.13.2-arch1-1 (93.42 KB, text/plain)
2025-02-12 12:06 UTC, Joel Mathew Thomas
Details
lspci_non_working_config_6.13.2-arch1-1 before vm starts (96.95 KB, text/plain)
2025-02-12 13:49 UTC, Joel Mathew Thomas
Details
Disable bwctrl during reset (5.01 KB, application/mbox)
2025-02-12 14:44 UTC, Ilpo Järvinen
Details
lspci_working_config_before_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668 (96.92 KB, text/plain)
2025-02-12 17:05 UTC, Joel Mathew Thomas
Details
lspci_working_config_before_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668 (96.96 KB, text/plain)
2025-02-12 17:06 UTC, Joel Mathew Thomas
Details
lspci_non_working_config_after_vm_boot_6.12.0-rc1-00007-gde9a6c8d5dbf_commit_de9a6c8d5dbf (93.50 KB, text/plain)
2025-02-12 17:07 UTC, Joel Mathew Thomas
Details
lspci_working_config_after_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668 (96.96 KB, text/plain)
2025-02-12 17:09 UTC, Joel Mathew Thomas
Details
lspci_non_working_config_before_vm_boot_6.12.0-rc1-00007-gde9a6c8d5dbf_commit_de9a6c8d5dbf (96.92 KB, text/plain)
2025-02-12 17:11 UTC, Joel Mathew Thomas
Details
lspci_before_vm_boot_latest_commit_febbc555cf0f_patch_applied_no_reverts (96.93 KB, text/plain)
2025-02-12 17:11 UTC, Joel Mathew Thomas
Details
lspci_after_vm_boot_latest_commit_febbc555cf0f_patch_applied_no_reverts (93.52 KB, text/plain)
2025-02-12 17:12 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-rc2-00034-gfebbc555cf0f_no_reverts_patch_applied (3.83 KB, text/plain)
2025-02-12 17:17 UTC, Joel Mathew Thomas
Details
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_commented_out (96.93 KB, text/plain)
2025-02-13 15:08 UTC, Joel Mathew Thomas
Details
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_commented_out (96.98 KB, text/plain)
2025-02-13 15:08 UTC, Joel Mathew Thomas
Details
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LABIE (96.93 KB, text/plain)
2025-02-13 15:09 UTC, Joel Mathew Thomas
Details
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LABIE (96.98 KB, text/plain)
2025-02-13 15:10 UTC, Joel Mathew Thomas
Details
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LBMIE (96.93 KB, text/plain)
2025-02-13 15:11 UTC, Joel Mathew Thomas
Details
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LBMIE (93.41 KB, text/plain)
2025-02-13 15:12 UTC, Joel Mathew Thomas
Details
lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt (96.93 KB, text/plain)
2025-02-14 19:09 UTC, Joel Mathew Thomas
Details
lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt (93.52 KB, text/plain)
2025-02-14 19:10 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move_patch.txt (24.69 KB, text/plain)
2025-02-14 19:10 UTC, Joel Mathew Thomas
Details
lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch (96.93 KB, text/plain)
2025-02-14 19:11 UTC, Joel Mathew Thomas
Details
lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch (96.98 KB, text/plain)
2025-02-14 19:11 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.log (21.32 KB, text/plain)
2025-02-14 19:11 UTC, Joel Mathew Thomas
Details
series patch file : regix_lnkctl2move.patch (3.39 KB, patch)
2025-02-14 19:15 UTC, Joel Mathew Thomas
Details | Diff
Improved disable BW notifications during reset patch (7.04 KB, application/mbox)
2025-02-17 12:37 UTC, Ilpo Järvinen
Details
lspci_before_vm_boot_6.14.0-rc3-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch (96.93 KB, text/plain)
2025-02-17 14:32 UTC, Joel Mathew Thomas
Details
lspci_after_vm_boot_6.14.0-rc3-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch (96.98 KB, text/plain)
2025-02-17 14:32 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-rc3-dirty-0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch (20.93 KB, text/plain)
2025-02-17 14:32 UTC, Joel Mathew Thomas
Details
lspci_before_vm_boot_6.14.0-rc3-dirty_reset_regfix_lnkctl2move_patch (96.93 KB, text/plain)
2025-02-17 17:50 UTC, Joel Mathew Thomas
Details
lspci_after_vm_boot_6.14.0-rc3-dirty_reset_regfix_lnkctl2move_patch.txt (96.98 KB, text/plain)
2025-02-17 17:50 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-rc3-dirty_reset_regix_lnk2ctl_patch (22.41 KB, text/plain)
2025-02-17 17:51 UTC, Joel Mathew Thomas
Details
exact patch applied : reset_regix_lnkctl2move.patch (total 3 patches) (8.19 KB, patch)
2025-02-17 17:52 UTC, Joel Mathew Thomas
Details | Diff
Increase a few waits that may be related (1.25 KB, patch)
2025-03-04 14:42 UTC, Ilpo Järvinen
Details | Diff
acpidump (1.44 MB, text/plain)
2025-03-04 18:26 UTC, Joel Mathew Thomas
Details
reset method values (199 bytes, text/plain)
2025-03-04 18:27 UTC, Joel Mathew Thomas
Details
dmesg_log_6.13.5-arch1-1-pcie_port_pm_off_before_vm_boot (88.16 KB, text/plain)
2025-03-04 18:27 UTC, Joel Mathew Thomas
Details
lspci_6.13.5-arch1-1_pcie_port_pm_off_before_vm_boot (6.00 KB, text/plain)
2025-03-04 18:27 UTC, Joel Mathew Thomas
Details
dmesg_log_6.13.5-arch1-1-pcie_port_pm_off_after_vm_boot_working (107.75 KB, text/plain)
2025-03-04 18:28 UTC, Joel Mathew Thomas
Details
lspci_6.13.5-arch1-1_pcie_port_pm_off_after_vm_boot_working (6.00 KB, text/plain)
2025-03-04 18:28 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f_before_vm_boot (86.83 KB, text/plain)
2025-03-04 18:28 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00013-g99fa936e8e4f_before_vm_boot (6.02 KB, text/plain)
2025-03-04 18:29 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f_after_vm_boot_non_working (89.43 KB, text/plain)
2025-03-04 18:29 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00013-g99fa936e8e4f_after_vm_boot_non_working (2.90 KB, text/plain)
2025-03-04 18:29 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_before_vm_boot (87.03 KB, text/plain)
2025-03-04 18:30 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_before_vm_boot (6.02 KB, text/plain)
2025-03-04 18:30 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_after_vm_boot_non_working (90.01 KB, text/plain)
2025-03-04 18:31 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_after_vm_boot_non_working (2.90 KB, text/plain)
2025-03-04 18:31 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_before_vm_boot (87.54 KB, text/plain)
2025-03-04 18:31 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_before_vm_boot (6.02 KB, text/plain)
2025-03-04 18:31 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_fix-reg-read-adjust-wait_before_vm_boot (87.03 KB, text/plain)
2025-03-04 18:35 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_fix-reg-read-adjust-wait_before_vm_boot (6.02 KB, text/plain)
2025-03-04 18:35 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_fix-reg-read-adjust-wait_after_vm_boot_non_working (90.01 KB, text/plain)
2025-03-04 18:35 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_fix-reg-read-adjust-wait_after_vm_boot_non_working (2.90 KB, text/plain)
2025-03-04 18:36 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_wait_before_vm_boot (87.54 KB, text/plain)
2025-03-04 18:36 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_wait_before_vm_boot (6.02 KB, text/plain)
2025-03-04 18:37 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_wait_after_vm_boot_non_working (90.14 KB, text/plain)
2025-03-04 18:38 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_after_wait_vm_boot_non_working (2.90 KB, text/plain)
2025-03-04 18:38 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00039-g848e07631744-dirty_before_vm_boot_stacktrace (117.65 KB, text/plain)
2025-03-06 18:10 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00039-g848e07631744-dirty_before_vm_boot_stacktrace (6.02 KB, text/plain)
2025-03-06 18:10 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc5-00039-g848e07631744-dirty_after_vm_boot_non_working_stacktrace (132.93 KB, text/plain)
2025-03-06 18:11 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc5-00039-g848e07631744-dirty_after_vm_boot_non_working_stacktrace (2.90 KB, text/plain)
2025-03-06 18:11 UTC, Joel Mathew Thomas
Details
Hotplug fix (3.24 KB, patch)
2025-03-10 17:10 UTC, Ilpo Järvinen
Details | Diff
dmesg_log_6.14.0-rc6-dirty_before_vm_boot_patch_reset_hotplugfix (90.97 KB, text/plain)
2025-03-10 21:20 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-dirty_before_vm_boot_patch_reset_hotplugfix (6.02 KB, text/plain)
2025-03-10 21:20 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc6-dirty_after_vm_boot_working_patch_reset_hotplugfix (95.68 KB, text/plain)
2025-03-10 21:21 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-dirty_after_vm_boot_working_patch_reset_hotplugfix (6.02 KB, text/plain)
2025-03-10 21:21 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc6-dirty_before_vm_boot_patch_reset_without_atomic_and (91.60 KB, text/plain)
2025-03-10 21:21 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-dirty_before_vm_boot_patch_reset_without_atomic_and (6.02 KB, text/plain)
2025-03-10 21:22 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc6-dirty_after_vm_boot_working_patch_reset_without_atomic_and (96.31 KB, text/plain)
2025-03-10 21:22 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-dirty_after_vm_boot_working_patch_reset_without_atomic_and (6.02 KB, text/plain)
2025-03-10 21:22 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc6-dirty_before_vm_boot_no_reset_without_atomic_and (91.00 KB, text/plain)
2025-03-10 21:23 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-dirty_before_vm_boot_no_reset_without_atomic_and (6.02 KB, text/plain)
2025-03-10 21:23 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc6-dirty_after_vm_boot_working_no_reset_without_atomic_and (95.51 KB, text/plain)
2025-03-10 21:23 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-dirty_after_vm_boot_working_no_reset_without_atomic_and (6.02 KB, text/plain)
2025-03-10 21:24 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc6-00022-gb7f94fcf5546-dirty-pci-hotplug-reset-fixes_before_vm_boot (94.61 KB, text/plain)
2025-03-13 16:55 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-00022-gb7f94fcf5546-dirty-pci-hotplug-reset-fixes_before_vm_boot (6.02 KB, text/plain)
2025-03-13 16:55 UTC, Joel Mathew Thomas
Details
dmesg_log_6.14.0-rc6-00022-gb7f94fcf5546-dirty-pci-hotplug-reset-fixes_after_vm_boot_working (99.50 KB, text/plain)
2025-03-13 16:55 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-00022-gb7f94fcf5546-dirty-pci-hotplug-reset-fixes_after_vm_boot_working (6.02 KB, text/plain)
2025-03-13 16:55 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix_beforevmboot (91.32 KB, text/plain)
2025-03-15 19:06 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix_beforevmboot (6.02 KB, text/plain)
2025-03-15 19:06 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix_aftervmboot_working (95.96 KB, text/plain)
2025-03-15 19:06 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix_aftervmboot_working (6.02 KB, text/plain)
2025-03-15 19:06 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-rc6-00048-geb88e6bfbc0a-dirty-eventmask-fix_beforevmboot (91.24 KB, text/plain)
2025-03-15 22:22 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-00048-geb88e6bfbc0a-dirty-eventmask-fix_beforevmboot (6.02 KB, text/plain)
2025-03-15 22:22 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-rc6-00048-geb88e6bfbc0a-dirty-eventmask-fix_aftervmboot_working (95.88 KB, text/plain)
2025-03-15 22:23 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-rc6-00048-geb88e6bfbc0a-dirty-eventmask-fix_aftervmboot_working (6.02 KB, text/plain)
2025-03-15 22:23 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-ga1da676633d4_beforevmboot (85.67 KB, text/plain)
2025-04-03 21:08 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-ga1da676633d4_beforevmboot (96.93 KB, text/plain)
2025-04-03 21:09 UTC, Joel Mathew Thomas
Details
dmesg_6.14.0-ga1da676633d4_aftervmboot_working (87.59 KB, text/plain)
2025-04-03 21:09 UTC, Joel Mathew Thomas
Details
lspci_6.14.0-ga1da676633d4_aftervmboot_working (96.98 KB, text/plain)
2025-04-03 21:09 UTC, Joel Mathew Thomas
Details
dmesg_6.15.0-rc1-gab59a8605604-dirty_before_vmboot (86.26 KB, text/plain)
2025-04-10 18:20 UTC, Joel Mathew Thomas
Details
lspci_6.15.0-rc1-gab59a8605604-dirty_before_vmboot (96.93 KB, text/plain)
2025-04-10 18:20 UTC, Joel Mathew Thomas
Details
dmesg_6.15.0-rc1-gab59a8605604-dirty_after_vmboot_working (88.91 KB, text/plain)
2025-04-10 18:20 UTC, Joel Mathew Thomas
Details
lspci_6.15.0-rc1-gab59a8605604-dirty_after_vmboot_working (96.98 KB, text/plain)
2025-04-10 18:21 UTC, Joel Mathew Thomas
Details

Description Joel Mathew Thomas 2025-02-09 07:21:36 UTC
Created attachment 307599 [details]
dmesg logs for the kernel in which gpu passthrough works

After upgrading from Linux 6.12.10 to Linux 6.13.0, VFIO GPU passthrough fails for an NVIDIA GPU (AD107). The GPU is not passed through to the VM, and its audio device (01:00.1) disappears from Virt-Manager. This issue does not occur in Linux 6.12.10.

I have attached the logs.
Comment 1 Joel Mathew Thomas 2025-02-09 07:22:30 UTC
Created attachment 307600 [details]
dmesg logs for the kernel in which gpu passthrough does not work
Comment 2 Joel Mathew Thomas 2025-02-09 07:31:52 UTC
Created attachment 307601 [details]
dmesg logs for the kernel in which gpu passthrough works
Comment 3 Joel Mathew Thomas 2025-02-09 19:03:23 UTC
I was able to get a workaround by setting pcie_port_pm=off kernel parameter.

It works properly, except for poor power management.

I will attach logs below
Comment 4 Joel Mathew Thomas 2025-02-09 19:04:12 UTC
Created attachment 307603 [details]
dmesg logs for kernel after using pcie_port_pm=off kernel parameter on kernel 6.13.2-arch1-1
Comment 5 Bjorn Helgaas 2025-02-10 16:50:53 UTC
Sorry for the regression, and thanks very much for the debugging you've already done.  I don't see anything obvious in the vfio or PCI core changes between v6.12 and v6.13.  If it's practical for you to bisect this, that would be a simple (though tedious) way to zero in on it.
Comment 6 Joel Mathew Thomas 2025-02-10 16:56:27 UTC
(In reply to Bjorn Helgaas from comment #5)
> Sorry for the regression, and thanks very much for the debugging you've
> already done.  I don't see anything obvious in the vfio or PCI core changes
> between v6.12 and v6.13.  If it's practical for you to bisect this, that
> would be a simple (though tedious) way to zero in on it.

Thanks for taking the time to look into this and for your response. I really appreciate it. I'll go ahead and bisect between these versions to pinpoint the exact commit that introduced the regression.

I'll provide any findings and assist in every way I can to help resolve this. Thanks again for your support!
Comment 7 Joel Mathew Thomas 2025-02-10 19:25:21 UTC
(In reply to Bjorn Helgaas)
>Just to confirm, did you see the suggestion to try reverting
>dc421bb3c0db ("PCI: Enable runtime PM of the host bridge")?

>That would be a quicker test than a full bisection.

>I didn't explicitly cc you (I used bcc) to avoid exposing your email
>address. Feel free to respond to the linux-pci thread if you don't
>mind your address being public.

>Bjorn

I did see it. And i reverted the commit and built the kernel successfully.

But, that didn't fix it. And what's more surprising is the fact that, this time around, the pcie_port_pm=off kernel parameter did not work.

So, right now I'm trying to bisect the kernel from 6.12.10 to 6.13.0
Comment 8 Joel Mathew Thomas 2025-02-11 18:03:57 UTC
I have successfully bisected the issue.


The first bad commit is: 665745f274870c921020f610e2c99a3b1613519b - PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller.


Additionally, its child commit also needed to be reverted: de9a6c8d5dbfedb5eb3722c822da0490f6a59a45 - PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed


The pcie_port_pm=off kernel parameter did not work as a workaround.

I suspect the reason it worked on the Arch Linux-built kernel (6.13.0-arch1-1) is due to differences in their build configuration.

After reverting both commits, I successfully built the latest kernel from 6.14.0-rc2-00036-gecb69e7576a0, and GPU passthrough works correctly again.

I will be attaching the following files for reference:

1. git bisect log
2. Patch file containing the reverts

If needed, I can assist with further testing.
Please review the attached patch and consider reverting these commits upstream.
Comment 9 Joel Mathew Thomas 2025-02-11 18:05:03 UTC
Created attachment 307614 [details]
git bisect log
Comment 10 Joel Mathew Thomas 2025-02-11 18:05:33 UTC
Created attachment 307615 [details]
Patch containing the reverts of commits that caused the regression.
Comment 11 Joel Mathew Thomas 2025-02-12 10:37:33 UTC
I'd like to share some additional findings.

Currently, on the latest kernel 6.14.0-rc2-00036-gecb69e7576a0, with the reverts applied, GPU passthrough works properly with no functional issues. However, dmesg logs show the following errors:

vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
vfio-pci 0000:01:00.0:   device [10de:28e1] error status/mask=00000040/0000a000
vfio-pci 0000:01:00.0:    [ 6] BadTLP                
pcieport 0000:00:01.1: PME: Spurious native interrupt!
pcieport 0000:00:01.1: PME: Spurious native interrupt!


Observations:

These errors disappear when using the pcie_port_pm=off kernel parameter.

However, like before, this parameter does not fix GPU passthrough failures. If passthrough is broken, setting pcie_port_pm=off will not resolve it—it only removes the above dmesg errors.
Comment 12 Joel Mathew Thomas 2025-02-12 10:38:14 UTC
Created attachment 307623 [details]
dmesg logs when the pcie_port_pm=off parameter is not set.
Comment 13 Joel Mathew Thomas 2025-02-12 10:38:32 UTC
Created attachment 307624 [details]
dmesg logs when the pcie_port_pm=off parameter is set.
Comment 14 Ilpo Järvinen 2025-02-12 11:18:16 UTC
Could you attach also lspci from working and non-working configuration please.

For the record, those bus errors seem to be present also in the log with 6.12 so nothing new there.
Comment 15 Joel Mathew Thomas 2025-02-12 11:53:37 UTC
(In reply to Ilpo Järvinen from comment #14)
> Could you attach also lspci from working and non-working configuration
> please.
> 
> For the record, those bus errors seem to be present also in the log with
> 6.12 so nothing new there.

I have attached the lspci logs for both working and non-working configurations as requested.

Working configuration: 6.14.0-rc2-00036-gecb69e7576a0 with reverts applied
Non-working configuration: 6.13.2-arch1-1
A few observations:

    In the non-working configuration, the NVIDIA GPU audio controller is initially visible before the VM starts, but it disappears after the VM is started.
    
    I'm not sure if it's relevant. In the working configuration, the kernel modules listed for the NVIDIA GPU are nouveau only—not nvidia or nvidia_drm. This is simply because I haven’t built the proprietary NVIDIA kernel modules due to missing Linux headers.
Comment 16 Joel Mathew Thomas 2025-02-12 11:54:11 UTC
Created attachment 307625 [details]
lspci_working_config_6.14.0-rc2-00036-gecb69e7576a0_with_reverts
Comment 17 Joel Mathew Thomas 2025-02-12 11:54:32 UTC
Created attachment 307626 [details]
lspci_working_config_6.13.2-arch1-1
Comment 18 Ilpo Järvinen 2025-02-12 11:56:56 UTC
I'm sorry, I forgot mention the lspci should be taken with -vvv.
Comment 19 Joel Mathew Thomas 2025-02-12 12:05:50 UTC
Created attachment 307627 [details]
lspci_working_config_6.14.0-rc2-00036-gecb69e7576a0_with_reverts
Comment 20 Joel Mathew Thomas 2025-02-12 12:06:28 UTC
Created attachment 307628 [details]
lspci_non_working_config_6.13.2-arch1-1
Comment 21 Joel Mathew Thomas 2025-02-12 12:06:52 UTC
(In reply to Ilpo Järvinen from comment #18)
> I'm sorry, I forgot mention the lspci should be taken with -vvv.

I have updated the lspci outputs.
Comment 22 Ilpo Järvinen 2025-02-12 13:16:41 UTC
Besides the changes expected from PCIe BW controller being enabled, the Root Port (00:01.1) has secondary MAbort+ and PME status bits seem to be off... 

-       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
+       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-

                LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
-                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
+                       ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+
                LnkSta: Speed 2.5GT/s, Width x8
-                       TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
+                       TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-

-               RootSta: PME ReqID 0009, PMEStatus+ PMEPending+
+               RootSta: PME ReqID 0009, PMEStatus- PMEPending-


01:00.0 is obviously in quite bad shape but it's unclear why its config (including one capability register bit that is read-only as per PCIe spec!?!) got so wretched. Lots of those changes look unrelated to what PCIe BW controller itself does (in fact, bwctrl shouldn't even touch 01:00.0).

The loss of 01:00.1 is probably just collateral damage in the grand scheme of things.

Perhaps 01:00.0 is fine also on 6.13 before those resets are done for it so maybe take lspci -vvv before that point, if possible? (Before the VM start, I guess.)

In any case, there are so many changes in the lspci output for 01:00.0 that it would be useful to confirm that all those are related to the bwctrl change only (instead of changing a lot by changing the entire kernel version and if I understood correctly, even the gpu drivers were not the same).
Comment 23 Joel Mathew Thomas 2025-02-12 13:47:59 UTC
(In reply to Ilpo Järvinen from comment #22)

> Perhaps 01:00.0 is fine also on 6.13 before those resets are done for it so
> maybe take lspci -vvv before that point, if possible? (Before the VM start,
> I guess.)
  
I'll attach the lspci output before the VM starts in the non working config.
  
> In any case, there are so many changes in the lspci output for 01:00.0 that
> it would be useful to confirm that all those are related to the bwctrl
> change only (instead of changing a lot by changing the entire kernel version
> and if I understood correctly, even the gpu drivers were not the same).

To verify this, I will:

    1. Build and test the kernel from before the first bad commit.
    2. Build and test the kernel from the first bad commit and its immediate child commit, as both introduced the regression.
    3. Collect and attach lspci logs from these tests to determine whether the changes are solely related to BWCTRL.

> ...even the gpu drivers were not the same).
The GPU drivers themselves have remained the same, but the NVIDIA kernel modules were just not built in the working config.
Comment 24 Joel Mathew Thomas 2025-02-12 13:49:03 UTC
Created attachment 307629 [details]
lspci_non_working_config_6.13.2-arch1-1 before vm starts
Comment 25 Ilpo Järvinen 2025-02-12 14:44:49 UTC
Created attachment 307630 [details]
Disable bwctrl during reset

Perhaps it would help to disable BW controller while performing the reset.

A test patch attached. The patch doesn't handle parallel MFD resets entirely correctly but even in this simple form it might be enough to show if this approach does help (I'm out of time for today to code the more complex one to handle the parallel resets with a counter).
Comment 26 Joel Mathew Thomas 2025-02-12 14:52:03 UTC
(In reply to Ilpo Järvinen from comment #25)
> Created attachment 307630 [details]
> Disable bwctrl during reset
> 
> Perhaps it would help to disable BW controller while performing the reset.
> 
> A test patch attached. The patch doesn't handle parallel MFD resets entirely
> correctly but even in this simple form it might be enough to show if this
> approach does help (I'm out of time for today to code the more complex one
> to handle the parallel resets with a counter).

Sorry, I'm a bit confused. Should i apply this patch to the commit that caused the regression, or should I test it against the latest kernel?
Comment 27 Joel Mathew Thomas 2025-02-12 15:44:10 UTC
(In reply to Joel Mathew Thomas from comment #26) 
> Sorry, I'm a bit confused. Should i apply this patch to the commit that
> caused the regression, or should I test it against the latest kernel?

Disregard my previous comment, I was able to apply the patch on the latest commit. No further action needed
Comment 28 Joel Mathew Thomas 2025-02-12 17:03:57 UTC
(In reply to Ilpo Järvinen from comment #25)
> Created attachment 307630 [details]
> Disable bwctrl during reset
> 
> Perhaps it would help to disable BW controller while performing the reset.
> 
> A test patch attached. The patch doesn't handle parallel MFD resets entirely
> correctly but even in this simple form it might be enough to show if this
> approach does help (I'm out of time for today to code the more complex one
> to handle the parallel resets with a counter).

I have performed tests on different kernel versions to analyze the regression affecting PCI passthrough. Below are the details of each tested kernel and the corresponding results:

1. Kernel: 6.12.0-rc1-00005-g3491f5096668

    Commit: 3491f5096668 (Good commit)
    Modifications: No reverts, no patches
    Result: Passthrough working

2. Kernel: 6.12.0-rc1-00007-gde9a6c8d5dbf

    Commit: de9a6c8d5dbf (Bad commit)
    Modifications: No reverts, no patches
    Result: Passthrough not working

3. Kernel: 6.14.0-rc2-00034-gfebbc555cf0f

    Commit: febbc555cf0f
    Modifications: Applied patch 0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch
    Result: Passthrough not working
Comment 29 Joel Mathew Thomas 2025-02-12 17:05:40 UTC
Created attachment 307631 [details]
lspci_working_config_before_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
Comment 30 Joel Mathew Thomas 2025-02-12 17:06:10 UTC
Created attachment 307632 [details]
lspci_working_config_before_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
Comment 31 Joel Mathew Thomas 2025-02-12 17:07:52 UTC
Created attachment 307633 [details]
lspci_non_working_config_after_vm_boot_6.12.0-rc1-00007-gde9a6c8d5dbf_commit_de9a6c8d5dbf
Comment 32 Joel Mathew Thomas 2025-02-12 17:09:45 UTC
Created attachment 307634 [details]
lspci_working_config_after_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
Comment 33 Joel Mathew Thomas 2025-02-12 17:11:15 UTC
Created attachment 307635 [details]
lspci_non_working_config_before_vm_boot_6.12.0-rc1-00007-gde9a6c8d5dbf_commit_de9a6c8d5dbf
Comment 34 Joel Mathew Thomas 2025-02-12 17:11:54 UTC
Created attachment 307636 [details]
lspci_before_vm_boot_latest_commit_febbc555cf0f_patch_applied_no_reverts
Comment 35 Joel Mathew Thomas 2025-02-12 17:12:18 UTC
Created attachment 307637 [details]
lspci_after_vm_boot_latest_commit_febbc555cf0f_patch_applied_no_reverts
Comment 36 Joel Mathew Thomas 2025-02-12 17:17:07 UTC
Created attachment 307638 [details]
dmesg_6.14.0-rc2-00034-gfebbc555cf0f_no_reverts_patch_applied
Comment 37 Ilpo Järvinen 2025-02-13 12:00:27 UTC
Thanks. There certainly was much less config space changes in those logs (could be coincidental but rules out most of the bit changes in the config regardless). Sadly it also means I've very little good theories remaining. I suggest next trying without enabling the bandwidth notifications, in pcie_bwnotif_enable() comment out this call:

        pcie_capability_set_word(port, PCI_EXP_LNKCTL,
                                 PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);

(On top of most recent kernels is fine as a test, don't include the "fix" patch I attached earlier.)

If comment that line out does help, then try with the set word call but set only one of the two bits (LBIE and LABIE) at a time to see if the problem is related to just one of them.
Comment 38 Joel Mathew Thomas 2025-02-13 15:06:58 UTC
(In reply to Ilpo Järvinen from comment #37)
> Thanks. There certainly was much less config space changes in those logs
> (could be coincidental but rules out most of the bit changes in the config
> regardless). Sadly it also means I've very little good theories remaining. I
> suggest next trying without enabling the bandwidth notifications, in
> pcie_bwnotif_enable() comment out this call:
> 
>         pcie_capability_set_word(port, PCI_EXP_LNKCTL,
>                                  PCI_EXP_LNKCTL_LBMIE |
> PCI_EXP_LNKCTL_LABIE);
> 
> (On top of most recent kernels is fine as a test, don't include the "fix"
> patch I attached earlier.)
> 
> If comment that line out does help, then try with the set word call but set
> only one of the two bits (LBIE and LABIE) at a time to see if the problem is
> related to just one of them.

Hi Ilpo,

Thanks for the suggestion. I tested the changes, and here's what I found:

    Commenting out the entire pcie_capability_set_word call: Passthrough works.
    Enabling only PCI_EXP_LNKCTL_LABIE: Passthrough works.
    Enabling only PCI_EXP_LNKCTL_LBMIE: Passthrough does not work.

It seems that LBMIE is the problematic bit. Let me know if you'd like me to run any further tests.


I've also attached the lspci outputs, in case, it is useful
Comment 39 Joel Mathew Thomas 2025-02-13 15:08:13 UTC
Created attachment 307640 [details]
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_commented_out
Comment 40 Joel Mathew Thomas 2025-02-13 15:08:41 UTC
Created attachment 307641 [details]
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_commented_out
Comment 41 Joel Mathew Thomas 2025-02-13 15:09:25 UTC
Created attachment 307642 [details]
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LABIE
Comment 42 Joel Mathew Thomas 2025-02-13 15:10:20 UTC
Created attachment 307643 [details]
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LABIE
Comment 43 Joel Mathew Thomas 2025-02-13 15:11:45 UTC
Created attachment 307644 [details]
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LBMIE
Comment 44 Joel Mathew Thomas 2025-02-13 15:12:07 UTC
Created attachment 307645 [details]
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LBMIE
Comment 45 Ilpo Järvinen 2025-02-14 12:44:13 UTC
There's one additional thing that might be worth a try. I noticed that the Link Speed gets downgraded to 2.5GT/s whenever set_word is there, and there's a series that should fix the Link Speed degradation:

https://lore.kernel.org/linux-pci/20250123055155.22648-1-sjiwei@163.com/

While I'm not entirely convinced it resolves the issue, the variations in Link Speed between the most recent logs may mean it could have a role here that is directly related to whether LBMIE is set or not. The interrupt handler in bwctrl increases lbms_count but might not do that in early phases if only LABIE or no set_word is present, which in turn impacts what the Target Speed quirk ends up doing (which relates to the fixes in that series).
Comment 46 Joel Mathew Thomas 2025-02-14 19:09:21 UTC
(In reply to Ilpo Järvinen from comment #45)
> There's one additional thing that might be worth a try. I noticed that the
> Link Speed gets downgraded to 2.5GT/s whenever set_word is there, and
> there's a series that should fix the Link Speed degradation:
> 
> https://lore.kernel.org/linux-pci/20250123055155.22648-1-sjiwei@163.com/
> 
> While I'm not entirely convinced it resolves the issue, the variations in
> Link Speed between the most recent logs may mean it could have a role here
> that is directly related to whether LBMIE is set or not. The interrupt
> handler in bwctrl increases lbms_count but might not do that in early phases
> if only LABIE or no set_word is present, which in turn impacts what the
> Target Speed quirk ends up doing (which relates to the fixes in that series).

Hi Ilpo, Jiwei, and everyone involved,

I would like to provide an update on my testing results.

Firstly, I sincerely apologize for my previous report regarding the 0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch. I mistakenly ran --check without actually applying the patch. After correctly applying and testing it, I can confirm that GPU passthrough works, but the PCIe link speed still downgrades from 16GT/s to 2.5GT/s after the VM is started.

Additionally, I have tested the two patches from the series:

    [PATCH v4 1/2] PCI: Fix the wrong reading of register fields
    [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register

With both patches applied, GPU passthrough fails, and the link speed also downgrades from 16GT/s to 2.5GT/s after VM boot.

I have attached the following logs for further analysis:

    lspci -vvv output before and after VM boot
    dmesg logs
Comment 47 Joel Mathew Thomas 2025-02-14 19:09:57 UTC
Created attachment 307652 [details]
lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt
Comment 48 Joel Mathew Thomas 2025-02-14 19:10:23 UTC
Created attachment 307653 [details]
lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt
Comment 49 Joel Mathew Thomas 2025-02-14 19:10:39 UTC
Created attachment 307654 [details]
dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move_patch.txt
Comment 50 Joel Mathew Thomas 2025-02-14 19:11:09 UTC
Created attachment 307655 [details]
lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch
Comment 51 Joel Mathew Thomas 2025-02-14 19:11:28 UTC
Created attachment 307656 [details]
lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch
Comment 52 Joel Mathew Thomas 2025-02-14 19:11:48 UTC
Created attachment 307657 [details]
dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.log
Comment 53 Joel Mathew Thomas 2025-02-14 19:14:32 UTC
To ensure clarity and avoid any misunderstandings, I am also attaching the exact patch files I applied from the series. This will help confirm that they were applied correctly and prevent any future confusion.
Comment 54 Joel Mathew Thomas 2025-02-14 19:15:10 UTC
Created attachment 307658 [details]
series patch file : regix_lnkctl2move.patch
Comment 55 Ilpo Järvinen 2025-02-17 12:24:45 UTC
Comment on attachment 307630 [details]
Disable bwctrl during reset

An updated patch coming in a moment.
Comment 56 Ilpo Järvinen 2025-02-17 12:37:21 UTC
Created attachment 307667 [details]
Improved disable BW notifications during reset patch

Okay, that's great news to hear. I've improved the fix patch to consider MFD siblings better. I also slightly reordered things slightly from previous so that disable is only called after device has been put into D0.

If you could test the improved patch still works and possibly give your Tested-by for it :-) (please test the patch without any other patches so we know it solves the PCIe device lost on VM start issue).

The lnkctl2 issue is likely orthogonal.

From your wording, it is unclear to me whether a test was conducted with the 1st version of the reset fix and the lnkctl2 fix series included at the same time. In addition to confirming the improved reset fix works alone, I'd want know what is the result when combining the fixes if that wasn't yet tested. Will the device remain operational and what is the Link Speed when both the reset fix and the lnkctl2 series are tested together?

FYI, I'll be quite busy the remaining week and might not reply until the next week.
Comment 57 Joel Mathew Thomas 2025-02-17 14:31:01 UTC
(In reply to Ilpo Järvinen from comment #56)
> Created attachment 307667 [details]
> Improved disable BW notifications during reset patch
> 
> Okay, that's great news to hear. I've improved the fix patch to consider MFD
> siblings better. I also slightly reordered things slightly from previous so
> that disable is only called after device has been put into D0.
> 
> If you could test the improved patch still works and possibly give your
> Tested-by for it :-) (please test the patch without any other patches so we
> know it solves the PCIe device lost on VM start issue).

Tested-by: Joel Mathew Thomas proxy0@tutamail.com

I’ve tested the patch on my setup, and I can confirm that GPU passthrough is now functioning correctly.
Comment 58 Joel Mathew Thomas 2025-02-17 14:32:03 UTC
Created attachment 307668 [details]
lspci_before_vm_boot_6.14.0-rc3-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
Comment 59 Joel Mathew Thomas 2025-02-17 14:32:32 UTC
Created attachment 307669 [details]
lspci_after_vm_boot_6.14.0-rc3-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
Comment 60 Joel Mathew Thomas 2025-02-17 14:32:50 UTC
Created attachment 307670 [details]
dmesg_6.14.0-rc3-dirty-0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
Comment 61 Joel Mathew Thomas 2025-02-17 14:39:59 UTC
(In reply to Ilpo Järvinen from comment #56)
> From your wording, it is unclear to me whether a test was conducted with the
> 1st version of the reset fix and the lnkctl2 fix series included at the same
> time. In addition to confirming the improved reset fix works alone, I'd want
> know what is the result when combining the fixes if that wasn't yet tested.
> Will the device remain operational and what is the Link Speed when both the
> reset fix and the lnkctl2 series are tested together?


If my understanding is correct, I think you are asking if I applied the following patches together at once.

2025-01-23  5:51  [PATCH v4 1/2] PCI: Fix the wrong reading of register fields Jiwei Sun
2025-01-23  5:51  [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register Jiwei Sun

If so, yes, I have applied both the patches together, and tested it as a single patch.
And GPU Passthrough fails, and the Link speed also deteriorates from 16GT/s to 2.5GT/s.

I had also attached the exact patch(attachment 307658 [details]) which i applied, which was a combination of both the above patches.

attachment 307658 [details]: series patch file : regix_lnkctl2move.patch

I had also attached the corresponding lspci outputs and dmesg logs:

attachment 307652 [details]: lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt

attachment 307653 [details]: lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt

attachment 307654 [details]: dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move_patch.txt
Comment 62 Ilpo Järvinen 2025-02-17 16:33:21 UTC
No, there is the reset fix and a series of two patches related for lnkctl2, which totals to 3 patches. I'd like to see the combined result with all three applied into the same kernel because 2.5GT/s is likely caused by the lack of the reset fix if only those 2 patch lnkctl2 series are used (same symptom but a different cause). So these 3:

[PATCH v4 1/2] PCI: Fix the wrong reading of register fields Jiwei Sun
[PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register Jiwei Sun
[PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset Ilpo Järvinen

Please confirm if you did test this combination or not?

(My apologies for the confusion despite me trying to be very specific what I meant.)

(And thanks for the tested by, I'll make the official submission of that fix now that it has been confirmed.)
Comment 63 Joel Mathew Thomas 2025-02-17 17:12:10 UTC
(In reply to Ilpo Järvinen from comment #62)
> No, there is the reset fix and a series of two patches related for lnkctl2,
> which totals to 3 patches. I'd like to see the combined result with all
> three applied into the same kernel because 2.5GT/s is likely caused by the
> lack of the reset fix if only those 2 patch lnkctl2 series are used (same
> symptom but a different cause). So these 3:
> 
> [PATCH v4 1/2] PCI: Fix the wrong reading of register fields Jiwei Sun
> [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2
> register Jiwei Sun
> [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset Ilpo Järvinen
> 
> Please confirm if you did test this combination or not?
> 
> (My apologies for the confusion despite me trying to be very specific what I
> meant.)
> 
> (And thanks for the tested by, I'll make the official submission of that fix
> now that it has been confirmed.)

Thank you for the clarification.

I apologize for the misunderstanding. To confirm, I have not yet tested the combination of all three patches you mentioned:

    [PATCH v4 1/2] PCI: Fix the wrong reading of register fields – Jiwei Sun
    [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register – Jiwei Sun
    [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset – Ilpo Järvinen (the improved patch attachment_307667)

I will proceed with testing this combination and provide feedback as soon as possible.

Regarding the official submission of the fix, could you please point me to where I can view it once it has been finalized? I’d appreciate knowing where I can track the submission.

Thank you for your work on this, and I’ll follow up shortly after testing the full patch set.
Comment 64 Joel Mathew Thomas 2025-02-17 17:50:06 UTC
(In reply to Joel Mathew Thomas from comment #63)
>     [PATCH v4 1/2] PCI: Fix the wrong reading of register fields – Jiwei Sun
>     [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2
> register – Jiwei Sun
>     [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset – Ilpo
> Järvinen (the improved patch attachment_307667)
> 
> I will proceed with testing this combination and provide feedback as soon as
> possible.
> Thank you for your work on this, and I’ll follow up shortly after testing
> the full patch set.

I have tested with all the three patches applied at once.
And the results are:

  GPU Passthrough works.
  Link speed gets downgraded to 2.5GT/s after VM start.
  Link speed goes back up to 16GT/s, after VM shutdown.

Attachments: lspci outputs, and dmesg logs.
Comment 65 Joel Mathew Thomas 2025-02-17 17:50:31 UTC
Created attachment 307673 [details]
lspci_before_vm_boot_6.14.0-rc3-dirty_reset_regfix_lnkctl2move_patch
Comment 66 Joel Mathew Thomas 2025-02-17 17:50:49 UTC
Created attachment 307674 [details]
lspci_after_vm_boot_6.14.0-rc3-dirty_reset_regfix_lnkctl2move_patch.txt
Comment 67 Joel Mathew Thomas 2025-02-17 17:51:06 UTC
Created attachment 307675 [details]
dmesg_6.14.0-rc3-dirty_reset_regix_lnk2ctl_patch
Comment 68 Joel Mathew Thomas 2025-02-17 17:52:54 UTC
Created attachment 307676 [details]
exact patch applied : reset_regix_lnkctl2move.patch (total 3 patches)
Comment 69 Ilpo Järvinen 2025-02-17 19:08:18 UTC
You were as Cc in the patch submission to linux-pci mailing list. Besides that, there's patchwork that keeps track of patches (but not much interesting happens there besides patches just sitting in the queue until they're applied :-)):

https://patchwork.kernel.org/project/linux-pci/patch/20250217165258.3811-1-ilpo.jarvinen@linux.intel.com/

If I need to send another version, then the patchwork will have a new entiry for the v2 patch (and so on for any other version).


I need to dig in more into all the logs so far to dig deeper into this 2.5GT/s downgrade but it won't happen today, and figure out a good way to debug it as Target Link Speed is at 16GT/s but for some reason Link remains at 2.5GT/s.
Comment 70 Joel Mathew Thomas 2025-02-17 19:29:53 UTC
(In reply to Ilpo Järvinen from comment #69)
> You were as Cc in the patch submission to linux-pci mailing list. Besides
> that, there's patchwork that keeps track of patches (but not much
> interesting happens there besides patches just sitting in the queue until
> they're applied :-)):
> 
> https://patchwork.kernel.org/project/linux-pci/patch/20250217165258.3811-1-
> ilpo.jarvinen@linux.intel.com/
> 
> If I need to send another version, then the patchwork will have a new entiry
> for the v2 patch (and so on for any other version).
> 
> 
> I need to dig in more into all the logs so far to dig deeper into this
> 2.5GT/s downgrade but it won't happen today, and figure out a good way to
> debug it as Target Link Speed is at 16GT/s but for some reason Link remains
> at 2.5GT/s.

Thanks for the update and for CC’ing me on the patch submission. I’ll keep an eye on it.
I’m happy to help however I can.
Comment 71 Ilpo Järvinen 2025-03-04 14:34:08 UTC
Hi again,

Lukas Wunner has various suggestions on what should be looked at. He asked to check the values of (and if they differ between kernel versions):

/sys/bus/pci/devices/0000:01:00.0/reset_method
/sys/bus/pci/devices/0000:01:00.1/reset_method

He also wanted to have an acpidump.
Comment 72 Ilpo Järvinen 2025-03-04 14:42:05 UTC
Created attachment 307741 [details]
Increase a few waits that may be related

Test patch to see if increased delay allows the transaction to complete, it should be tested without the bwctrl fix. The contains also another minor tweak to nvidia hda d3hot delay, just in case it would be too small for some reason.
Comment 73 Joel Mathew Thomas 2025-03-04 18:25:46 UTC
Hi Ilpo,

I've performed the requested tests and observed the following:
Test Results
1. Arch Kernel (6.13.5-arch1-1) with pcie_port_pm=off

    GPU passthrough works (as before, pcie_port_pm=off only works on Arch kernel).
    Link speed remains at 2.5GT/s before and after VM boot because the GPU was attached to a process before VM boot.

2. Custom Kernel (6.14.0-rc5-00013-g99fa936e8e4f) - No Patches

    GPU passthrough does not work.
    Link speed downgrades from 16GT/s → 2.5GT/s after VM boot.
    Before VM boot, link speed remains at 16GT/s due to no NVIDIA modules being loaded in the custom kernel.

3. Custom Kernel with Fixes (6.14.0-rc5-00013-g99fa936e8e4f_fix-reg-read-adjust-wait)

Patches applied:

    [PATCH v4 1/2] PCI: Fix the wrong reading of register fields

    [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register

    0001-DEBUG-Adjust-a-few-waits.patch

    GPU passthrough does not work.

    Link speed downgrade observed (16GT/s → 2.5GT/s) after VM boot.

    Before VM boot, link speed remains at 16GT/s due to no NVIDIA modules being loaded.

4. Custom Kernel with Only Wait Adjustment Patch (6.14.0-rc5-00013-g99fa936e8e4f_wait)

Patches applied:

    0001-DEBUG-Adjust-a-few-waits.patch only

    GPU passthrough does not work.

    Link speed downgrade observed (16GT/s → 2.5GT/s) after VM boot.

    Before VM boot, link speed remains at 16GT/s due to no NVIDIA modules being loaded.

Additional Information

    acpidump logs will be attached as requested.
    Reset method values (/sys/bus/pci/devices/.../reset_method)
        No changes observed in values across different kernel versions.

Let me know if further tests are required.
Comment 74 Joel Mathew Thomas 2025-03-04 18:26:42 UTC
Created attachment 307742 [details]
acpidump
Comment 75 Joel Mathew Thomas 2025-03-04 18:27:02 UTC
Created attachment 307743 [details]
reset method values
Comment 76 Joel Mathew Thomas 2025-03-04 18:27:28 UTC
Created attachment 307744 [details]
dmesg_log_6.13.5-arch1-1-pcie_port_pm_off_before_vm_boot
Comment 77 Joel Mathew Thomas 2025-03-04 18:27:41 UTC
Created attachment 307745 [details]
lspci_6.13.5-arch1-1_pcie_port_pm_off_before_vm_boot
Comment 78 Joel Mathew Thomas 2025-03-04 18:28:14 UTC
Created attachment 307746 [details]
dmesg_log_6.13.5-arch1-1-pcie_port_pm_off_after_vm_boot_working
Comment 79 Joel Mathew Thomas 2025-03-04 18:28:29 UTC
Created attachment 307747 [details]
lspci_6.13.5-arch1-1_pcie_port_pm_off_after_vm_boot_working
Comment 80 Joel Mathew Thomas 2025-03-04 18:28:54 UTC
Created attachment 307748 [details]
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f_before_vm_boot
Comment 81 Joel Mathew Thomas 2025-03-04 18:29:21 UTC
Created attachment 307749 [details]
lspci_6.14.0-rc5-00013-g99fa936e8e4f_before_vm_boot
Comment 82 Joel Mathew Thomas 2025-03-04 18:29:40 UTC
Created attachment 307750 [details]
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f_after_vm_boot_non_working
Comment 83 Joel Mathew Thomas 2025-03-04 18:29:56 UTC
Created attachment 307752 [details]
lspci_6.14.0-rc5-00013-g99fa936e8e4f_after_vm_boot_non_working
Comment 84 Joel Mathew Thomas 2025-03-04 18:30:15 UTC
Created attachment 307753 [details]
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_before_vm_boot
Comment 85 Joel Mathew Thomas 2025-03-04 18:30:36 UTC
Created attachment 307754 [details]
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_before_vm_boot
Comment 86 Joel Mathew Thomas 2025-03-04 18:31:05 UTC
Created attachment 307755 [details]
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_after_vm_boot_non_working
Comment 87 Joel Mathew Thomas 2025-03-04 18:31:17 UTC
Created attachment 307756 [details]
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_after_vm_boot_non_working
Comment 88 Joel Mathew Thomas 2025-03-04 18:31:35 UTC
Created attachment 307757 [details]
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_before_vm_boot
Comment 89 Joel Mathew Thomas 2025-03-04 18:31:48 UTC
Created attachment 307758 [details]
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_before_vm_boot
Comment 90 Joel Mathew Thomas 2025-03-04 18:35:07 UTC
Created attachment 307759 [details]
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_fix-reg-read-adjust-wait_before_vm_boot
Comment 91 Joel Mathew Thomas 2025-03-04 18:35:26 UTC
Created attachment 307760 [details]
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_fix-reg-read-adjust-wait_before_vm_boot
Comment 92 Joel Mathew Thomas 2025-03-04 18:35:51 UTC
Created attachment 307761 [details]
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_fix-reg-read-adjust-wait_after_vm_boot_non_working
Comment 93 Joel Mathew Thomas 2025-03-04 18:36:36 UTC
Created attachment 307762 [details]
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_fix-reg-read-adjust-wait_after_vm_boot_non_working
Comment 94 Joel Mathew Thomas 2025-03-04 18:36:57 UTC
Created attachment 307763 [details]
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_wait_before_vm_boot
Comment 95 Joel Mathew Thomas 2025-03-04 18:37:12 UTC
Created attachment 307764 [details]
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_wait_before_vm_boot
Comment 96 Joel Mathew Thomas 2025-03-04 18:38:08 UTC
Created attachment 307765 [details]
dmesg_log_6.14.0-rc5-00013-g99fa936e8e4f-dirty_wait_after_vm_boot_non_working
Comment 97 Joel Mathew Thomas 2025-03-04 18:38:19 UTC
Created attachment 307766 [details]
lspci_6.14.0-rc5-00013-g99fa936e8e4f-dirty_after_wait_vm_boot_non_working
Comment 98 Ilpo Järvinen 2025-03-06 11:16:41 UTC
Okay thanks for testing anyway.

It would be nice to understand what triggers that SBR (reset slot) in the non-working case. If you could add WARN_ON(1) into pciehp_reset_slot() after the probe has been checked to get a stacktrace.

Also, the dmesg might provide more clues if taken with dyndbg enabled. So before starting the VM, please enabled pci & vfio dyndebug lines with:

echo 'file drivers/pci/* +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file drivers/pci/pcie/* +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file drivers/pci/hotplug/* +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file drivers/vfio/pci/* +p' > /sys/kernel/debug/dynamic_debug/control

(It might be that the subdir ones are unnecessary, I'm not sure about the nesting rules.)
Comment 99 Joel Mathew Thomas 2025-03-06 11:43:26 UTC
(In reply to Ilpo Järvinen from comment #98)
> Okay thanks for testing anyway.
> 
> It would be nice to understand what triggers that SBR (reset slot) in the
> non-working case. If you could add WARN_ON(1) into pciehp_reset_slot() after
> the probe has been checked to get a stacktrace.
> 
> Also, the dmesg might provide more clues if taken with dyndbg enabled. So
> before starting the VM, please enabled pci & vfio dyndebug lines with:
> 
> echo 'file drivers/pci/* +p' > /sys/kernel/debug/dynamic_debug/control
> echo 'file drivers/pci/pcie/* +p' > /sys/kernel/debug/dynamic_debug/control
> echo 'file drivers/pci/hotplug/* +p' >
> /sys/kernel/debug/dynamic_debug/control
> echo 'file drivers/vfio/pci/* +p' > /sys/kernel/debug/dynamic_debug/control
> 
> (It might be that the subdir ones are unnecessary, I'm not sure about the
> nesting rules.)

Could you clarify whether I should test this on a kernel without any patches, or if I should include specific patches? If so, which ones?
Comment 100 Ilpo Järvinen 2025-03-06 11:47:44 UTC
I think it would be best to have these patches included since they fix obvious, known problems:

[PATCH v4 1/2] PCI: Fix the wrong reading of register fields
[PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register
Comment 101 Joel Mathew Thomas 2025-03-06 18:09:39 UTC
(In reply to Ilpo Järvinen from comment #98)
> Okay thanks for testing anyway.
> 
> It would be nice to understand what triggers that SBR (reset slot) in the
> non-working case. If you could add WARN_ON(1) into pciehp_reset_slot() after
> the probe has been checked to get a stacktrace.
> 
> Also, the dmesg might provide more clues if taken with dyndbg enabled. So
> before starting the VM, please enabled pci & vfio dyndebug lines with:
> 
> echo 'file drivers/pci/* +p' > /sys/kernel/debug/dynamic_debug/control
> echo 'file drivers/pci/pcie/* +p' > /sys/kernel/debug/dynamic_debug/control
> echo 'file drivers/pci/hotplug/* +p' >
> /sys/kernel/debug/dynamic_debug/control
> echo 'file drivers/vfio/pci/* +p' > /sys/kernel/debug/dynamic_debug/control
> 
> (It might be that the subdir ones are unnecessary, I'm not sure about the
> nesting rules.)

I have performed the requested tests and gathered the necessary logs.

    Modifications made:
        Added WARN_ON(1); in pciehp_reset_slot() to capture a stack trace.
        Enabled dyndbg logging for PCI and VFIO as per the instructions.

    Attached logs:
        lspci output before and after VM boot.
        dmesg logs before and after VM boot with dyndbg enabled.

Let me know if you need any additional information or specific tests.

Thanks!
Comment 102 Joel Mathew Thomas 2025-03-06 18:10:25 UTC
Created attachment 307775 [details]
dmesg_log_6.14.0-rc5-00039-g848e07631744-dirty_before_vm_boot_stacktrace
Comment 103 Joel Mathew Thomas 2025-03-06 18:10:38 UTC
Created attachment 307776 [details]
lspci_6.14.0-rc5-00039-g848e07631744-dirty_before_vm_boot_stacktrace
Comment 104 Joel Mathew Thomas 2025-03-06 18:11:05 UTC
Created attachment 307777 [details]
dmesg_log_6.14.0-rc5-00039-g848e07631744-dirty_after_vm_boot_non_working_stacktrace
Comment 105 Joel Mathew Thomas 2025-03-06 18:11:20 UTC
Created attachment 307778 [details]
lspci_6.14.0-rc5-00039-g848e07631744-dirty_after_vm_boot_non_working_stacktrace
Comment 106 Ilpo Järvinen 2025-03-10 17:10:01 UTC
Created attachment 307788 [details]
Hotplug fix

I think I finally found the real cause for this problem on hotplug side. BW controller is related only by sharing the interrupt with hotplug, the extra interrupts that come due to BW controller setting LBMIE confuse hotplug interrupt handling which thought it had disabled certain bits over the duration of reset. Eventually that lead to hotplug thinking the Link went down which triggers unconfiguring the GPU.

It could be that the added atomic_and() is unnecessary but I left it in place just in case. But if the patch works, it would be useful to test also without that atomic_and() as I think the synchronize_irq() should make sure there are no pending events to handle.
Comment 107 Joel Mathew Thomas 2025-03-10 21:19:42 UTC
(In reply to Ilpo Järvinen from comment #106)
> Created attachment 307788 [details]
> Hotplug fix
> 
> I think I finally found the real cause for this problem on hotplug side. BW
> controller is related only by sharing the interrupt with hotplug, the extra
> interrupts that come due to BW controller setting LBMIE confuse hotplug
> interrupt handling which thought it had disabled certain bits over the
> duration of reset. Eventually that lead to hotplug thinking the Link went
> down which triggers unconfiguring the GPU.
> 
> It could be that the added atomic_and() is unnecessary but I left it in
> place just in case. But if the patch works, it would be useful to test also
> without that atomic_and() as I think the synchronize_irq() should make sure
> there are no pending events to handle.

Hi Ilpo,

I have tested the patch as requested and observed the following:

Kernel: 6.14.0-rc6-dirty-reset-hotplugfix

Patches applied:

    [PATCH v4 1/2] PCI: Fix the wrong reading of register fields
    [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register
    0001-PCI-hotplug-Disable-HPIE-over-reset.patch

Observations:

    Link speed downgrade observed (16GT/s → 2.5GT/s) after VM boot.
    GPU passthrough is working.


Kernel: 6.14.0-rc6-dirty_reset_without_atomic_and

Patches applied:

    [PATCH v4 1/2] PCI: Fix the wrong reading of register fields
    [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register
    0001-PCI-hotplug-Disable-HPIE-over-reset.patch without atomic_and(~stat_mask, &ctrl->pending_events);

Observations:

    Link speed downgrade observed (16GT/s → 2.5GT/s) after VM boot.
    GPU passthrough is working.


Kernel: 6.14.0-rc6-dirty_no_reset_without_atomic_and

Patches applied:

    0001-PCI-hotplug-Disable-HPIE-over-reset.patch without atomic_and(~stat_mask, &ctrl->pending_events);

Observations:

    Link speed downgrade observed (16GT/s → 2.5GT/s) after VM boot.
    GPU passthrough is working.


I will attach before and after lspci and dmesg logs for each kernel version.
Please let me know if any additional tests are needed.
Comment 108 Joel Mathew Thomas 2025-03-10 21:20:02 UTC
Created attachment 307789 [details]
dmesg_log_6.14.0-rc6-dirty_before_vm_boot_patch_reset_hotplugfix
Comment 109 Joel Mathew Thomas 2025-03-10 21:20:39 UTC
Created attachment 307790 [details]
lspci_6.14.0-rc6-dirty_before_vm_boot_patch_reset_hotplugfix
Comment 110 Joel Mathew Thomas 2025-03-10 21:21:02 UTC
Created attachment 307791 [details]
dmesg_log_6.14.0-rc6-dirty_after_vm_boot_working_patch_reset_hotplugfix
Comment 111 Joel Mathew Thomas 2025-03-10 21:21:14 UTC
Created attachment 307792 [details]
lspci_6.14.0-rc6-dirty_after_vm_boot_working_patch_reset_hotplugfix
Comment 112 Joel Mathew Thomas 2025-03-10 21:21:53 UTC
Created attachment 307793 [details]
dmesg_log_6.14.0-rc6-dirty_before_vm_boot_patch_reset_without_atomic_and
Comment 113 Joel Mathew Thomas 2025-03-10 21:22:21 UTC
Created attachment 307794 [details]
lspci_6.14.0-rc6-dirty_before_vm_boot_patch_reset_without_atomic_and
Comment 114 Joel Mathew Thomas 2025-03-10 21:22:43 UTC
Created attachment 307795 [details]
dmesg_log_6.14.0-rc6-dirty_after_vm_boot_working_patch_reset_without_atomic_and
Comment 115 Joel Mathew Thomas 2025-03-10 21:22:59 UTC
Created attachment 307796 [details]
lspci_6.14.0-rc6-dirty_after_vm_boot_working_patch_reset_without_atomic_and
Comment 116 Joel Mathew Thomas 2025-03-10 21:23:17 UTC
Created attachment 307797 [details]
dmesg_log_6.14.0-rc6-dirty_before_vm_boot_no_reset_without_atomic_and
Comment 117 Joel Mathew Thomas 2025-03-10 21:23:32 UTC
Created attachment 307798 [details]
lspci_6.14.0-rc6-dirty_before_vm_boot_no_reset_without_atomic_and
Comment 118 Joel Mathew Thomas 2025-03-10 21:23:57 UTC
Created attachment 307799 [details]
dmesg_log_6.14.0-rc6-dirty_after_vm_boot_working_no_reset_without_atomic_and
Comment 119 Joel Mathew Thomas 2025-03-10 21:24:12 UTC
Created attachment 307800 [details]
lspci_6.14.0-rc6-dirty_after_vm_boot_working_no_reset_without_atomic_and
Comment 120 Ilpo Järvinen 2025-03-11 16:44:09 UTC
Hi,

Thanks again for all the help, I'll make an official submission of the hotplug fix soon.

Looking the various logs, I think the 2.5GT/s issue is either red herring (GPU driver somehow told the card to downgrade the Link Speed, I'm under impression GPUs tend to do that) or it's because of some entirely different commit because lspci_working_config_after_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668.txt also has 2.5GT/s and that was before any bwctrl commits.
Comment 121 Ilpo Järvinen 2025-03-11 16:46:57 UTC
I forgot to ask but can I put your tested by tag to this hotplug fix?
Comment 122 Joel Mathew Thomas 2025-03-11 16:57:40 UTC
(In reply to Ilpo Järvinen from comment #121)
> I forgot to ask but can I put your tested by tag to this hotplug fix?

Hi Ilpo,

Thanks for all your efforts on this! I’d be happy to have my Tested-by tag added to the hotplug fix—please go ahead.

Tested-by: Joel Mathew Thomas proxy0@tutamail.com

Regarding the Link Speed downgrade to 2.5GT/s, I’ve observed a similar behavior on an entirely different system. Running vkcube on a testbench/laptop with an Intel PCIe root port (Intel 12450H CPU, RTX 3050 Mobile dGPU) also results in a link speed reduction. If it is some kind of bug, hopefully it gets fixed in the future.

Let me know if there’s anything else you’d like me to do!
Comment 123 Lukas Wunner 2025-03-11 20:02:17 UTC
(In reply to Ilpo Järvinen from comment #106)
> Created attachment 307788 [details]
> Hotplug fix

I'd assume that clearing and re-enabling DLLSCE and PDCE in the Slot Control register is now unnecessary. Any reason why it's kept?

And yes the atomic_and() seems unnecessary.

I think I'd put this...

	if (!pciehp_poll_mode)
		ctrl_mask |= PCI_EXP_SLTCTL_HPIE;

... above the lines which determine stat_mask for clarity.

There seems to be an issue on unbind of pciehp: pciehp_remove() calls cleanup_slot() -> pci_hp_destroy(). Afterwards, pciehp_reset_slot() is no longer entered (but it may still be in execution concurrently -- oops). But before cleanup_slot() is called, the IRQ is already freed via pcie_shutdown_notification(). Once this has happened, I'm not sure it's safe to call synchronize_irq() or kthread_unpark(). Hmmm...
Comment 124 Lukas Wunner 2025-03-11 20:10:23 UTC
I'm working on patches to get rid of pci_slot_mutex and I'm thinking of holding a ref on the hotplug_slot as long as the reset is ongoing. So the hotplug_slot and controller struct will not be freed until the reset has concluded. Hence it's probably fine if you ignore the concurrency issue of ->reset_slot with pciehp_remove() for now as I can take care of it separately, but I'm worried how to avoid synchronize_irq() if the IRQ has already been freed...
Comment 125 Lukas Wunner 2025-03-12 06:36:56 UTC
(In reply to Lukas Wunner from comment #123)
> There seems to be an issue on unbind of pciehp: pciehp_remove() calls
> cleanup_slot() -> pci_hp_destroy(). Afterwards, pciehp_reset_slot() is no
> longer entered (but it may still be in execution concurrently -- oops).

Scratch the sentence in parentheses. I've realized that this is currently properly serialized by pci_slot_mutex.

But the problem remains that pciehp_reset_slot() probably shouldn't call synchronize_irq() or kthread_unpark() if pciehp is being removed.
Comment 126 Ilpo Järvinen 2025-03-12 14:26:47 UTC
I'm very hesitant to remove DLLSCE clearing because of the commit bbe54ea5330d ("PCI: pciehp: Disable Data Link Layer State Changed event on suspend"). Disabling just HPIE wasn't enough during suspend for some reason. At minimum, it should be removed in a separate patch so bisect can pinpoint to it if it ends up causing some issue similar to what bbe54ea5330d fixed.

As for the race with freeing irq/thread, I suggest taking reset_lock for read in pciehp_free_irq() and clearing the variables after they're freed (and the variable values in pciehp_reset_slot()).
Comment 127 Ilpo Järvinen 2025-03-13 14:32:40 UTC
Hi,

I've just sent patch series to linux-pci mailing list which attempts to address Lukas' concerns and suggestion.

I'd appreciate if Joel could retest the fixes in that series still work as I've made some changes that could have unintended sideeffects or bugs in them. I didn't post the series here as bugzilla is a bit annoying when dealing with more than one patch.

Joel should be among receipients of the patch series but in case you cannot find the emails, the patches can be found from the lore archives:

https://lore.kernel.org/linux-pci/20250313142333.5792-1-ilpo.jarvinen@linux.intel.com/T/#t
Comment 128 Joel Mathew Thomas 2025-03-13 15:28:38 UTC
(In reply to Ilpo Järvinen from comment #127)
> Hi,
> 
> I've just sent patch series to linux-pci mailing list which attempts to
> address Lukas' concerns and suggestion.
> 
> I'd appreciate if Joel could retest the fixes in that series still work as
> I've made some changes that could have unintended sideeffects or bugs in
> them. I didn't post the series here as bugzilla is a bit annoying when
> dealing with more than one patch.
> 
> Joel should be among receipients of the patch series but in case you cannot
> find the emails, the patches can be found from the lore archives:
> 
> https://lore.kernel.org/linux-pci/20250313142333.5792-1-ilpo.jarvinen@linux.
> intel.com/T/#t

I'll definitely test it, and let you know about the results.
Comment 129 Joel Mathew Thomas 2025-03-13 16:54:08 UTC
Hi Ilpo,

I have tested the patch series as requested and observed the following:

Kernel Version: 6.14.0-rc6-00022-gb7f94fcf5546-dirty-pci-hotplug-reset-fixes
Patches Applied:
    [PATCH 1/4] PCI/hotplug: Disable HPIE over reset
    [PATCH 2/4] PCI/hotplug: Clearing HPIE for the duration of reset is enough
    [PATCH 3/4] PCI/hotplug: reset_lock is not required synchronizing with irq thread
    [PATCH 4/4] PCI/hotplug: Don't enable HPIE in poll mode

Observations:

    GPU passthrough functions correctly with the applied patches.
    The previously observed error messages related to PCIe Bus Errors, such as "BadTLP," are no longer present in the dmesg outputs.

Tested-by: Joel Mathew Thomas proxy0@tutamail.com

The lspci and dmesg logs before and after the VM boot,have been attached.
Please let me know if any additional tests are needed.

Thank You.
Comment 130 Joel Mathew Thomas 2025-03-13 16:55:00 UTC
Created attachment 307815 [details]
dmesg_log_6.14.0-rc6-00022-gb7f94fcf5546-dirty-pci-hotplug-reset-fixes_before_vm_boot
Comment 131 Joel Mathew Thomas 2025-03-13 16:55:12 UTC
Created attachment 307816 [details]
lspci_6.14.0-rc6-00022-gb7f94fcf5546-dirty-pci-hotplug-reset-fixes_before_vm_boot
Comment 132 Joel Mathew Thomas 2025-03-13 16:55:28 UTC
Created attachment 307817 [details]
dmesg_log_6.14.0-rc6-00022-gb7f94fcf5546-dirty-pci-hotplug-reset-fixes_after_vm_boot_working
Comment 133 Joel Mathew Thomas 2025-03-13 16:55:45 UTC
Created attachment 307818 [details]
lspci_6.14.0-rc6-00022-gb7f94fcf5546-dirty-pci-hotplug-reset-fixes_after_vm_boot_working
Comment 134 Joel Mathew Thomas 2025-03-15 19:05:35 UTC
Hi Lukas,

> @Joel Mathew Thomas, could you give the below patch a spin and see
> if it helps?

I've tested the patch series along with the additional patch provided.

Kernel: 6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix

Patches applied:
- [PATCH 1/4] PCI/hotplug: Disable HPIE over reset
- [PATCH 2/4] PCI/hotplug: Clearing HPIE for the duration of reset is enough
- [PATCH 3/4] PCI/hotplug: reset_lock is not required synchronizing with irq thread
- [PATCH 4/4] PCI/hotplug: Don't enable HPIE in poll mode
- The latest patch from you:
  ```diff
  +	/* Ignore events masked by pciehp_reset_slot(). */
  +	events &= ctrl->slot_ctrl;
  +	if (!events)
  +		return IRQ_HANDLED;
  ```

Observations:

    GPU passthrough works.
    Link speed degrades from 16GT/s to 2.5GT/s (same as before).
    The previous error that disappeared with the 4-patch series alone has now reappeared with this patch:

    	[  351.460502] pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0
	[  351.460517] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
	[  351.460520] vfio-pci 0000:01:00.0:   device [10de:28e1] error status/mask=00000040/0000a000
	[  351.460523] vfio-pci 0000:01:00.0:    [ 6] BadTLP

Tested-by: Joel Mathew Thomas proxy0@tutamail.com

lspci and dmesg logs will be attached.

Thanks again for the continued work on this!

Joel
Comment 135 Joel Mathew Thomas 2025-03-15 19:06:09 UTC
Created attachment 307830 [details]
dmesg_6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix_beforevmboot
Comment 136 Joel Mathew Thomas 2025-03-15 19:06:23 UTC
Created attachment 307831 [details]
lspci_6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix_beforevmboot
Comment 137 Joel Mathew Thomas 2025-03-15 19:06:40 UTC
Created attachment 307832 [details]
dmesg_6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix_aftervmboot_working
Comment 138 Joel Mathew Thomas 2025-03-15 19:06:55 UTC
Created attachment 307833 [details]
lspci_6.14.0-rc6-00043-g3571e8b091f4-dirty-pci-hotplug-reset-fixes-eventmask-fix_aftervmboot_working
Comment 139 Joel Mathew Thomas 2025-03-15 19:22:27 UTC
Hi all,

Apologies — I tried replying to the mailing list, but it seems my mail client doesn’t support plain-text replies in the interleaved style. I gave it a few attempts, but they got rejected.

So I’ll just reply here instead.

Also, sorry for the duplicate messages that came through despite the rejections.
Comment 140 Joel Mathew Thomas 2025-03-15 22:22:15 UTC
Hi Lukas,

> Could you test *only* the quoted diff, i.e. without patches [1/4] - [4/4], on
> top of a recent kernel?


I tested only the provided patch on top of a recent kernel without patches [1/4] - [4/4].

Kernel: 6.14.0-rc6-00048-geb88e6bfbc0a-dirty-eventmask-fix

Patch applied:

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c  
index bb5a8d9f03ad..99a2ac13a3d1 100644  
--- a/drivers/pci/hotplug/pciehp_hpc.c  
+++ b/drivers/pci/hotplug/pciehp_hpc.c  
@@ -688,6 +688,11 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)  
 		return IRQ_HANDLED;  
 	}  
  
+	/* Ignore events masked by pciehp_reset_slot(). */  
+	events &= ctrl->slot_ctrl;  
+	if (!events)  
+		return IRQ_HANDLED;  
+  
 	/* Save pending events for consumption by IRQ thread. */  
 	atomic_or(events, &ctrl->pending_events);  
 	return IRQ_WAKE_THREAD;  


Observations:

    GPU passthrough works.
    Link speed still degrades from 16GT/s to 2.5GT/s after VM boot.
    The BadTLP error reappears:

	[  351.460502] pcieport 0000:00:01.1: AER: Correctable error message received from 0000:01:00.0  
	[  351.460517] vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)  
	[  351.460520] vfio-pci 0000:01:00.0:   device [10de:28e1] error status/mask=00000040/0000a000  
	[  351.460523] vfio-pci 0000:01:00.0:    [ 6] BadTLP  


Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>

I’ll also attach the lspci and dmesg logs.

Thanks,

Joel
Comment 141 Joel Mathew Thomas 2025-03-15 22:22:40 UTC
Created attachment 307835 [details]
dmesg_6.14.0-rc6-00048-geb88e6bfbc0a-dirty-eventmask-fix_beforevmboot
Comment 142 Joel Mathew Thomas 2025-03-15 22:22:58 UTC
Created attachment 307836 [details]
lspci_6.14.0-rc6-00048-geb88e6bfbc0a-dirty-eventmask-fix_beforevmboot
Comment 143 Joel Mathew Thomas 2025-03-15 22:23:14 UTC
Created attachment 307837 [details]
dmesg_6.14.0-rc6-00048-geb88e6bfbc0a-dirty-eventmask-fix_aftervmboot_working
Comment 144 Joel Mathew Thomas 2025-03-15 22:23:28 UTC
Created attachment 307838 [details]
lspci_6.14.0-rc6-00048-geb88e6bfbc0a-dirty-eventmask-fix_aftervmboot_working
Comment 145 Joel Mathew Thomas 2025-03-23 14:47:49 UTC
Hi Ilpo,

I just wanted to kindly check in and ask if there are any updates. If there's anything that needs testing or further input from my side, I’m happy to help.

Thank you again for your time and efforts!
Comment 146 Lukas Wunner 2025-03-23 17:01:36 UTC
I'll submit a proper patch in the next couple of days. Two other regressions came in, one of which I've dealt with, the other I'm working on right now. My apologies for the delay.
Comment 147 Joel Mathew Thomas 2025-03-23 17:04:13 UTC
(In reply to Lukas Wunner from comment #146)
> I'll submit a proper patch in the next couple of days. Two other regressions
> came in, one of which I've dealt with, the other I'm working on right now.
> My apologies for the delay.

No worries — thanks for the update and for continuing to work on this. Looking forward to the next patch when it’s ready!
Comment 148 Lukas Wunner 2025-04-03 14:08:45 UTC
I've pushed the branch:

  https://github.com/l1k/linux/commits/pci-for-6.16

... containing material for the upcoming cycle (which starts on Monday). The two commits:

  PCI: pciehp: Ignore Presence Detect Changed caused by DPC
  PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset

... seek to fix the issue you've reported here. So far this is compile-tested only. I'd be grateful if you could give it a spin and see if it resolves the issue.

The branch is based on Linus' current master, i.e. what will become v6.15-rc1 this Sunday. It should be possible to cherry-pick the two commits cleanly onto a recent kernel release such as v6.14 or v6.13. The commits are in a draft stage, I'm still honing the commit messages and need to add kernel-doc for the three helpers introduced by the second commit.
Comment 149 Joel Mathew Thomas 2025-04-03 14:44:42 UTC
(In reply to Lukas Wunner from comment #148)
> I've pushed the branch:
> 
>   https://github.com/l1k/linux/commits/pci-for-6.16
> 
> ... containing material for the upcoming cycle (which starts on Monday). The
> two commits:
> 
>   PCI: pciehp: Ignore Presence Detect Changed caused by DPC
>   PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
> 
> ... seek to fix the issue you've reported here. So far this is
> compile-tested only. I'd be grateful if you could give it a spin and see if
> it resolves the issue.
> 
> The branch is based on Linus' current master, i.e. what will become
> v6.15-rc1 this Sunday. It should be possible to cherry-pick the two commits
> cleanly onto a recent kernel release such as v6.14 or v6.13. The commits are
> in a draft stage, I'm still honing the commit messages and need to add
> kernel-doc for the three helpers introduced by the second commit.

Hey Lukas,
Thanks for working on this! I’d definitely like to help test the patch, but I need some time to set up my VM again. I’ll let you know once I’ve tested it or if I have any doubts along the way. I'll try to do it as fast as possible.
Comment 150 Joel Mathew Thomas 2025-04-03 21:07:54 UTC
(In reply to Lukas Wunner from comment #148)
> I've pushed the branch:
> 
>   https://github.com/l1k/linux/commits/pci-for-6.16
> 
> ... containing material for the upcoming cycle (which starts on Monday). The
> two commits:
> 
>   PCI: pciehp: Ignore Presence Detect Changed caused by DPC
>   PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset
> 
> ... seek to fix the issue you've reported here. So far this is
> compile-tested only. I'd be grateful if you could give it a spin and see if
> it resolves the issue.
> 
> The branch is based on Linus' current master, i.e. what will become
> v6.15-rc1 this Sunday. It should be possible to cherry-pick the two commits
> cleanly onto a recent kernel release such as v6.14 or v6.13. The commits are
> in a draft stage, I'm still honing the commit messages and need to add
> kernel-doc for the three helpers introduced by the second commit.

Hi Lukas,

I tested the commits, and GPU passthrough is working fine. Here’s what I tested:

Kernel Version: 6.14.0-ga1da676633d4

Last 3 Commits:

    * a1da67663 (HEAD -> master) PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset  
    * c305bc35f PCI: pciehp: Ignore Presence Detect Changed caused by DPC  
    * a2cc6ff5e (grafted, origin/master, origin/HEAD) Merge tag 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Results:

    GPU passthrough works.

    Link speed is being downgraded, though I’m not sure if it’s related to this patch.

Tested-by: Joel Mathew Thomas <proxy0@tutamail.com>

I'll attach logs in case it's necessary.
Thank You.
Comment 151 Joel Mathew Thomas 2025-04-03 21:08:39 UTC
Created attachment 307920 [details]
dmesg_6.14.0-ga1da676633d4_beforevmboot
Comment 152 Joel Mathew Thomas 2025-04-03 21:09:02 UTC
Created attachment 307921 [details]
lspci_6.14.0-ga1da676633d4_beforevmboot
Comment 153 Joel Mathew Thomas 2025-04-03 21:09:31 UTC
Created attachment 307922 [details]
dmesg_6.14.0-ga1da676633d4_aftervmboot_working
Comment 154 Joel Mathew Thomas 2025-04-03 21:09:55 UTC
Created attachment 307923 [details]
lspci_6.14.0-ga1da676633d4_aftervmboot_working
Comment 155 Joel Mathew Thomas 2025-04-03 21:12:22 UTC
(In reply to Joel Mathew Thomas from comment #150) 
>     Link speed is being downgraded, though I’m not sure if it’s related to
> this patch.

Edit: Link speed is being downgraded, though I'm not sure if it's related to this issue.
Comment 156 Joel Mathew Thomas 2025-04-10 18:19:51 UTC
Hi Lukas,

Quoting your message from the mailing list:

>    First of all, PCIe hotplug is deliberately ignoring link events occurring
>    as a side effect of Downstream Port Containment. But it's not yet ignoring
>    Presence Detect Changed events. These can happen if a hotplug bridge uses
>    in-band presence detect. Reported by Keith Busch, patch [1/2] seeks to fix
>    it.
>
>    Second, PCIe hotplug is deliberately ignoring link events and Presence
>    Detect Changed events occurring as a side effect of a Secondary Bus Reset.
>    But that's no longer working properly since the introduction of bandwidth
>    control in v6.13-rc1. Actually it never worked properly, but bandwidth
>    control is now mercilessly exposing the issue. VFIO is thus broken,
>    it resets the device on passthrough. Reported by Joel Mathew Thomas.
>
>    [...]
>    This leads me to believe that we need a generic mechanism to tell hotplug
>    drivers that spurious link changes are ongoing which need to be ignored.
>    Patch [2/2] introduces an API for it and the first user is SBR handling
>    in PCIe hotplug.

I've now tested both patches:

    [PATCH 1/2] PCI: pciehp: Ignore Presence Detect Changed caused by DPC

    [PATCH 2/2] PCI: pciehp: Ignore Link Down/Up caused by Secondary Bus Reset

Kernel: 6.15.0-rc1-gab59a8605604-dirty

Results:

    GPU passthrough works.

    AER "BadTLP" warnings still occur in dmesg during passthrough reset.

    PME spurious native interrupts were also observed.

    Link speed is downgraded from 16GT/s to 2.5GT/s after reset:

    LnkSta: Speed 16GT/s, Width x8
    LnkSta: Speed 2.5GT/s (downgraded), Width x8


I'll attach the full dmesg and lspci output for reference.

Thanks again for your continued work on this!

Tested-by: Joel Mathew Thomas proxy0@tutamail.com
Comment 157 Joel Mathew Thomas 2025-04-10 18:20:09 UTC
Created attachment 307948 [details]
dmesg_6.15.0-rc1-gab59a8605604-dirty_before_vmboot
Comment 158 Joel Mathew Thomas 2025-04-10 18:20:24 UTC
Created attachment 307949 [details]
lspci_6.15.0-rc1-gab59a8605604-dirty_before_vmboot
Comment 159 Joel Mathew Thomas 2025-04-10 18:20:44 UTC
Created attachment 307950 [details]
dmesg_6.15.0-rc1-gab59a8605604-dirty_after_vmboot_working
Comment 160 Joel Mathew Thomas 2025-04-10 18:21:00 UTC
Created attachment 307951 [details]
lspci_6.15.0-rc1-gab59a8605604-dirty_after_vmboot_working

Note You need to log in before you can comment on or make changes to this bug.