Bug 219765

Summary: Regression: VFIO NVIDIA GPU Passthrough Fails in Linux 6.13.0 (GPU Reset & Audio Controller Disappears)
Product: Drivers Reporter: Joel Mathew Thomas (proxy0)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: NEW ---    
Severity: high CC: bjorn, ilpo.jarvinen, proxy0
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: Yes Bisected commit-id: 665745f274870c921020f610e2c99a3b1613519b
Attachments: dmesg logs for the kernel in which gpu passthrough works
dmesg logs for the kernel in which gpu passthrough does not work
dmesg logs for the kernel in which gpu passthrough works
dmesg logs for kernel after using pcie_port_pm=off kernel parameter on kernel 6.13.2-arch1-1
git bisect log
Patch containing the reverts of commits that caused the regression.
dmesg logs when the pcie_port_pm=off parameter is not set.
dmesg logs when the pcie_port_pm=off parameter is set.
lspci_working_config_6.14.0-rc2-00036-gecb69e7576a0_with_reverts
lspci_working_config_6.13.2-arch1-1
lspci_working_config_6.14.0-rc2-00036-gecb69e7576a0_with_reverts
lspci_non_working_config_6.13.2-arch1-1
lspci_non_working_config_6.13.2-arch1-1 before vm starts
Disable bwctrl during reset
lspci_working_config_before_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
lspci_working_config_before_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
lspci_non_working_config_after_vm_boot_6.12.0-rc1-00007-gde9a6c8d5dbf_commit_de9a6c8d5dbf
lspci_working_config_after_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
lspci_non_working_config_before_vm_boot_6.12.0-rc1-00007-gde9a6c8d5dbf_commit_de9a6c8d5dbf
lspci_before_vm_boot_latest_commit_febbc555cf0f_patch_applied_no_reverts
lspci_after_vm_boot_latest_commit_febbc555cf0f_patch_applied_no_reverts
dmesg_6.14.0-rc2-00034-gfebbc555cf0f_no_reverts_patch_applied
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_commented_out
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_commented_out
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LABIE
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LABIE
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LBMIE
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LBMIE
lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt
lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt
dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move_patch.txt
lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch
lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch
dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.log
series patch file : regix_lnkctl2move.patch
Improved disable BW notifications during reset patch
lspci_before_vm_boot_6.14.0-rc3-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
lspci_after_vm_boot_6.14.0-rc3-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
dmesg_6.14.0-rc3-dirty-0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
lspci_before_vm_boot_6.14.0-rc3-dirty_reset_regfix_lnkctl2move_patch
lspci_after_vm_boot_6.14.0-rc3-dirty_reset_regfix_lnkctl2move_patch.txt
dmesg_6.14.0-rc3-dirty_reset_regix_lnk2ctl_patch
exact patch applied : reset_regix_lnkctl2move.patch (total 3 patches)

Description Joel Mathew Thomas 2025-02-09 07:21:36 UTC
Created attachment 307599 [details]
dmesg logs for the kernel in which gpu passthrough works

After upgrading from Linux 6.12.10 to Linux 6.13.0, VFIO GPU passthrough fails for an NVIDIA GPU (AD107). The GPU is not passed through to the VM, and its audio device (01:00.1) disappears from Virt-Manager. This issue does not occur in Linux 6.12.10.

I have attached the logs.
Comment 1 Joel Mathew Thomas 2025-02-09 07:22:30 UTC
Created attachment 307600 [details]
dmesg logs for the kernel in which gpu passthrough does not work
Comment 2 Joel Mathew Thomas 2025-02-09 07:31:52 UTC
Created attachment 307601 [details]
dmesg logs for the kernel in which gpu passthrough works
Comment 3 Joel Mathew Thomas 2025-02-09 19:03:23 UTC
I was able to get a workaround by setting pcie_port_pm=off kernel parameter.

It works properly, except for poor power management.

I will attach logs below
Comment 4 Joel Mathew Thomas 2025-02-09 19:04:12 UTC
Created attachment 307603 [details]
dmesg logs for kernel after using pcie_port_pm=off kernel parameter on kernel 6.13.2-arch1-1
Comment 5 Bjorn Helgaas 2025-02-10 16:50:53 UTC
Sorry for the regression, and thanks very much for the debugging you've already done.  I don't see anything obvious in the vfio or PCI core changes between v6.12 and v6.13.  If it's practical for you to bisect this, that would be a simple (though tedious) way to zero in on it.
Comment 6 Joel Mathew Thomas 2025-02-10 16:56:27 UTC
(In reply to Bjorn Helgaas from comment #5)
> Sorry for the regression, and thanks very much for the debugging you've
> already done.  I don't see anything obvious in the vfio or PCI core changes
> between v6.12 and v6.13.  If it's practical for you to bisect this, that
> would be a simple (though tedious) way to zero in on it.

Thanks for taking the time to look into this and for your response. I really appreciate it. I'll go ahead and bisect between these versions to pinpoint the exact commit that introduced the regression.

I'll provide any findings and assist in every way I can to help resolve this. Thanks again for your support!
Comment 7 Joel Mathew Thomas 2025-02-10 19:25:21 UTC
(In reply to Bjorn Helgaas)
>Just to confirm, did you see the suggestion to try reverting
>dc421bb3c0db ("PCI: Enable runtime PM of the host bridge")?

>That would be a quicker test than a full bisection.

>I didn't explicitly cc you (I used bcc) to avoid exposing your email
>address. Feel free to respond to the linux-pci thread if you don't
>mind your address being public.

>Bjorn

I did see it. And i reverted the commit and built the kernel successfully.

But, that didn't fix it. And what's more surprising is the fact that, this time around, the pcie_port_pm=off kernel parameter did not work.

So, right now I'm trying to bisect the kernel from 6.12.10 to 6.13.0
Comment 8 Joel Mathew Thomas 2025-02-11 18:03:57 UTC
I have successfully bisected the issue.


The first bad commit is: 665745f274870c921020f610e2c99a3b1613519b - PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller.


Additionally, its child commit also needed to be reverted: de9a6c8d5dbfedb5eb3722c822da0490f6a59a45 - PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed


The pcie_port_pm=off kernel parameter did not work as a workaround.

I suspect the reason it worked on the Arch Linux-built kernel (6.13.0-arch1-1) is due to differences in their build configuration.

After reverting both commits, I successfully built the latest kernel from 6.14.0-rc2-00036-gecb69e7576a0, and GPU passthrough works correctly again.

I will be attaching the following files for reference:

1. git bisect log
2. Patch file containing the reverts

If needed, I can assist with further testing.
Please review the attached patch and consider reverting these commits upstream.
Comment 9 Joel Mathew Thomas 2025-02-11 18:05:03 UTC
Created attachment 307614 [details]
git bisect log
Comment 10 Joel Mathew Thomas 2025-02-11 18:05:33 UTC
Created attachment 307615 [details]
Patch containing the reverts of commits that caused the regression.
Comment 11 Joel Mathew Thomas 2025-02-12 10:37:33 UTC
I'd like to share some additional findings.

Currently, on the latest kernel 6.14.0-rc2-00036-gecb69e7576a0, with the reverts applied, GPU passthrough works properly with no functional issues. However, dmesg logs show the following errors:

vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
vfio-pci 0000:01:00.0:   device [10de:28e1] error status/mask=00000040/0000a000
vfio-pci 0000:01:00.0:    [ 6] BadTLP                
pcieport 0000:00:01.1: PME: Spurious native interrupt!
pcieport 0000:00:01.1: PME: Spurious native interrupt!


Observations:

These errors disappear when using the pcie_port_pm=off kernel parameter.

However, like before, this parameter does not fix GPU passthrough failures. If passthrough is broken, setting pcie_port_pm=off will not resolve it—it only removes the above dmesg errors.
Comment 12 Joel Mathew Thomas 2025-02-12 10:38:14 UTC
Created attachment 307623 [details]
dmesg logs when the pcie_port_pm=off parameter is not set.
Comment 13 Joel Mathew Thomas 2025-02-12 10:38:32 UTC
Created attachment 307624 [details]
dmesg logs when the pcie_port_pm=off parameter is set.
Comment 14 Ilpo Järvinen 2025-02-12 11:18:16 UTC
Could you attach also lspci from working and non-working configuration please.

For the record, those bus errors seem to be present also in the log with 6.12 so nothing new there.
Comment 15 Joel Mathew Thomas 2025-02-12 11:53:37 UTC
(In reply to Ilpo Järvinen from comment #14)
> Could you attach also lspci from working and non-working configuration
> please.
> 
> For the record, those bus errors seem to be present also in the log with
> 6.12 so nothing new there.

I have attached the lspci logs for both working and non-working configurations as requested.

Working configuration: 6.14.0-rc2-00036-gecb69e7576a0 with reverts applied
Non-working configuration: 6.13.2-arch1-1
A few observations:

    In the non-working configuration, the NVIDIA GPU audio controller is initially visible before the VM starts, but it disappears after the VM is started.
    
    I'm not sure if it's relevant. In the working configuration, the kernel modules listed for the NVIDIA GPU are nouveau only—not nvidia or nvidia_drm. This is simply because I haven’t built the proprietary NVIDIA kernel modules due to missing Linux headers.
Comment 16 Joel Mathew Thomas 2025-02-12 11:54:11 UTC
Created attachment 307625 [details]
lspci_working_config_6.14.0-rc2-00036-gecb69e7576a0_with_reverts
Comment 17 Joel Mathew Thomas 2025-02-12 11:54:32 UTC
Created attachment 307626 [details]
lspci_working_config_6.13.2-arch1-1
Comment 18 Ilpo Järvinen 2025-02-12 11:56:56 UTC
I'm sorry, I forgot mention the lspci should be taken with -vvv.
Comment 19 Joel Mathew Thomas 2025-02-12 12:05:50 UTC
Created attachment 307627 [details]
lspci_working_config_6.14.0-rc2-00036-gecb69e7576a0_with_reverts
Comment 20 Joel Mathew Thomas 2025-02-12 12:06:28 UTC
Created attachment 307628 [details]
lspci_non_working_config_6.13.2-arch1-1
Comment 21 Joel Mathew Thomas 2025-02-12 12:06:52 UTC
(In reply to Ilpo Järvinen from comment #18)
> I'm sorry, I forgot mention the lspci should be taken with -vvv.

I have updated the lspci outputs.
Comment 22 Ilpo Järvinen 2025-02-12 13:16:41 UTC
Besides the changes expected from PCIe BW controller being enabled, the Root Port (00:01.1) has secondary MAbort+ and PME status bits seem to be off... 

-       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
+       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-

                LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
-                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
+                       ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+
                LnkSta: Speed 2.5GT/s, Width x8
-                       TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
+                       TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-

-               RootSta: PME ReqID 0009, PMEStatus+ PMEPending+
+               RootSta: PME ReqID 0009, PMEStatus- PMEPending-


01:00.0 is obviously in quite bad shape but it's unclear why its config (including one capability register bit that is read-only as per PCIe spec!?!) got so wretched. Lots of those changes look unrelated to what PCIe BW controller itself does (in fact, bwctrl shouldn't even touch 01:00.0).

The loss of 01:00.1 is probably just collateral damage in the grand scheme of things.

Perhaps 01:00.0 is fine also on 6.13 before those resets are done for it so maybe take lspci -vvv before that point, if possible? (Before the VM start, I guess.)

In any case, there are so many changes in the lspci output for 01:00.0 that it would be useful to confirm that all those are related to the bwctrl change only (instead of changing a lot by changing the entire kernel version and if I understood correctly, even the gpu drivers were not the same).
Comment 23 Joel Mathew Thomas 2025-02-12 13:47:59 UTC
(In reply to Ilpo Järvinen from comment #22)

> Perhaps 01:00.0 is fine also on 6.13 before those resets are done for it so
> maybe take lspci -vvv before that point, if possible? (Before the VM start,
> I guess.)
  
I'll attach the lspci output before the VM starts in the non working config.
  
> In any case, there are so many changes in the lspci output for 01:00.0 that
> it would be useful to confirm that all those are related to the bwctrl
> change only (instead of changing a lot by changing the entire kernel version
> and if I understood correctly, even the gpu drivers were not the same).

To verify this, I will:

    1. Build and test the kernel from before the first bad commit.
    2. Build and test the kernel from the first bad commit and its immediate child commit, as both introduced the regression.
    3. Collect and attach lspci logs from these tests to determine whether the changes are solely related to BWCTRL.

> ...even the gpu drivers were not the same).
The GPU drivers themselves have remained the same, but the NVIDIA kernel modules were just not built in the working config.
Comment 24 Joel Mathew Thomas 2025-02-12 13:49:03 UTC
Created attachment 307629 [details]
lspci_non_working_config_6.13.2-arch1-1 before vm starts
Comment 25 Ilpo Järvinen 2025-02-12 14:44:49 UTC
Created attachment 307630 [details]
Disable bwctrl during reset

Perhaps it would help to disable BW controller while performing the reset.

A test patch attached. The patch doesn't handle parallel MFD resets entirely correctly but even in this simple form it might be enough to show if this approach does help (I'm out of time for today to code the more complex one to handle the parallel resets with a counter).
Comment 26 Joel Mathew Thomas 2025-02-12 14:52:03 UTC
(In reply to Ilpo Järvinen from comment #25)
> Created attachment 307630 [details]
> Disable bwctrl during reset
> 
> Perhaps it would help to disable BW controller while performing the reset.
> 
> A test patch attached. The patch doesn't handle parallel MFD resets entirely
> correctly but even in this simple form it might be enough to show if this
> approach does help (I'm out of time for today to code the more complex one
> to handle the parallel resets with a counter).

Sorry, I'm a bit confused. Should i apply this patch to the commit that caused the regression, or should I test it against the latest kernel?
Comment 27 Joel Mathew Thomas 2025-02-12 15:44:10 UTC
(In reply to Joel Mathew Thomas from comment #26) 
> Sorry, I'm a bit confused. Should i apply this patch to the commit that
> caused the regression, or should I test it against the latest kernel?

Disregard my previous comment, I was able to apply the patch on the latest commit. No further action needed
Comment 28 Joel Mathew Thomas 2025-02-12 17:03:57 UTC
(In reply to Ilpo Järvinen from comment #25)
> Created attachment 307630 [details]
> Disable bwctrl during reset
> 
> Perhaps it would help to disable BW controller while performing the reset.
> 
> A test patch attached. The patch doesn't handle parallel MFD resets entirely
> correctly but even in this simple form it might be enough to show if this
> approach does help (I'm out of time for today to code the more complex one
> to handle the parallel resets with a counter).

I have performed tests on different kernel versions to analyze the regression affecting PCI passthrough. Below are the details of each tested kernel and the corresponding results:

1. Kernel: 6.12.0-rc1-00005-g3491f5096668

    Commit: 3491f5096668 (Good commit)
    Modifications: No reverts, no patches
    Result: Passthrough working

2. Kernel: 6.12.0-rc1-00007-gde9a6c8d5dbf

    Commit: de9a6c8d5dbf (Bad commit)
    Modifications: No reverts, no patches
    Result: Passthrough not working

3. Kernel: 6.14.0-rc2-00034-gfebbc555cf0f

    Commit: febbc555cf0f
    Modifications: Applied patch 0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch
    Result: Passthrough not working
Comment 29 Joel Mathew Thomas 2025-02-12 17:05:40 UTC
Created attachment 307631 [details]
lspci_working_config_before_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
Comment 30 Joel Mathew Thomas 2025-02-12 17:06:10 UTC
Created attachment 307632 [details]
lspci_working_config_before_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
Comment 31 Joel Mathew Thomas 2025-02-12 17:07:52 UTC
Created attachment 307633 [details]
lspci_non_working_config_after_vm_boot_6.12.0-rc1-00007-gde9a6c8d5dbf_commit_de9a6c8d5dbf
Comment 32 Joel Mathew Thomas 2025-02-12 17:09:45 UTC
Created attachment 307634 [details]
lspci_working_config_after_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
Comment 33 Joel Mathew Thomas 2025-02-12 17:11:15 UTC
Created attachment 307635 [details]
lspci_non_working_config_before_vm_boot_6.12.0-rc1-00007-gde9a6c8d5dbf_commit_de9a6c8d5dbf
Comment 34 Joel Mathew Thomas 2025-02-12 17:11:54 UTC
Created attachment 307636 [details]
lspci_before_vm_boot_latest_commit_febbc555cf0f_patch_applied_no_reverts
Comment 35 Joel Mathew Thomas 2025-02-12 17:12:18 UTC
Created attachment 307637 [details]
lspci_after_vm_boot_latest_commit_febbc555cf0f_patch_applied_no_reverts
Comment 36 Joel Mathew Thomas 2025-02-12 17:17:07 UTC
Created attachment 307638 [details]
dmesg_6.14.0-rc2-00034-gfebbc555cf0f_no_reverts_patch_applied
Comment 37 Ilpo Järvinen 2025-02-13 12:00:27 UTC
Thanks. There certainly was much less config space changes in those logs (could be coincidental but rules out most of the bit changes in the config regardless). Sadly it also means I've very little good theories remaining. I suggest next trying without enabling the bandwidth notifications, in pcie_bwnotif_enable() comment out this call:

        pcie_capability_set_word(port, PCI_EXP_LNKCTL,
                                 PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);

(On top of most recent kernels is fine as a test, don't include the "fix" patch I attached earlier.)

If comment that line out does help, then try with the set word call but set only one of the two bits (LBIE and LABIE) at a time to see if the problem is related to just one of them.
Comment 38 Joel Mathew Thomas 2025-02-13 15:06:58 UTC
(In reply to Ilpo Järvinen from comment #37)
> Thanks. There certainly was much less config space changes in those logs
> (could be coincidental but rules out most of the bit changes in the config
> regardless). Sadly it also means I've very little good theories remaining. I
> suggest next trying without enabling the bandwidth notifications, in
> pcie_bwnotif_enable() comment out this call:
> 
>         pcie_capability_set_word(port, PCI_EXP_LNKCTL,
>                                  PCI_EXP_LNKCTL_LBMIE |
> PCI_EXP_LNKCTL_LABIE);
> 
> (On top of most recent kernels is fine as a test, don't include the "fix"
> patch I attached earlier.)
> 
> If comment that line out does help, then try with the set word call but set
> only one of the two bits (LBIE and LABIE) at a time to see if the problem is
> related to just one of them.

Hi Ilpo,

Thanks for the suggestion. I tested the changes, and here's what I found:

    Commenting out the entire pcie_capability_set_word call: Passthrough works.
    Enabling only PCI_EXP_LNKCTL_LABIE: Passthrough works.
    Enabling only PCI_EXP_LNKCTL_LBMIE: Passthrough does not work.

It seems that LBMIE is the problematic bit. Let me know if you'd like me to run any further tests.


I've also attached the lspci outputs, in case, it is useful
Comment 39 Joel Mathew Thomas 2025-02-13 15:08:13 UTC
Created attachment 307640 [details]
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_commented_out
Comment 40 Joel Mathew Thomas 2025-02-13 15:08:41 UTC
Created attachment 307641 [details]
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_commented_out
Comment 41 Joel Mathew Thomas 2025-02-13 15:09:25 UTC
Created attachment 307642 [details]
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LABIE
Comment 42 Joel Mathew Thomas 2025-02-13 15:10:20 UTC
Created attachment 307643 [details]
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LABIE
Comment 43 Joel Mathew Thomas 2025-02-13 15:11:45 UTC
Created attachment 307644 [details]
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LBMIE
Comment 44 Joel Mathew Thomas 2025-02-13 15:12:07 UTC
Created attachment 307645 [details]
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LBMIE
Comment 45 Ilpo Järvinen 2025-02-14 12:44:13 UTC
There's one additional thing that might be worth a try. I noticed that the Link Speed gets downgraded to 2.5GT/s whenever set_word is there, and there's a series that should fix the Link Speed degradation:

https://lore.kernel.org/linux-pci/20250123055155.22648-1-sjiwei@163.com/

While I'm not entirely convinced it resolves the issue, the variations in Link Speed between the most recent logs may mean it could have a role here that is directly related to whether LBMIE is set or not. The interrupt handler in bwctrl increases lbms_count but might not do that in early phases if only LABIE or no set_word is present, which in turn impacts what the Target Speed quirk ends up doing (which relates to the fixes in that series).
Comment 46 Joel Mathew Thomas 2025-02-14 19:09:21 UTC
(In reply to Ilpo Järvinen from comment #45)
> There's one additional thing that might be worth a try. I noticed that the
> Link Speed gets downgraded to 2.5GT/s whenever set_word is there, and
> there's a series that should fix the Link Speed degradation:
> 
> https://lore.kernel.org/linux-pci/20250123055155.22648-1-sjiwei@163.com/
> 
> While I'm not entirely convinced it resolves the issue, the variations in
> Link Speed between the most recent logs may mean it could have a role here
> that is directly related to whether LBMIE is set or not. The interrupt
> handler in bwctrl increases lbms_count but might not do that in early phases
> if only LABIE or no set_word is present, which in turn impacts what the
> Target Speed quirk ends up doing (which relates to the fixes in that series).

Hi Ilpo, Jiwei, and everyone involved,

I would like to provide an update on my testing results.

Firstly, I sincerely apologize for my previous report regarding the 0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch. I mistakenly ran --check without actually applying the patch. After correctly applying and testing it, I can confirm that GPU passthrough works, but the PCIe link speed still downgrades from 16GT/s to 2.5GT/s after the VM is started.

Additionally, I have tested the two patches from the series:

    [PATCH v4 1/2] PCI: Fix the wrong reading of register fields
    [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register

With both patches applied, GPU passthrough fails, and the link speed also downgrades from 16GT/s to 2.5GT/s after VM boot.

I have attached the following logs for further analysis:

    lspci -vvv output before and after VM boot
    dmesg logs
Comment 47 Joel Mathew Thomas 2025-02-14 19:09:57 UTC
Created attachment 307652 [details]
lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt
Comment 48 Joel Mathew Thomas 2025-02-14 19:10:23 UTC
Created attachment 307653 [details]
lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt
Comment 49 Joel Mathew Thomas 2025-02-14 19:10:39 UTC
Created attachment 307654 [details]
dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move_patch.txt
Comment 50 Joel Mathew Thomas 2025-02-14 19:11:09 UTC
Created attachment 307655 [details]
lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch
Comment 51 Joel Mathew Thomas 2025-02-14 19:11:28 UTC
Created attachment 307656 [details]
lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch
Comment 52 Joel Mathew Thomas 2025-02-14 19:11:48 UTC
Created attachment 307657 [details]
dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.log
Comment 53 Joel Mathew Thomas 2025-02-14 19:14:32 UTC
To ensure clarity and avoid any misunderstandings, I am also attaching the exact patch files I applied from the series. This will help confirm that they were applied correctly and prevent any future confusion.
Comment 54 Joel Mathew Thomas 2025-02-14 19:15:10 UTC
Created attachment 307658 [details]
series patch file : regix_lnkctl2move.patch
Comment 55 Ilpo Järvinen 2025-02-17 12:24:45 UTC
Comment on attachment 307630 [details]
Disable bwctrl during reset

An updated patch coming in a moment.
Comment 56 Ilpo Järvinen 2025-02-17 12:37:21 UTC
Created attachment 307667 [details]
Improved disable BW notifications during reset patch

Okay, that's great news to hear. I've improved the fix patch to consider MFD siblings better. I also slightly reordered things slightly from previous so that disable is only called after device has been put into D0.

If you could test the improved patch still works and possibly give your Tested-by for it :-) (please test the patch without any other patches so we know it solves the PCIe device lost on VM start issue).

The lnkctl2 issue is likely orthogonal.

From your wording, it is unclear to me whether a test was conducted with the 1st version of the reset fix and the lnkctl2 fix series included at the same time. In addition to confirming the improved reset fix works alone, I'd want know what is the result when combining the fixes if that wasn't yet tested. Will the device remain operational and what is the Link Speed when both the reset fix and the lnkctl2 series are tested together?

FYI, I'll be quite busy the remaining week and might not reply until the next week.
Comment 57 Joel Mathew Thomas 2025-02-17 14:31:01 UTC
(In reply to Ilpo Järvinen from comment #56)
> Created attachment 307667 [details]
> Improved disable BW notifications during reset patch
> 
> Okay, that's great news to hear. I've improved the fix patch to consider MFD
> siblings better. I also slightly reordered things slightly from previous so
> that disable is only called after device has been put into D0.
> 
> If you could test the improved patch still works and possibly give your
> Tested-by for it :-) (please test the patch without any other patches so we
> know it solves the PCIe device lost on VM start issue).

Tested-by: Joel Mathew Thomas proxy0@tutamail.com

I’ve tested the patch on my setup, and I can confirm that GPU passthrough is now functioning correctly.
Comment 58 Joel Mathew Thomas 2025-02-17 14:32:03 UTC
Created attachment 307668 [details]
lspci_before_vm_boot_6.14.0-rc3-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
Comment 59 Joel Mathew Thomas 2025-02-17 14:32:32 UTC
Created attachment 307669 [details]
lspci_after_vm_boot_6.14.0-rc3-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
Comment 60 Joel Mathew Thomas 2025-02-17 14:32:50 UTC
Created attachment 307670 [details]
dmesg_6.14.0-rc3-dirty-0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
Comment 61 Joel Mathew Thomas 2025-02-17 14:39:59 UTC
(In reply to Ilpo Järvinen from comment #56)
> From your wording, it is unclear to me whether a test was conducted with the
> 1st version of the reset fix and the lnkctl2 fix series included at the same
> time. In addition to confirming the improved reset fix works alone, I'd want
> know what is the result when combining the fixes if that wasn't yet tested.
> Will the device remain operational and what is the Link Speed when both the
> reset fix and the lnkctl2 series are tested together?


If my understanding is correct, I think you are asking if I applied the following patches together at once.

2025-01-23  5:51  [PATCH v4 1/2] PCI: Fix the wrong reading of register fields Jiwei Sun
2025-01-23  5:51  [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register Jiwei Sun

If so, yes, I have applied both the patches together, and tested it as a single patch.
And GPU Passthrough fails, and the Link speed also deteriorates from 16GT/s to 2.5GT/s.

I had also attached the exact patch(attachment 307658 [details]) which i applied, which was a combination of both the above patches.

attachment 307658 [details]: series patch file : regix_lnkctl2move.patch

I had also attached the corresponding lspci outputs and dmesg logs:

attachment 307652 [details]: lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt

attachment 307653 [details]: lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt

attachment 307654 [details]: dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move_patch.txt
Comment 62 Ilpo Järvinen 2025-02-17 16:33:21 UTC
No, there is the reset fix and a series of two patches related for lnkctl2, which totals to 3 patches. I'd like to see the combined result with all three applied into the same kernel because 2.5GT/s is likely caused by the lack of the reset fix if only those 2 patch lnkctl2 series are used (same symptom but a different cause). So these 3:

[PATCH v4 1/2] PCI: Fix the wrong reading of register fields Jiwei Sun
[PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register Jiwei Sun
[PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset Ilpo Järvinen

Please confirm if you did test this combination or not?

(My apologies for the confusion despite me trying to be very specific what I meant.)

(And thanks for the tested by, I'll make the official submission of that fix now that it has been confirmed.)
Comment 63 Joel Mathew Thomas 2025-02-17 17:12:10 UTC
(In reply to Ilpo Järvinen from comment #62)
> No, there is the reset fix and a series of two patches related for lnkctl2,
> which totals to 3 patches. I'd like to see the combined result with all
> three applied into the same kernel because 2.5GT/s is likely caused by the
> lack of the reset fix if only those 2 patch lnkctl2 series are used (same
> symptom but a different cause). So these 3:
> 
> [PATCH v4 1/2] PCI: Fix the wrong reading of register fields Jiwei Sun
> [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2
> register Jiwei Sun
> [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset Ilpo Järvinen
> 
> Please confirm if you did test this combination or not?
> 
> (My apologies for the confusion despite me trying to be very specific what I
> meant.)
> 
> (And thanks for the tested by, I'll make the official submission of that fix
> now that it has been confirmed.)

Thank you for the clarification.

I apologize for the misunderstanding. To confirm, I have not yet tested the combination of all three patches you mentioned:

    [PATCH v4 1/2] PCI: Fix the wrong reading of register fields – Jiwei Sun
    [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register – Jiwei Sun
    [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset – Ilpo Järvinen (the improved patch attachment_307667)

I will proceed with testing this combination and provide feedback as soon as possible.

Regarding the official submission of the fix, could you please point me to where I can view it once it has been finalized? I’d appreciate knowing where I can track the submission.

Thank you for your work on this, and I’ll follow up shortly after testing the full patch set.
Comment 64 Joel Mathew Thomas 2025-02-17 17:50:06 UTC
(In reply to Joel Mathew Thomas from comment #63)
>     [PATCH v4 1/2] PCI: Fix the wrong reading of register fields – Jiwei Sun
>     [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2
> register – Jiwei Sun
>     [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset – Ilpo
> Järvinen (the improved patch attachment_307667)
> 
> I will proceed with testing this combination and provide feedback as soon as
> possible.
> Thank you for your work on this, and I’ll follow up shortly after testing
> the full patch set.

I have tested with all the three patches applied at once.
And the results are:

  GPU Passthrough works.
  Link speed gets downgraded to 2.5GT/s after VM start.
  Link speed goes back up to 16GT/s, after VM shutdown.

Attachments: lspci outputs, and dmesg logs.
Comment 65 Joel Mathew Thomas 2025-02-17 17:50:31 UTC
Created attachment 307673 [details]
lspci_before_vm_boot_6.14.0-rc3-dirty_reset_regfix_lnkctl2move_patch
Comment 66 Joel Mathew Thomas 2025-02-17 17:50:49 UTC
Created attachment 307674 [details]
lspci_after_vm_boot_6.14.0-rc3-dirty_reset_regfix_lnkctl2move_patch.txt
Comment 67 Joel Mathew Thomas 2025-02-17 17:51:06 UTC
Created attachment 307675 [details]
dmesg_6.14.0-rc3-dirty_reset_regix_lnk2ctl_patch
Comment 68 Joel Mathew Thomas 2025-02-17 17:52:54 UTC
Created attachment 307676 [details]
exact patch applied : reset_regix_lnkctl2move.patch (total 3 patches)
Comment 69 Ilpo Järvinen 2025-02-17 19:08:18 UTC
You were as Cc in the patch submission to linux-pci mailing list. Besides that, there's patchwork that keeps track of patches (but not much interesting happens there besides patches just sitting in the queue until they're applied :-)):

https://patchwork.kernel.org/project/linux-pci/patch/20250217165258.3811-1-ilpo.jarvinen@linux.intel.com/

If I need to send another version, then the patchwork will have a new entiry for the v2 patch (and so on for any other version).


I need to dig in more into all the logs so far to dig deeper into this 2.5GT/s downgrade but it won't happen today, and figure out a good way to debug it as Target Link Speed is at 16GT/s but for some reason Link remains at 2.5GT/s.
Comment 70 Joel Mathew Thomas 2025-02-17 19:29:53 UTC
(In reply to Ilpo Järvinen from comment #69)
> You were as Cc in the patch submission to linux-pci mailing list. Besides
> that, there's patchwork that keeps track of patches (but not much
> interesting happens there besides patches just sitting in the queue until
> they're applied :-)):
> 
> https://patchwork.kernel.org/project/linux-pci/patch/20250217165258.3811-1-
> ilpo.jarvinen@linux.intel.com/
> 
> If I need to send another version, then the patchwork will have a new entiry
> for the v2 patch (and so on for any other version).
> 
> 
> I need to dig in more into all the logs so far to dig deeper into this
> 2.5GT/s downgrade but it won't happen today, and figure out a good way to
> debug it as Target Link Speed is at 16GT/s but for some reason Link remains
> at 2.5GT/s.

Thanks for the update and for CC’ing me on the patch submission. I’ll keep an eye on it.
I’m happy to help however I can.