Bug 219765
Created attachment 307600 [details]
dmesg logs for the kernel in which gpu passthrough does not work
Created attachment 307601 [details]
dmesg logs for the kernel in which gpu passthrough works
I was able to get a workaround by setting pcie_port_pm=off kernel parameter. It works properly, except for poor power management. I will attach logs below Created attachment 307603 [details]
dmesg logs for kernel after using pcie_port_pm=off kernel parameter on kernel 6.13.2-arch1-1
Sorry for the regression, and thanks very much for the debugging you've already done. I don't see anything obvious in the vfio or PCI core changes between v6.12 and v6.13. If it's practical for you to bisect this, that would be a simple (though tedious) way to zero in on it. (In reply to Bjorn Helgaas from comment #5) > Sorry for the regression, and thanks very much for the debugging you've > already done. I don't see anything obvious in the vfio or PCI core changes > between v6.12 and v6.13. If it's practical for you to bisect this, that > would be a simple (though tedious) way to zero in on it. Thanks for taking the time to look into this and for your response. I really appreciate it. I'll go ahead and bisect between these versions to pinpoint the exact commit that introduced the regression. I'll provide any findings and assist in every way I can to help resolve this. Thanks again for your support! (In reply to Bjorn Helgaas) >Just to confirm, did you see the suggestion to try reverting >dc421bb3c0db ("PCI: Enable runtime PM of the host bridge")? >That would be a quicker test than a full bisection. >I didn't explicitly cc you (I used bcc) to avoid exposing your email >address. Feel free to respond to the linux-pci thread if you don't >mind your address being public. >Bjorn I did see it. And i reverted the commit and built the kernel successfully. But, that didn't fix it. And what's more surprising is the fact that, this time around, the pcie_port_pm=off kernel parameter did not work. So, right now I'm trying to bisect the kernel from 6.12.10 to 6.13.0 I have successfully bisected the issue. The first bad commit is: 665745f274870c921020f610e2c99a3b1613519b - PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller. Additionally, its child commit also needed to be reverted: de9a6c8d5dbfedb5eb3722c822da0490f6a59a45 - PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed The pcie_port_pm=off kernel parameter did not work as a workaround. I suspect the reason it worked on the Arch Linux-built kernel (6.13.0-arch1-1) is due to differences in their build configuration. After reverting both commits, I successfully built the latest kernel from 6.14.0-rc2-00036-gecb69e7576a0, and GPU passthrough works correctly again. I will be attaching the following files for reference: 1. git bisect log 2. Patch file containing the reverts If needed, I can assist with further testing. Please review the attached patch and consider reverting these commits upstream. Created attachment 307614 [details]
git bisect log
Created attachment 307615 [details]
Patch containing the reverts of commits that caused the regression.
I'd like to share some additional findings. Currently, on the latest kernel 6.14.0-rc2-00036-gecb69e7576a0, with the reverts applied, GPU passthrough works properly with no functional issues. However, dmesg logs show the following errors: vfio-pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID) vfio-pci 0000:01:00.0: device [10de:28e1] error status/mask=00000040/0000a000 vfio-pci 0000:01:00.0: [ 6] BadTLP pcieport 0000:00:01.1: PME: Spurious native interrupt! pcieport 0000:00:01.1: PME: Spurious native interrupt! Observations: These errors disappear when using the pcie_port_pm=off kernel parameter. However, like before, this parameter does not fix GPU passthrough failures. If passthrough is broken, setting pcie_port_pm=off will not resolve it—it only removes the above dmesg errors. Created attachment 307623 [details]
dmesg logs when the pcie_port_pm=off parameter is not set.
Created attachment 307624 [details]
dmesg logs when the pcie_port_pm=off parameter is set.
Could you attach also lspci from working and non-working configuration please. For the record, those bus errors seem to be present also in the log with 6.12 so nothing new there. (In reply to Ilpo Järvinen from comment #14) > Could you attach also lspci from working and non-working configuration > please. > > For the record, those bus errors seem to be present also in the log with > 6.12 so nothing new there. I have attached the lspci logs for both working and non-working configurations as requested. Working configuration: 6.14.0-rc2-00036-gecb69e7576a0 with reverts applied Non-working configuration: 6.13.2-arch1-1 A few observations: In the non-working configuration, the NVIDIA GPU audio controller is initially visible before the VM starts, but it disappears after the VM is started. I'm not sure if it's relevant. In the working configuration, the kernel modules listed for the NVIDIA GPU are nouveau only—not nvidia or nvidia_drm. This is simply because I haven’t built the proprietary NVIDIA kernel modules due to missing Linux headers. Created attachment 307625 [details]
lspci_working_config_6.14.0-rc2-00036-gecb69e7576a0_with_reverts
Created attachment 307626 [details]
lspci_working_config_6.13.2-arch1-1
I'm sorry, I forgot mention the lspci should be taken with -vvv. Created attachment 307627 [details]
lspci_working_config_6.14.0-rc2-00036-gecb69e7576a0_with_reverts
Created attachment 307628 [details]
lspci_non_working_config_6.13.2-arch1-1
(In reply to Ilpo Järvinen from comment #18) > I'm sorry, I forgot mention the lspci should be taken with -vvv. I have updated the lspci outputs. Besides the changes expected from PCIe BW controller being enabled, the Root Port (00:01.1) has secondary MAbort+ and PME status bits seem to be off... - Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- + Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+ - ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- + ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+ LnkSta: Speed 2.5GT/s, Width x8 - TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+ + TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt- - RootSta: PME ReqID 0009, PMEStatus+ PMEPending+ + RootSta: PME ReqID 0009, PMEStatus- PMEPending- 01:00.0 is obviously in quite bad shape but it's unclear why its config (including one capability register bit that is read-only as per PCIe spec!?!) got so wretched. Lots of those changes look unrelated to what PCIe BW controller itself does (in fact, bwctrl shouldn't even touch 01:00.0). The loss of 01:00.1 is probably just collateral damage in the grand scheme of things. Perhaps 01:00.0 is fine also on 6.13 before those resets are done for it so maybe take lspci -vvv before that point, if possible? (Before the VM start, I guess.) In any case, there are so many changes in the lspci output for 01:00.0 that it would be useful to confirm that all those are related to the bwctrl change only (instead of changing a lot by changing the entire kernel version and if I understood correctly, even the gpu drivers were not the same). (In reply to Ilpo Järvinen from comment #22) > Perhaps 01:00.0 is fine also on 6.13 before those resets are done for it so > maybe take lspci -vvv before that point, if possible? (Before the VM start, > I guess.) I'll attach the lspci output before the VM starts in the non working config. > In any case, there are so many changes in the lspci output for 01:00.0 that > it would be useful to confirm that all those are related to the bwctrl > change only (instead of changing a lot by changing the entire kernel version > and if I understood correctly, even the gpu drivers were not the same). To verify this, I will: 1. Build and test the kernel from before the first bad commit. 2. Build and test the kernel from the first bad commit and its immediate child commit, as both introduced the regression. 3. Collect and attach lspci logs from these tests to determine whether the changes are solely related to BWCTRL. > ...even the gpu drivers were not the same). The GPU drivers themselves have remained the same, but the NVIDIA kernel modules were just not built in the working config. Created attachment 307629 [details]
lspci_non_working_config_6.13.2-arch1-1 before vm starts
Created attachment 307630 [details]
Disable bwctrl during reset
Perhaps it would help to disable BW controller while performing the reset.
A test patch attached. The patch doesn't handle parallel MFD resets entirely correctly but even in this simple form it might be enough to show if this approach does help (I'm out of time for today to code the more complex one to handle the parallel resets with a counter).
(In reply to Ilpo Järvinen from comment #25) > Created attachment 307630 [details] > Disable bwctrl during reset > > Perhaps it would help to disable BW controller while performing the reset. > > A test patch attached. The patch doesn't handle parallel MFD resets entirely > correctly but even in this simple form it might be enough to show if this > approach does help (I'm out of time for today to code the more complex one > to handle the parallel resets with a counter). Sorry, I'm a bit confused. Should i apply this patch to the commit that caused the regression, or should I test it against the latest kernel? (In reply to Joel Mathew Thomas from comment #26) > Sorry, I'm a bit confused. Should i apply this patch to the commit that > caused the regression, or should I test it against the latest kernel? Disregard my previous comment, I was able to apply the patch on the latest commit. No further action needed (In reply to Ilpo Järvinen from comment #25) > Created attachment 307630 [details] > Disable bwctrl during reset > > Perhaps it would help to disable BW controller while performing the reset. > > A test patch attached. The patch doesn't handle parallel MFD resets entirely > correctly but even in this simple form it might be enough to show if this > approach does help (I'm out of time for today to code the more complex one > to handle the parallel resets with a counter). I have performed tests on different kernel versions to analyze the regression affecting PCI passthrough. Below are the details of each tested kernel and the corresponding results: 1. Kernel: 6.12.0-rc1-00005-g3491f5096668 Commit: 3491f5096668 (Good commit) Modifications: No reverts, no patches Result: Passthrough working 2. Kernel: 6.12.0-rc1-00007-gde9a6c8d5dbf Commit: de9a6c8d5dbf (Bad commit) Modifications: No reverts, no patches Result: Passthrough not working 3. Kernel: 6.14.0-rc2-00034-gfebbc555cf0f Commit: febbc555cf0f Modifications: Applied patch 0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch Result: Passthrough not working Created attachment 307631 [details]
lspci_working_config_before_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
Created attachment 307632 [details]
lspci_working_config_before_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
Created attachment 307633 [details]
lspci_non_working_config_after_vm_boot_6.12.0-rc1-00007-gde9a6c8d5dbf_commit_de9a6c8d5dbf
Created attachment 307634 [details]
lspci_working_config_after_vm_boot_6.12.0-rc1-00005-g3491f5096668_commit_3491f5096668
Created attachment 307635 [details]
lspci_non_working_config_before_vm_boot_6.12.0-rc1-00007-gde9a6c8d5dbf_commit_de9a6c8d5dbf
Created attachment 307636 [details]
lspci_before_vm_boot_latest_commit_febbc555cf0f_patch_applied_no_reverts
Created attachment 307637 [details]
lspci_after_vm_boot_latest_commit_febbc555cf0f_patch_applied_no_reverts
Created attachment 307638 [details]
dmesg_6.14.0-rc2-00034-gfebbc555cf0f_no_reverts_patch_applied
Thanks. There certainly was much less config space changes in those logs (could be coincidental but rules out most of the bit changes in the config regardless). Sadly it also means I've very little good theories remaining. I suggest next trying without enabling the bandwidth notifications, in pcie_bwnotif_enable() comment out this call: pcie_capability_set_word(port, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE); (On top of most recent kernels is fine as a test, don't include the "fix" patch I attached earlier.) If comment that line out does help, then try with the set word call but set only one of the two bits (LBIE and LABIE) at a time to see if the problem is related to just one of them. (In reply to Ilpo Järvinen from comment #37) > Thanks. There certainly was much less config space changes in those logs > (could be coincidental but rules out most of the bit changes in the config > regardless). Sadly it also means I've very little good theories remaining. I > suggest next trying without enabling the bandwidth notifications, in > pcie_bwnotif_enable() comment out this call: > > pcie_capability_set_word(port, PCI_EXP_LNKCTL, > PCI_EXP_LNKCTL_LBMIE | > PCI_EXP_LNKCTL_LABIE); > > (On top of most recent kernels is fine as a test, don't include the "fix" > patch I attached earlier.) > > If comment that line out does help, then try with the set word call but set > only one of the two bits (LBIE and LABIE) at a time to see if the problem is > related to just one of them. Hi Ilpo, Thanks for the suggestion. I tested the changes, and here's what I found: Commenting out the entire pcie_capability_set_word call: Passthrough works. Enabling only PCI_EXP_LNKCTL_LABIE: Passthrough works. Enabling only PCI_EXP_LNKCTL_LBMIE: Passthrough does not work. It seems that LBMIE is the problematic bit. Let me know if you'd like me to run any further tests. I've also attached the lspci outputs, in case, it is useful Created attachment 307640 [details]
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_commented_out
Created attachment 307641 [details]
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_commented_out
Created attachment 307642 [details]
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LABIE
Created attachment 307643 [details]
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LABIE
Created attachment 307644 [details]
lspci_before_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LBMIE
Created attachment 307645 [details]
lspci_after_vm_boot_6.14.0-rc2-00034-gfebbc555cf0f-dirty_pcie_capability_set_word_LBMIE
There's one additional thing that might be worth a try. I noticed that the Link Speed gets downgraded to 2.5GT/s whenever set_word is there, and there's a series that should fix the Link Speed degradation: https://lore.kernel.org/linux-pci/20250123055155.22648-1-sjiwei@163.com/ While I'm not entirely convinced it resolves the issue, the variations in Link Speed between the most recent logs may mean it could have a role here that is directly related to whether LBMIE is set or not. The interrupt handler in bwctrl increases lbms_count but might not do that in early phases if only LABIE or no set_word is present, which in turn impacts what the Target Speed quirk ends up doing (which relates to the fixes in that series). (In reply to Ilpo Järvinen from comment #45) > There's one additional thing that might be worth a try. I noticed that the > Link Speed gets downgraded to 2.5GT/s whenever set_word is there, and > there's a series that should fix the Link Speed degradation: > > https://lore.kernel.org/linux-pci/20250123055155.22648-1-sjiwei@163.com/ > > While I'm not entirely convinced it resolves the issue, the variations in > Link Speed between the most recent logs may mean it could have a role here > that is directly related to whether LBMIE is set or not. The interrupt > handler in bwctrl increases lbms_count but might not do that in early phases > if only LABIE or no set_word is present, which in turn impacts what the > Target Speed quirk ends up doing (which relates to the fixes in that series). Hi Ilpo, Jiwei, and everyone involved, I would like to provide an update on my testing results. Firstly, I sincerely apologize for my previous report regarding the 0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch. I mistakenly ran --check without actually applying the patch. After correctly applying and testing it, I can confirm that GPU passthrough works, but the PCIe link speed still downgrades from 16GT/s to 2.5GT/s after the VM is started. Additionally, I have tested the two patches from the series: [PATCH v4 1/2] PCI: Fix the wrong reading of register fields [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register With both patches applied, GPU passthrough fails, and the link speed also downgrades from 16GT/s to 2.5GT/s after VM boot. I have attached the following logs for further analysis: lspci -vvv output before and after VM boot dmesg logs Created attachment 307652 [details]
lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt
Created attachment 307653 [details]
lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt
Created attachment 307654 [details]
dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move_patch.txt
Created attachment 307655 [details]
lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch
Created attachment 307656 [details]
lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.patch
Created attachment 307657 [details]
dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-downstr.log
To ensure clarity and avoid any misunderstandings, I am also attaching the exact patch files I applied from the series. This will help confirm that they were applied correctly and prevent any future confusion. Created attachment 307658 [details]
series patch file : regix_lnkctl2move.patch
Comment on attachment 307630 [details]
Disable bwctrl during reset
An updated patch coming in a moment.
Created attachment 307667 [details]
Improved disable BW notifications during reset patch
Okay, that's great news to hear. I've improved the fix patch to consider MFD siblings better. I also slightly reordered things slightly from previous so that disable is only called after device has been put into D0.
If you could test the improved patch still works and possibly give your Tested-by for it :-) (please test the patch without any other patches so we know it solves the PCIe device lost on VM start issue).
The lnkctl2 issue is likely orthogonal.
From your wording, it is unclear to me whether a test was conducted with the 1st version of the reset fix and the lnkctl2 fix series included at the same time. In addition to confirming the improved reset fix works alone, I'd want know what is the result when combining the fixes if that wasn't yet tested. Will the device remain operational and what is the Link Speed when both the reset fix and the lnkctl2 series are tested together?
FYI, I'll be quite busy the remaining week and might not reply until the next week.
(In reply to Ilpo Järvinen from comment #56) > Created attachment 307667 [details] > Improved disable BW notifications during reset patch > > Okay, that's great news to hear. I've improved the fix patch to consider MFD > siblings better. I also slightly reordered things slightly from previous so > that disable is only called after device has been put into D0. > > If you could test the improved patch still works and possibly give your > Tested-by for it :-) (please test the patch without any other patches so we > know it solves the PCIe device lost on VM start issue). Tested-by: Joel Mathew Thomas proxy0@tutamail.com I’ve tested the patch on my setup, and I can confirm that GPU passthrough is now functioning correctly. Created attachment 307668 [details]
lspci_before_vm_boot_6.14.0-rc3-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
Created attachment 307669 [details]
lspci_after_vm_boot_6.14.0-rc3-dirty_0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
Created attachment 307670 [details]
dmesg_6.14.0-rc3-dirty-0001-PCI-bwctrl-Disable-PCIe-BW-controller-during-reset_patch
(In reply to Ilpo Järvinen from comment #56) > From your wording, it is unclear to me whether a test was conducted with the > 1st version of the reset fix and the lnkctl2 fix series included at the same > time. In addition to confirming the improved reset fix works alone, I'd want > know what is the result when combining the fixes if that wasn't yet tested. > Will the device remain operational and what is the Link Speed when both the > reset fix and the lnkctl2 series are tested together? If my understanding is correct, I think you are asking if I applied the following patches together at once. 2025-01-23 5:51 [PATCH v4 1/2] PCI: Fix the wrong reading of register fields Jiwei Sun 2025-01-23 5:51 [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register Jiwei Sun If so, yes, I have applied both the patches together, and tested it as a single patch. And GPU Passthrough fails, and the Link speed also deteriorates from 16GT/s to 2.5GT/s. I had also attached the exact patch(attachment 307658 [details]) which i applied, which was a combination of both the above patches. attachment 307658 [details]: series patch file : regix_lnkctl2move.patch I had also attached the corresponding lspci outputs and dmesg logs: attachment 307652 [details]: lspci_before_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt attachment 307653 [details]: lspci_after_vm_boot_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move.txt attachment 307654 [details]: dmesg_6.14.0-rc2-00185-g128c8f96eb86-dirty_regfix_lnkctl2move_patch.txt No, there is the reset fix and a series of two patches related for lnkctl2, which totals to 3 patches. I'd like to see the combined result with all three applied into the same kernel because 2.5GT/s is likely caused by the lack of the reset fix if only those 2 patch lnkctl2 series are used (same symptom but a different cause). So these 3: [PATCH v4 1/2] PCI: Fix the wrong reading of register fields Jiwei Sun [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register Jiwei Sun [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset Ilpo Järvinen Please confirm if you did test this combination or not? (My apologies for the confusion despite me trying to be very specific what I meant.) (And thanks for the tested by, I'll make the official submission of that fix now that it has been confirmed.) (In reply to Ilpo Järvinen from comment #62) > No, there is the reset fix and a series of two patches related for lnkctl2, > which totals to 3 patches. I'd like to see the combined result with all > three applied into the same kernel because 2.5GT/s is likely caused by the > lack of the reset fix if only those 2 patch lnkctl2 series are used (same > symptom but a different cause). So these 3: > > [PATCH v4 1/2] PCI: Fix the wrong reading of register fields Jiwei Sun > [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 > register Jiwei Sun > [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset Ilpo Järvinen > > Please confirm if you did test this combination or not? > > (My apologies for the confusion despite me trying to be very specific what I > meant.) > > (And thanks for the tested by, I'll make the official submission of that fix > now that it has been confirmed.) Thank you for the clarification. I apologize for the misunderstanding. To confirm, I have not yet tested the combination of all three patches you mentioned: [PATCH v4 1/2] PCI: Fix the wrong reading of register fields – Jiwei Sun [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 register – Jiwei Sun [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset – Ilpo Järvinen (the improved patch attachment_307667) I will proceed with testing this combination and provide feedback as soon as possible. Regarding the official submission of the fix, could you please point me to where I can view it once it has been finalized? I’d appreciate knowing where I can track the submission. Thank you for your work on this, and I’ll follow up shortly after testing the full patch set. (In reply to Joel Mathew Thomas from comment #63) > [PATCH v4 1/2] PCI: Fix the wrong reading of register fields – Jiwei Sun > [PATCH v4 2/2] PCI: Adjust the position of reading the Link Control 2 > register – Jiwei Sun > [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset – Ilpo > Järvinen (the improved patch attachment_307667) > > I will proceed with testing this combination and provide feedback as soon as > possible. > Thank you for your work on this, and I’ll follow up shortly after testing > the full patch set. I have tested with all the three patches applied at once. And the results are: GPU Passthrough works. Link speed gets downgraded to 2.5GT/s after VM start. Link speed goes back up to 16GT/s, after VM shutdown. Attachments: lspci outputs, and dmesg logs. Created attachment 307673 [details]
lspci_before_vm_boot_6.14.0-rc3-dirty_reset_regfix_lnkctl2move_patch
Created attachment 307674 [details]
lspci_after_vm_boot_6.14.0-rc3-dirty_reset_regfix_lnkctl2move_patch.txt
Created attachment 307675 [details]
dmesg_6.14.0-rc3-dirty_reset_regix_lnk2ctl_patch
Created attachment 307676 [details]
exact patch applied : reset_regix_lnkctl2move.patch (total 3 patches)
You were as Cc in the patch submission to linux-pci mailing list. Besides that, there's patchwork that keeps track of patches (but not much interesting happens there besides patches just sitting in the queue until they're applied :-)): https://patchwork.kernel.org/project/linux-pci/patch/20250217165258.3811-1-ilpo.jarvinen@linux.intel.com/ If I need to send another version, then the patchwork will have a new entiry for the v2 patch (and so on for any other version). I need to dig in more into all the logs so far to dig deeper into this 2.5GT/s downgrade but it won't happen today, and figure out a good way to debug it as Target Link Speed is at 16GT/s but for some reason Link remains at 2.5GT/s. (In reply to Ilpo Järvinen from comment #69) > You were as Cc in the patch submission to linux-pci mailing list. Besides > that, there's patchwork that keeps track of patches (but not much > interesting happens there besides patches just sitting in the queue until > they're applied :-)): > > https://patchwork.kernel.org/project/linux-pci/patch/20250217165258.3811-1- > ilpo.jarvinen@linux.intel.com/ > > If I need to send another version, then the patchwork will have a new entiry > for the v2 patch (and so on for any other version). > > > I need to dig in more into all the logs so far to dig deeper into this > 2.5GT/s downgrade but it won't happen today, and figure out a good way to > debug it as Target Link Speed is at 16GT/s but for some reason Link remains > at 2.5GT/s. Thanks for the update and for CC’ing me on the patch submission. I’ll keep an eye on it. I’m happy to help however I can. |
Created attachment 307599 [details] dmesg logs for the kernel in which gpu passthrough works After upgrading from Linux 6.12.10 to Linux 6.13.0, VFIO GPU passthrough fails for an NVIDIA GPU (AD107). The GPU is not passed through to the VM, and its audio device (01:00.1) disappears from Virt-Manager. This issue does not occur in Linux 6.12.10. I have attached the logs.