Bug 208363
Summary: | Restart failure with IOMMU errors | ||
---|---|---|---|
Product: | ACPI | Reporter: | Hsiao-Ting Wang, Tiffany (hsiaoting.wang) |
Component: | BIOS | Assignee: | Lu Baolu (baolu.lu) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | ashok.raj, baolu.lu, gicmo, koba.ko, leho, rui.zhang, superm1, tiffany.wang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel log with patches as described.
A test change A potential fix patch ask for test |
Description
Hsiao-Ting Wang, Tiffany
2020-06-29 03:35:21 UTC
More observation: When we mark iommu_disable_translation()@intel-iommu.c, reboot is good and go into OS soon. Created attachment 289955 [details]
A test change
Can anybody help to test whether attached change help here? We are checking and will update you. @Lu Baolu, We test your test change on comment 2, the system would not stuck and can restart okay. May I know is there any side effect for this change? We found it also skip iommu as a workaround. Thank you for your effort. @ Hsiao-Ting Wang I'm not sure about the side effect hence I need to do more tests before posting it to upstream. Thanks a lot for your test. Can you also test kexec with the latest patch from Baolu? @Ashok, What's different between install a kernel and kexec!? The waiting point is in kernel(iommu_disable_translation()@intel-iommu.c). Is this possibly the same root cause as https://bugzilla.kernel.org/show_bug.cgi?id=206571 ? (In reply to Hsiao-Ting Wang from comment #5) > @Lu Baolu, > We test your test change on comment 2, the system would not stuck and can > restart okay. > May I know is there any side effect for this change? > We found it also skip iommu as a workaround. > > Thank you for your effort. Can you please let me know the pci vendor/device ids of the integrated graphic device? Best regards, baolu @Baolu, It's a Intel's gpu(i915). vendor id/device id = 0086:9a49 0086 isn't the vendor id for Intel. Can you please double check? sorry my fault, correct the information vendor id/device id = 8086:9a49 @KobaKo since you're not in the CC list on https://bugzilla.kernel.org/show_bug.cgi?id=206571 there was a proposed patch. Can you see if it helps for the TGL case? @Mario, Tried the patch and it doesn't work. The machine still hang on iommu_disable_translation()@intel-iommu.c. Created attachment 290317 [details]
A potential fix patch ask for test
Hi, can anybody help to test the patch attached (A potential fix patch ask for test)?
Best regards,
baolu
@Baolu, Could you explain more why gfx would be ignore to disable TE!? @Kobako, I have explained in the commit message. Is that sufficient for you? I am not sure whether it's the root cause of the issue reported here, hence ask for some tests. @Baolu, It's nice and the information is very detail in the patch. Thanks @Baolu, With the test patch I tried the multiple times(around 6) and the machine(TGL) wouldn't hang during the shutdown and wouldn't wait a long time(*). *With the previous patch, the machine wouldn't hang but wait a long time to restart. @Kobako, thanks for the testing. Can I add your test-by when I submit this patch to upstream linux kernel? @Baolu, Can we reset the iommu during resume!? After the power state transition is triggered, is the dma translation in a corrupted status? if it is and you don't recover it, does dma translation work well in the following time!? @Baolu, yes, please take it tested-by: koba.ko@canonical.com @Baolu, Will you refine something in the patch!? Is the last patch you provided a final version!? (In reply to KobaKo from comment #23) > @Baolu, > Can we reset the iommu during resume!? > After the power state transition is triggered, is the dma translation in a > corrupted status? if it is and you don't recover it, does dma translation > work well in the following time!? The same thing happens during iommu suspend/resume. https://bugzilla.kernel.org/show_bug.cgi?id=206571 The patch is aking for test for this issue. (In reply to KobaKo from comment #25) > @Baolu, > Will you refine something in the patch!? Is the last patch you provided a > final version!? It should be if it passes the test for 206571. Do you have any review comments? (In reply to Lu Baolu from comment #26) > (In reply to KobaKo from comment #23) > > @Baolu, > > Can we reset the iommu during resume!? > > After the power state transition is triggered, is the dma translation in a > > corrupted status? if it is and you don't recover it, does dma translation > > work well in the following time!? > > The same thing happens during iommu suspend/resume. > > https://bugzilla.kernel.org/show_bug.cgi?id=206571 > > The patch is aking for test for this issue. On my side, the issue must trigger the suspend before reboot, that means the iommu is in the corrupted status after suspend? Should we recover/reset the iommu that the machine comes from the suspend!? @Baolu, Would you please share the official patch once you push it to upstream!? Thanks sure! I will update here. I put the patch into the current rawhide kernel (kernel-5.8.0-0.rc5.20200715gite9919e11e219) for a user, who currently has shutdown issues on his Dell XPS 9300 (Ice Lake IIRC), to test and they report that "Both shutdown and reboot are working." @Baolu, what is the status of this issue? It has been upstreamed. Is this issue still there? Bug closed as the fix is already in upstream. |