Description: System hangs after reaching "Reboot: Power Down" and I have to force shutdown by pressing the power button for few seconds. I'm on a Dell XPS 7390 2-in-1 with Arch, kernel version 5.5.4. I have this bug since version 5.5. Steps to reproduce: It always hangs if I suspend and wakeup the system before shutdown. It is not the only condition but I've not identified other ways to always reproduce. Fix: Reverting commit 6c3a44ed3c553c324845744f30bcd1d3b07d61fd "iommu/vt-d: Turn off translations at shutdown" resolves the issue.
Edoardo (and anyone else reading this). Just chiming in with a +1 and 'me too' report for a XPS 13" 2020. Same weird reboot/poweroff behaviour where the laptop just sits with the LED & Keyboard backlight and requires the 8second power button press to get it to turn off. The history of my diagnostics can be found over at @ https://bugzilla.redhat.com/show_bug.cgi?id=1834277 -- The reason I'm commenting is because the kernel bisect and related git commit (to do with "vt-d and iommu" presented a functional workaround! Go into the BIOS of the laptop, under Virtualization, and DISABLE the option "VT for Direct I/O". Once that is disabled, the laptop will reboot or poweroff properly from 5.5+ kernels. I'm testing in FC31 on the aforementioned Dell XPS 13 2020 and its working great. At present, I'm running kernel 5.6.11!
I had the same problem with Kernel 5.3.0.53. Reverting to 5.3.0.51 fixed it. System is an ASUS A320-M-K with a AMD Ryzen 3 2200g.
7390 2-in-1, same problem (obviously). So which is worse or better: reverting the commit, or turning off BIOS option? What is "VT for Direct I/O" useful for?
If you can; you're better off reverting the commit for now. However while you're rebuilding your kernel I think a good data point to provide is if this is fixed in current mainline tree. The particular code in that commit is still present (IOMMU is turned off at shutdown) but to make sure that other changes to IOMMU haven't changed it. Regarding what that BIOS option does: It uses the IOMMU to prevent devices from being able to access the memory space of one another. It prevents attacks such as Thunderspy.
Can you please let me know the pci vendor/device ids of the integrated graphic device?
leho@papaya ~ $ [-] sudo lspci -nn ... 00:02.0 VGA compatible controller [0300]: Intel Corporation Iris Plus Graphics G7 [8086:8a52] (rev 07) ...
I think this is happening because some devices don't get shutdown properly especially if in suspend power states and when the device comes back up, the IOMMU is gone and the device is possibly accessing physical addresses directly. Can you try if the following patch works for you? diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a3a29647272c..879194797dee 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5040,6 +5040,8 @@ void intel_iommu_shutdown(void) { struct dmar_drhd_unit *drhd; struct intel_iommu *iommu = NULL; + struct device *dev; + int i; if (no_iommu || dmar_disabled) return; @@ -5047,11 +5049,21 @@ void intel_iommu_shutdown(void) down_write(&dmar_global_lock); /* Disable PMRs explicitly here. */ - for_each_iommu(iommu, drhd) - iommu_disable_protect_mem_regions(iommu); + for_each_iommu(iommu, drhd) { + /* + * All PCI devices managed by this unit should have been destroyed. + */ + if (!drhd->include_all && drhd->devices && drhd->devices_cnt) { + for_each_active_dev_scope(drhd->devices, + drhd->devices_cnt, i, dev) + continue; + } - /* Make sure the IOMMUs are switched off */ - intel_disable_iommus(); + disable_dmar_iommu(iommu); + free_dmar_iommu(iommu); + + iommu_disable_protect_mem_regions(iommu); + } up_write(&dmar_global_lock); If it does, the AMD IOMMU code also has a similar issue, I can figure out a clean way of doing this and post to the mailing list. Sorry for the delay in looking into this, I did not have an account. Mario contacted me on email directly.
(In reply to deepa from comment #7) > I think this is happening because some devices don't get shutdown properly > especially if in suspend power states and when the device comes back up, the > IOMMU is gone and the device is possibly accessing physical addresses > directly. > > Can you try if the following patch works for you? > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index a3a29647272c..879194797dee 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -5040,6 +5040,8 @@ void intel_iommu_shutdown(void) > { > struct dmar_drhd_unit *drhd; > struct intel_iommu *iommu = NULL; > + struct device *dev; > + int i; > > if (no_iommu || dmar_disabled) > return; > @@ -5047,11 +5049,21 @@ void intel_iommu_shutdown(void) > down_write(&dmar_global_lock); > > /* Disable PMRs explicitly here. */ > - for_each_iommu(iommu, drhd) > - iommu_disable_protect_mem_regions(iommu); > + for_each_iommu(iommu, drhd) { > + /* > + * All PCI devices managed by this unit should have been destroyed. > + */ > + if (!drhd->include_all && drhd->devices && drhd->devices_cnt) { > + for_each_active_dev_scope(drhd->devices, > + drhd->devices_cnt, i, dev) > + continue; > + } > > - /* Make sure the IOMMUs are switched off */ > - intel_disable_iommus(); > + disable_dmar_iommu(iommu); > + free_dmar_iommu(iommu); > + > + iommu_disable_protect_mem_regions(iommu); > + } > > up_write(&dmar_global_lock); > > > If it does, the AMD IOMMU code also has a similar issue, I can figure out a > clean way of doing this and post to the mailing list. > > Sorry for the delay in looking into this, I did not have an account. > Mario contacted me on email directly. Please refer to the comment in https://bugzilla.kernel.org/show_bug.cgi?id=208363#c16. Unfortunately, this suggested does seem to work.
Can you please check whether the test patch posted at https://bugzilla.kernel.org/show_bug.cgi?id=208363 can help here?
Just to make it clearer, Lu means the patch in this comment on that bug: https://bugzilla.kernel.org/attachment.cgi?id=289955&action=diff
Created attachment 290315 [details] A potential fix patch ask for test Hi, can anybody help to test the patch attached (A potential fix patch ask for test)? Best regards, baolu
I confirm that this is working with the patch that landed in 5.8.2
I'm fairly certain this bug can be closed as RESOLVED, as I haven't seen this happen on my 7390 for a year or longer.