Bug 199513 - Boot hangs with "AMD-Vi Command Buffer Timeout"
Summary: Boot hangs with "AMD-Vi Command Buffer Timeout"
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: IOMMU (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_iommu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-25 21:37 UTC by ojab
Modified: 2018-04-26 20:30 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.16.4
Tree: Mainline
Regression: No


Attachments
Booting 2400G APU with IOMMU enabled in BIOS results in this Kernel Panic (3.82 MB, image/jpeg)
2018-04-26 13:59 UTC, oyvinds
Details
2400G with IOMMU enabled, Wait loop timed out messages before Total Kernel Panic (4.22 MB, image/jpeg)
2018-04-26 14:09 UTC, oyvinds
Details

Description ojab 2018-04-25 21:37:57 UTC
Ryzen 2400g + asus A320M-K motherboard here, if iommu is enabled, boot hangs with 
`[    0.001000] AMD-Vi Command Buffer Timeout` printed to console many times, boots fine with `amd_iommu=off` or with iommu disabled in bios/efi.
Comment 1 ojab 2018-04-26 06:17:52 UTC
Reproducible on 4.16.4 & 4.17-rc2.
Comment 2 oyvinds 2018-04-26 09:53:37 UTC
Can confirm that this happens on ASUS B350M-PLUS TUF motherboard + 2400G *if* there is a dedicated GPU (not tested with other PCI-e cards). IOMMU does throw some error but the system does boot if there is no dedicated GPU in the system and the iGPU part of the 2400G is used.

Happens with kernels 4.16.4 and 4.17.0-0.rc2.git0.
Comment 3 ojab 2018-04-26 09:56:45 UTC
I've tried to boot only with CPU/RAM/SATA SSD inserted into MB, same error.
So it's reproducible even without external GPU for me.
Comment 4 oyvinds 2018-04-26 13:59:11 UTC
Created attachment 275587 [details]
Booting 2400G APU with IOMMU enabled in BIOS results in this Kernel Panic
Comment 5 oyvinds 2018-04-26 14:09:04 UTC
Created attachment 275589 [details]
2400G with IOMMU enabled, Wait loop timed out messages before Total Kernel Panic

Not sure if this is very useful but there it is.
AMD-Vi: Completion-Wait loop timed out
iommu ivhd0: AMD-Vi: Event Logged
iommu ivhd0: IOTLB_INV_TIMEOUT device=07:00.0 address=0x00...

07:00.X is
07:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a (rev c6)
07:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 15df
07:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15e0
07:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15e1
07:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Device 15e3
Comment 6 ojab 2018-04-26 20:30:48 UTC
AMD guys accidentally removed iommu@ mail-list from this reply, so it's not visible there, reposting here for posterity:

> I believe this might be related to an SME and IOMMU interaction.  There is a
> fix that is supposed to be in BIOS/UEFI to configure an IOMMU setting,
> otherwise you see those messages when booting with SME enabled.  I was trying
> to tie that fix to a microcode level (that was the only way to somehow figure
> out if it was present), but somehow some levels of UEFI went out without the
> fix but at or above the microcode level (see amd_iommu_sme_check() in
> amd_iommu_init.c).  I suspect if the user sets mem_encrypt=off that he won't
> see the IOMMU error messages and the system would boot successfully.  A BIOS
> update from the board manufacturer would be required (that would hopefully
> have the fix).  Otherwise, the user will have to insure that SME is off at
> boot (either through kernel config to not enable it by default or with
> mem_encrypt=off command line parameter).
>I believe they'll need an updated AGESA - I need to double check but I
think the fix is in 1007.


My system works fine with mem_encrypt=off and IOMMU on, so closing this.
If your issue is still reproducible with mem_encrypt=off — it's another issue and it's better to create new bugzilla ticket (probably a mail to iommu@lists.linux-foundation.org wouldn't hurt either) and mention AGESA version for you BIOS.

Note You need to log in before you can comment on or make changes to this bug.