Bug 208981
Summary: | trace with B550I AORUS PRO AX and AMD Ryzen 5 PRO 4650G | ||
---|---|---|---|
Product: | Drivers | Reporter: | Florian La Roche (florian.laroche) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | NEW --- | ||
Severity: | normal | CC: | alexdeucher, anton, arthurborsboom, liliorg, tino+kernel |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.8.2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | dmesg output |
Description
Florian La Roche
2020-08-20 19:03:59 UTC
Is this a regression? If so, can you bisect? Please attach your full dmesg output. New system, so no regression for me. I'll try to check some older kernels the next days and report back here. Thanks a lot, Florian La Roche Created attachment 292039 [details]
dmesg output
Full dmesg output of the system.
Hi, have you solved your problem? I have the same problem as you. Here is my environment: asrock A520M-ITX/AC + AMD 4750G Ubuntu 20.04.1 LTS + AMDGPU-pro-20.30-1109583 - Ubuntu-20.04.tar.xz I have tested kernels 5.6.19, 5.7.10 and 5.7.17 and they all show this problem. I assume your report means this also happens on a 5.4.x kernel (Ubuntu 20.04 LTS) Display seams to work ok and I am mostly using it on a server machine, so maybe not a huge problem, still a trace on each reboot... :-) (The kernel source mentions Display Port, clock values for power management etc. ???) Also seems to depend on BIOS data (?), so I'll check again on future BIOS versions as well as future kernel source code for fixes. [ 4.207712] smu driver if version = 0x0000000b, smu fw if version = 0x0000000e, smu fw version = 0x00374100 (55.65.0) [ 4.207717] SMU driver if version not matched [ 4.207795] SMU is initialized successfully! best regards, Florian La Roche Similar trace with Radeon RX 5500M, see at the end of this report: https://bugzilla.kernel.org/show_bug.cgi?id=209225 It might be related to the same cause. The same happens with the following setup: - Kernel 5.9-rc8 with mostly Debian kernel config - AMD Ryzen 5 4650G CPU - MSI MAG B550M Mortar mainboard - MSI AMD RX460 graphics card The same happened with the following setup: - Kernel 5.8.14 and 5.9 with mostly Gentoo kernel config - AMD Ryzen 7 PRO 4750G CPU+iGPU - ASRock A520M-ITX/ac mainboard + ECC UDIMM memory The trace mentioned above disappeared when I updated BIOS (v. 1.20 from 2020/9/18, it contains AGESA 1.0.8.0). However, I'm still not able to run ROCm OpenCL (tried various versions, including 3.7 and 3.8), system either hangs, or (if the program is killed early) dmesg shows Evicting PASID 0x8001 queues BTW, clinfo causes GPU resets, and leaves 99% GPU utilization, while dmesg shows something like qcm fence wait loop timeout expired The cp might be in an unrecoverable state due to an unsuccessful queues preemption amdgpu: Failed to evict process queues amdgpu: Failed to quiesce KFD amdgpu 0000:07:00.0: amdgpu: GPU reset begin! [drm] free PSP TMP buffer amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume ...(and similarly for kernel 5.9.0) It is probably an off-topic, but it seems to be related to amdgpu driver, and I don't know how to move forward (and somebody reported that ROCk 3.7 driver works well with APU Renoir). Hello, Am Mi., 14. Okt. 2020 um 11:44 Uhr schrieb <bugzilla-daemon@bugzilla.kernel.org>: > - Kernel 5.8.14 and 5.9 with mostly Gentoo kernel config > - AMD Ryzen 7 PRO 4750G CPU+iGPU > - ASRock A520M-ITX/ac mainboard + ECC UDIMM memory > > The trace mentioned above disappeared when I updated BIOS (v. 1.20 from > 2020/9/18, it contains AGESA 1.0.8.0). However, I'm still not able to run > ROCm I have updated my motherboard Gigabyte B550I AORUS PRO AX to BIOS F10 from 09/18/2020 with AMD AGESA ComboV2 1.0.8.1. The trace is still present, so this issue is still open for me. > OpenCL (tried various versions, including 3.7 and 3.8), system either hangs, > or > (if the program is killed early) dmesg shows > > Evicting PASID 0x8001 queues > > BTW, clinfo causes GPU resets, and leaves 99% GPU utilization, while dmesg > shows something like > > qcm fence wait loop timeout expired > The cp might be in an unrecoverable state due to an unsuccessful queues > preemption > amdgpu: Failed to evict process queues > amdgpu: Failed to quiesce KFD > amdgpu 0000:07:00.0: amdgpu: GPU reset begin! > [drm] free PSP TMP buffer > amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume > ...(and similarly for kernel 5.9.0) > > It is probably an off-topic, but it seems to be related to amdgpu driver, and > I > don't know how to move forward (and somebody reported that ROCk 3.7 driver > works well with APU Renoir). Seems this is all unrelated to my bug-report. best regards, Florian La Roche This seems to be fixed after updating to BIOS F12 from 2021-01-18, BIOS Revision: 5.17. There are even newer BIOS revisions available, but they only work with RAM at 2133 MT/s instead of the usual 3200 MT/s and seem to be unstable. best regards, Florian La Roche |