Bug 208149 - amdgpu makes system crash (even when using fbdev for X)
Summary: amdgpu makes system crash (even when using fbdev for X)
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Console/Framebuffers (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: James Simmons
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-12 20:26 UTC by pkk
Modified: 2021-02-14 14:47 UTC (History)
4 users (show)

See Also:
Kernel Version: 5.7.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg output from shortly before the crash (93.43 KB, text/plain)
2020-06-12 20:26 UTC, pkk
Details
dmesg with kernel parameter amdgpu.runpm=0 (88.88 KB, text/plain)
2020-09-11 09:07 UTC, Arthur Borsboom
Details

Description pkk 2020-06-12 20:26:41 UTC
Created attachment 289637 [details]
dmesg output from shortly before the crash

Today, I received a MSI Bravo 15 (Ryzen 4800H + Radeon 5500M).

When loaded, the amdgpu module makes the system very unstable; X usually crashes after a few seconds. Even if X actually uses the fbdev driver.

The system is stable when amdgpu is not loaded (but then fbdev only gives me 1024x768, while with amdgpu loaded I get more resolutions).

I see the same problem using 5.6.14 kernel from Debian, so this is apparently not a regression.

The attached dmesg is pre-crash. What happened here:

System was booted with amdgpu blacklisted. X server started (fbdev, 1024x768 resolution).

modproge amdgpu was done (resulting in the "[   55.887409] [drm:amdgpu_get_bios [amdgpu]] *ERROR* ACPI VFCT table present but broken (too short #2)" message).

Then I switched to the console on which X was running. This resulted in higher resolution than before, and the rest of the output in dmesg. I switched back to a text console.

The system is still running at that point, but switching back to X will result in a crash very soon, usually within seconds.

Philipp
Comment 1 pkk 2020-06-20 09:20:55 UTC
I updated the EC firmware and tested kernel 5.7.4 on my Bravo 15 A4DDR today.

With amdgpu.runpm=0, it works very well for me. I have not seen a single crash yet.

Without amdgpu.runpm=0 I get an instant crash the moment I start X.

Some more discussion, including posts by at least two other GNU/Linux users on MSI Bravo 15 can be found at https://www.reddit.com/r/Amd/comments/h87bn6/msi_bravo_15_ryzen_7_4800h_radeon_rx_5500mbased/
Comment 2 Arthur Borsboom 2020-09-11 09:06:17 UTC
Confirming similar behavior for the Bravo 17 A4DDR (Ryzen 4800H + Radeon 5500M).

Arch Linux kernel 5.8.8

Crashes almost instantly without kernel option amdgpu.runpm=0

With amdgpu.runpm=0 the system does seem to run stable, but with several errors in dmesg, such as the following.

** snd_pci_acp3x 0000:07:00.5: Invalid ACP audio mode : 0
** [drm:amdgpu_get_bios [amdgpu]] *ERROR* ACPI VFCT table present but broken (too short #2)
** acp_pdm_mach acp_pdm_mach.0: snd_soc_register_card(acp) failed: -517
** [drm:dm_helpers_dp_write_dpcd [amdgpu]] *ERROR* Failed to find connector for link!
** [amdgpu] oops in dc_link_set_backlight_level
** [drm:mod_hdcp_add_display_to_topology [amdgpu]] *ERROR* Failed to add display topology, DTM TA is not initialized.

If I understood correct, using amdgpu.runpm=0 is a workaround, resulting in shorter battery life, since the second GPU is not turned off when unused. Therefore I hope someone will take the time and effort to fix this.

Dmesg is attached.
Comment 3 Arthur Borsboom 2020-09-11 09:07:40 UTC
Created attachment 292469 [details]
dmesg with kernel parameter amdgpu.runpm=0
Comment 4 kirtangajjar95 2021-01-07 14:42:55 UTC
Is there any update on this?
Comment 5 boldos 2021-01-07 17:17:05 UTC
Same issues as described above confirmed by my friend.

He bought a new MSI Bravo 17 A4DDR-034XCZ (AMD Ryzen 7 4800H, 17.3" 144Hz, RAM 16GB DDR4, AMD Vega integrated + AMD Radeon RX 5500M 4GB) couple of days ago (Jan 2021) and he was facing exactly the same issues as described here. He wanted it for Linux, unfortunately gave up trying Linux and bought&installed Windows :(    

I see this issue is opened for a long time; is there any progress please?
Comment 6 Kishan 2021-02-14 14:32:07 UTC
Did i get any solution ?
Comment 7 Kishan 2021-02-14 14:32:39 UTC
Did u get any solution for 
 [drm:amdgpu_get_bios [amdgpu]] *ERROR* ACPI VFCT table present but broken (too short #2)"



???
Comment 8 Arthur Borsboom 2021-02-14 14:47:17 UTC
As a starter, I believe Bugzilla is not used anymore by the amdgpu developers. AFAIK bugs are tracked here.

https://gitlab.freedesktop.org/drm/amd/-/issues

I have reported a similar bug and it resulted in a solution.

https://gitlab.freedesktop.org/drm/amd/-/issues/1304

If you have another issue which is believed to be an amdgpu driver issue, I suggest to create a bug report at the first link above.

Note You need to log in before you can comment on or make changes to this bug.