Using ArchLinux's kernel 6.13.7 with two Radeon GPU. Main GPU is an RX 5500 XT, second is an HD 7850 (Pitcairn). Booting works fine with both cards. However, connecting a monitor on any of the HD 7850's port throws the following error. Oops: Oops: 0010 [#1] PREEMPT SMP NOPTI CPU: 0 UID: 0 PID: 138 Comm: kworker/0:1H Tainted: G S 6.13.7-arch1-1 #1 c1fb750cdab658a6e7961595e6231210fa8606e4 Tainted: [S]=CPU_OUT_OF_SPEC Hardware name: To Be Filled By O.E.M. B550 Phantom Gaming 4/ac/B550 Phantom Gaming 4/ac, BIOS P2.40 10/19/2022 Workqueue: events_highpri dm_irq_work_func [amdgpu] RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffad114066b868 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffa047552802a8 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa04736648060 RBP: 0000000000000780 R08: ffffa04694d10000 R09: 0000000000000000 R10: 00000000007a1200 R11: ffffa04755280a88 R12: ffffa04736648000 R13: 0000000000000000 R14: 0000000000000780 R15: 0000000000000780 FS: 0000000000000000(0000) GS:ffffa049aea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 000000010d8c8000 CR4: 0000000000350ef0 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15c/0x2e0 ? exc_page_fault+0x81/0x190 ? asm_exc_page_fault+0x26/0x30 resource_get_odm_slice_dst_width+0xc2/0x120 [amdgpu 63b2a590acaeeee8c3b2e1cf2368f882ac94c973] resource_get_odm_slice_dst_rect+0xba/0x140 [amdgpu 63b2a590acaeeee8c3b2e1cf2368f882ac94c973] resource_get_odm_slice_src_rect+0x57/0x130 [amdgpu 63b2a590acaeeee8c3b2e1cf2368f882ac94c973] resource_build_scaling_params+0x2b/0x940 [amdgpu 63b2a590acaeeee8c3b2e1cf2368f882ac94c973] resource_append_dpp_pipes_for_plane_composition+0x1dc/0x2a0 [amdgpu 63b2a590acaeeee8c3b2e1cf2368f882ac94c973] ? srso_return_thunk+0x5/0x5f ? dce110_get_pix_clk_dividers+0x233/0x2a0 [amdgpu 63b2a590acaeeee8c3b2e1cf2368f882ac94c973] dc_state_add_plane+0xca/0x260 [amdgpu 63b2a590acaeeee8c3b2e1cf2368f882ac94c973] create_validate_stream_for_sink+0x380/0x400 [amdgpu 63b2a590acaeeee8c3b2e1cf2368f882ac94c973] amdgpu_dm_connector_mode_valid+0x63/0x200 [amdgpu 63b2a590acaeeee8c3b2e1cf2368f882ac94c973] drm_connector_mode_valid+0x3b/0x60 __drm_helper_update_and_validate+0x127/0x3e0 ? srso_return_thunk+0x5/0x5f drm_helper_probe_single_connector_modes+0x332/0x630 drm_client_modeset_probe+0x273/0x1740 ? srso_return_thunk+0x5/0x5f ? __wake_up+0x44/0x60 ? kmem_cache_free+0x3f0/0x450 __drm_fb_helper_initial_config_and_unlock+0x3b/0x4d0 ? srso_return_thunk+0x5/0x5f drm_client_dev_hotplug+0xa1/0xf0 handle_hpd_irq_helper+0x176/0x190 [amdgpu 63b2a590acaeeee8c3b2e1cf2368f882ac94c973] process_one_work+0x17e/0x330 worker_thread+0x2ce/0x3f0 ? __pfx_worker_thread+0x10/0x10 kthread+0xd2/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: tls snd_seq_dummy rfcomm snd_hrtimer snd_seq nf_conntrack_netlink xt_nat iptable_raw xt_tcpudp veth xt_conntrack xt_MASQUERADE bridge stp l> btmtk crypto_simd snd_pcm sp5100_tco cryptd videodev wmi_bmof snd_timer mdio_devres blake2b_generic rapl cfg80211 snd i2c_piix4 bluetooth xor mc pcspkr k10te> CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffad114066b868 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffa047552802a8 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffa04736648060 RBP: 0000000000000780 R08: ffffa04694d10000 R09: 0000000000000000 R10: 00000000007a1200 R11: ffffa04755280a88 R12: ffffa04736648000 R13: 0000000000000000 R14: 0000000000000780 R15: 0000000000000780 FS: 0000000000000000(0000) GS:ffffa049aea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 000000010d8c8000 CR4: 0000000000350ef0 note: kworker/0:1H[138] exited with irqs disabled I also tested a vanilla 6.13.0, which gave me the same error. Going back further with the 6.10.2-rt from Arch gave me no output, but it didn't crash (no output in the logs). My kernel is configured with radeon.si_support=0 radeon.cik_support=0 amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.dpm=1 amdgpu.dc=1 amdgpu.ppfeaturemask=0xffffffff
Since I had not encountered this problem until recently (the HD7850 was sitting as the primary GPU until my secondary GPU had to be replaced), I tried to inverse the primary GPU (RX 5500 XT) with the secondary GPU (HD7850). So, when the HD7850 sits as the primary GPU, it still doesn't output any video on 6.13.7 and it freezes. However, it works properly on 6.10.2 when sitting as primary (the HD7850 still doesn't output anything when sitting as the secondary GPU, but it doesn't crash. Thus, this seems be a different issue). I'll try to narrow it down a bit more to figure out if it was introduced between 6.10 and 6.13.
6.11.0 exhibits the same problem where there is no output. There must a signal established though, since the monitor doesn't complain about "No signal". However, the boot process is stuck. Using the radeon driver works as expected.
(In reply to Alexandre Demers from comment #1) > Since I had not encountered this problem until recently (the HD7850 was > sitting as the primary GPU until my secondary GPU had to be replaced), I > tried to inverse the primary GPU (RX 5500 XT) with the secondary GPU > (HD7850). > > So, when the HD7850 sits as the primary GPU, it still doesn't output any > video on 6.13.7 and it freezes. For precision, when I say "it freezes", I mean the GPUs freeze: no output on any of the GPUs. However, I still see activity on the front light of the computer, so the kernel itself is still running. If I connect the monitor on the RX 5500 XT in this configuration (sitting as secondary GPU) before booting, the RX 5500 XT works properly until I try to connect the monitor to the HD7850 (then, no output on any GPU after that). Alexandre > > However, it works properly on 6.10.2 when sitting as primary (the HD7850 > still doesn't output anything when sitting as the secondary GPU, but it > doesn't crash. Thus, this seems be a different issue). > > I'll try to narrow it down a bit more to figure out if it was introduced > between 6.10 and 6.13.
6.12.0 was already "Oopsing". I hadn't checked if 6.11.0 was "oopsing" in the log though before testing 6.12.0, I'll have to check it out later.
bisecting found out the following culprit: e6a901a00822659181c93c86d8bbc2a17779fddc is the first bad commit commit e6a901a00822659181c93c86d8bbc2a17779fddc (HEAD) Author: Wenjing Liu <wenjing.liu@amd.com> Date: Wed Apr 17 15:23:08 2024 -0400 drm/amd/display: use even ODM slice width for two pixels per container [why] When optc uses two pixel per container, each ODM slice width must be an even number. [how] If ODM slice width is odd number increase it by 1. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Looking at the commit, this bug probably affects a wider range of GPUs since DCE12.0 and DCE8.0 rely on the change applied to DCE11.0
DCE6's dce60_tg_funcs structure is missing .is_two_pixels_per_container = dce110_is_two_pixels_per_container It seems the fix is already in 6.14-rc7.
For reference, commit e204aab79e01bc8ff750645666993ed8b719de57
It will be patched in upcoming 6.14, but this should be backported.