Bug 211033 - [bisected][regression] amdgpu: *ERROR* Restoring old state failed with -12
Summary: [bisected][regression] amdgpu: *ERROR* Restoring old state failed with -12
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
: 211065 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-01-04 17:28 UTC by Žilvinas Žaltiena
Modified: 2023-08-15 17:37 UTC (History)
15 users (show)

See Also:
Kernel Version: 5.10.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
kernel log (177.13 KB, text/plain)
2021-02-04 04:49 UTC, Shawn Anastasio
Details

Description Žilvinas Žaltiena 2021-01-04 17:28:59 UTC
I have AMD RX560 GPU which is connected to LG 27UD-88 display via ATEN CS782DP displayPort KVM switch.

I can't reliably switch KVM port to the machine running Kernel 5.10.4 with previously mentioned GPU. Screen is usually blank and display even goes to standby after the first try, however sometimes (if I switch KVM ports quickly a few times) it turns on, but with wrong resolution. dmesg complains with:

[drm:dm_restore_drm_connector_state] *ERROR* Restoring old state failed with -12

for every failed port switch.

KVM switch is configured to "redetect display" - basically it emulates replugging when port is switched. I have been using it for 4 years with the same GPU and it worked correctly until Kernel 5.10.4

I did a bisect, and it showed the first bad commit was: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd?h=linux-rolling-stable&id=ea64b21c6638d1ac3120fff89eb20c0e525a21d9

Reverting this commit fixed my issue (it works like pre 5.10.4)
Comment 2 Stuart Foster 2021-01-04 22:23:16 UTC
Got same problem on:

ASUS Name/PRIME A320M-K, BIOS 5603 10/14/2020 (with KVM switch).
amdgpu is raven2.
Comment 3 Jindrich Makovicka 2021-01-05 10:41:56 UTC
On Tue, 5 Jan 2021 07:54:02 +0100, Greg Kroah-Hartman wrote:

> Can you test 5.11-rc to see if this issue is there as well?

Tested with 5.11.0-rc2 and 5600 XT, same issue.

"xset dpms force off" turns the display off and it won't come back,
with the same error in the syslog.

Jan  5 11:11:54 holly kernel: [drm:dm_restore_drm_connector_state
[amdgpu]] *ERROR* Restoring old state failed with -12

Can be worked around by

1) turning display power off completely
2) switching to console
3) turning display power on - linux console shows up
4) switching back to X
Comment 4 Oleg Serytsan 2021-01-05 12:19:42 UTC
Got the same problem with 5.4.86 and AMD RX560.

Reverting the following commit fixed the issue:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=e1b1f10c3404c8d40c45c3a6846d304fd403fa2c
Comment 5 Mike Old 2021-01-05 21:15:36 UTC
Same problem on Arch (5.10.4), AMD RX580. In my case it works without issues if I set the monitor to 120hz instead of 144hz in GNOME settings.

Can also be worked around by setting the monitor (monitor settings) to Display Port version 1.1 and thus a lower refresh rate or by using HDMI instead of DP.
Comment 6 Shlomo 2021-01-06 17:20:38 UTC
I think I have the same bug on Arch after upgrading linux from 5.10.3.arch1-1 to 5.10.4.arch2-1.

My graphics card is Gigabyte Radeon RX VEGA 56 GAMING OC 8G, connected to 6 monitors: 3 HDMI outputs + 3 DisplayPort outputs. (One of the DisplayPort outputs is connected to a DisplayPort->HDMI converter). No KVM.
All monitors run at 59.95 or 60 Hz.

Sometimes when I try to wake up the monitors, the 3 monitors on the HDMI outputs (or occasionally just 1 or 2 of them) don't turn back on. dmesg shows:

[   56.906065] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
[   56.936004] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
[   56.976021] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
[   62.646390] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
[   62.671813] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
[   62.697193] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12

The DisplayPort outputs work fine.

Sometimes all monitors correctly turn on, but dmesg shows the following errors which didn't appear before:

[  542.425462] [drm:dce110_enable_timing_synchronization [amdgpu]] *ERROR* GSL: Timeout on reset trigger!
[  542.592191] [drm:dce110_enable_timing_synchronization [amdgpu]] *ERROR* GSL: Timeout on reset trigger!
[  542.937936] [drm:dce110_enable_timing_synchronization [amdgpu]] *ERROR* GSL: Timeout on reset trigger!
[  543.104672] [drm:dce110_enable_timing_synchronization [amdgpu]] *ERROR* GSL: Timeout on reset trigger!
[  543.421457] [drm:dce110_enable_timing_synchronization [amdgpu]] *ERROR* GSL: Timeout on reset trigger!
[  543.588188] [drm:dce110_enable_timing_synchronization [amdgpu]] *ERROR* GSL: Timeout on reset trigger!
[  543.771657] [drm:dce110_enable_timing_synchronization [amdgpu]] *ERROR* GSL: Timeout on reset trigger!

ea64b21c6638d1ac3120fff89eb20c0e525a21d9 "drm/amd/display: Fix memory leaks in S3 resume" is the first bad commit.
Comment 7 Andre Tomt 2021-01-06 18:42:46 UTC
A revert for ea64b21c6638d1ac3120fff89eb20c0e525a21d9 has been queued in stable-queue, meaning it should show up in 5.10.6 and 5.4.88 if all goes well.
Comment 8 Artur Bac 2021-01-06 23:48:32 UTC
*** Bug 211065 has been marked as a duplicate of this bug. ***
Comment 9 Artur Bac 2021-01-06 23:50:56 UTC
I can confirm this regression with 
AMD Radeon RX 5700 XT (NAVI10, DRM 3.40.0, 5.10.3, LLVM 11.0.0)
I have display only on one monitor of 2 connected with 5.10.4 and 5.10.5
Comment 10 Artur Bac 2021-01-11 00:39:16 UTC
5.10.6 works ok again with 2 monitors and Sapphire Nitro+ Radeon RX 5700 XT

here is dmesg output of drm on 5.10.6

[    3.471791] systemd[1]: Starting Load Kernel Module drm...
[    3.482194] systemd[1]: modprobe@drm.service: Succeeded.
[    3.482336] systemd[1]: Finished Load Kernel Module drm.
[    3.562351] [drm] amdgpu kernel modesetting enabled.
[    3.563495] [drm] initializing kernel modesetting (NAVI10 0x1002:0x731F 0x1DA2:0xE409 0xC1).
[    3.563505] [drm] register mmio base: 0xFCC00000
[    3.563506] [drm] register mmio size: 524288
[    3.564808] [drm] add ip block number 0 <nv_common>
[    3.564810] [drm] add ip block number 1 <gmc_v10_0>
[    3.564811] [drm] add ip block number 2 <navi10_ih>
[    3.564812] [drm] add ip block number 3 <psp>
[    3.564813] [drm] add ip block number 4 <smu>
[    3.564814] [drm] add ip block number 5 <dm>
[    3.564815] [drm] add ip block number 6 <gfx_v10_0>
[    3.564816] [drm] add ip block number 7 <sdma_v5_0>
[    3.564817] [drm] add ip block number 8 <vcn_v2_0>
[    3.564818] [drm] add ip block number 9 <jpeg_v2_0>
[    3.564848] [drm] VCN decode is enabled in VM mode
[    3.564849] [drm] VCN encode is enabled in VM mode
[    3.564850] [drm] JPEG decode is enabled in VM mode
[    3.564865] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[    3.564878] [drm] Detected VRAM RAM=8176M, BAR=256M
[    3.564879] [drm] RAM width 256bits GDDR6
[    3.564935] [drm] amdgpu: 8176M of VRAM memory ready
[    3.564937] [drm] amdgpu: 8176M of GTT memory ready.
[    3.564938] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    3.565069] [drm] PCIE GART of 512M enabled (table at 0x0000008000900000).
[    3.588344] [drm] Found VCN firmware Version ENC: 1.10 DEC: 5 VEP: 0 Revision: 13
[    3.588351] [drm] PSP loading VCN firmware
[    4.166778] [drm] reserve 0x900000 from 0x81fe400000 for PSP TMR
[    4.412518] [drm] Display Core initialized with v3.2.104!
[    4.625413] [drm] kiq ring mec 2 pipe 1 q 0
[    4.634394] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[    4.634532] [drm] JPEG decode initialized successfully.
[    4.639105] [drm] fb mappable at 0xE0B0A000
[    4.639107] [drm] vram apper at 0xE0000000
[    4.639108] [drm] size 33177600
[    4.639110] [drm] fb depth is 24
[    4.639111] [drm]    pitch is 15360
[    4.639171] fbcon: amdgpudrmfb (fb0) is primary device
[    4.791132] amdgpu 0000:0f:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[    4.816294] [drm] Initialized amdgpu 3.40.0 20150101 for 0000:0f:00.0 on minor 0
Comment 11 Andreas 2021-01-11 16:51:41 UTC
I'm sorry, but I can confirm the bug still also for 5.10.6! - it seems to be more seldom, but existing.
(System: Kubuntu 20.10, Ryzon 7 PRO 4750G - using only integrated GPU)

Jan 10 15:22:02 localhost kernel: [    0.000000] Linux version 5.10.6-aw (root@icehome) (gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0, GNU ld (GNU Binutils for Ubuntu) 2.35.1) #1 SMP PREEMPT Sat Jan 9 19:50:09 CET 2021
Jan 10 15:22:02 localhost kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.6-aw root=UUID=6ff371aa-4315-475d-b8ec-b0a642c9eb5b ro nosplash video=1920x1080
...
Jan 11 12:47:22 localhost kernel: [76631.564613] [drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
Jan 11 12:47:23 localhost kernel: [76632.038945] [drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
Jan 11 12:47:23 localhost kernel: [76632.570188] [drm] perform_link_training_with_retries: Link training attempt 3 of 4 failed
Jan 11 12:47:24 localhost kernel: [76633.145733] [drm] enabling link 1 failed: 15
Jan 11 12:47:48 localhost kernel: [76657.512604] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
Jan 11 12:47:48 localhost kernel: [76657.512767] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing B200 (len 3615, WS 8, PS 0) @ 0xB34E
Jan 11 12:47:48 localhost kernel: [76657.512933] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing B0F4 (len 268, WS 4, PS 0) @ 0xB147
Jan 11 12:47:48 localhost kernel: [76657.513150] [drm:dcn10_link_encoder_enable_dp_output [amdgpu]] *ERROR* dcn10_link_encoder_enable_dp_output: Failed to execute VBIOS command table!
Jan 11 12:47:50 localhost kernel: [76658.934111] [drm] amdgpu_dm_irq_schedule_work FAILED src 2
Jan 11 12:48:02 localhost kernel: [76671.080303] ------------[ cut here ]------------
Jan 11 12:48:02 localhost kernel: [76671.080468] WARNING: CPU: 11 PID: 34192 at decide_link_settings+0x243/0x250 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.080470] Modules linked in: snd_usb_audio uvcvideo snd_usbmidi_lib snd_seq_dummy snd_hrtimer vmnet(OE) vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OE) binfmt_misc si2157 si2168 m88rs6000t a8293 cx25840 nls_iso8859_1 wmi_bmof amdgpu snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec amd64_edac_mod snd_hwdep edac_mce_amd snd_hda_core kvm_amd cx23885 snd_seq_midi kvm gpu_sched snd_seq_midi_event tveeprom ttm snd_rawmidi altera_ci crct10dif_pclmul i2c_algo_bit cx2341x ghash_clmulni_intel tda18271 drm_kms_helper snd_pcm joydev snd_seq rapl snd_seq_device altera_stapl snd_timer syscopyarea videobuf2_dvb sysfillrect videobuf2_dma_sg sysimgblt m88ds3103 fb_sys_fops efi_pstore snd i2c_mux cec dvb_core ccp soundcore videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common k10temp videodev mc rc_core nf_log_ipv6 wmi xt_hl ip6_tables ip6t_rt video nf_log_ipv4 nf_log_common xt_LOG nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6
Jan 11 12:48:02 localhost kernel: [76671.080540]  nf_defrag_ipv4 nft_compat sch_fq_codel nft_counter nct6775 hwmon_vid lm92 nf_tables nfnetlink lm83 drm ip_tables x_tables autofs4 raid10 raid1 raid0 multipath linear hid_generic usbhid hid crc32_pclmul i2c_piix4 e1000e r8169 ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic
Jan 11 12:48:02 localhost kernel: [76671.080571] CPU: 11 PID: 34192 Comm: Xorg Tainted: G           OE     5.10.6-aw #1
Jan 11 12:48:02 localhost kernel: [76671.080573] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B550M Pro4, BIOS P1.70 12/01/2020
Jan 11 12:48:02 localhost kernel: [76671.080724] RIP: 0010:decide_link_settings+0x243/0x250 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.080727] Code: 8b 54 24 18 49 89 06 49 89 56 08 e9 07 ff ff ff 48 8b 83 88 00 00 00 48 8b 93 90 00 00 00 49 89 06 49 89 56 08 e9 ed fe ff ff <0f> 0b e9 d7 fe ff ff e8 41 6d f6 f7 90 0f 1f 44 00 00 55 48 89 e5
Jan 11 12:48:02 localhost kernel: [76671.080729] RSP: 0018:ffffa3aac0bdf650 EFLAGS: 00010246
Jan 11 12:48:02 localhost kernel: [76671.080731] RAX: 0000000000000000 RBX: ffff91ee84bc6c00 RCX: 00000000000009c5
Jan 11 12:48:02 localhost kernel: [76671.080733] RDX: ffffffffc0ed3bf0 RSI: ffffffffc0f3f183 RDI: 0000000000000000
Jan 11 12:48:02 localhost kernel: [76671.080734] RBP: ffffa3aac0bdf698 R08: ffff91ef997c2000 R09: ffffa3aac0bdf618
Jan 11 12:48:02 localhost kernel: [76671.080735] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000093828
Jan 11 12:48:02 localhost kernel: [76671.080736] R13: ffffa3aac0bdf660 R14: ffffa3aac0bdf6a8 R15: ffffa3aac0bdf6a8
Jan 11 12:48:02 localhost kernel: [76671.080739] FS:  00007f212bf39a40(0000) GS:ffff91fd2f4c0000(0000) knlGS:0000000000000000
Jan 11 12:48:02 localhost kernel: [76671.080740] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 11 12:48:02 localhost kernel: [76671.080742] CR2: 00005622049f7630 CR3: 000000010597a000 CR4: 0000000000350ee0
Jan 11 12:48:02 localhost kernel: [76671.080743] Call Trace:
Jan 11 12:48:02 localhost kernel: [76671.080892]  ? amdgpu_cgs_read_register+0x14/0x20 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.081040]  enable_link_dp+0x15a/0x230 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.081185]  core_link_enable_stream+0x6a3/0x830 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.081368]  dce110_apply_ctx_to_hw+0x590/0x5d0 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.081512]  ? dm_read_reg_func+0x3e/0xb0 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.081663]  dc_commit_state+0x32a/0xa70 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.081694]  ? drm_calc_timestamping_constants+0x199/0x200 [drm]
Jan 11 12:48:02 localhost kernel: [76671.081929]  amdgpu_dm_atomic_commit_tail+0x529/0x2420 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.081945]  ? ttm_bo_move_accel_cleanup+0x1fa/0x3f0 [ttm]
Jan 11 12:48:02 localhost kernel: [76671.082058]  ? amdgpu_move_blit+0xce/0x210 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.082182]  ? amdgpu_vram_mgr_new+0x363/0x3c0 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.082295]  ? amdgpu_bo_move+0xa4/0x2b0 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.082303]  ? ttm_bo_handle_move_mem+0xba/0x4a0 [ttm]
Jan 11 12:48:02 localhost kernel: [76671.082305]  ? ttm_bo_validate+0x137/0x150 [ttm]
Jan 11 12:48:02 localhost kernel: [76671.082353]  ? dm_plane_helper_prepare_fb+0x198/0x250 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.082357]  ? wait_for_completion_timeout+0xc0/0xf0
Jan 11 12:48:02 localhost kernel: [76671.082365]  commit_tail+0x99/0x130 [drm_kms_helper]
Jan 11 12:48:02 localhost kernel: [76671.082369]  drm_atomic_helper_commit+0x123/0x150 [drm_kms_helper]
Jan 11 12:48:02 localhost kernel: [76671.082415]  amdgpu_dm_atomic_commit+0x11/0x20 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.082430]  drm_atomic_commit+0x4a/0x50 [drm]
Jan 11 12:48:02 localhost kernel: [76671.082438]  drm_atomic_helper_set_config+0x7c/0xc0 [drm_kms_helper]
Jan 11 12:48:02 localhost kernel: [76671.082449]  drm_mode_setcrtc+0x20b/0x7e0 [drm]
Jan 11 12:48:02 localhost kernel: [76671.082487]  ? amdgpu_cs_wait_ioctl+0xd8/0x160 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.082514]  ? drm_mode_getcrtc+0x190/0x190 [drm]
Jan 11 12:48:02 localhost kernel: [76671.082520]  drm_ioctl_kernel+0xae/0xf0 [drm]
Jan 11 12:48:02 localhost kernel: [76671.082526]  drm_ioctl+0x245/0x400 [drm]
Jan 11 12:48:02 localhost kernel: [76671.082532]  ? drm_mode_getcrtc+0x190/0x190 [drm]
Jan 11 12:48:02 localhost kernel: [76671.082565]  amdgpu_drm_ioctl+0x4e/0x80 [amdgpu]
Jan 11 12:48:02 localhost kernel: [76671.082567]  __x64_sys_ioctl+0x91/0xc0
Jan 11 12:48:02 localhost kernel: [76671.082569]  do_syscall_64+0x38/0x90
Jan 11 12:48:02 localhost kernel: [76671.082570]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 11 12:48:02 localhost kernel: [76671.082571] RIP: 0033:0x7f212c39e31b
Jan 11 12:48:02 localhost kernel: [76671.082573] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1d 3b 0d 00 f7 d8 64 89 01 48
Jan 11 12:48:02 localhost kernel: [76671.082574] RSP: 002b:00007ffce35a8bd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 11 12:48:02 localhost kernel: [76671.082576] RAX: ffffffffffffffda RBX: 00007ffce35a8c10 RCX: 00007f212c39e31b
Jan 11 12:48:02 localhost kernel: [76671.082577] RDX: 00007ffce35a8c10 RSI: 00000000c06864a2 RDI: 000000000000000f
Jan 11 12:48:02 localhost kernel: [76671.082577] RBP: 00000000c06864a2 R08: 0000000000000000 R09: 00005622052b8df0
Jan 11 12:48:02 localhost kernel: [76671.082577] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Jan 11 12:48:02 localhost kernel: [76671.082578] R13: 000000000000000f R14: 00005622042462c0 R15: 0000000000000000
Jan 11 12:48:02 localhost kernel: [76671.082580] ---[ end trace d8392f22819e77e9 ]---
Jan 11 13:23:24 localhost kernel: [78778.721432] [drm] amdgpu_dm_irq_schedule_work FAILED src 2
Jan 11 13:23:25 localhost kernel: [78779.094587] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
Jan 11 13:23:25 localhost kernel: [78779.094762] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing B200 (len 3615, WS 8, PS 0) @ 0xB34E
Jan 11 13:23:25 localhost kernel: [78779.094941] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing B0F4 (len 268, WS 4, PS 0) @ 0xB147
Jan 11 13:23:25 localhost kernel: [78779.095175] [drm:dcn10_link_encoder_enable_dp_output [amdgpu]] *ERROR* dcn10_link_encoder_enable_dp_output: Failed to execute VBIOS command table!
Jan 11 13:23:44 localhost kernel: [78798.812677] [drm] amdgpu_dm_irq_schedule_work FAILED src 2
Jan 11 13:23:45 localhost kernel: [78799.637347] [drm] amdgpu_dm_irq_schedule_work FAILED src 2
Jan 11 13:23:47 localhost kernel: [78800.947592] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
Jan 11 13:23:47 localhost kernel: [78800.947767] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing B200 (len 3615, WS 8, PS 0) @ 0xB6EA
Jan 11 13:23:47 localhost kernel: [78800.947945] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing B0F4 (len 268, WS 4, PS 0) @ 0xB147
Jan 11 13:23:47 localhost kernel: [78800.948179] [drm:dcn10_link_encoder_enable_dp_output [amdgpu]] *ERROR* dcn10_link_encoder_enable_dp_output: Failed to execute VBIOS command table!
Jan 11 13:23:55 localhost kernel: [78809.545424] [drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
Jan 11 13:23:56 localhost kernel: [78810.658465] [drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
Jan 11 13:24:17 localhost kernel: [78831.232553] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
Jan 11 13:24:17 localhost kernel: [78831.232591] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing B200 (len 3615, WS 8, PS 0) @ 0xB34E
Jan 11 13:24:17 localhost kernel: [78831.232627] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing B0F4 (len 268, WS 4, PS 0) @ 0xB147
Jan 11 13:24:17 localhost kernel: [78831.232679] [drm:dcn10_link_encoder_enable_dp_output [amdgpu]] *ERROR* dcn10_link_encoder_enable_dp_output: Failed to execute VBIOS command table!
Jan 11 13:24:17 localhost kernel: [78831.243045] [drm] perform_link_training_with_retries: Link training attempt 3 of 4 failed
Jan 11 13:24:18 localhost kernel: [78832.455340] [drm] enabling link 1 failed: 15
Jan 11 13:24:21 localhost kernel: [78835.109927] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
Jan 11 13:24:21 localhost kernel: [78835.109964] clocksource:                       'hpet' wd_now: 9fd5133d wd_last: 9ee2f985 mask: ffffffff
Jan 11 13:24:21 localhost kernel: [78835.109970] clocksource:                       'tsc' cs_now: 10385bde83018 cs_last: 103851817b114 mask: ffffffffffffffff
Jan 11 13:24:21 localhost kernel: [78835.109974] tsc: Marking TSC unstable due to clocksource watchdog
Jan 11 13:24:21 localhost kernel: [78835.110724] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Jan 11 13:24:21 localhost kernel: [78835.110730] sched_clock: Marking unstable (79377631293089, -542502030791)<-(78835125546559, -15519055)
Jan 11 13:24:23 localhost kernel: [78836.812226] clocksource: Switched to clocksource hpet
Jan 11 13:26:26 localhost kernel: [78959.736998] [drm] amdgpu_dm_irq_schedule_work FAILED src 2
Jan 11 13:26:28 localhost kernel: [78961.849240] INFO: task kworker/7:0:43964 blocked for more than 122 seconds.
Jan 11 13:26:28 localhost kernel: [78961.849247]       Tainted: G        W  OE     5.10.6-aw #1
Jan 11 13:26:28 localhost kernel: [78961.849249] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 11 13:26:28 localhost kernel: [78961.849252] task:kworker/7:0     state:D stack:    0 pid:43964 ppid:     2 flags:0x00004000
Jan 11 13:26:28 localhost kernel: [78961.849433] Workqueue: events dm_irq_work_func [amdgpu]
Jan 11 13:26:28 localhost kernel: [78961.849435] Call Trace:
Jan 11 13:26:28 localhost kernel: [78961.849444]  __schedule+0x267/0x7c0
Jan 11 13:26:28 localhost kernel: [78961.849447]  schedule+0x68/0xe0
Jan 11 13:26:28 localhost kernel: [78961.849450]  schedule_preempt_disabled+0x15/0x20
Jan 11 13:26:28 localhost kernel: [78961.849453]  __ww_mutex_lock.constprop.0+0x316/0x7c0
Jan 11 13:26:28 localhost kernel: [78961.849457]  __ww_mutex_lock_slowpath+0x16/0x20
Jan 11 13:26:28 localhost kernel: [78961.849459]  ww_mutex_lock+0x77/0x90
Jan 11 13:26:28 localhost kernel: [78961.849487]  drm_modeset_lock+0x35/0xb0 [drm]
Jan 11 13:26:28 localhost kernel: [78961.849507]  drm_modeset_lock_all_ctx+0x28/0x300 [drm]
Jan 11 13:26:28 localhost kernel: [78961.849526]  drm_modeset_lock_all+0x5e/0xb0 [drm]
Jan 11 13:26:28 localhost kernel: [78961.849689]  handle_hpd_irq+0xd2/0x120 [amdgpu]
Jan 11 13:26:28 localhost kernel: [78961.849843]  dm_irq_work_func+0x4e/0x60 [amdgpu]
Jan 11 13:26:28 localhost kernel: [78961.849847]  process_one_work+0x1e3/0x3b0
Jan 11 13:26:28 localhost kernel: [78961.849850]  worker_thread+0x50/0x3f0
Jan 11 13:26:28 localhost kernel: [78961.849853]  ? rescuer_thread+0x390/0x390
Jan 11 13:26:28 localhost kernel: [78961.849856]  kthread+0x145/0x170
Jan 11 13:26:28 localhost kernel: [78961.849859]  ? __kthread_bind_mask+0x70/0x70
Jan 11 13:26:28 localhost kernel: [78961.849863]  ret_from_fork+0x22/0x30
Comment 12 Andreas 2021-01-11 16:54:55 UTC
Sorry wrong BUG channel. How can I remove my last comment above?
Comment 13 Andreas 2021-01-11 17:00:44 UTC
Wrong channel above, but the trigger was the same - simply 1) switch monitor off, 2) wakeup the system from standby/suspend and then 3) switch the monitor on -> (blank screen - no signal).
Comment 14 Shawn Anastasio 2021-01-27 00:10:42 UTC
I can confirm the same issue on 5.10.4 with a 2x 4K KVM setup on an AMD Radeon Pro WX 5100 (Talos II POWER9 host).
Comment 16 Shawn Anastasio 2021-02-04 04:49:20 UTC
Created attachment 295063 [details]
kernel log

I have encountered the same issue on 5.10.10 which contains the revert. Attached is my kernel log after switching inputs through my KVM.

System specifications:

GPU: AMD Radeon WX 5100
CPU: IBM POWER9 8-core (x2)
Kernel: 5.10.10
Comment 17 Jan Klos 2021-03-06 13:33:24 UTC
I don't think this is limited just to KVMs and such. On my Vega 64 + 5.11.3-arch1 (had the same problem with 5.11.2, 5.10 etc.), sometimes, when I return to PC after a while and my 2 monitors are sleeping, only one monitor wakes up, the other one remains in sleep mode. Dmesg shows this:

[bře 6 14:22] [drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
[bře 6 14:23] [drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
[  +0,473352] [drm] perform_link_training_with_retries: Link training attempt 3 of 4 failed
[  +0,437753] [drm] enabling link 0 failed: 15
[  +0,432276] [drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
[  +0,405827] [drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
[  +0,476172] [drm] perform_link_training_with_retries: Link training attempt 3 of 4 failed
[  +0,415466] [drm] enabling link 1 failed: 15

Switching to terminal and waiting a sec or two makes both monitors work, switching back to X11 and everything is OK. It seems to me that there might be some kind of bug where if the first link training attempt fails, the subsequent ones ALWAYS fail as well, so there is actually only a single actual link training attempt that has a chance to succeed.
Comment 18 Alex Deucher 2021-03-08 21:48:08 UTC
The original issue reported here was fixed.  If you are having other issues, please open new bugs.

Note You need to log in before you can comment on or make changes to this bug.