Bug 211649

Summary: "drm/amd/display: reuse current context instead of recreating one" cause hdmi hotplug blackscreen on amdgpu
Product: Drivers Reporter: youling257
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: alexdeucher, eric, eseifert, nicholas.kazlauskas, pmenzel+bugzilla.kernel.org, stefan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.11-rc7 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg.txt
amdgpu_dm_dtn_log
dmesg.txt
trace.txt

Description youling257 2021-02-09 06:51:06 UTC
5.11-rc7 kernel,amdgpu 3400g, plug out hdmi, replug in hdmi, blackscreen.

the bad commit is 8866a67 drm/amd/display: reuse current context instead of recreating one.
Comment 1 Alex Deucher 2021-02-10 14:35:09 UTC
Please attach your xorg log (if using X) and your dmesg output.
Comment 2 Nicholas Kazlauskas 2021-02-10 14:42:03 UTC
Please reproduce the blackscreen issue and attach the following logs/information:

- dmesg.log (with drm.debug=6 set as part of your modeline)
- Xorg.0.log if using X
- The name of your desktop environment - eg. GNOME, KDE, etc.

I'm unable to reproduce this locally on my end and it doesn't seem to have applied generically to DCN21/DCN30 per Dan's test report.
Comment 3 youling257 2021-02-10 16:14:44 UTC
replug hdmi, [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:67:crtc-0] flip_done timed out


[  252.377374] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:67:crtc-0] flip_done timed out

[  262.617295] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:67:crtc-0] flip_done timed out

[  270.268771] [drm] amdgpu_dm_irq_schedule_work FAILED src 2
[  272.857440] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:55:plane-3] flip_done timed out
[  272.906034] ------------[ cut here ]------------
[  272.906037] WARNING: CPU: 0 PID: 15 at amdgpu_dm_atomic_commit_tail+0x2582/0x25f0 [amdgpu]
[  272.906404] Modules linked in: nct6775 hwmon_vid ccm binfmt_misc overlay exportfs kvm_amd kvm crc32c_intel crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore snd_intel_dspcfg amdgpu drm_kms_helper cec fb_sys_fops syscopyarea sysfillrect sysimgblt gpu_sched drm_ttm_helper ttm drm ccp rng_core hid_multitouch btusb iwlmvm btrtl btintel btbcm bluetooth ecdh_generic ecc mac80211 libarc4 iwlwifi cfg80211 igb i2c_algo_bit k10temp gpio_amdpt gpio_generic efi_pstore sdcardfs
[  272.906461] CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G        W         5.11.0-rc5-android-x86_64+ #1
[  272.906466] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Gaming-ITX/ac, BIOS P4.10 07/08/2020
[  272.906468] Workqueue: events dm_irq_work_func [amdgpu]
[  272.906826] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2582/0x25f0 [amdgpu]
[  272.907181] Code: a0 fd ff ff 01 c7 85 9c fd ff ff 37 00 00 00 c7 85 a4 fd ff ff 20 00 00 00 e8 5a 30 13 00 e9 a0 fa ff ff 0f 0b e9 f0 f8 ff ff <0f> 0b e9 3f f9 ff ff 0f 0b 0f 0b e9 55 f9 ff ff 49 8b 06 41 0f b6
[  272.907185] RSP: 0018:ffffa0a8000a3ab8 EFLAGS: 00210002
[  272.907189] RAX: 0000000000000002 RBX: 0000000000000003 RCX: ffff996d91211918
[  272.907191] RDX: 0000000000000001 RSI: 0000000000200297 RDI: ffff996d510a0178
[  272.907194] RBP: ffffa0a8000a3db8 R08: 0000000000000005 R09: 0000000000000000
[  272.907196] R10: ffffa0a8000a3a18 R11: ffffa0a8000a3a1c R12: 0000000000200287
[  272.907198] R13: ffff996d6c35d800 R14: ffff996d91211800 R15: ffff996d6f4d3600
[  272.907200] FS:  0000000000000000(0000) GS:ffff996e56e00000(0000) knlGS:0000000000000000
[  272.907203] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  272.907206] CR2: 00000000144c2000 CR3: 000000010e262000 CR4: 00000000001506f0
[  272.907209] Call Trace:
[  272.907219]  commit_tail+0x94/0x130 [drm_kms_helper]
[  272.907251]  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
[  272.907280]  dm_restore_drm_connector_state+0xef/0x170 [amdgpu]
[  272.907635]  handle_hpd_irq+0x118/0x150 [amdgpu]
[  272.907990]  dm_irq_work_func+0x49/0x60 [amdgpu]
[  272.908346]  process_one_work+0x1c1/0x3c0
[  272.908353]  worker_thread+0x4d/0x3d0
[  272.908357]  ? rescuer_thread+0x3b0/0x3b0
[  272.908362]  kthread+0x133/0x150
[  272.908367]  ? kthread_create_worker_on_cpu+0x70/0x70
[  272.908373]  ret_from_fork+0x22/0x30
[  272.908379] ---[ end trace 9e7a0ad9c8574ed6 ]---
[  272.908402] ------------[ cut here ]------------
[  272.908403] WARNING: CPU: 0 PID: 15 at amdgpu_dm_atomic_commit_tail+0x258b/0x25f0 [amdgpu]
[  272.908757] Modules linked in: nct6775 hwmon_vid ccm binfmt_misc overlay exportfs kvm_amd kvm crc32c_intel crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore snd_intel_dspcfg amdgpu drm_kms_helper cec fb_sys_fops syscopyarea sysfillrect sysimgblt gpu_sched drm_ttm_helper ttm drm ccp rng_core hid_multitouch btusb iwlmvm btrtl btintel btbcm bluetooth ecdh_generic ecc mac80211 libarc4 iwlwifi cfg80211 igb i2c_algo_bit k10temp gpio_amdpt gpio_generic efi_pstore sdcardfs
[  272.908803] CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G        W         5.11.0-rc5-android-x86_64+ #1
[  272.908807] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Gaming-ITX/ac, BIOS P4.10 07/08/2020
[  272.908809] Workqueue: events dm_irq_work_func [amdgpu]
[  272.909163] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x258b/0x25f0 [amdgpu]
[  272.909517] Code: ff ff 37 00 00 00 c7 85 a4 fd ff ff 20 00 00 00 e8 5a 30 13 00 e9 a0 fa ff ff 0f 0b e9 f0 f8 ff ff 0f 0b e9 3f f9 ff ff 0f 0b <0f> 0b e9 55 f9 ff ff 49 8b 06 41 0f b6 8e 2d 01 00 00 48 c7 c6 38
[  272.909521] RSP: 0018:ffffa0a8000a3ab8 EFLAGS: 00210086
[  272.909524] RAX: 0000000000000001 RBX: 0000000000000003 RCX: ffff996d91211918
[  272.909526] RDX: 0000000000000001 RSI: 0000000000200297 RDI: ffff996d510a0178
[  272.909528] RBP: ffffa0a8000a3db8 R08: 0000000000000005 R09: 0000000000000000
[  272.909530] R10: ffffa0a8000a3a18 R11: ffffa0a8000a3a1c R12: 0000000000200287
[  272.909532] R13: ffff996d6c35d800 R14: ffff996d91211800 R15: ffff996d6f4d3600
[  272.909534] FS:  0000000000000000(0000) GS:ffff996e56e00000(0000) knlGS:0000000000000000
[  272.909537] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  272.909539] CR2: 00000000144c2000 CR3: 000000010e262000 CR4: 00000000001506f0
[  272.909541] Call Trace:
[  272.909549]  commit_tail+0x94/0x130 [drm_kms_helper]
[  272.909579]  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
[  272.909608]  dm_restore_drm_connector_state+0xef/0x170 [amdgpu]
[  272.909961]  handle_hpd_irq+0x118/0x150 [amdgpu]
[  272.910315]  dm_irq_work_func+0x49/0x60 [amdgpu]
[  272.910669]  process_one_work+0x1c1/0x3c0
[  272.910674]  worker_thread+0x4d/0x3d0
[  272.910678]  ? rescuer_thread+0x3b0/0x3b0
[  272.910682]  kthread+0x133/0x150
[  272.910687]  ? kthread_create_worker_on_cpu+0x70/0x70
[  272.910692]  ret_from_fork+0x22/0x30
[  272.910697] ---[ end trace 9e7a0ad9c8574ed7 ]---
Comment 4 youling257 2021-02-10 16:41:53 UTC
Created attachment 295189 [details]
dmesg.txt

echo 6 > /sys/module/drm/parameters/debug
replug hdmi dmesg.
Comment 5 Nicholas Kazlauskas 2021-02-10 16:57:30 UTC
It would be good to know what OS/Userspace this is, I don't think I've seen anything that has atomic check failures on nearly every flip, eg:

[  179.142898] [drm:amdgpu_dm_atomic_check [amdgpu]] Atomic check failed with err: -22

Black screen comes from the page-flip timeout, but it's hard to say what the root cause is.

When the screen is black can you cat the DTN log and then attach the result to this ticket?

cat /sys/kernel/debug/dri/0/amdgpu_dm_dtn_log

Thanks!
Comment 6 youling257 2021-02-10 17:28:07 UTC
Created attachment 295193 [details]
amdgpu_dm_dtn_log

I using amd 3400g running on androidx86 7 with mesa21.0

cat /sys/kernel/debug/dri/0/amdgpu_dm_dtn_log
Comment 7 Nicholas Kazlauskas 2021-02-10 18:29:09 UTC
That makes more sense then, that's the behavior I'd expect on CrOS/Android.

Also explains why we haven't seen this on X/Wayland based compositors like GNOME on Ubuntu.

One more log I'd like captured here, the trace log. Please record a log from before the black screen occurs until after.

I've included a snippet below that shows how to capture this log:

#!/bin/bash

set +x
OUTPUT_DIR=$(pwd)

mount -o remount,rw /
echo 4 > /sys/module/drm/parameters/debug
echo 65536 > /sys/kernel/debug/tracing/buffer_size_kb

cd /sys/kernel/debug/tracing/
echo amdgpu_dm* > set_ftrace_filter
echo function > current_tracer
echo 1 > events/amdgpu_dm/enable
echo 1 > tracing_on
echo > trace
echo "Capturing a trace...\n"
read -n 1
cat trace > $OUTPUT_DIR/trace.txt
echo "Saved trace to ${OUTPUT_DIR}/trace.txt"
dmesg > $OUTPUT_DIR/dmesg.txt
echo "Saved dmesg to ${OUTPUT_DIR}/dmesg.txt"
Comment 8 youling257 2021-02-10 19:35:47 UTC
(In reply to Nicholas Kazlauskas from comment #7)
> That makes more sense then, that's the behavior I'd expect on CrOS/Android.
> 
> Also explains why we haven't seen this on X/Wayland based compositors like
> GNOME on Ubuntu.
> 
> One more log I'd like captured here, the trace log. Please record a log from
> before the black screen occurs until after.
> 
> I've included a snippet below that shows how to capture this log:
> 
> #!/bin/bash
> 
> set +x
> OUTPUT_DIR=$(pwd)
> 
> mount -o remount,rw /
> echo 4 > /sys/module/drm/parameters/debug
> echo 65536 > /sys/kernel/debug/tracing/buffer_size_kb
> 
> cd /sys/kernel/debug/tracing/
> echo amdgpu_dm* > set_ftrace_filter
> echo function > current_tracer
> echo 1 > events/amdgpu_dm/enable
> echo 1 > tracing_on
> echo > trace
> echo "Capturing a trace...\n"
> read -n 1
> cat trace > $OUTPUT_DIR/trace.txt
> echo "Saved trace to ${OUTPUT_DIR}/trace.txt"
> dmesg > $OUTPUT_DIR/dmesg.txt
> echo "Saved dmesg to ${OUTPUT_DIR}/dmesg.txt"

run the script,only echo Capturing a trace...\n
failed output trace.txt and dmesg.txt.
Comment 9 youling257 2021-02-10 19:46:29 UTC
Created attachment 295197 [details]
dmesg.txt

I delete "read -n 1", can output dmesg.txt and trace.txt.
trace.txt,
# tracer: function
#
# entries-in-buffer/entries-written: 0/0   #P:8
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| /     delay
#           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
#              | |         |   ||||      |         |
Comment 10 Nicholas Kazlauskas 2021-02-10 19:51:53 UTC
I don't expect the trace log to be completely empty.

You'll need to leave the trace log running after enabling it as well (with tracing on/trace).

You'll also need various config options set in your kernel:

CONFIG_FUNCTION_TRACER
CONFIG_FUNCTION_GRAPH_TRACER
CONFIG_STACK_TRACER
CONFIG_DYNAMIC_FTRACE

Sorry if this is a little bit involved, but I'd like capture the exact sequence of atomic commits with the trace so I can reproduce the sequence on my end.

Thanks!
Comment 11 youling257 2021-02-10 19:55:15 UTC
Created attachment 295201 [details]
trace.txt

cat /sys/kernel/debug/tracing/trace
Comment 12 Nicholas Kazlauskas 2021-02-10 19:56:16 UTC
Thanks, this was what I'm looking for.
Comment 13 Alex Deucher 2021-02-15 15:22:23 UTC
See also:
https://gitlab.freedesktop.org/drm/amd/-/issues/1487
Comment 14 Eric Samuelson 2021-02-18 14:35:22 UTC
*** Bug 211821 has been marked as a duplicate of this bug. ***
Comment 16 Paul Menzel 2021-03-04 12:54:48 UTC
Can this bug be closed then?
Comment 17 Alex Deucher 2021-03-04 14:01:03 UTC
(In reply to Paul Menzel from comment #16)
> Can this bug be closed then?

Only the person that opens the bug can close it, but yes, it can be closed.
Comment 18 Stefan de Konink 2021-03-04 17:40:18 UTC
*** Bug 211997 has been marked as a duplicate of this bug. ***