Bug 215648

Summary: amdgpu: Changing monitor configuration (plug/unplug/wake from DPMS) causes kernel panic
Product: Drivers Reporter: Philipp Riederer (pr_kernel)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED UNREPRODUCIBLE    
Severity: normal CC: alexdeucher
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.15.12 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg from boot using kernel 5.15.12@94ba5b0fb52d6dbf1f200351876a839afb74aedd

Description Philipp Riederer 2022-03-01 14:32:15 UTC
Hi!

My Lenovo T14s (AMD) crashes with a panic (https://imgur.com/a/P6Twvov) when I unplug/replug any monitor. This also happens when an external display wakes from DPMS.

I have bisected the issue to the same 0f591d17e36e08313b0c440b99b0e57b47e01a9a as Jose Mestre did in #215511. The patch proposed there (that is already mainlined, if I see that correctly) does not help. Alex Deucher asked me to open this as a new bug.

I have tried all kernel up to 5.15.24 -- I cannot try 5.16 as I use zfs as root device the and zfs module is not (yet) compatible with 5.16.

Is there anything you would like me to try or should my issue be fixed in 5.16+?

Cheers,
Philipp
Comment 1 Alex Deucher 2022-03-01 15:06:08 UTC
Can you attach your full dmesg output when the issue happens?  Is it actually a segfault or just a warning?
Comment 2 Philipp Riederer 2022-03-02 08:57:53 UTC
Hey,

this is the log I could recover:

> <4>[   70.829010] RSP: 0018:ffffad060ad67838 EFLAGS: 00000202
> <4>[   70.829013] RAX: 0000000000000000 RBX: ffff92c82ff28000 RCX:
> 00000000000001f5
> <4>[   70.829014] RDX: ffffffffc2fc3970 RSI: ffffffffc3047f09 RDI:
> 0000000000000000
> <4>[   70.829014] RBP: 0000000000000012 R08: 0000000000000000 R09:
> ffffad060ad67610
> <4>[   70.829015] R10: ffffad060ad67608 R11: ffffffff9b743bc8 R12:
> ffff92c827b30c00
> <4>[   70.829016] R13: ffff92c7e0b35000 R14: ffff92c82c8c0000 R15:
> ffff92c82c8c0b58
> <4>[   70.829017] FS:  0000000000000000(0000) GS:ffff92cabf900000(0000)
> knlGS:0000000000000000
> <4>[   70.829018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[   70.829018] CR2: 00007f2afe48ef20 CR3: 000000014a5b4000 CR4:
> 0000000000350ee0
> <0>[   70.829020] Kernel panic - not syncing: Fatal exception in interrupt
> <0>[   70.829047] Kernel Offset: 0x19000000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> <4>[   70.741431] ---[ end trace d042cf4ec67f5116 ]---
> <4>[   70.829001] RIP: 0010:kgdb_breakpoint+0x10/0x20
> <4>[   70.829009] Code: c0 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 c0 c3
> 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f0 ff 05 fc f9 f3 01 0f ae f8 cc <0f>
> ae f8 f0 ff 0d ee f9 f3 01 c3 0f 1f 44 00 00 0f 1f 44 00 00 48
> <4>[   70.702654]  videodev snd_hda_core ecdh_generic drm_kms_helper ecc
> snd_hwdep cdc_acm joydev mc thinkpad_acpi sp5100_tco snd_rn_pci_acp3x
> serio_raw efi_pstore k10temp cfg80211 snd_pcm i2c_piix4 nvram cec rtsx_pci
> snd_pci_acp3x ccp snd_timer rc_core ipmi_devintf ledtrig_audio ucsi_acpi
> platform_profile mfd_core ipmi_msghandler typec_ucsi snd roles mac_hid typec
> soundcore rfkill wmi video i2c_designware_platform i2c_scmi pinctrl_amd
> amd_pmc i2c_designware_core acpi_cpufreq vboxnetadp(OE) vboxnetflt(OE) nfsd
> auth_rpcgss vboxdrv(OE) drm nfs_acl lockd grace backlight fuse i2c_core
> configfs sunrpc bpf_preload efivarfs zfs(POE) zunicode(POE) zzstd(OE)
> zlua(OE) zavl(POE) icp(POE) crc32_pclmul crc32c_intel zcommon(POE)
> znvpair(POE) spl(OE) xhci_pci xhci_pci_renesas aesni_intel r8169 crypto_simd
> realtek cryptd xhci_hcd mdio_devres ehci_pci libphy ehci_hcd
> <4>[   70.702619]  ? set_kthread_struct+0x40/0x40
> <4>[   70.702620]  ret_from_fork+0x22/0x30
> <4>[   70.702623]  </TASK>
> <4>[   70.702623] Modules linked in: ccm xt_mark xt_comment tun xt_CHECKSUM
> xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle
> ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter ip_tables
> bridge stp llc cmac algif_skcipher bnep intel_rapl_msr intel_rapl_common
> snd_acp3x_rn snd_soc_dmic snd_acp3x_pdm_dma snd_soc_core snd_compress
> rtsx_pci_sdmmc edac_mce_amd snd_pcm_dmaengine mmc_core ac97_bus wmi_bmof
> kvm_amd snd_ctl_led iwlmvm snd_hda_codec_realtek amdgpu mac80211
> snd_hda_codec_generic kvm snd_hda_codec_hdmi uvcvideo btusb libarc4 btrtl
> irqbypass btbcm drm_ttm_helper cdc_ether snd_hda_intel btintel
> videobuf2_vmalloc crct10dif_pclmul videobuf2_memops ttm snd_intel_dspcfg
> snd_usb_audio usbnet snd_intel_sdw_acpi ghash_clmulni_intel videobuf2_v4l2
> iommu_v2 snd_usbmidi_lib gpu_sched bluetooth iwlwifi snd_hda_codec r8152
> videobuf2_common snd_rawmidi rapl i2c_algo_bit mii snd_seq_device
> <4>[   70.702134]  ? dc_validate_global_state+0x321/0x3c0 [amdgpu]
> <4>[   70.702259]  ? dm_plane_helper_prepare_fb+0x231/0x2b0 [amdgpu]
> <4>[   70.702382]  ? __cond_resched+0x16/0x40
> <4>[   70.702385]  ? __wait_for_common+0x3b/0x160
> <4>[   70.702386]  ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
> <4>[   70.702390]  commit_tail+0x94/0x120 [drm_kms_helper]
> <4>[   70.702401]  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
> <4>[   70.702412]  drm_client_modeset_commit_atomic+0x1e4/0x220 [drm]
> <4>[   70.702430]  drm_client_modeset_commit_locked+0x56/0x150 [drm]
> <4>[   70.702444]  drm_client_modeset_commit+0x24/0x40 [drm]
> <4>[   70.702458]  drm_fb_helper_set_par+0xa5/0xd0 [drm_kms_helper]
> <4>[   70.702468]  drm_fb_helper_hotplug_event.part.0+0xa8/0xc0
> [drm_kms_helper]
> <4>[   70.702476]  drm_kms_helper_hotplug_event+0x26/0x30 [drm_kms_helper]
> <4>[   70.702486]  handle_hpd_irq+0x12b/0x160 [amdgpu]
> <4>[   70.702611]  process_one_work+0x1f1/0x390
> <4>[   70.702614]  worker_thread+0x53/0x3e0
> <4>[   70.702615]  ? process_one_work+0x390/0x390
> <4>[   70.702617]  kthread+0x127/0x150
> <4>[   70.701096] RBP: 0000000000000012 R08: 0000000000000000 R09:
> ffffad060ad67610
> <4>[   70.701096] R10: ffffad060ad67608 R11: ffffffff9b743bc8 R12:
> ffff92c827b30c00
> <4>[   70.701097] R13: ffff92c7e0b35000 R14: ffff92c82c8c0000 R15:
> ffff92c82c8c0b58
> <4>[   70.701098] FS:  0000000000000000(0000) GS:ffff92cabf900000(0000)
> knlGS:0000000000000000
> <4>[   70.701099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[   70.701099] CR2: 00007f2afe48ef20 CR3: 000000003f610000 CR4:
> 0000000000350ee0
> <4>[   70.701100] Call Trace:
> <4>[   70.701102]  <TASK>
> <4>[   70.701103]  mpc2_assert_idle_mpcc+0xaf/0xf0 [amdgpu]
> <4>[   70.701242]  dcn20_program_front_end_for_ctx+0x70f/0xd40 [amdgpu]
> <4>[   70.701373]  dc_commit_state+0x49c/0xa60 [amdgpu]
> <4>[   70.701507]  amdgpu_dm_atomic_commit_tail+0x55c/0x2630 [amdgpu]
> <4>[   70.701635]  ? dcn21_validate_bandwidth_fp+0x109/0x700 [amdgpu]
> <4>[   70.701760]  ? kfree+0xba/0x400
> <4>[   70.701764]  ? dcn21_validate_bandwidth_fp+0xc1/0x700 [amdgpu]
> <4>[   70.701886]  ? dc_fpu_end+0x70/0x80 [amdgpu]
> <4>[   70.702010]  ? dcn21_validate_bandwidth+0x44/0x50 [amdgpu]

The system is locked down hard after this.
Comment 3 Alex Deucher 2022-03-02 14:06:14 UTC
Thanks.  Can you get the dmesg output from boot prior to the hang?
Comment 4 Philipp Riederer 2022-03-02 14:11:07 UTC
Created attachment 300517 [details]
dmesg from boot using kernel 5.15.12@94ba5b0fb52d6dbf1f200351876a839afb74aedd
Comment 5 Philipp Riederer 2022-03-02 14:12:18 UTC
Hey,

Thank you for working on this!

I added the dmesg as attachment to the bug. Please note that this is from the working kernel (commit 94ba5b0fb52d6dbf1f200351876a839afb74aedd) as that is the one I have running now. If it helps, I can also provide the messages from a non-working kernel later.

Cheers,
Philipp
Comment 6 Philipp Riederer 2022-03-11 16:34:19 UTC
I can no longer reproduce this issue with 5.16.14.

Sorry for the noise.

Regards,
Philipp