Bug 206519

Summary: [amdgpu] kernel NULL pointer dereference on shutdown when CONFIG_DRM_AMD_DC_HDCP=y
Product: Drivers Reporter: Shlomo (shlomo)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: alexdeucher
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.5.1.arch1-1, 5.5.3-arch1-1 Subsystem:
Regression: No Bisected commit-id:
Attachments: shutdown screen photo
dmesg after boot
possible fix

Description Shlomo 2020-02-13 16:07:48 UTC
Created attachment 287353 [details]
shutdown screen photo

When I try to power off my machine, it shows the usual shutdown messages and the screens turn off, but the machine is still powered on. The virtual console shows a kernel NULL pointer dereference at address 0.

I run Arch Linux.

The bug occurs even if I never run X. I can turn on the machine and immediately try to shut it down, and the same bug still occurs.

This bug occurred since I upgraded linux 5.4.15.arch1-1 to 5.5.1.arch1-1. I now run linux 5.5.3.arch1-1 and the bug still exists.

My graphics card is Gigabyte Radeon RX VEGA 56 GAMING OC 8G, connected to six monitors.

A photo of the screen at shutdown is attached. I think these are the relevant lines for this bug:

BUG: kernel NULL pointer dereference, address: 0 [...]
RIP: 0010:queue_work_on+0x17/0x40
Code: fd ff ff 44 89 e0 5d 41 5c c3 [...]
Call Trace:
 handle_hpd_rx_irq+0x26e/0x320 [amdgpu]
 ? _raw_spin_unlock_irq+0x1d/0x30
 dm_irq_work_func+0x49/0x60 [amdgpu]
 process_one_work+0x1e1/0x3d0
 [...]
Comment 1 Shlomo 2020-02-13 16:09:02 UTC
Created attachment 287355 [details]
dmesg after boot
Comment 2 Alex Deucher 2020-02-13 16:20:46 UTC
Can you bisect?
Comment 3 Shlomo 2020-02-14 11:54:39 UTC
The bug first occurs in Arch Linux 5.5.arch1-1, which set CONFIG_DRM_AMD_DC_HDCP=y [1].

Arch Linux 5.4.15.arch1-1 is good.
Arch Linux 5.4.15.arch1-1 with CONFIG_DRM_AMD_DC_HDCP=y set (and no other changes) is bad.

Arch Linux 5.5.arch1-1 (and later) is bad. (CONFIG_DRM_AMD_DC_HDCP=y is set)

Testing the most recent Arch Linux kernel shows the same:
Arch Linux 5.5.3.arch1 is bad.
Arch Linux 5.5.3.arch1 with CONFIG_DRM_AMD_DC_HDCP unset is good.

This means that this bug was triggered by changes to the config, not kernel changes, so I don't know if this is a regression or not.

[1] https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/linux&id=019514c4cdff26addfd49db8a78a857cb03994d9
Comment 4 Shlomo 2020-02-18 17:19:41 UTC
I bisected the bug.

The first bad commit is 96a3b32e67236f547cc8acd69d5a3cef125b2295 (drm/amd/display: only enable HDCP for DCN+) with ea268870d6f548d0661e896e9746673210c1fa79 (drm/amd/display: Add hdcp to Kconfig) cherry-picked on top of it.

(The previous commit da3fd7ac0bcf372cc57117bdfcd725cca7ef975a with ea268870d6f548d0661e896e9746673210c1fa79 cherry-picked on top of it is good.)

The call trace for this bug is the same as I posted above.
Comment 5 Alex Deucher 2020-02-18 18:25:55 UTC
Created attachment 287487 [details]
possible fix

I think this patch should fix it.
Comment 6 Shlomo 2020-02-19 08:32:21 UTC
Yes, this fixes the bug.

I applied your patch over linux v5.5, but I first had to modify it so it would apply:
-                       drm_connector_attach_content_protection_property(&aconnector->base, true);
+                       drm_connector_attach_content_protection_property(&aconnector->base, false);
Comment 7 Shlomo 2020-04-07 17:20:14 UTC
Confirmed fixed on Arch linux 5.6.2-arch1-2. Thanks.