Bug 206519 - [amdgpu] kernel NULL pointer dereference on shutdown when CONFIG_DRM_AMD_DC_HDCP=y
Summary: [amdgpu] kernel NULL pointer dereference on shutdown when CONFIG_DRM_AMD_DC_H...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-13 16:07 UTC by Shlomo
Modified: 2020-04-07 17:20 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.5.1.arch1-1, 5.5.3-arch1-1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
shutdown screen photo (1018.59 KB, image/jpeg)
2020-02-13 16:07 UTC, Shlomo
Details
dmesg after boot (82.90 KB, text/plain)
2020-02-13 16:09 UTC, Shlomo
Details
possible fix (2.34 KB, patch)
2020-02-18 18:25 UTC, Alex Deucher
Details | Diff

Description Shlomo 2020-02-13 16:07:48 UTC
Created attachment 287353 [details]
shutdown screen photo

When I try to power off my machine, it shows the usual shutdown messages and the screens turn off, but the machine is still powered on. The virtual console shows a kernel NULL pointer dereference at address 0.

I run Arch Linux.

The bug occurs even if I never run X. I can turn on the machine and immediately try to shut it down, and the same bug still occurs.

This bug occurred since I upgraded linux 5.4.15.arch1-1 to 5.5.1.arch1-1. I now run linux 5.5.3.arch1-1 and the bug still exists.

My graphics card is Gigabyte Radeon RX VEGA 56 GAMING OC 8G, connected to six monitors.

A photo of the screen at shutdown is attached. I think these are the relevant lines for this bug:

BUG: kernel NULL pointer dereference, address: 0 [...]
RIP: 0010:queue_work_on+0x17/0x40
Code: fd ff ff 44 89 e0 5d 41 5c c3 [...]
Call Trace:
 handle_hpd_rx_irq+0x26e/0x320 [amdgpu]
 ? _raw_spin_unlock_irq+0x1d/0x30
 dm_irq_work_func+0x49/0x60 [amdgpu]
 process_one_work+0x1e1/0x3d0
 [...]
Comment 1 Shlomo 2020-02-13 16:09:02 UTC
Created attachment 287355 [details]
dmesg after boot
Comment 2 Alex Deucher 2020-02-13 16:20:46 UTC
Can you bisect?
Comment 3 Shlomo 2020-02-14 11:54:39 UTC
The bug first occurs in Arch Linux 5.5.arch1-1, which set CONFIG_DRM_AMD_DC_HDCP=y [1].

Arch Linux 5.4.15.arch1-1 is good.
Arch Linux 5.4.15.arch1-1 with CONFIG_DRM_AMD_DC_HDCP=y set (and no other changes) is bad.

Arch Linux 5.5.arch1-1 (and later) is bad. (CONFIG_DRM_AMD_DC_HDCP=y is set)

Testing the most recent Arch Linux kernel shows the same:
Arch Linux 5.5.3.arch1 is bad.
Arch Linux 5.5.3.arch1 with CONFIG_DRM_AMD_DC_HDCP unset is good.

This means that this bug was triggered by changes to the config, not kernel changes, so I don't know if this is a regression or not.

[1] https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/linux&id=019514c4cdff26addfd49db8a78a857cb03994d9
Comment 4 Shlomo 2020-02-18 17:19:41 UTC
I bisected the bug.

The first bad commit is 96a3b32e67236f547cc8acd69d5a3cef125b2295 (drm/amd/display: only enable HDCP for DCN+) with ea268870d6f548d0661e896e9746673210c1fa79 (drm/amd/display: Add hdcp to Kconfig) cherry-picked on top of it.

(The previous commit da3fd7ac0bcf372cc57117bdfcd725cca7ef975a with ea268870d6f548d0661e896e9746673210c1fa79 cherry-picked on top of it is good.)

The call trace for this bug is the same as I posted above.
Comment 5 Alex Deucher 2020-02-18 18:25:55 UTC
Created attachment 287487 [details]
possible fix

I think this patch should fix it.
Comment 6 Shlomo 2020-02-19 08:32:21 UTC
Yes, this fixes the bug.

I applied your patch over linux v5.5, but I first had to modify it so it would apply:
-                       drm_connector_attach_content_protection_property(&aconnector->base, true);
+                       drm_connector_attach_content_protection_property(&aconnector->base, false);
Comment 7 Shlomo 2020-04-07 17:20:14 UTC
Confirmed fixed on Arch linux 5.6.2-arch1-2. Thanks.

Note You need to log in before you can comment on or make changes to this bug.