Bug 178421 - [regression] Radeon Oops on shutdown
Summary: [regression] Radeon Oops on shutdown
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-19 16:30 UTC by Jouni Mettälä
Modified: 2016-11-17 15:34 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.9-rc1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
picture of panic (1.32 MB, image/jpeg)
2016-10-19 16:30 UTC, Jouni Mettälä
Details
recent picture of panic (1.14 MB, image/jpeg)
2016-10-24 13:20 UTC, Jouni Mettälä
Details

Description Jouni Mettälä 2016-10-19 16:30:55 UTC
Created attachment 241971 [details]
picture of panic

Between 4.8 and 4.9-rc1 shutdown stopped working without power button. There was blinking caps- and scroll lock leds on custom kernel. There was no blinking leds on 4.9.0-040900rc1-generic from ubuntu kernel ppa. But it didn't power off either. Suspend worked at least with custom kernel.

I tried to bisect. There was some uncertainity. Still bisect said c0d5fb4d0d9224ccaad0475c9b58740873970e7e is the first bad commit. I tried git revert -n -m 1 c0d5fb4d0d9224ccaad0475c9b58740873970e7e which gave Oops on screen after shutdown attemp. Picture is attached.
Comment 2 Kevin 2016-10-21 00:38:21 UTC
Will these fixes be added to kernel 4.8 in a future minor version? or will I have to wait till 4.9?

I cant shutdown since update to 4.8. reboot is ok though.


radeon 7870
arch linux at 4.8.3-ck
Comment 3 Jouni Mettälä 2016-10-24 13:20:52 UTC
Created attachment 242511 [details]
recent picture of panic
Comment 4 Jouni Mettälä 2016-10-24 13:27:59 UTC
I still get oops on 4.9-rc2. Picture is attached. It looks different than already fixed bug, for me at least.

Kevin, you have probably different bug. Reboot doesn't work for me. What is last known good kernel for you?
Comment 5 Michel Dänzer 2016-10-25 03:15:34 UTC
(In reply to Jouni Mettälä from comment #0)
> I tried to bisect. There was some uncertainity. Still bisect said
> c0d5fb4d0d9224ccaad0475c9b58740873970e7e is the first bad commit.

That's a pure merge commit, so the problem can't really have started at that commit. In order to avoid getting an incorrect bisection result again:

* Manually apply the patches referenced in comment 1 for each commit where they're not applied yet
* Test every commit multiple times before marking it as "good"
* Only mark commits as "bad" which show exactly the same symptoms, otherwise "skip" (for commits which fail to shut down / reboot for other reasons) or "good"
Comment 6 Jouni Mettälä 2016-10-30 11:01:58 UTC
With patches referenced in comment 1, bisect pointed to 6b25e21fa6f26d0f0d45f161d169029411c84286
Merge tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux

I was still bit unsure but after removing these shutdown seems to work.

@@ -362,6 +361,17 @@ radeon_pci_remove(struct pci_dev *pdev)
 	drm_put_dev(dev);
 }
 
+static void
+radeon_pci_shutdown(struct pci_dev *pdev)
+{
+	/* if we are running in a VM, make sure the device
+	 * torn down properly on reboot/shutdown.
+	 * unfortunately we can't detect certain
+	 * hypervisors so just do this all the time.
+	 */
+	radeon_pci_remove(pdev);
+}
+
 static int radeon_pmops_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);


@@ -574,6 +588,7 @@ static struct pci_driver radeon_kms_pci_driver = {
 	.id_table = pciidlist,
 	.probe = radeon_pci_probe,
 	.remove = radeon_pci_remove,
+	.shutdown = radeon_pci_shutdown,
 	.driver.pm = &radeon_pm_ops,
 };

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/drivers/gpu/drm/radeon/radeon_drv.c?id=6b25e21fa6f26d0f0d45f161d169029411c84286

During bisect some kernels failed to boot with initrd and UUID. Without initrd they booted but still didn't shutdown.
Comment 7 Alex Deucher 2016-10-30 16:11:47 UTC
As per comment 1, shutdown was working even with the commit that adds the shutdown callback earlier in the cycle.  Some other change appears to have regressed it.
Comment 8 Borislav Petkov 2016-10-30 23:49:52 UTC
What about this one:

https://lkml.kernel.org/r/3b57a593-776b-b008-a5f2-672b9343f18a@lwfinger.net

I don't know the code so I can't say whether testing the ->ddc_bus ptr is the proper fix...
Comment 9 Alex Deucher 2016-10-31 16:27:12 UTC
Yes, that should fix it.  All of the board I tested with had a ddc bus on all connectors so it never came up before.  This only triggers on really old boards with TV connectors which don't have a DDC bus for those connectors.
Comment 10 Borislav Petkov 2016-10-31 21:52:43 UTC
Yap, it does. Just did two suspend-to-disk runs.

Thanks.
Comment 11 Jouni Mettälä 2016-11-17 15:34:16 UTC
This bug is fixed between 4.9-rc4 and 4.9-rc5.

Note You need to log in before you can comment on or make changes to this bug.