Bug 113861

Summary: [radeon] Xorg fatal freeze upon startx
Product: Drivers Reporter: centuryplague
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: high CC: alexdeucher, martin.hamrle, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.4.4, 4.1.18 LTS Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg for radeon xorg lockup upon startx

Description centuryplague 2016-03-07 06:53:52 UTC
Source of bug report:
https://bugs.archlinux.org/task/48468
https://bbs.archlinux.org/viewtopic.php?id=209576

When startx is used to start Xorg on an Arch system booted to command line with radeon module, Xorg process is fully frozen and cannot be killed or terminated, with message:
"waiting for X server to begin accepting connections...
...
..."
etc.
(CTRL-C 500 times to stop the polling, Xorg wrapper is unresponsive to kill -9)

This is happening on both 4.1.18 LTS and 4.4.4 (latter is vanilla + grsec). It is not happening on 4.4.3 from distribution (but pure vanilla not tried).

Does not occur if radeon is blacklisted and VESA used instead.

Two reports so far. 
Mine is switchable :
[AMD/ATI] RS880M [Mobility Radeon HD 4225/4250]
[AMD/ATI] Whistler [Radeon HD 6630M/6650M/6750M/7670M/7690M]
Forum user:
[AMD/ATI] Mars [Radeon HD 8730M]
... (see link)

Forum user recorded a stack trace (from his link https://gist.github.com/Bernolt/ae162d69444f9937610d):

[<ffffffff813f2151>] rpm_resume+0x171/0x660
[<ffffffff813f267f>] __pm_runtime_resume+0x3f/0x60
[<ffffffffa06b05c4>] radeon_dp_detect+0x64/0x2e0 [radeon]
[<ffffffffa040c20a>] drm_helper_hpd_irq_event+0xaa/0x140 [drm_kms_helper]
[<ffffffffa0688bec>] radeon_resume_kms+0x22c/0x420 [radeon]
[<ffffffffa0686143>] radeon_pmops_runtime_resume+0x73/0xb0 [radeon]
[<ffffffff8130517f>] pci_pm_runtime_resume+0x7f/0xc0
[<ffffffff813df139>] vga_switcheroo_runtime_resume+0x39/0x40
[<ffffffff813f10a4>] __rpm_callback+0x34/0x90
[<ffffffff813f1128>] rpm_callback+0x28/0x90
[<ffffffff813f2439>] rpm_resume+0x459/0x660
[<ffffffff813f267f>] __pm_runtime_resume+0x3f/0x60
[<ffffffffa068b016>] radeon_driver_open_kms+0x36/0x1d0 [radeon]
[<ffffffffa035b7ef>] drm_open+0x1af/0x4c0 [drm]
[<ffffffffa0362689>] drm_stub_open+0xa9/0x120 [drm]
[<ffffffff811dc0e6>] chrdev_open+0xb6/0x1b0
[<ffffffff811d50a7>] do_dentry_open+0x227/0x330
[<ffffffff811d62e6>] vfs_open+0x56/0x60
[<ffffffff811e5bf4>] do_last.isra.11+0x344/0xf50
[<ffffffff811e6891>] path_openat+0x91/0x670
[<ffffffff811e82f9>] do_filp_open+0x49/0xd0
[<ffffffff811d66ed>] do_sys_open+0x14d/0x250
[<ffffffff811d680e>] SyS_open+0x1e/0x20
[<ffffffff8158256e>] system_call_fastpath+0x12/0x71
[<ffffffffffffffff>] 0xffffffffffffffff
Comment 1 centuryplague 2016-03-07 06:58:50 UTC
Note there are NO Xorg.0.log produced, full freeze.
Comment 2 centuryplague 2016-03-07 07:16:28 UTC
"It is not happening on 4.4.3 from distribution (but pure vanilla not tried)."
(but note 4.4.2 was tried which was vanilla + grsec, and had no problems... so likely a change in 4.4.4 exactly triggers this bug)
Comment 3 Alex Deucher 2016-03-07 16:30:37 UTC
Please attach your dmesg output and xorg log.
Comment 4 centuryplague 2016-03-07 17:13:52 UTC
Created attachment 207961 [details]
dmesg for radeon xorg lockup upon startx
Comment 5 centuryplague 2016-03-07 17:16:21 UTC
I've attached my dmesg run under 4.1.18 LTS. This was a boot under a regular user (user doesn't matter, happens with root), and first command was "startx", which polls eternally, the CTRL+C 500 times, then saved dmesg output.

There is no xorg log produced in any location.
Comment 6 Michel Dänzer 2016-03-08 03:56:38 UTC
Does radeon.runpm=0 on the kernel command line avoid the problem?

Can you bisect?
Comment 7 Alex Deucher 2016-03-08 05:45:47 UTC
Probably the same issue as reported here:
https://lists.freedesktop.org/archives/dri-devel/2016-March/102379.html
Does reverting dbb17a21c131eca94eb31136eee9a7fe5aff00d9 fix it?
Comment 8 centuryplague 2016-03-08 06:21:40 UTC
(In reply to Michel Dänzer from comment #6)
> Does radeon.runpm=0 on the kernel command line avoid the problem?

Yes, I tried moments ago and radeon.runpm=0 allows x to start! Had no tried that.

(In reply to Alex Deucher from comment #7)
> Probably the same issue as reported here:
> https://lists.freedesktop.org/archives/dri-devel/2016-March/102379.html
> Does reverting dbb17a21c131eca94eb31136eee9a7fe5aff00d9 fix it?

That sounds exactly the same. I can't compile tonight and may not get it to work for a couple days (non standard setup). I'll mention to the other guy.
Comment 9 centuryplague 2016-03-08 19:03:34 UTC
(In reply to Alex Deucher from comment #7)
> Probably the same issue as reported here:
> https://lists.freedesktop.org/archives/dri-devel/2016-March/102379.html
> Does reverting dbb17a21c131eca94eb31136eee9a7fe5aff00d9 fix it?

Went a bit out my way to test this faster. I can now confirm reverting this commit on a 4.4.4-based kernel (arch linux-grsec 4.4.4.201603032158-1-grsec) allows startx to start again on my (non-standard) system. I don't know about the other guy.

Thanks I will report on the arch bug. Will this be in 4.4.5?
Comment 12 martin.hamrle 2016-03-13 17:34:56 UTC
I had the same issue and latest torvalds master (that contains reverted patch) fixed my problem.
Comment 13 centuryplague 2016-03-20 18:44:29 UTC
This is now fixed in 4.4.6 (though not 4.1.20 yet), thank you.