Bug 88481

Summary: Kernel oops leaves system without graphical output [radeon]
Product: Drivers Reporter: Daniel Otero (daniel.otero)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: high CC: adamw, alexdeucher, amerikapsn, cfroemmel
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: v3.17.3 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg output
kernel config
complete dmesg when X starts and HDMI (with audio) is connected, dpm enabled
lspci -nn

Description Daniel Otero 2014-11-19 01:33:32 UTC
Created attachment 158101 [details]
dmesg output

When trying to start the X server through the the login manager, the radeon module seems to crash leaving the system without graphical output (but otherwise working). This happens every time as soon as PC boots.

Going back to v3.17.2 fixes the issue.

This seems to be relevant dmesg part:

[    2.640374] BUG: unable to handle kernel paging request at ffffec2000000900
[    2.640419] IP: [<ffffffff811ac4e6>] kfree+0x56/0x1a0
[    2.640449] PGD 0 
[    2.640462] Oops: 0000 [#1] PREEMPT SMP 
[    2.640488] Modules linked in: [...]
[    2.641120] CPU: 0 PID: 287 Comm: Xorg.bin Not tainted 3.17.3-1-ARCH #1
[    2.641152] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q77M-D2H, BIOS F2 12/20/2012
[    2.641197] task: ffff8800dc131420 ti: ffff8800dc168000 task.ti: ffff8800dc168000
[    2.641232] RIP: 0010:[<ffffffff811ac4e6>]  [<ffffffff811ac4e6>] kfree+0x56/0x1a0
[    2.641269] RSP: 0018:ffff8800dc16ba30  EFLAGS: 00010286
[    2.641294] RAX: 0000022000000900 RBX: 0000100000024414 RCX: 0000000000010005
[    2.641328] RDX: 000077ff80000000 RSI: 0000000000000005 RDI: 0000100000024414
[    2.641361] RBP: ffff8800dc16ba48 R08: 0000000000000005 R09: ffffec2000000900
[    2.641393] R10: 0000000000000010 R11: 0000000000000000 R12: ffff8800dd430800
[    2.641426] R13: ffffffffa0736e8c R14: 00000000000120f0 R15: 0000000000001800
[    2.641460] FS:  00007f3efe93e8c0(0000) GS:ffff88021dc00000(0000) knlGS:0000000000000000
[    2.641497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.641524] CR2: ffffec2000000900 CR3: 000000020e2e2000 CR4: 00000000001407f0
[    2.641556] Stack:
[    2.641567]  ffff8800dd9d4000 ffff8800dd430800 ffff8800dd9d4000 ffff8800dc16bb30
[    2.641607]  ffffffffa0736e8c 000000000000004c 000120480001212c ffff880000000008
[    2.641648]  0000ffff00012044 000000000001212c 0000000000012048 0000000000012044
[    2.641688] Call Trace:
[    2.641721]  [<ffffffffa0736e8c>] evergreen_hdmi_setmode+0xd8c/0x1970 [radeon]
[    2.641763]  [<ffffffffa034c8d4>] ? drm_detect_hdmi_monitor+0x74/0xc0 [drm]
[    2.641810]  [<ffffffffa073f388>] radeon_atom_encoder_mode_set+0x178/0x3c0 [radeon]
[    2.641849]  [<ffffffffa04a7986>] drm_crtc_helper_set_mode+0x356/0x530 [drm_kms_helper]
[    2.641897]  [<ffffffffa06d989c>] radeon_property_change_mode.isra.1+0x3c/0x40 [radeon]
[    2.641943]  [<ffffffffa06d9a4e>] radeon_connector_set_property+0x1ae/0x3f0 [radeon]
[    2.641986]  [<ffffffffa03494e2>] drm_mode_obj_set_property_ioctl+0x1b2/0x3a0 [drm]
[    2.642027]  [<ffffffffa034970f>] drm_mode_connector_property_set_ioctl+0x3f/0x60 [drm]
[    2.642069]  [<ffffffffa0339fef>] drm_ioctl+0x1df/0x680 [drm]
[    2.642100]  [<ffffffff8105e9ac>] ? __do_page_fault+0x2ec/0x600
[    2.642134]  [<ffffffffa06b204c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
[    2.642166]  [<ffffffff811da3c0>] do_vfs_ioctl+0x2d0/0x4b0
[    2.642194]  [<ffffffff811c9f31>] ? __sb_end_write+0x31/0x60
[    2.642222]  [<ffffffff811da621>] SyS_ioctl+0x81/0xa0
[    2.642249]  [<ffffffff8153d8e9>] system_call_fastpath+0x16/0x1b
[    2.642277] Code: 00 00 00 80 ff 77 00 00 49 b9 00 00 00 00 00 ea ff ff 48 01 d8 48 0f 42 15 38 bb 66 00 48 01 d0 48 c1 e8 0c 48 c1 e0 06 49 01 c1 <49> 8b 01 f6 c4 80 0f 85 0e 01 00 00 49 8b 01 a8 80 0f 84 83 00 
[    2.644387] RIP  [<ffffffff811ac4e6>] kfree+0x56/0x1a0
[    2.646397]  RSP <ffff8800dc16ba30>
[    2.648359] CR2: ffffec2000000900
Comment 1 Christopher Frömmel 2014-11-19 03:40:12 UTC
I can confirm this.
My laptop can boot with 3.17.3 with 2 additional monitors connected (DVI + HDMI) and all 3 screens are working.
But when X is starting, kernel goes OOPS if HDMI is connected. I can reboot via ssh then.
If HDMI is connected AFTER X has been started, system will enter a hard lockup state.

Will add my current .config and dmesg part.
Comment 2 Christopher Frömmel 2014-11-19 03:42:38 UTC
Created attachment 158131 [details]
kernel config
Comment 3 Christopher Frömmel 2014-11-19 03:48:04 UTC
Created attachment 158141 [details]
complete dmesg when X starts and HDMI (with audio) is connected, dpm enabled
Comment 4 Christopher Frömmel 2014-11-19 03:50:59 UTC
Created attachment 158151 [details]
lspci -nn
Comment 5 Michel Dänzer 2014-11-19 07:34:49 UTC
Does reverting commit ffe0245532b98efc4bc0e06f29c51d3f0e471152 help? If not, can you bisect?
Comment 6 Daniel Otero 2014-11-19 14:38:30 UTC
Reverting the commit ffe0245532b98efc4bc0e06f29c51d3f0e471152 from the tag v3.17.3 does resolve the issue.

Thanks for your time.
Comment 7 Alex Deucher 2014-11-19 16:46:30 UTC
Should be fixed by http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=83d04c39f9048807a8500e575ae3f1718a3f45bb
which needs to go to stable.
Comment 8 Daniel Otero 2014-11-19 21:06:56 UTC
Applying the patch of Alex's commit solved the issue.
Comment 9 Christopher Frömmel 2014-11-20 01:08:31 UTC
Thanks to everyone involved. Report and fix within 20 hours. Awesome!
Comment 10 Adam Williamson 2014-11-22 00:54:32 UTC
I'm not sure it's right to mark it as fixed until the patch goes to 3.17 - it's fixed in mainline, sure, but not stable. The fix seems to have missed the boat for 3.17.4. (I got burned by this with OpenELEC 5.0 beta 3, which happened to get kernel 3.17.3, and is commonly used on systems with Radeon graphics adapters and HDMI displays as it's an HTPC appliance distro...)
Comment 11 Daniel Otero 2014-11-22 13:35:22 UTC
The bug is pretty nasty and the fix is quite trivial, so I don't know why wasn't included in the 3.17.4 release. I assume it's because didn't seem so urgent, but actually leave my system unusable.

I don't know the procedure to reach the current stable branch, but if it's going to be a 3.17.5 release, this fix should be in there. Should I mail Greg KH about the bug?

Anyway, I don't think leaving it open will make any difference.
Comment 12 Alex Deucher 2014-11-27 01:33:50 UTC
The patch was sent to stable last week.  It should show up any time now.
Comment 13 Cherry Ontop 2014-12-12 10:58:42 UTC
Does this bug still persist in 3.17.6-1 or is it patched there?