Bug 54311

Summary: computer hangs during resume from S3 in i915 drm
Product: Drivers Reporter: Thomas Meyer (thomas.mey)
Component: Video(DRI - Intel)Assignee: intel-gfx-bugs (intel-gfx-bugs)
Status: RESOLVED CODE_FIX    
Severity: normal CC: aaron.lu, daniel, intel-gfx-bugs, thomas
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.8.0 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 56331    

Description Thomas Meyer 2013-02-23 10:56:46 UTC
I put my machine to supsend to ram in the evening. the next day I resume from ram and the computer hangs/is frozen.

The strange things is that the 3.8.0 kernel seems to resume fine for short sleep periods (for a few seconds).

I'll switch back to 3.7.8 which doesn't expose this behaviour.
Comment 1 Aaron Lu 2013-02-25 05:23:28 UTC
(In reply to comment #0)
> I put my machine to supsend to ram in the evening. the next day I resume from
> ram and the computer hangs/is frozen.

Did you see any output when it hangs?
You can try booting into console mode and adding nomodeset no_console_suspend to the kernel command line to see what happened.

> 
> The strange things is that the 3.8.0 kernel seems to resume fine for short
> sleep periods (for a few seconds).
> 
> I'll switch back to 3.7.8 which doesn't expose this behaviour.

Use git-bisect to find out offending commit would be best :-)
Thanks.
Comment 2 Thomas Meyer 2013-03-03 12:28:42 UTC
Hi,

some more infos:

The crash occurs non determinstic and seems not to be depenend on the time of the S2R as the bug's title suggests.

The kernel is waking up correctly, I can ssh into the machine. The probelm seems to be the graphic subsystem. The X server seems to hang. This is the i915 driver.


The problem still occurs in 3.8.1

I tried to gdb into the Xorg process, but this got stuck somehow!
Comment 3 Thomas Meyer 2013-03-03 12:34:58 UTC
$ su -c 'echo w > /proc/sysrq-trigger'

[...]
[81535.305064] SysRq : Show Blocked State
[81535.305072]   task                        PC stack   pid father
[81535.305096] Xorg            D 7fffffffffffffff     0   552      1 0x00400084
[81535.305102]  ffff88012f6bfad8 0000000000000086 ffff88012a085f40 ffff88012f6bffd8
[81535.305106]  ffff88012f6bffd8 ffff88012f6bffd8 ffff88010f9d8000 ffff88012a085f40
[81535.305109]  0000000100000000 0000000000000304 003fffffffc00c2a 0000000000000000
[81535.305113] Call Trace:
[81535.305123]  [<ffffffff814f9163>] schedule+0x23/0x60
[81535.305127]  [<ffffffff814f7e35>] schedule_timeout+0x105/0x140
[81535.305131]  [<ffffffff814f8ffa>] wait_for_common+0xaa/0x140
[81535.305137]  [<ffffffff810553e0>] ? try_to_wake_up+0x80/0x80
[81535.305141]  [<ffffffff814f9138>] wait_for_completion+0x18/0x20
[81535.305146]  [<ffffffff810478ea>] flush_workqueue+0x10a/0x390
[81535.305152]  [<ffffffff81317223>] intel_crtc_page_flip+0x133/0x350
[81535.305157]  [<ffffffff812e49a5>] drm_mode_page_flip_ioctl+0x235/0x2a0
[81535.305161]  [<ffffffff812df161>] ? drm_mode_object_find+0x61/0x90
[81535.305165]  [<ffffffff812defcc>] ? drm_crtc_convert_to_umode+0xcc/0x150
[81535.305170]  [<ffffffff812d3bd3>] drm_ioctl+0x4c3/0x570
[81535.305174]  [<ffffffff812e4770>] ? drm_mode_gamma_get_ioctl+0x120/0x120
[81535.305180]  [<ffffffff810f7fca>] do_vfs_ioctl+0x8a/0x560
[81535.305186]  [<ffffffff811b9635>] ? inode_has_perm.isra.40.constprop.70+0x25/0x30
[81535.305190]  [<ffffffff811babaf>] ? file_has_perm+0x8f/0xa0
[81535.305190]  [<ffffffff810f8531>] sys_ioctl+0x91/0xb0
[81535.305190]  [<ffffffff814ffa50>] system_call_fastpath+0x16/0x1b
[81535.305190] kworker/u:52    D ffffffff8150b740     0  4107      2 0x00000080
[81535.305190]  ffff88012b1d7d28 0000000000000046 ffff8800af349900 ffff88012b1d7fd8
[81535.305190]  ffff88012b1d7fd8 ffff88012b1d7fd8 ffffffff816d4460 ffff8800af349900
[81535.305190]  ffff88012b1d7d68 ffffffff814f8bfd ffff8800af349900 ffff88013b21bb30
[81535.305190] Call Trace:
[81535.305190]  [<ffffffff814f8bfd>] ? __schedule+0x22d/0x500
[81535.305190]  [<ffffffff814f9163>] schedule+0x23/0x60
[81535.305190]  [<ffffffff814f92c9>] schedule_preempt_disabled+0x9/0x10
[81535.305190]  [<ffffffff814f843d>] __mutex_lock_slowpath+0x5d/0x90
[81535.305190]  [<ffffffff814f817d>] mutex_lock+0x1d/0x30
[81535.305190]  [<ffffffff812f3348>] i915_hotplug_work_func+0x28/0xa0
[81535.305190]  [<ffffffff812f3320>] ? i915_error_work_func+0x100/0x100
[81535.305190]  [<ffffffff810471cd>] process_one_work+0x11d/0x420
[81535.305190]  [<ffffffff81048355>] worker_thread+0x135/0x3d0
[81535.305190]  [<ffffffff81048220>] ? manage_workers+0x240/0x240
[81535.305190]  [<ffffffff8104c94a>] kthread+0xba/0xc0
[81535.305190]  [<ffffffff8104c890>] ? kthread_create_on_node+0x110/0x110
[81535.305190]  [<ffffffff814ff9aa>] ret_from_fork+0x7a/0xb0
[81535.305190]  [<ffffffff8104c890>] ? kthread_create_on_node+0x110/0x110
[81535.305190] Xorg            D ffff88009cc2f200     0  4757   4754 0x00400084
[81535.305190]  ffff8800a579dd68 0000000000000082 ffff88009cc2f200 ffff8800a579dfd8
[81535.305190]  ffff8800a579dfd8 ffff8800a579dfd8 ffff88013b08cc80 ffff88009cc2f200
[81535.305190]  ffff8800a579ddd0 ffff88013b21b800 00000000fffffff2 ffff88013b21bb30
[81535.305190] Call Trace:
[81535.305190]  [<ffffffff814f9163>] schedule+0x23/0x60
[81535.305190]  [<ffffffff814f92c9>] schedule_preempt_disabled+0x9/0x10
[81535.305190]  [<ffffffff814f843d>] __mutex_lock_slowpath+0x5d/0x90
[81535.305190]  [<ffffffff814f817d>] mutex_lock+0x1d/0x30
[81535.305190]  [<ffffffff812e39da>] drm_fb_release+0x2a/0x80
[81535.305190]  [<ffffffff812d4738>] drm_release+0x568/0x600
[81535.305190]  [<ffffffff810e9027>] __fput+0xe7/0x220
[81535.305190]  [<ffffffff810e91f9>] ____fput+0x9/0x10
[81535.305190]  [<ffffffff81049c57>] task_work_run+0x77/0xc0
[81535.305190]  [<ffffffff81002846>] do_notify_resume+0x56/0x80
[81535.305190]  [<ffffffff814ffcd4>] int_signal+0x12/0x17
[81535.305064] SysRq : Show Blocked State
[81535.305072]   task                        PC stack   pid father
[81535.305096] Xorg            D 7fffffffffffffff     0   552      1 0x00400084
[81535.305102]  ffff88012f6bfad8 0000000000000086 ffff88012a085f40 ffff88012f6bffd8
[81535.305106]  ffff88012f6bffd8 ffff88012f6bffd8 ffff88010f9d8000 ffff88012a085f40
[81535.305109]  0000000100000000 0000000000000304 003fffffffc00c2a 0000000000000000
[81535.305113] Call Trace:
[81535.305123]  [<ffffffff814f9163>] schedule+0x23/0x60
[81535.305127]  [<ffffffff814f7e35>] schedule_timeout+0x105/0x140
[81535.305131]  [<ffffffff814f8ffa>] wait_for_common+0xaa/0x140
[81535.305137]  [<ffffffff810553e0>] ? try_to_wake_up+0x80/0x80
[81535.305141]  [<ffffffff814f9138>] wait_for_completion+0x18/0x20
[81535.305146]  [<ffffffff810478ea>] flush_workqueue+0x10a/0x390
[81535.305152]  [<ffffffff81317223>] intel_crtc_page_flip+0x133/0x350
[81535.305157]  [<ffffffff812e49a5>] drm_mode_page_flip_ioctl+0x235/0x2a0
[81535.305161]  [<ffffffff812df161>] ? drm_mode_object_find+0x61/0x90
[81535.305165]  [<ffffffff812defcc>] ? drm_crtc_convert_to_umode+0xcc/0x150
[81535.305170]  [<ffffffff812d3bd3>] drm_ioctl+0x4c3/0x570
[81535.305174]  [<ffffffff812e4770>] ? drm_mode_gamma_get_ioctl+0x120/0x120
[81535.305180]  [<ffffffff810f7fca>] do_vfs_ioctl+0x8a/0x560
[81535.305186]  [<ffffffff811b9635>] ? inode_has_perm.isra.40.constprop.70+0x25/0x30
[81535.305190]  [<ffffffff811babaf>] ? file_has_perm+0x8f/0xa0
[81535.305190]  [<ffffffff810f8531>] sys_ioctl+0x91/0xb0
[81535.305190]  [<ffffffff814ffa50>] system_call_fastpath+0x16/0x1b
[81535.305190] kworker/u:52    D ffffffff8150b740     0  4107      2 0x00000080
[81535.305190]  ffff88012b1d7d28 0000000000000046 ffff8800af349900 ffff88012b1d7fd8
[81535.305190]  ffff88012b1d7fd8 ffff88012b1d7fd8 ffffffff816d4460 ffff8800af349900
[81535.305190]  ffff88012b1d7d68 ffffffff814f8bfd ffff8800af349900 ffff88013b21bb30
[81535.305190] Call Trace:
[81535.305190]  [<ffffffff814f8bfd>] ? __schedule+0x22d/0x500
[81535.305190]  [<ffffffff814f9163>] schedule+0x23/0x60
[81535.305190]  [<ffffffff814f92c9>] schedule_preempt_disabled+0x9/0x10
[81535.305190]  [<ffffffff814f843d>] __mutex_lock_slowpath+0x5d/0x90
[81535.305190]  [<ffffffff814f817d>] mutex_lock+0x1d/0x30
[81535.305190]  [<ffffffff812f3348>] i915_hotplug_work_func+0x28/0xa0
[81535.305190]  [<ffffffff812f3320>] ? i915_error_work_func+0x100/0x100
[81535.305190]  [<ffffffff810471cd>] process_one_work+0x11d/0x420
[81535.305190]  [<ffffffff81048355>] worker_thread+0x135/0x3d0
[81535.305190]  [<ffffffff81048220>] ? manage_workers+0x240/0x240
[81535.305190]  [<ffffffff8104c94a>] kthread+0xba/0xc0
[81535.305190]  [<ffffffff8104c890>] ? kthread_create_on_node+0x110/0x110
[81535.305190]  [<ffffffff814ff9aa>] ret_from_fork+0x7a/0xb0
[81535.305190]  [<ffffffff8104c890>] ? kthread_create_on_node+0x110/0x110
[81535.305190] Xorg            D ffff88009cc2f200     0  4757   4754 0x00400084
[81535.305190]  ffff8800a579dd68 0000000000000082 ffff88009cc2f200 ffff8800a579dfd8
[81535.305190]  ffff8800a579dfd8 ffff8800a579dfd8 ffff88013b08cc80 ffff88009cc2f200
[81535.305190]  ffff8800a579ddd0 ffff88013b21b800 00000000fffffff2 ffff88013b21bb30
[81535.305190] Call Trace:
[81535.305190]  [<ffffffff814f9163>] schedule+0x23/0x60
[81535.305190]  [<ffffffff814f92c9>] schedule_preempt_disabled+0x9/0x10
[81535.305190]  [<ffffffff814f843d>] __mutex_lock_slowpath+0x5d/0x90
[81535.305190]  [<ffffffff814f817d>] mutex_lock+0x1d/0x30
[81535.305190]  [<ffffffff812e39da>] drm_fb_release+0x2a/0x80
[81535.305190]  [<ffffffff812d4738>] drm_release+0x568/0x600
[81535.305190]  [<ffffffff810e9027>] __fput+0xe7/0x220
[81535.305190]  [<ffffffff810e91f9>] ____fput+0x9/0x10
[81535.305190]  [<ffffffff81049c57>] task_work_run+0x77/0xc0
[81535.305190]  [<ffffffff81002846>] do_notify_resume+0x56/0x80
[81535.305190]  [<ffffffff814ffcd4>] int_signal+0x12/0x17
Comment 4 Aaron Lu 2013-03-04 01:11:45 UTC
Hi Thomas,

Attach the full dmesg after you ssh to the system would be helpful, thanks.
Comment 5 Aaron Lu 2013-03-04 01:13:00 UTC
And I'll move this bug to drivers/i915.
Comment 6 Jani Nikula 2013-03-04 10:19:40 UTC
(In reply to comment #1)
> Use git-bisect to find out offending commit would be best :-)
> Thanks.

Seconded.

(In reply to comment #4)
> Attach the full dmesg after you ssh to the system would be helpful, thanks.

Please do this with drm.debug=0xe module parameter.
Comment 7 Daniel Vetter 2013-03-04 10:30:42 UTC
Hm, smells like a deadlock on the mode_config lock. Can you please re-hang your machine with lockdep enabled too, that should spit out all current lock holders?

Also please retest with latest drm-intel-nightly from http://cgit.freedesktop.org/~danvet/drm-intel we've fixed a bunch of bugs in that area recently (or 3.9 kernels since all patches are currently merged upstream).
Comment 8 Thomas Meyer 2013-03-06 21:20:34 UTC
Hi, mhh. Strange: With enabling DEBUG_LOCKDEP the hang does not occur anymore... I'll update you if I can catch the hang again with this debug option enabled.
Comment 9 Thomas Meyer 2013-03-28 11:28:22 UTC
I'm using 3.9.0-rc4+ right now. the 3.9-rcX kernels seems to be okay. I did'nt encounter the hang the last weeks with the current development kernel. will stay on 3.9 for now. feel free to close this bug.
Comment 10 Daniel Vetter 2013-03-28 11:33:03 UTC
Ok, sounds like some piece of ducttape in 3.9 helps. Thanks for reporting this issue and please reopen when it pops up again.