Bug 93711
Summary: | BUG in i915_gem.c freezed the console | ||
---|---|---|---|
Product: | Drivers | Reporter: | Martin Ziegler (ziegler) |
Component: | Video(DRI - Intel) | Assignee: | intel-gfx-bugs (intel-gfx-bugs) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | igor.raits, intel-gfx-bugs, nickross, xry111 |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.0.0-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
from the syslog
from the syslog (2) from the syslog Linus' patch my patch my patch (2) damien's patch Daniel's patch Josh's patch |
Created attachment 168501 [details]
from the syslog (2)
The Bug is still there after commit ae1aa797e0ace9bbce055e31de1f641e422a082a Merge: a015d33 21689a4 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Sat Feb 28 10:36:48 2015 -0800 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux I attach the syslog(2) regards Martin Created attachment 168531 [details]
from the syslog
I met the bug too. I've attached the syslog which seems similar to Martin's syslog. Linus has fixed the bug. I think the patch will be merged into the kernel tree soon. In fact the bug is not in i915_gem.c but in intel_atomic_plane.c. See <https://people.freedesktop.org/patch/43712/>. Created attachment 168621 [details]
Linus' patch
My bad. After the actual testing, I found that all patches on freedesktop.org can't solve the problem. Still occur in 4.0.0-rc2. I confirm that: The bug is still in 4.0.0-rc2 After some debug I found out the problem: In drivers/gpu/drm/i915/intel_display.c: 9733 intel_crtc_page_flip, the route do following things: 1) Add a kernel work to workqueue of drm devices. This work will unpin old framebuffer object (crtc->primary->fb). 2) Try to lock the mutex of drm device. 3) Assign crtc->primary->fb to another framebuffer object (the argument fb). 4) Pin newly assigned crtc->primary->fb. 5) Unlock the mutex of drm device. Unfortunately, in step 2, if the mutex has been locked, the route will sleep. Then the kernel may run the work created in step 1. This may make pin_count of old framebuffer object zero. However, since step 3 is not processed, crtc->primary->fb is still assigned to the old framebuffer. If we are unlucky, the routes switching X to console will run in such a "intermediate" situation. It will tries to unpin the old framebuffer, again. Then a kernel bug occurs. I am trying to move step 2 before step 1 to solve the problem. However I am afraid of deadlock so I will analysis the kernel code more. My previous comment seems not correct since I am not a professional display driver developer :( But now I've found and fixed the bug: In intel_crtc_page_flip, we changed crtc->primary->fb, but forgot to change crtc->primary->state->fb. The fixing is simple, call drm_atomic_set_fb_for_plane to change crtc->primary->state->fb, keep it same to crtc->primary->fb. To administrator: (1) Mark this as FIXED. (2) Fix it in kernel tree. Created attachment 169661 [details]
my patch
Sorry. The previous patch includes some debug output code I added so it will produce some warning applying to origin kernel. I'll upload a new patch. Created attachment 169671 [details]
my patch (2)
Thanks. I will test the patch. Martin Fixed by commit 2dccc9898d45cd552f372c3f0b4a7f42126312f1 Author: Xi Ruoyao <xry111@outlook.com> Date: Thu Mar 12 20:16:32 2015 +0800 drm/i915: Ensure plane->state->fb stays in sync with plane->fb in drm-intel-fixes. Will reach some v4.0-rcN. Thanks for the report and the fix. 4.0.0-rc5 won't boot for me on an intel core i7 4770. I get eight penguins and no further. I have bisected the issue to (on Linus's tree): commit 319c1d420a0b62d9dbb88104afebaabc968cdbfa Author: Xi Ruoyao <xry111@outlook.com> Date: Thu Mar 12 20:16:32 2015 +0800 which seems to be the above patch. Any ideas? Anything I can try to help? (In reply to Nick Ross from comment #17) > 4.0.0-rc5 won't boot for me on an intel core i7 4770. I get eight penguins > and no further. I have bisected the issue to (on Linus's tree): > commit 319c1d420a0b62d9dbb88104afebaabc968cdbfa > Author: Xi Ruoyao <xry111@outlook.com> > Date: Thu Mar 12 20:16:32 2015 +0800 > > which seems to be the above patch. Any ideas? Anything I can try to help? I received many emails about this today. Daniel recommended to cherry-pick commit f55548b5af87ebfc586ca75748947f1c1b1a4a52 Author: Damien Lespiau <damien.lespiau@intel.com> Date: Thu Feb 5 18:30:20 2015 +0000 drm/i915: Don't try to reference the fb in get_initial_plane_config() From linux-next. I haven't build and test rc5 yet. But in rc4+ my patch works well (on my machine). I'll build an rc5 and test again immediately. > I haven't build and test rc5 yet. But in rc4+ my patch works well (on my
> machine).
> I'll build an rc5 and test again immediately.
rc5 still works well on my machine. But I found some WARNINGs in kernel
log which may be related to your problem.
I tried Damien's solution. It solved the WARNING.
Created attachment 172221 [details]
damien's patch
Damien's patch fixes my boot problem. Thank you. Damien's patch fixes my problem. Thank you.On 25 Mar 2015 04:16, bugzilla-daemon@bugzilla.kernel.org wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=93711 > > --- Comment #20 from Ruoyao Xi <xry111@outlook.com> --- > Created attachment 172221 [details] > --> https://bugzilla.kernel.org/attachment.cgi?id=172221&action=edit > damien's patch > > -- > You are receiving this mail because: > You are on the CC list for the bug. Ich applied Damien's patch to 4.0-rc5. Before the patch I had two warnings druring boot: WARNING: CPU: 3 PID: 1 at include/linux/kref.h:47 drm_framebuffer_reference+0x56/0x5f() WARNING: CPU: 2 PID: 6 at drivers/gpu/drm/drm_atomic.c:482 drm_atomic_check_only+0x3a9/0x3cf() After the patch only the following warning: Mar 25 21:46:01 zertz kernel: [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5 ... fbcon: inteldrmfb (fb0) is primary device ------------[ cut here ]------------ WARNING: CPU: 0 PID: 6 at drivers/gpu/drm/drm_atomic.c:482 drm_atomic_check_only+0x3a9/0x3cf() Modules linked in: CPU: 0 PID: 6 Comm: kworker/u8:0 Tainted: G U 4.0.0-rc5-00003-gad5de1d #77 Hardware name: LENOVO 4349WK7/4349WK7, BIOS 6MET81WW (1.41) 10/26/2010 Workqueue: events_unbound async_run_entry_fn 0000000000000000 0000000000000009 ffffffff813d2040 0000000000000000 ffffffff81036594 ffff8801334c2710 ffffffff81241786 0000000000000088 ffff8800b6c66180 ffff8800b6c93000 0000000000000002 ffff8800b6c5a180 Call Trace: [<ffffffff813d2040>] ? dump_stack+0x40/0x50 [<ffffffff81036594>] ? warn_slowpath_common+0x93/0xab [<ffffffff81241786>] ? drm_atomic_check_only+0x3a9/0x3cf [<ffffffff81241786>] ? drm_atomic_check_only+0x3a9/0x3cf [<ffffffff812417ba>] ? drm_atomic_commit+0xe/0x4d [<ffffffff812264da>] ? drm_atomic_helper_plane_set_property+0x68/0xa3 [<ffffffff81240c43>] ? modeset_lock+0x8f/0xf2 [<ffffffff812345ee>] ? drm_mode_plane_set_obj_prop+0x28/0x49 [<ffffffff81227a8c>] ? restore_fbdev_mode+0x5f/0xc7 [<ffffffff8122934c>] ? drm_fb_helper_restore_fbdev_mode_unlocked+0x1e/0x54 [<ffffffff812293b0>] ? drm_fb_helper_set_par+0x2e/0x32 [<ffffffff8129fc1b>] ? intel_fbdev_set_par+0x11/0x55 [<ffffffff811bc715>] ? fbcon_init+0x327/0x442 [<ffffffff81209ec1>] ? visual_init+0xaf/0x102 [<ffffffff8120b422>] ? do_bind_con_driver+0x18e/0x295 [<ffffffff8120b9db>] ? do_take_over_console+0x150/0x179 [<ffffffff811b90df>] ? do_fbcon_takeover+0x58/0x94 [<ffffffff8104a8de>] ? notifier_call_chain+0x35/0x59 [<ffffffff8104aaff>] ? __blocking_notifier_call_chain+0x42/0x5d [<ffffffff811c0017>] ? register_framebuffer+0x281/0x2b4 [<ffffffff81229632>] ? drm_fb_helper_initial_config+0x27e/0x330 [<ffffffff81001684>] ? __switch_to+0x1ff/0x45d [<ffffffff8104bb5a>] ? async_run_entry_fn+0x2d/0xbf [<ffffffff810467a5>] ? process_one_work+0x142/0x214 [<ffffffff81046ce3>] ? worker_thread+0x1c3/0x26d [<ffffffff81046b20>] ? rescuer_thread+0x284/0x284 [<ffffffff8104a0c8>] ? kthread+0xab/0xb3 [<ffffffff81040000>] ? get_signal+0x2e0/0x4ce [<ffffffff8104a01d>] ? __kthread_parkme+0x5d/0x5d [<ffffffff813d60c8>] ? ret_from_fork+0x58/0x90 [<ffffffff8104a01d>] ? __kthread_parkme+0x5d/0x5d ---[ end trace 8d8be2074054d8dc ]--- Martin The fix of 93711 in rc5 is a cherry-picking from linux-next. But it caused a lot of trouble. Now we've found other fixes from linux-next needed to keep all things work correctly (>_<). The solution in intel-gfx mailing list is: (1) Cherry-picking Daniel Vetter's commit 8218c3f4df3bb1c637c17552405039a6dd3c1ee1 drm: Fixup racy refcounting in plane_force_disable from linux-next. (2) Cherry-picking Damien Lespiau's commit f55548b5af87ebfc586ca75748947f1c1b1a4a52 drm/i915: Don't try to reference the fb in get_initial_plane_config() from linux-next. (3) Apply Josh Boyer's patch from intel-gfx mailing list. I'll upload patches from Daniel and Josh. Maintain something between mainline tree and linux-next always causes trouble... I hope this package solution can end up all this mess things! Created attachment 172411 [details]
Daniel's patch
Created attachment 172421 [details]
Josh's patch
No more warnings during boot after the three patches. Thanks Martin Also fixed in drm-intel-fixes hopefully on its way to v4.0-rc6. *** Bug 93991 has been marked as a duplicate of this bug. *** |
Created attachment 168081 [details] from the syslog When I switched from X to the console, the console became unresponsive. I had to use SysRq to reboot. I attach the part of the syslog. Regards Martin