Created attachment 129791 [details] Kernel oops in syslog I am experiencing a regression that reveals itself after resume from a long hibernation. The symptom is that the X11 display freezes after the kernel emits an oops. This seems to depend on the video chipset. A paging request failure occurs on Thinkpad X200 Tablet with the Intel GM45 chipset whereas there is no problem on Thinkpad X220 Tablet with Intel HD Graphics 3000. This problem does not occur if the hibernation is short. I can reproduce the error reliably if the hibernation lasts for several hours. I use the compositing window manager, compiz 0.8.8. Since this problem depends on the kernel version, I performed a bisection. The first bad commit is: commit 17fec8a08698bcab98788e1e89f5b8e7502ababd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jul 4 00:23:33 2013 +0100 drm/i915: Use Graphics Base of Stolen Memory on all gen3+ So I made the mistake of missing that the desktop and mobile chipsets have different layouts in their PCI configurations, and we were incorrectly setting the wrong physical address for stolen memory on mobile chipsets. Since all gen3+ are actually consistent in the location of the GBSM register in the PCI configuration space on device 2 (the GPU), use it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: Drop cc: stable and fudge conflicts.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Let me attach: kernel oops in syslog, Xorg.0.log, bisection log
Created attachment 129801 [details] Xorg.0.log
Created attachment 129811 [details] Bisection log
Tainted. By what?
Sorry, I forgot that I was using modules from https://github.com/evgeni/tp_smapi . They are hdaps.ko, thinkpad_ec.ko, tp_smapi.ko. I am going to retest this without those modules, if you want.
I was more concerned that this may not have been the first warning. Can you please run 'addr2line -i -e </path/to/i915.ko> 0xffffffffa038a1c4' or perhaps better gdb </path/to/i915.ko> ; list *intel_gen4_queue_flip+0xc4
I realized that my ebuild script was stripping the kernel modules before installing them, so I recompiled the kernel without stripping the modules. I also switched on the CONFIG_DEBUG_INFO option. But then the system locked up hardly after resume, so I could not get dmesg or syslog. Therefore, I just tried the gdb command, hoping that the address is not changed whether to (un)strip or to turn on/off CONFIG_DEBUG_INFO. The result is: # gdb /lib/modules/3.10.0-1+/kernel/drivers/gpu/drm/i915/i915.ko GNU gdb (Gentoo 7.6.2 p1) 7.6.2 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". For bug reporting instructions, please see: <http://bugs.gentoo.org/>... Reading symbols from /lib64/modules/3.10.0-1+/kernel/drivers/gpu/drm/i915/i915.ko...done. (gdb) list *intel_gen4_queue_flip+0xc4 0x2d1b4 is in intel_gen4_queue_flip (/var/tmp/portage/sys-kernel/bisect-3.99.99/work/linux-3.99.99/drivers/gpu/drm/i915/intel_ringbuffer.h:233). 228 /var/tmp/portage/sys-kernel/bisect-3.99.99/work/linux-3.99.99/drivers/gpu/drm/i915/intel_ringbuffer.h: No such file or directory. Line intel_ringbuffer.h:233 is in the function: 229 static inline void intel_ring_emit(struct intel_ring_buffer *ring, 230 u32 data) 231 { 232 iowrite32(data, ring->virtual_start + ring->tail); 233 ring->tail += 4; 234 } BTW, the kernel crashed without the tp_smapi modules.
Chris, another candidate for your ring init rework patches?
Maybe, but he didn't say that they were any error messages upon resume. You would have thought he noticed the *ERROR* first.
Created attachment 130801 [details] *ERROR* and kernel oops in syslog
Sorry, I missed the *ERROR*: [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 0000c82c tail 00000000 start 00003000
Definitely one for Chris' patches.
I've rebased the patches against drm-intel-nightly, so they should be easier to apply: http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug76554
I tried the bug76554 branch with head cfa8aaa35f180268c99e72964228c944930af680 by (shallow-)cloning the git repo. Now, the long hibernation issue seems to be gone. Thank you. However, I hit "*ERROR* render ring initialization failed" under a different condition. Maybe due to this, compiz or sometimes the X server crashes. A good thing is that this takes much less time to reproduce. The steps to trigger the error are: 0. turn off computer 1. disconnect external display from the VGA port 2. turn on Thinkpad X200 Tablet and wait until X server starts up 3. connect an LCD TV to the VGA port 4. log in (compiz then starts up) 5. hibernate 6. resume I attach dmesg with drm.debug=7 and Xorg.0.log. Should I file a different bug?
Created attachment 131511 [details] dmesg with external TV connected after X startup
Created attachment 131521 [details] Xorg.0.log with external TV connected after X startup
Chris, is this a dupe of https://bugs.freedesktop.org/show_bug.cgi?id=76554? Can we close this one?
Close enough. The bug in the summary was a different fix.
Assuming the bug in the summary is now fixed, please reopen if this is not the case. We'll track the render ring initialization issue at https://bugs.freedesktop.org/show_bug.cgi?id=76554. Thanks for the report.