Bug 15737

Summary: System hangs
Product: Other Reporter: uzytkownik2 (uzytkownik2)
Component: OtherAssignee: other_other
Status: RESOLVED OBSOLETE    
Severity: high CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33,2.6.34-rc3,2.6.34-rc7,2.6.34,2.6.36-rc3 Subsystem:
Regression: Yes Bisected commit-id:

Description uzytkownik2@gmail.com 2010-04-09 11:37:37 UTC
I'm so sorry that this bug report does not contain much information but I don't know how to find out what the problem is.

After some time of running system hungs. I.e.:
- Caps lock/num lock does not blink as in kernel panic
- Screen is frozen
- Computer does not respond to ping by wifi

Affected version
- 2.6.33 - both vanillia and vanilla+gentoo patches+ck+tux-on-ice
- 2.6.34-rc3 - vanillia

It seems that it was triggered by some software update (it wasn't present a long time with 2.6.33) - however I suspect kernel as of the problems with ping.

The computer is ThinkPad T500 with intel graphics.
Comment 1 uzytkownik2@gmail.com 2010-04-15 08:08:32 UTC
This crash of X have occured at the same time as at least one problem:


#0  0x00007f173c6e91b5 in *__GI_raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        pid = <value optimized out>
        selftid = <value optimized out>
#1  0x00007f173c6ea5e0 in *__GI_abort () at abort.c:92
        act = {__sigaction_handler = {sa_handler = 0x344feb0, 
            sa_sigaction = 0x344feb0}, sa_mask = {__val = {32, 16, 8, 4, 
              139737775137196, 139737774676752, 139737805238272, 4, 
              4294967295, 1, 1, 8184064, 0, 56315696, 46, 0}}, 
          sa_flags = 1042310354, sa_restorer = 0x200000000001}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x000000000046a62e in OsAbort () at utils.c:1321
No locals.
#3  0x0000000000468ccd in AbortServer () at log.c:418
No locals.
#4  0x00000000004695f0 in FatalError (
    f=0x587b68 "Caught signal %d (%s). Server aborting\n") at log.c:546
        args = {{gp_offset = 24, fp_offset = 48, 
            overflow_arg_area = 0x7fff3f748ee0, 
            reg_save_area = 0x7fff3f748e20}}
        beenhere = 1
#5  0x0000000000465bde in OsSigHandler (signo=11, sip=0x290, 
    unused=<value optimized out>) at osinit.c:156
No locals.
#6  <signal handler called>
No symbol table info available.
#7  0x000000000045b85a in privateExists (privates=0x290, key=0x7f173ad2a23c)
    at privates.c:79
No locals.
#8  dixLookupPrivate (privates=0x290, key=0x7f173ad2a23c) at privates.c:162
        ptr = <value optimized out>
#9  0x00007f173ab26c59 in DRI2DestroyDrawable ()
   from /usr/lib64/xorg/modules/extensions/libdri2.so
No symbol table info available.
#10 0x00007f173b38dea4 in ?? ()
   from /usr/lib64/xorg/modules/extensions/libglx.so
No symbol table info available.
#11 0x00007f173b382d50 in ?? ()
   from /usr/lib64/xorg/modules/extensions/libglx.so
No symbol table info available.
#12 0x000000000042c87d in FreeClientResources (client=<value optimized out>)
    at resource.c:810
        rtype = <value optimized out>
        resources = <value optimized out>
        this = 0x341e600
        j = 46
#13 0x000000000045315e in CloseDownClient (client=0x35b7620) at dispatch.c:3631
        really_close_down = 1
#14 0x0000000000457bf7 in KillAllClients () at dispatch.c:3655
        i = 8
#15 Dispatch () at dispatch.c:468
        result = 8257088
        client = <value optimized out>
        nready = 0
        start_tick = 90960
#16 0x0000000000424d75 in main (argc=<value optimized out>, argv=0x7ebc28, 
    envp=<value optimized out>) at main.c:286
        i = 1
        alwaysCheckForInput = {0, 1}
Comment 2 uzytkownik2@gmail.com 2010-04-28 19:39:13 UTC
I cannot reproduce it with 2.6.32. Crash did only coincide in time. It is very erradic in nature (sometimes there is week without problem sometimes every 15 minutes).
Comment 3 uzytkownik2@gmail.com 2010-05-22 15:12:16 UTC
When I try to use Alt+SysRq+r (when the error is not triggered) on 2.6.34 on X it changes VT to tty1 and system hangs. 

Could anyone say how to debug this bug?
Comment 4 uzytkownik2@gmail.com 2010-08-09 08:04:32 UTC
Additional info: zen-sources with SLAB allocator works[1] but with SLQB I can reproduce the bug (the only option changed). Vanilla sources were tested with SLAB.

[1] Somehow. Only X hangs not the whole system and I can switch to vt and restart it. However if I switch to vt and back to X without restarting system sometimes hangs. Anyway system works after restart.
Comment 5 uzytkownik2@gmail.com 2010-09-06 23:26:06 UTC
Reproducable on zen-sources 2.6.35 - sometimes. Fragment of dmesg catched:

[ 1611.616749] [drm:i915_gem_do_execbuffer] *ERROR* Failed to pin buffer 22 of 24, total 242929664 bytes, 0 fences: -28
[ 1611.616753] [drm:i915_gem_do_execbuffer] *ERROR* 1010 objects [28 pinned], 347590656 object bytes [32833536 pinned], 32833536/234881024 gtt bytes
[ 1611.794369] [drm:i915_gem_do_execbuffer] *ERROR* Failed to pin buffer 18 of 20, total 242655232 bytes, 0 fences: -28
[ 1611.794372] [drm:i915_gem_do_execbuffer] *ERROR* 881 objects [24 pinned], 354299904 object bytes [32559104 pinned], 32559104/234881024 gtt bytes

I stopped X11 and restarted it. After reloging into gnome shell and jumping to console & back several times untill I cought:

[ 1687.344040] [drm:i915_gem_do_execbuffer] *ERROR* Failed to pin buffer 22 of 24, total 242929664 bytes, 0 fences: -28
[ 1687.344044] [drm:i915_gem_do_execbuffer] *ERROR* 907 objects [28 pinned], 319623168 object bytes [32833536 pinned], 32833536/234881024 gtt bytes
[ 1687.932780] [drm:i915_gem_do_execbuffer] *ERROR* Failed to pin buffer 22 of 24, total 242929664 bytes, 0 fences: -28
[ 1687.932784] [drm:i915_gem_do_execbuffer] *ERROR* 835 objects [28 pinned], 318877696 object bytes [32833536 pinned], 32833536/234881024 gtt bytes

When I switched to X11 system hanged. 

I do understend that zen-sources are not supported and I will try to reporduce on vanilla sources. However I would appreciate any help with debugging this issue.
Comment 6 uzytkownik2@gmail.com 2010-09-14 14:14:35 UTC
Reproduced with 2.6.36-rc3 with slight differences:
 - After soft hang the compizite WM crashes
 - The i915_gem_do_execbuffer appearse one instead of twice.
Comment 7 Alan 2012-06-18 20:20:40 UTC
Closing as obsolete as its an old kernel and both Xen and the kernel bits have changed a ton, with Xen also being integrated. If you can reproduce it on a modern kernel please do update the bug