Bug 87791 - radeonsi lockup and oops
Summary: radeonsi lockup and oops
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-05 18:55 UTC by aCaB
Modified: 2016-03-23 18:54 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.17
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lockup example (no oops) (9.87 KB, text/plain)
2014-11-05 18:55 UTC, aCaB
Details
lockup example (with oops) (5.86 KB, text/plain)
2014-11-05 18:56 UTC, aCaB
Details

Description aCaB 2014-11-05 18:55:55 UTC
Created attachment 156761 [details]
lockup example (no oops)

After upgrading mesa from mesa-10.3.0 to mesa-10.3.1 the Radeon HD 7700 card locks up several times a day without any specific trigger (or reliable way to reproduce it).
Xorg appears frozen with just the mouse pointer moving. It's not even possible to switch to a VT, however everything else works just fine. At any given time the X bt looks like this:
Thread 2 (Thread 0x7fe8147be700 (LWP 2415)):
#0  0x00007fe81b4d911c in pthread_cond_wait () from /lib64/libpthread.so.0
#1  0x00007fe816a807a3 in ?? () from /usr/lib64/dri/radeonsi_dri.so
#2  0x00007fe816a7ffc7 in ?? () from /usr/lib64/dri/radeonsi_dri.so
#3  0x00007fe81b4d5083 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fe81b9da3ad in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fe81d607880 (LWP 2380)):
#0  0x00007fe81b9836e9 in __memcpy_sse2_unaligned () from /lib64/libc.so.6
#1  0x00007fe816857a46 in ?? () from /usr/lib64/dri/radeonsi_dri.so
#2  0x00007fe816858742 in ?? () from /usr/lib64/dri/radeonsi_dri.so
#3  0x00007fe816859452 in ?? () from /usr/lib64/dri/radeonsi_dri.so
#4  0x00007fe8168abf13 in ?? () from /usr/lib64/dri/radeonsi_dri.so
#5  0x00007fe8168ac993 in ?? () from /usr/lib64/dri/radeonsi_dri.so
#6  0x00007fe81684ad74 in ?? () from /usr/lib64/dri/radeonsi_dri.so
#7  0x00007fe81684c510 in ?? () from /usr/lib64/dri/radeonsi_dri.so
#8  0x00007fe81a58b883 in ?? () from /usr/lib64/xorg/modules/libglamoregl.so
#9  0x00007fe81a58c64e in ?? () from /usr/lib64/xorg/modules/libglamoregl.so
#10 0x00007fe81a58d160 in ?? () from /usr/lib64/xorg/modules/libglamoregl.so
#11 0x00007fe81a58d81c in ?? () from /usr/lib64/xorg/modules/libglamoregl.so
#12 0x00007fe81a56c674 in ?? () from /usr/lib64/xorg/modules/libglamoregl.so
#13 0x00007fe81a56cd84 in ?? () from /usr/lib64/xorg/modules/libglamoregl.so
#14 0x00007fe81a56dd5e in ?? () from /usr/lib64/xorg/modules/libglamoregl.so
#15 0x00007fe81a56e6ba in ?? () from /usr/lib64/xorg/modules/libglamoregl.so
#16 0x0000000000563e2d in miCopyRegion ()
#17 0x00000000005643b6 in miDoCopy ()
#18 0x00007fe81a56e6fd in ?? () from /usr/lib64/xorg/modules/libglamoregl.so
#19 0x0000000000511828 in ?? ()
#20 0x0000000000432291 in ?? ()
#21 0x0000000000435d3e in ?? ()
#22 0x0000000000439b6a in ?? ()
#23 0x00007fe81b913a65 in __libc_start_main () from /lib64/libc.so.6
#24 0x000000000042531e in _start ()
Comment 1 aCaB 2014-11-05 18:56:18 UTC
Created attachment 156771 [details]
lockup example (with oops)
Comment 2 Alex Deucher 2014-11-05 19:39:06 UTC
(In reply to aCaB from comment #0)
> Created attachment 156761 [details]
> lockup example (no oops)
> 
> After upgrading mesa from mesa-10.3.0 to mesa-10.3.1 the Radeon HD 7700 card
> locks up several times a day without any specific trigger (or reliable way
> to reproduce it).
>

This sounds like a mesa regression rather than a kernel driver bug.  Can you bisect mesa?
Comment 3 aCaB 2014-11-05 20:25:43 UTC
(In reply to Alex Deucher from comment #2)
> This sounds like a mesa regression rather than a kernel driver bug.  Can you
> bisect mesa?

I understand mesa may be sending crap to the kernel space but that doesn't sound like a good reason to deref a NULL.

As for bisecting mesa, I am certainly willing to do that but I need a reliable way to trigger the lockup rather than just log in and wait for it to occour.
Will see if I get more hints over then next few days.
Comment 4 Michel Dänzer 2014-11-06 03:05:08 UTC
(In reply to aCaB from comment #3)
> I understand mesa may be sending crap to the kernel space but that doesn't
> sound like a good reason to deref a NULL.

AFAICT that should be fixed by the changes to radeon_ttm.c in https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=57d20a43c9b30663bdbacde8294a902edef35a84 .
Comment 5 aCaB 2014-11-06 21:46:56 UTC
Michel,
Thanks for your pointer and sorry for the late answer.

I'll try harder to find a reproducible case (firefox with some large animation seems to trigger it some times).
In the meantime, if the lockup is a mesa bug then feel free to close this ticket.

Note You need to log in before you can comment on or make changes to this bug.