|Summary:||[bisected] General stability issues since "futex: Sanitize cmpxchg_futex_value_locked API" commit|
|Product:||Platform Specific/Hardware||Reporter:||Émeric Maschino (emeric.maschino)|
|Severity:||high||CC:||emeric.maschino, jrnieder, tony.luck|
Description Émeric Maschino 2012-02-11 13:26:39 UTC
As reported first in http://lists.debian.org/debian-ia64/2012/01/msg00016.html, I'm getting stability issues with kernel > 2.6.38. I've bisected the problem to commit 37a9d912b24f96a0591773e6e6c3642991ae5a70 (futex: Sanitize cmpxchg_futex_value_locked API). Here are some issues that can be easily observed: - in a X session (tested under GNOME Classic as well as TWM), hitting the Tab key while in a terminal window instantly triggers a X restart. X isn't crashed as I wrongly assume initially. From the logs, it's definitely properly shut down and restarted - still in a X session, clicking on the "Edit" menu or "Back button" of Firefox/Iceweasel triggers a crash of Firefox/Iceweasel. For this scenario, I have some kind of gdb stack trace in PulseAudio, before gdb itself goes wrong (more on this later; core file available) First investigations let me wrongly assume that these issues were related to something bad in PulseAudio, as uninstalling PulseAudio fixed both of them (Tab key in a terminal issue and Firefox/Iceweasel crash). However, other crashes make me believe that PulseAudio was only a evidence of something more general broken since the bisected commit. Most notably, gdb can't be started at all. Every attempts to debug a program immediately ends up with a SIGTRAP signal. Quite problematic to debug further... It's noteworthy that the exact same system doesn't exhibit these issues when rebooted with kernel 2.6.38 (Debian linux-image-2.6.38-2-mckinley if needed).
Comment 1 Émeric Maschino 2012-02-12 11:35:51 UTC
While also testing GL rendering on my system, trying to run ioQuake3-based Quake 3 demo gives at startup: ------ Initializing Sound ------ Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:106, function pa_mutex_unlock(). Aborting. ----- Client Shutdown (Received signal 6) ----- RE_Shutdown( 1 ) So, still triggered in PulseAudio, but definitely mutex-related...
Comment 2 Jonathan Nieder 2012-03-05 00:45:54 UTC
Forwarded to the linux-ia64 list, though I chose a poor subject line. http://thread.gmane.org/gmane.linux.kernel/1111752/focus=22096
Comment 3 Tony Luck 2012-04-13 20:08:11 UTC
Created attachment 72915 [details] proposed patch I think I found the problem. GCC re-orders code because it does not know that the ia64 fault handler may change the value of register r8 to -EFAULT
Comment 4 Émeric Maschino 2012-04-15 21:26:57 UTC
Hello, Just rebuilt kernel with patch proposed in attachment #72915 [details]. Issue fixed :-) Many thanks, Emeric PS: gdb is still returning early with SIGTRAP when debugging Iceweasel (didn't have time to try other programs). However every other reported issues (futex test suite, Tab keystroke in a terminal window, Iceweasel's buttons and menus, ioQuake3-based Quake 3 demo) now work fine. So it may be a problem with gdb itself.
Comment 5 Tony Luck 2012-04-17 04:56:38 UTC
Fix (slightly modified from patch attached here because Linus pointed out that we should tell GCC that the __asm__ code modifies r8) is now upstream: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=c76f39bddb84f93f70a5520d9253ec0317bec216
Comment 6 Émeric Maschino 2012-04-17 20:16:22 UTC
New patch version works flawlessly. Thanks, Emeric