As reported first in http://lists.debian.org/debian-ia64/2012/01/msg00016.html, I'm getting stability issues with kernel > 2.6.38.
I've bisected the problem to commit 37a9d912b24f96a0591773e6e6c3642991ae5a70 (futex: Sanitize cmpxchg_futex_value_locked API).
Here are some issues that can be easily observed:
- in a X session (tested under GNOME Classic as well as TWM), hitting the Tab key while in a terminal window instantly triggers a X restart. X isn't crashed as I wrongly assume initially. From the logs, it's definitely properly shut down and restarted
- still in a X session, clicking on the "Edit" menu or "Back button" of Firefox/Iceweasel triggers a crash of Firefox/Iceweasel. For this scenario, I have some kind of gdb stack trace in PulseAudio, before gdb itself goes wrong (more on this later; core file available)
First investigations let me wrongly assume that these issues were related to something bad in PulseAudio, as uninstalling PulseAudio fixed both of them (Tab key in a terminal issue and Firefox/Iceweasel crash).
However, other crashes make me believe that PulseAudio was only a evidence of something more general broken since the bisected commit. Most notably, gdb can't be started at all. Every attempts to debug a program immediately ends up with a SIGTRAP signal. Quite problematic to debug further...
It's noteworthy that the exact same system doesn't exhibit these issues when rebooted with kernel 2.6.38 (Debian linux-image-2.6.38-2-mckinley if needed).
While also testing GL rendering on my system, trying to run ioQuake3-based Quake 3 demo gives at startup:
------ Initializing Sound ------
Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:106, function pa_mutex_unlock(). Aborting.
----- Client Shutdown (Received signal 6) -----
RE_Shutdown( 1 )
So, still triggered in PulseAudio, but definitely mutex-related...
Forwarded to the linux-ia64 list, though I chose a poor subject line.
Created attachment 72915 [details]
I think I found the problem. GCC re-orders code because it does not know that the ia64 fault handler may change the value of register r8 to -EFAULT
Just rebuilt kernel with patch proposed in attachment #72915 [details]. Issue fixed :-)
PS: gdb is still returning early with SIGTRAP when debugging Iceweasel (didn't have time to try other programs). However every other reported issues (futex test suite, Tab keystroke in a terminal window, Iceweasel's buttons and menus, ioQuake3-based Quake 3 demo) now work fine. So it may be a problem with gdb itself.
Fix (slightly modified from patch attached here because Linus pointed out that we should tell GCC that the __asm__ code modifies r8) is now upstream:
New patch version works flawlessly.