Bug 42757 - [bisected] General stability issues since "futex: Sanitize cmpxchg_futex_value_locked API" commit
Summary: [bisected] General stability issues since "futex: Sanitize cmpxchg_futex_valu...
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: IA-64 (show other bugs)
Hardware: IA-64 Linux
: P1 high
Assignee: platform_ia-64
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-11 13:26 UTC by Émeric Maschino
Modified: 2012-04-17 20:16 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.39
Tree: Mainline
Regression: Yes


Attachments
proposed patch (1.55 KB, patch)
2012-04-13 20:08 UTC, Tony Luck
Details | Diff

Description Émeric Maschino 2012-02-11 13:26:39 UTC
As reported first in http://lists.debian.org/debian-ia64/2012/01/msg00016.html, I'm getting stability issues with kernel > 2.6.38.

I've bisected the problem to commit 37a9d912b24f96a0591773e6e6c3642991ae5a70 (futex: Sanitize cmpxchg_futex_value_locked API).

Here are some issues that can be easily observed:
- in a X session (tested under GNOME Classic as well as TWM), hitting the Tab key while in a terminal window instantly triggers a X restart. X isn't crashed as I wrongly assume initially. From the logs, it's definitely properly shut down and restarted
- still in a X session, clicking on the "Edit" menu or "Back button" of Firefox/Iceweasel triggers a crash of Firefox/Iceweasel. For this scenario, I have some kind of gdb stack trace in PulseAudio, before gdb itself goes wrong (more on this later; core file available)

First investigations let me wrongly assume that these issues were related to something bad in PulseAudio, as uninstalling PulseAudio fixed both of them (Tab key in a terminal issue and Firefox/Iceweasel crash).

However, other crashes make me believe that PulseAudio was only a evidence of something more general broken since the bisected commit. Most notably, gdb can't be started at all. Every attempts to debug a program immediately ends up with a SIGTRAP signal. Quite problematic to debug further...

It's noteworthy that the exact same system doesn't exhibit these issues when rebooted with kernel 2.6.38 (Debian linux-image-2.6.38-2-mckinley if needed).
Comment 1 Émeric Maschino 2012-02-12 11:35:51 UTC
While also testing GL rendering on my system, trying to run ioQuake3-based Quake 3 demo gives at startup:

------ Initializing Sound ------
Assertion 'pthread_mutex_unlock(&m->mutex) == 0' failed at pulsecore/mutex-posix.c:106, function pa_mutex_unlock(). Aborting.
----- Client Shutdown (Received signal 6) -----
RE_Shutdown( 1 )

So, still triggered in PulseAudio, but definitely mutex-related...
Comment 2 Jonathan Nieder 2012-03-05 00:45:54 UTC
Forwarded to the linux-ia64 list, though I chose a poor subject line.
http://thread.gmane.org/gmane.linux.kernel/1111752/focus=22096
Comment 3 Tony Luck 2012-04-13 20:08:11 UTC
Created attachment 72915 [details]
proposed patch

I think I found the problem. GCC re-orders code because it does not know that the ia64 fault handler may change the value of register r8 to -EFAULT
Comment 4 Émeric Maschino 2012-04-15 21:26:57 UTC
Hello,

Just rebuilt kernel with patch proposed in attachment #72915 [details]. Issue fixed :-)

Many thanks,

     Emeric

PS: gdb is still returning early with SIGTRAP when debugging Iceweasel (didn't have time to try other programs). However every other reported issues (futex test suite, Tab keystroke in a terminal window, Iceweasel's buttons and menus, ioQuake3-based Quake 3 demo) now work fine. So it may be a problem with gdb itself.
Comment 5 Tony Luck 2012-04-17 04:56:38 UTC
Fix (slightly modified from patch attached here because Linus pointed out that we should tell GCC that the __asm__ code modifies r8) is now upstream:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=c76f39bddb84f93f70a5520d9253ec0317bec216
Comment 6 Émeric Maschino 2012-04-17 20:16:22 UTC
New patch version works flawlessly.

Thanks,

    Emeric

Note You need to log in before you can comment on or make changes to this bug.