Bug 7967

Summary: TIMER_ABSTIME cleared resulting in userland deadlock
Product: Timers Reporter: R (rdenis)
Component: OtherAssignee: Thomas Gleixner (tglx)
Status: REJECTED INSUFFICIENT_DATA    
Severity: normal CC: akpm, drepper, hgb, larsbj, mathias_koehrer, protasnb, roland, spam2
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.20 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Sample kernel config

Description R 2007-02-08 09:25:28 UTC
Most recent kernel where this bug did *NOT* occur: not known
Distribution: Debian Unstable
Hardware Environment: PC both SMT and non-SMT, with/without preemption
Software Environment: glibc 2.3 or 2.5
Problem Description:
Under some circumstances TIMER_ABSTIME seems to be lost from the 
clock_nanosleep() system call. One such case is when a straceing the process 
and triggering an ignored signal during clock_nanosleep. When clock_nanosleep 
is restarted, flags changes to 0 and the process hence starts waiting for a 
VERY long time (~35 years if using CLOCK_REALTIME) as the timer value is 
interpreted as a delay instead of an "absolute" deadline.

This seems to happen without debugger too (deadlocking VLC after many many 
clock_nanosleep calls here), but I could not find a testcase so far.

Steps to reproduce:
Runs the following program under strace.


#define _GNU_SOURCE
#include <stdio.h>
#include <time.h>
#include <signal.h>
#include <unistd.h>

int main (void)
{
        struct timespec ts;

        if (clock_gettime (CLOCK_MONOTONIC, &ts))
        {
                perror ("Please upgrade your kernel");
                return 1;
        }

        signal (SIGALRM, SIG_IGN);
        alarm (1);

        ts.tv_sec += 5;
        clock_nanosleep (CLOCK_MONOTONIC, TIMER_ABSTIME, &ts, NULL);

        puts ("Your kernel is fine");
        return 0;
}
Comment 1 Thomas Gleixner 2007-02-08 09:40:50 UTC
The kernel code of this has not changed since quite a time. 

Not reproducible here: Kernel 2.6.20, glibc 2.5, FC6

clock_gettime(CLOCK_MONOTONIC, {17630, 666273121}) = 0
rt_sigaction(SIGALRM, {SIG_IGN}, {SIG_DFL}, 8) = 0
alarm(1)                                = 0
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {17635, 666273121}, 0) = ?
ERESTARTNOHAND (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {17635, 666273121}, NULL) = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 4), ...}) = 0

Comment 2 Thomas Gleixner 2007-02-08 09:44:14 UTC
Question: Is this a 32 bit test program on a 64 bit kernel ?
Comment 3 R 2007-02-08 10:02:45 UTC
No. IA32 hardware (and hence software).

The problem vanishes with Debian kernel (2.6.18 + Debian patches) but appears 
with various .config on vanilla 2.6.20... I have not been able to identify what 
options triggers it :( but at least it is present on two different systems with 
different kernel configs!?
Comment 4 Thomas Gleixner 2007-02-08 10:06:41 UTC
Looked deeper. Seems to be a glibc problem.

        /* Absolute timers do not update the rmtp value and restart: */
        if (mode == HRTIMER_ABS)
                return -ERESTARTNOHAND;

clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {17635, 666273121}, 0) = ?
ERESTARTNOHAND (To be restarted)

Can you provide the output of strace please ?
Comment 5 R 2007-02-08 10:16:46 UTC
In the failure case, it looks like this:
...
rt_sigaction(SIGRTMIN, {0xb7e088a0, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0xb7e087c0, [], SA_RESTART|SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
uname({sys="Linux", node="auguste", ...}) = 0
clock_gettime(CLOCK_MONOTONIC, {4074, 735569772}) = 0
rt_sigaction(SIGALRM, {SIG_IGN}, {SIG_DFL}, 8) = 0
alarm(1)                                = 0
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {4079, 735569772}, 0) = ? 
ERESTARTNOHAND (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
clock_nanosleep(CLOCK_MONOTONIC, 0, {4079, 735569772}, 

and I did not feel like waiting one hour for the continuation...
Comment 6 Thomas Gleixner 2007-02-08 10:31:01 UTC
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {4079, 735569772}, 0) = ? 
ERESTARTNOHAND (To be restarted)

The kernel returned ERESTARTNOHAND, which means there is no restart handler
function stored inside the kernel. This is only done for relative sleeps, as the
user space can not keep track, but the absolute expiry time is not changing.

--- SIGALRM (Alarm clock) @ 0 (0) ---
clock_nanosleep(CLOCK_MONOTONIC, 0, {4079, 735569772}, 

--------------------------------^^^
This comes from glibc, not from the kernel.

I'm closing that one.

Comment 7 Andrew Morton 2007-02-08 10:33:01 UTC
Ulrich, we have a possible globc problem here.
Comment 8 Ulrich Drepper 2007-02-08 17:31:48 UTC
The comment about glibc is bogus.  The kernel cannot return ERESTARTNOHAND. 
That's no allowed error code.  If strace reports this it means that strace is
involved in the restart and it is responsible for doing this right.

On the libc side there is nothing special we do.  clock_nanosleep is just an
ordinary system call wrapper.  Specifically, it does not do any restart of
system calls in case they return with an error.
Comment 9 Thomas Gleixner 2007-02-09 06:15:58 UTC
Oops, sorry. The kernel returns -EINTR of course. The translation seems to be
done by strace.

Comment 10 Ulrich Drepper 2007-02-09 06:54:04 UTC
And if the kernel returns -EINTR the userlevel wrapper will return EINTR.  It's
that easy.

But what you're seeing is more likely that under strace the implicit restart is
somehow messec up.  Maybe Roland knows.  It's certainly no problem in glibc.
Comment 11 Andrew Morton 2007-02-09 13:19:18 UTC
Added roland
Comment 12 Roland McGrath 2007-02-09 13:47:01 UTC
strace shows ERESTARTHAND because that's what's in the register at the
syscall-exit tracing stop.  By the time it gets to userland, it handles the
signal and updates the registers to do the restart.

What's suspicious is that the register for the flags argument (%ecx) got reset
to zero somehow.
Comment 13 R 2007-03-01 08:51:32 UTC
This misses the point. The problem is not strace-specific. It's merely way 
easier to trigger with it.
Comment 14 Thomas Gleixner 2007-03-04 08:30:47 UTC
> This misses the point. The problem is not strace-specific. It's merely way
> easier to trigger with it.

Right. Can you please send me your .config ?
Comment 15 Thomas Gleixner 2007-03-04 08:31:36 UTC
Another question:

Which toolchain is used to build the kernel ?
Comment 16 R 2007-03-04 09:15:48 UTC
Created attachment 10611 [details]
Sample kernel config

(from /proc/config.gz)
Comment 17 R 2007-03-04 09:18:08 UTC
gcc 4.1.1 from Debian unstable (which looks more like gcc 4.1.2)
binutils 2.17

Not sure if anything else is of interest
Comment 18 Thomas Gleixner 2007-03-04 09:27:36 UTC
> gcc 4.1.1 from Debian unstable (which looks more like gcc 4.1.2)
> binutils 2.17
> 
> Not sure if anything else is of interest

Can you try a different compiler ?

	tglx


Comment 19 Thomas Gleixner 2007-03-21 08:15:48 UTC
Did you have a chance to test with a different compiler ?
Comment 20 R 2007-03-23 08:23:07 UTC
I eventually will try, but I am too busy at the moment. Not sure if it should 
be switched to will fix later
Comment 21 Natalie Protasevich 2007-07-07 14:33:15 UTC
Any update on this problem? Has it been tested with latest kernels (2.6.22+) since?
Thanks.
Comment 22 Lars Gullik Bjønnes 2007-07-16 08:05:58 UTC
We see the same problem here as well, both with a gentoo dist, and with
our own build dist. One point might be that we see the prob with gdb (strace not
tested.)

On our own dist we have kernel 2.6.21, gcc 4.1.1 and glibc 2.4.
(the kernel is not compiled with dynticks or hires timers)

Any pointers to how we can help solving this problem would be welcome.
Comment 23 Ruud Schramp (via spam catcher) 2007-07-18 00:39:33 UTC
Just CC
Comment 24 Ruud Schramp (via spam catcher) 2007-07-19 14:12:20 UTC
Emulated the bug in qemu.
the problem is repeatable these:

Below 2 short traces. One of the breakpoints + stacktraces of the kernel. One of the strace output inside the emulator.

Some hints of sensible tracepoints and places to break would be appreciated.

Br. Ruud

Breakpoint 3, hrtimer_nanosleep (rqtp=0xc197ffa0, rmtp=0xbfa7a834, mode=HRTIMER_MODE_ABS, clockid=1) at kernel/hrtimer.c:1290
1290    {
(gdb) bt
#0  hrtimer_nanosleep (rqtp=0xc197ffa0, rmtp=0xbfa7a834, mode=HRTIMER_MODE_ABS, clockid=1) at kernel/hrtimer.c:1290
#1  0xc012d496 in sys_clock_nanosleep (which_clock=1, flags=0, rqtp=0xbfa7a83c, rmtp=0xbfa7a834) at kernel/posix-timers.c:952
#2  0xc0102ae8 in syscall_call () at include/asm/bitops.h:246
#3  0x00000001 in ?? ()
#4  0x00000000 in ?? ()
(gdb) cont
Continuing.

Breakpoint 3, hrtimer_nanosleep (rqtp=0xc197ffa0, rmtp=0xbfa7a834, mode=HRTIMER_MODE_REL, clockid=1) at kernel/hrtimer.c:1290
1290    {
(gdb) bt
#0  hrtimer_nanosleep (rqtp=0xc197ffa0, rmtp=0xbfa7a834, mode=HRTIMER_MODE_REL, clockid=1) at kernel/hrtimer.c:1290
#1  0xc012d496 in sys_clock_nanosleep (which_clock=1, flags=1, rqtp=0xbfa7a83c, rmtp=0xbfa7a834) at kernel/posix-timers.c:952
#2  0xc0102ae8 in syscall_call () at include/asm/bitops.h:246
#3  0x00000001 in ?? ()
#4  0x00000001 in ?? ()
#5  0xbfa7a83c in ?? ()
#6  0xbfa7a834 in ?? ()
#7  0xb7f67ff4 in ?? ()
#8  0xbfa7a818 in ?? ()
#9  0xffffffda in ?? ()
#10 0x0000007b in ?? ()
#11 0x0000007b in ?? ()
#12 0x00000000 in ?? ()
(gdb)



[b7f74410] futex(0xb7f594bc, FUTEX_WAKE, 2147483647) = 0
[b7f628f2] clock_gettime(CLOCK_MONOTONIC, {2093, 884323024}) = 0
[b7f74410] rt_sigaction(SIGALRM, {SIG_IGN}, {SIG_DFL}, 8) = 0
[b7f74410] alarm(1)                     = 0
[b7f62d11] clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {2098, 884323024}, 0xbfa7a834) = ? ERESTARTNOHAND (To be restarted)
[b7f62d11] --- SIGALRM (Alarm clock) @ 0 (0) ---
[b7f62d11] clock_nanosleep(CLOCK_MONOTONIC, 0, {2098, 884323024},
Comment 25 Ruud Schramp (via spam catcher) 2007-07-20 13:06:24 UTC
Recompiled only the kernel with gcc-3.4.5. This solves/hides the problem:

brk(0x80ef000)                          = 0x80ef000
clock_gettime(CLOCK_MONOTONIC, {288, 72038510}) = 0
rt_sigaction(SIGALRM, {SIG_IGN}, {SIG_DFL}, 8) = 0
alarm(1)                                = 0
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {293, 72038510}, 0xbff752c4) = ? ERESTARTNOHAND (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {293, 72038510}, {1, 3220656872}) = 0
fstat64(1, {st_mode=S_IFCHR|0622, st_rdev=makedev(4, 65), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B115200 opost isig icanon echo ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f75000
write(1, "Your kernel is fine\n", 20Your kernel is fine
)   = 20
exit_group(0)                           = ?
Process 1075 detached
Comment 26 Ruud Schramp (via spam catcher) 2007-07-27 14:04:40 UTC
Was encountered on 2.6.20.

Validated in 2.6.21.2
Validated in 2.6.22.1
Comment 27 John Gumb 2007-08-10 06:24:17 UTC
seems like kernel parameter CONFIG_CC_OPTIMIZE_FOR_SIZE is related to this.

CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the example program works.

# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set and the example program fails.

kernel: 2.6.22.1
gcc version: 4.1.1
glibc version: 2.4
Comment 28 Ruud Schramp (via spam catcher) 2007-08-10 12:23:51 UTC
Also found a weird thingy. trying to debug the problem with printk I found that having a printf in sys_clock_nanosloop makes the problem disappear.

BTW, the same printf showing the content of "l" does NOT make the problem disappear.

BTW. If there is a need for generated assembly for those situations of this function, I can upload those.




asmlinkage long
sys_clock_nanosleep(const clockid_t which_clock, int flags,
                    const struct timespec __user *rqtp,
                    struct timespec __user *rmtp)
{
        struct timespec t;
        long l;

        if (invalid_clockid(which_clock))
                return -EINVAL;

        if (copy_from_user(&t, rqtp, sizeof (struct timespec)))
                return -EFAULT;

        if (!timespec_valid(&t))
                return -EINVAL;

        l=CLOCK_DISPATCH(which_clock, nsleep,
                              (which_clock, flags, &t, rmtp));
        printk(KERN_ERR "Leave sys_clock_nanosleep %d\n",flags);
        return l;
}
Comment 29 Ruud Schramp (via spam catcher) 2007-08-20 06:31:41 UTC
ok, found the problem upto the instruction now.

The problem is that "flags" is stored in ecx and that ecx is corrupted in the second call of common_nsleep.

sys_clock_nanosleep is called twice because there is no restart handler which results apperantly in this behaviour when restarting after the alarm signal has been triggered.

    return hrtimer_nanosleep(tsave, rmtp, flags & TIMER_ABSTIME ?
                 HRTIMER_MODE_ABS : HRTIMER_MODE_REL,
                 which_clock);

The bittest for TIMER_ABSTIME corrupts the ecx register that holds the flags argument.
Comment 30 Thomas Gleixner 2007-09-10 03:35:56 UTC
Ruud, which compiler are you using ? This seems to be a compiler problem. Which compiler version are you using ?
Comment 31 Ruud Schramp (via spam catcher) 2007-09-10 03:46:32 UTC
Hi Thomas,

I tried several different compilers, most giving the faulty result:

Vanila compiler of Fedora 6 (gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13))
Vanila compiler of Fedora 7 (gcc (GCC) 4.1.2 20070502 (Red Hat 4.1.2-12))

A selfbuild cross compile toolchain:
x86_gcc-3.4.5_glibc-2.3.6_i686-linux  (This is the only one NOT having the bug)
x86_gcc-4.2.1_glibc-2.4_i686-linux
x86_gcc-4.1.1_glibc-2.4_i686-linux
Comment 32 Thomas Gleixner 2007-09-10 04:11:01 UTC
> I tried several different compilers, most giving the faulty result:
> 
> Vanila compiler of Fedora 6 (gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13))
> Vanila compiler of Fedora 7 (gcc (GCC) 4.1.2 20070502 (Red Hat 4.1.2-12))
> 
> A selfbuild cross compile toolchain:
> x86_gcc-3.4.5_glibc-2.3.6_i686-linux  (This is the only one NOT having the
> bug)
> x86_gcc-4.2.1_glibc-2.4_i686-linux
> x86_gcc-4.1.1_glibc-2.4_i686-linux

Just reread your decoding results. The ecx value which is modified is
not the problem. flags is only local in this function and should not
affect the syscall parameters. Can you please disassemble the code
section in question ?

	tglx
Comment 33 Thomas Gleixner 2007-11-14 14:44:37 UTC
Ruud, any updates to this ?
Comment 34 Thomas Gleixner 2007-11-17 09:42:41 UTC
John Gumb (Ruud's coworker) said:

I've not had much time to spend on this but I *believe* the problem is
fixed in 2.6.24 series kernels. Certainly the code there that's related
to this looks quite different.

I close this bug with insufficient data.

John, if you find time to look at it and you still have problems, please reopen.

Thanks,
     tglx
Comment 35 Mathias 2009-04-09 19:36:58 UTC
Hi!

I have the same (or similar) issue with kernel 2.6.29.1 when 
using clock_nanosleep with TIMER_ABSTIME in gdb.
When I interrupt my program by CTRL-C it does not continue
afterwards.
It is somehow related to CONFIG_CC_OPT_SIZE.
When this configuration is set, everything works fine, if it is not set
I have the error situation from above.

My setup:
Debian 5.0 (Lenny), gcc 4.3.2, Intel Core2Quad, vanilla kernel.

The following C file illustrates the setup,
once I press CTRL-C in gdb, it stops fine but does not continue to run properly.
***************** BEGIN ************************
#define _GNU_SOURCE
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <linux/unistd.h>

int main(void)
{
    int i;
    struct timespec ts;
    clock_gettime(CLOCK_TO_USE, &ts);
    for (i=0; i<=1000; i++)
    {
        ts.tv_sec++;      
        clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &ts, NULL);
        printf("%i\n",i);
    }
    return 0;
}
/* Build with
   gcc test_clock_nanosleep.c -g -Wall -lrt
*/
**************** END *******************

Thanks for all feedback on this!

Mathias