Most recent kernel where this bug did *NOT* occur: not known Distribution: Debian Unstable Hardware Environment: PC both SMT and non-SMT, with/without preemption Software Environment: glibc 2.3 or 2.5 Problem Description: Under some circumstances TIMER_ABSTIME seems to be lost from the clock_nanosleep() system call. One such case is when a straceing the process and triggering an ignored signal during clock_nanosleep. When clock_nanosleep is restarted, flags changes to 0 and the process hence starts waiting for a VERY long time (~35 years if using CLOCK_REALTIME) as the timer value is interpreted as a delay instead of an "absolute" deadline. This seems to happen without debugger too (deadlocking VLC after many many clock_nanosleep calls here), but I could not find a testcase so far. Steps to reproduce: Runs the following program under strace. #define _GNU_SOURCE #include <stdio.h> #include <time.h> #include <signal.h> #include <unistd.h> int main (void) { struct timespec ts; if (clock_gettime (CLOCK_MONOTONIC, &ts)) { perror ("Please upgrade your kernel"); return 1; } signal (SIGALRM, SIG_IGN); alarm (1); ts.tv_sec += 5; clock_nanosleep (CLOCK_MONOTONIC, TIMER_ABSTIME, &ts, NULL); puts ("Your kernel is fine"); return 0; }
The kernel code of this has not changed since quite a time. Not reproducible here: Kernel 2.6.20, glibc 2.5, FC6 clock_gettime(CLOCK_MONOTONIC, {17630, 666273121}) = 0 rt_sigaction(SIGALRM, {SIG_IGN}, {SIG_DFL}, 8) = 0 alarm(1) = 0 clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {17635, 666273121}, 0) = ? ERESTARTNOHAND (To be restarted) --- SIGALRM (Alarm clock) @ 0 (0) --- clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {17635, 666273121}, NULL) = 0 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 4), ...}) = 0
Question: Is this a 32 bit test program on a 64 bit kernel ?
No. IA32 hardware (and hence software). The problem vanishes with Debian kernel (2.6.18 + Debian patches) but appears with various .config on vanilla 2.6.20... I have not been able to identify what options triggers it :( but at least it is present on two different systems with different kernel configs!?
Looked deeper. Seems to be a glibc problem. /* Absolute timers do not update the rmtp value and restart: */ if (mode == HRTIMER_ABS) return -ERESTARTNOHAND; clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {17635, 666273121}, 0) = ? ERESTARTNOHAND (To be restarted) Can you provide the output of strace please ?
In the failure case, it looks like this: ... rt_sigaction(SIGRTMIN, {0xb7e088a0, [], SA_SIGINFO}, NULL, 8) = 0 rt_sigaction(SIGRT_1, {0xb7e087c0, [], SA_RESTART|SA_SIGINFO}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0 uname({sys="Linux", node="auguste", ...}) = 0 clock_gettime(CLOCK_MONOTONIC, {4074, 735569772}) = 0 rt_sigaction(SIGALRM, {SIG_IGN}, {SIG_DFL}, 8) = 0 alarm(1) = 0 clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {4079, 735569772}, 0) = ? ERESTARTNOHAND (To be restarted) --- SIGALRM (Alarm clock) @ 0 (0) --- clock_nanosleep(CLOCK_MONOTONIC, 0, {4079, 735569772}, and I did not feel like waiting one hour for the continuation...
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {4079, 735569772}, 0) = ? ERESTARTNOHAND (To be restarted) The kernel returned ERESTARTNOHAND, which means there is no restart handler function stored inside the kernel. This is only done for relative sleeps, as the user space can not keep track, but the absolute expiry time is not changing. --- SIGALRM (Alarm clock) @ 0 (0) --- clock_nanosleep(CLOCK_MONOTONIC, 0, {4079, 735569772}, --------------------------------^^^ This comes from glibc, not from the kernel. I'm closing that one.
Ulrich, we have a possible globc problem here.
The comment about glibc is bogus. The kernel cannot return ERESTARTNOHAND. That's no allowed error code. If strace reports this it means that strace is involved in the restart and it is responsible for doing this right. On the libc side there is nothing special we do. clock_nanosleep is just an ordinary system call wrapper. Specifically, it does not do any restart of system calls in case they return with an error.
Oops, sorry. The kernel returns -EINTR of course. The translation seems to be done by strace.
And if the kernel returns -EINTR the userlevel wrapper will return EINTR. It's that easy. But what you're seeing is more likely that under strace the implicit restart is somehow messec up. Maybe Roland knows. It's certainly no problem in glibc.
Added roland
strace shows ERESTARTHAND because that's what's in the register at the syscall-exit tracing stop. By the time it gets to userland, it handles the signal and updates the registers to do the restart. What's suspicious is that the register for the flags argument (%ecx) got reset to zero somehow.
This misses the point. The problem is not strace-specific. It's merely way easier to trigger with it.
> This misses the point. The problem is not strace-specific. It's merely way > easier to trigger with it. Right. Can you please send me your .config ?
Another question: Which toolchain is used to build the kernel ?
Created attachment 10611 [details] Sample kernel config (from /proc/config.gz)
gcc 4.1.1 from Debian unstable (which looks more like gcc 4.1.2) binutils 2.17 Not sure if anything else is of interest
> gcc 4.1.1 from Debian unstable (which looks more like gcc 4.1.2) > binutils 2.17 > > Not sure if anything else is of interest Can you try a different compiler ? tglx
Did you have a chance to test with a different compiler ?
I eventually will try, but I am too busy at the moment. Not sure if it should be switched to will fix later
Any update on this problem? Has it been tested with latest kernels (2.6.22+) since? Thanks.
We see the same problem here as well, both with a gentoo dist, and with our own build dist. One point might be that we see the prob with gdb (strace not tested.) On our own dist we have kernel 2.6.21, gcc 4.1.1 and glibc 2.4. (the kernel is not compiled with dynticks or hires timers) Any pointers to how we can help solving this problem would be welcome.
Just CC
Emulated the bug in qemu. the problem is repeatable these: Below 2 short traces. One of the breakpoints + stacktraces of the kernel. One of the strace output inside the emulator. Some hints of sensible tracepoints and places to break would be appreciated. Br. Ruud Breakpoint 3, hrtimer_nanosleep (rqtp=0xc197ffa0, rmtp=0xbfa7a834, mode=HRTIMER_MODE_ABS, clockid=1) at kernel/hrtimer.c:1290 1290 { (gdb) bt #0 hrtimer_nanosleep (rqtp=0xc197ffa0, rmtp=0xbfa7a834, mode=HRTIMER_MODE_ABS, clockid=1) at kernel/hrtimer.c:1290 #1 0xc012d496 in sys_clock_nanosleep (which_clock=1, flags=0, rqtp=0xbfa7a83c, rmtp=0xbfa7a834) at kernel/posix-timers.c:952 #2 0xc0102ae8 in syscall_call () at include/asm/bitops.h:246 #3 0x00000001 in ?? () #4 0x00000000 in ?? () (gdb) cont Continuing. Breakpoint 3, hrtimer_nanosleep (rqtp=0xc197ffa0, rmtp=0xbfa7a834, mode=HRTIMER_MODE_REL, clockid=1) at kernel/hrtimer.c:1290 1290 { (gdb) bt #0 hrtimer_nanosleep (rqtp=0xc197ffa0, rmtp=0xbfa7a834, mode=HRTIMER_MODE_REL, clockid=1) at kernel/hrtimer.c:1290 #1 0xc012d496 in sys_clock_nanosleep (which_clock=1, flags=1, rqtp=0xbfa7a83c, rmtp=0xbfa7a834) at kernel/posix-timers.c:952 #2 0xc0102ae8 in syscall_call () at include/asm/bitops.h:246 #3 0x00000001 in ?? () #4 0x00000001 in ?? () #5 0xbfa7a83c in ?? () #6 0xbfa7a834 in ?? () #7 0xb7f67ff4 in ?? () #8 0xbfa7a818 in ?? () #9 0xffffffda in ?? () #10 0x0000007b in ?? () #11 0x0000007b in ?? () #12 0x00000000 in ?? () (gdb) [b7f74410] futex(0xb7f594bc, FUTEX_WAKE, 2147483647) = 0 [b7f628f2] clock_gettime(CLOCK_MONOTONIC, {2093, 884323024}) = 0 [b7f74410] rt_sigaction(SIGALRM, {SIG_IGN}, {SIG_DFL}, 8) = 0 [b7f74410] alarm(1) = 0 [b7f62d11] clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {2098, 884323024}, 0xbfa7a834) = ? ERESTARTNOHAND (To be restarted) [b7f62d11] --- SIGALRM (Alarm clock) @ 0 (0) --- [b7f62d11] clock_nanosleep(CLOCK_MONOTONIC, 0, {2098, 884323024},
Recompiled only the kernel with gcc-3.4.5. This solves/hides the problem: brk(0x80ef000) = 0x80ef000 clock_gettime(CLOCK_MONOTONIC, {288, 72038510}) = 0 rt_sigaction(SIGALRM, {SIG_IGN}, {SIG_DFL}, 8) = 0 alarm(1) = 0 clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {293, 72038510}, 0xbff752c4) = ? ERESTARTNOHAND (To be restarted) --- SIGALRM (Alarm clock) @ 0 (0) --- clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {293, 72038510}, {1, 3220656872}) = 0 fstat64(1, {st_mode=S_IFCHR|0622, st_rdev=makedev(4, 65), ...}) = 0 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B115200 opost isig icanon echo ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f75000 write(1, "Your kernel is fine\n", 20Your kernel is fine ) = 20 exit_group(0) = ? Process 1075 detached
Was encountered on 2.6.20. Validated in 2.6.21.2 Validated in 2.6.22.1
seems like kernel parameter CONFIG_CC_OPTIMIZE_FOR_SIZE is related to this. CONFIG_CC_OPTIMIZE_FOR_SIZE=y and the example program works. # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set and the example program fails. kernel: 2.6.22.1 gcc version: 4.1.1 glibc version: 2.4
Also found a weird thingy. trying to debug the problem with printk I found that having a printf in sys_clock_nanosloop makes the problem disappear. BTW, the same printf showing the content of "l" does NOT make the problem disappear. BTW. If there is a need for generated assembly for those situations of this function, I can upload those. asmlinkage long sys_clock_nanosleep(const clockid_t which_clock, int flags, const struct timespec __user *rqtp, struct timespec __user *rmtp) { struct timespec t; long l; if (invalid_clockid(which_clock)) return -EINVAL; if (copy_from_user(&t, rqtp, sizeof (struct timespec))) return -EFAULT; if (!timespec_valid(&t)) return -EINVAL; l=CLOCK_DISPATCH(which_clock, nsleep, (which_clock, flags, &t, rmtp)); printk(KERN_ERR "Leave sys_clock_nanosleep %d\n",flags); return l; }
ok, found the problem upto the instruction now. The problem is that "flags" is stored in ecx and that ecx is corrupted in the second call of common_nsleep. sys_clock_nanosleep is called twice because there is no restart handler which results apperantly in this behaviour when restarting after the alarm signal has been triggered. return hrtimer_nanosleep(tsave, rmtp, flags & TIMER_ABSTIME ? HRTIMER_MODE_ABS : HRTIMER_MODE_REL, which_clock); The bittest for TIMER_ABSTIME corrupts the ecx register that holds the flags argument.
Ruud, which compiler are you using ? This seems to be a compiler problem. Which compiler version are you using ?
Hi Thomas, I tried several different compilers, most giving the faulty result: Vanila compiler of Fedora 6 (gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)) Vanila compiler of Fedora 7 (gcc (GCC) 4.1.2 20070502 (Red Hat 4.1.2-12)) A selfbuild cross compile toolchain: x86_gcc-3.4.5_glibc-2.3.6_i686-linux (This is the only one NOT having the bug) x86_gcc-4.2.1_glibc-2.4_i686-linux x86_gcc-4.1.1_glibc-2.4_i686-linux
> I tried several different compilers, most giving the faulty result: > > Vanila compiler of Fedora 6 (gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)) > Vanila compiler of Fedora 7 (gcc (GCC) 4.1.2 20070502 (Red Hat 4.1.2-12)) > > A selfbuild cross compile toolchain: > x86_gcc-3.4.5_glibc-2.3.6_i686-linux (This is the only one NOT having the > bug) > x86_gcc-4.2.1_glibc-2.4_i686-linux > x86_gcc-4.1.1_glibc-2.4_i686-linux Just reread your decoding results. The ecx value which is modified is not the problem. flags is only local in this function and should not affect the syscall parameters. Can you please disassemble the code section in question ? tglx
Ruud, any updates to this ?
John Gumb (Ruud's coworker) said: I've not had much time to spend on this but I *believe* the problem is fixed in 2.6.24 series kernels. Certainly the code there that's related to this looks quite different. I close this bug with insufficient data. John, if you find time to look at it and you still have problems, please reopen. Thanks, tglx
Hi! I have the same (or similar) issue with kernel 2.6.29.1 when using clock_nanosleep with TIMER_ABSTIME in gdb. When I interrupt my program by CTRL-C it does not continue afterwards. It is somehow related to CONFIG_CC_OPT_SIZE. When this configuration is set, everything works fine, if it is not set I have the error situation from above. My setup: Debian 5.0 (Lenny), gcc 4.3.2, Intel Core2Quad, vanilla kernel. The following C file illustrates the setup, once I press CTRL-C in gdb, it stops fine but does not continue to run properly. ***************** BEGIN ************************ #define _GNU_SOURCE #include <stdio.h> #include <time.h> #include <unistd.h> #include <linux/unistd.h> int main(void) { int i; struct timespec ts; clock_gettime(CLOCK_TO_USE, &ts); for (i=0; i<=1000; i++) { ts.tv_sec++; clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &ts, NULL); printf("%i\n",i); } return 0; } /* Build with gcc test_clock_nanosleep.c -g -Wall -lrt */ **************** END ******************* Thanks for all feedback on this! Mathias