I think I have found a bug. It is fairly harmless but non-the-less, not the expected behavior. I have only done tests on x86_64. First, there is one assumption I am making, so if this is incorrect, then this bug is invalid. From my experience, doing a ptrace with PTRACE_SINGLESTEP sets a flag in rflags which causes the processor to execute a single instruction, **the flag is then cleared**, finally a trap event is delivered. So far it has been my experience, that the flag is supposed to be cleared, so I am running on that assumption. Given a program which looks like this (any signal handling code will do, but it's easier with an example). #include <stdio.h> #include <signal.h> #include <unistd.h> void handler(int n) { puts("alarm"); alarm(5); } int main() { signal(SIGALRM, handler); alarm(5); while(1) {} return 0; } If we run this with ptrace, and do a PTRACE_CONT at the start, we will get a stop event inside the while(1) {}. If we then do a PTRACE_SINGLESTEP passing the signal to the debugee, we end up stopped at the first instruction of the signal handler. At this point, all is working as expected. If we then do a PTRACE_SINGLESTEP, the debugee single steps as expected, **but the single step flag remains set in the eflags!**. if we follow this up with a PTRACE_CONT, we still get single step events. This only happens if you single step the very first instruction of the signal handler. If we set a breakpoint in the middle of the signal handler and step from there, the flag is unset as expected. I originally encountered this in my personal debugger, but here is a log of a GDB session demonstrating the issue as well: $ gdb ./test warning: Can not parse XML syscalls information; XML support was disabled at compile time. GNU gdb (Gentoo 7.0.1 p1) 7.0.1 Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". For bug reporting instructions, please see: <http://bugs.gentoo.org/>... Reading symbols from /####/test...(no debugging symbols found)...done. (gdb) handle SIGALRM stop Signal Stop Print Pass to program Description SIGALRM Yes Yes Yes Alarm clock (gdb) run Starting program: /####/test Program received signal SIGALRM, Alarm clock. 0x00007ffff7b216e0 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:82 82 ../sysdeps/unix/syscall-template.S: No such file or directory. in ../sysdeps/unix/syscall-template.S Current language: auto The current source language is "auto; currently asm". (gdb) stepi 0x0000000000400614 in handler () (gdb) disassemble Dump of assembler code for function handler: 0x0000000000400614 <handler+0>: push %rbp 0x0000000000400615 <handler+1>: mov %rsp,%rbp 0x0000000000400618 <handler+4>: sub $0x10,%rsp 0x000000000040061c <handler+8>: mov %edi,-0x4(%rbp) 0x000000000040061f <handler+11>: mov $0x40074c,%edi 0x0000000000400624 <handler+16>: callq 0x4004e0 <puts@plt> 0x0000000000400629 <handler+21>: mov $0x5,%edi 0x000000000040062e <handler+26>: callq 0x400520 <alarm@plt> 0x0000000000400633 <handler+31>: leaveq 0x0000000000400634 <handler+32>: retq End of assembler dump. (gdb) info registers rax 0x0 0 rbx 0x7fffffffda60 140737488345696 rcx 0xffffffffffffffff -1 rdx 0x7fffffffd4c0 140737488344256 rsi 0x7fffffffd5f0 140737488344560 rdi 0xe 14 rbp 0xffffffff 0xffffffff rsp 0x7fffffffd4b8 0x7fffffffd4b8 r8 0x7fffffffd940 140737488345408 r9 0x7ffff7decb20 140737351961376 r10 0x8 8 r11 0x246 582 r12 0x7fffffffd9e0 140737488345568 r13 0x7fffffffdc00 140737488346112 r14 0x0 0 r15 0x0 0 rip 0x400614 0x400614 <handler> eflags 0x246 [ PF ZF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 fctrl 0x37f 895 fstat 0x0 0 ftag 0xffff 65535 fiseg 0x0 0 fioff 0x0 0 foseg 0x0 0 fooff 0x0 0 fop 0x0 0 mxcsr 0x1f80 [ IM DM ZM OM UM PM ] (gdb) # as you can see TF is not set here (gdb) stepi 0x0000000000400615 in handler () (gdb) info registers rax 0x0 0 rbx 0x7fffffffda60 140737488345696 rcx 0xffffffffffffffff -1 rdx 0x7fffffffd4c0 140737488344256 rsi 0x7fffffffd5f0 140737488344560 rdi 0xe 14 rbp 0xffffffff 0xffffffff rsp 0x7fffffffd4b0 0x7fffffffd4b0 r8 0x7fffffffd940 140737488345408 r9 0x7ffff7decb20 140737351961376 r10 0x8 8 r11 0x246 582 r12 0x7fffffffd9e0 140737488345568 r13 0x7fffffffdc00 140737488346112 r14 0x0 0 r15 0x0 0 rip 0x400615 0x400615 <handler+1> eflags 0x346 [ PF ZF TF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 fctrl 0x37f 895 fstat 0x0 0 ftag 0xffff 65535 fiseg 0x0 0 fioff 0x0 0 foseg 0x0 0 fooff 0x0 0 fop 0x0 0 mxcsr 0x1f80 [ IM DM ZM OM UM PM ] (gdb) # suddenly TF is set, breaking continue (gdb) cont Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0x0000000000400618 in handler () (gdb) # it should run until the next SIGARLM at the same place as the first, but we got a single step event instead (gdb) quit
Agreed, this looks like a bug to me. Perhaps TIF_FORCED_TF logic. I'll try to investigate on Monday. Thanks Evan!
Created attachment 26587 [details] [patch] the quick hack for illustration This patch fixes the problem, but I doubt it is really right. I'll try to think more. So: - the tracee reports a signal SIG - the tracer does ptrace(SINGLESTEP, SIG), this approves the signal and also sets TF - the tracee dequeues the signal, changes its IP to sig_handler(). then it notices TIF_SINGLESTEP and notifies the tracer without return to user-space _and_ without clearing TF or TIF_SINGLESTEP - the tracer does ptrace(SINGLESTEP) again, but now enable_single_step() looses TIF_FORCED_TF. - the tracer does ptrace(CONT), but user_disable_single_step() doesn't clear TF since TIF_FORCED_TF is not set - the tracee returns to user-space with X86_EFLAGS_TF in eflags I am not sure what would be the "right" fix. I read enable_single_step() billion times, and I still can't say I understand the comments/logic. But perhaps this patch makes sense anyway?
Damn, I was in a hurry, the description is not accurate... (In reply to comment #3) > > then it notices TIF_SINGLESTEP and notifies the tracer without > return to user-space _and_ without clearing TF or TIF_SINGLESTEP ... _and_ without clearing TIF_SINGLESTEP, but TF is cleared (right before setup_rt_frame()) > - the tracer does ptrace(SINGLESTEP) again, but now enable_single_step() > looses TIF_FORCED_TF. and that is why enable_single_step() doesn't set FORCED. > But perhaps this patch makes sense anyway? As for x86, we can just change handle_signal to use test_and_clear(TIF_SINGLESTEP) as an argument to the tracehook, but then we probably need fixes for other machines. Roland, what do you think?
Oleg, please get an appropriate test case for this into the ptrace-tests suite. That change might be the right one, but we should discuss it more in email, and look at the situation on other machines.
Why is this marked OBSOLETE? Has this been fixed in any version? I still reproduce it in 3.16.0-43-generic from Kubuntu 14.04.
> --- Comment #6 from Ruslan <b7.10110111@gmail.com> --- > Why is this marked OBSOLETE? Has this been fixed in any version? I still > reproduce it in 3.16.0-43-generic from Kubuntu 14.04. This was hopefully fixed by https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=fd0f86b66425bd8c6af8985881e82b28c30fd450 Oleg.