Most recent kernel where this bug did not occur: unknown Distribution: gentoo Hardware Environment: dell latitude d800, 2Ghz Penitum-M, 2GB ram Problem Description: it is possible to cause the kernel to reference a null pointer with simple ptrace code. the OOPS seems to indicate that it occurs in arch_ptrace. I noticed when working on my debugger, that if i set the childs CS to certain invalid values, instead of a segfault (i believe this is the correct response to invalid cs). Not only would the ptraced app crash, but so would the debugger. After further investigation, I realized that the debugger crash was the kernel terminating the ptracer due to a null pointer in one of the ptrace functions. Steps to reproduce: run the following program: ----snip-test.c---- #include <sys/ptrace.h> #include <sys/types.h> #include <sys/user.h> #include <sys/wait.h> #include <unistd.h> #include <stdio.h> #include <string.h> int main(void) { pid_t pid = fork(); struct user_regs_struct regs; int status; int retval; switch(pid) { case 0: /* child */ while(1) { sleep(1); } break; case -1: /* error */ break; default: /* parent */ ptrace(PTRACE_ATTACH, pid, 0, 0); retval = waitpid(pid, &status, 0); printf("[waitpid] %d\n", retval); ptrace(PTRACE_GETREGS, pid, 0, ®s); regs.xcs = 0xff; ptrace(PTRACE_SETREGS, pid, 0, ®s); retval = ptrace(PTRACE_SINGLESTEP, pid, 0, 0); /* DOESN'T GET HERE */ printf("[ptrace] %d\n", retval); retval = waitpid(pid, &status, 0); printf("[waitpid] %d\n", retval); ptrace(PTRACE_SINGLESTEP, pid, 0, 0); break; } return 0; } ----snip-test.c----
here's the OOPS: BUG: unable to handle kernel NULL pointer dereference at virtual address 000000fc printing eip: c0104b84 *pde = 00000000 Oops: 0000 [#1] Modules linked in: rtc 3c59x mii ide_cd cdrom yenta_socket rsrc_nonstatic pcmcia_core psmouse uhci_hcd usbcore evdev pcspkr CPU: 0 EIP: 0060:[<c0104b84>] Not tainted VLI EFLAGS: 00010202 (2.6.20-gentoo-r8 #1) EIP is at arch_ptrace+0x5ea/0x828 eax: 000000f8 ebx: f6dfd580 ecx: 000000ff edx: f6f1a030 esi: 000000ff edi: b7fad410 ebp: 00000000 esp: c2209f68 ds: 007b es: 007b ss: 0068 Process test (pid: 28205, ti=c2208000 task=f6dc8550 task.ti=c2208000) Stack: c020850a f6c9f380 f6f1a030 00000000 0000000b 00000001 00000046 c01163b5 f6f1a030 00000000 00000009 00000000 c0119033 00000000 00000000 00000009 00000000 b7fa3ff4 c2208000 c01028be 00000009 00006e2e 00000000 00000000 Call Trace: [<c020850a>] net_tx_action+0x3b/0xba [<c01163b5>] __do_softirq+0x3e/0x83 [<c0119033>] sys_ptrace+0x63/0x89 [<c01028be>] sysenter_past_esp+0x5f/0x85 [<c0240033>] sysctl_tcp_congestion_control+0x7b/0x83 ======================= Code: 00 ff 8b 5c 01 00 00 79 0b 8d 83 5c 01 00 00 e8 d7 81 14 00 8b 54 24 08 8b 9a 80 00 00 00 89 f0 25 f8 ff 00 00 03 83 6c 01 00 00 <8b> 48 04 89 fa 81 e2 ff ff 00 00 f7 c1 00 00 40 00 0f 44 fa 0f EIP: [<c0104b84>] arch_ptrace+0x5ea/0x828 SS:ESP 0068:c2209f68
The following commit fixes this bug: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=29eb51101c02df517ca64ec472d7501127ad1da8 Indeed, the test case does not cause any Oopses when using the 2.6.23 kernel. Therefore, this bug can be closed now.