Bug 16
Summary: | reproduceable oops in lock_get_status | ||
---|---|---|---|
Product: | File System | Reporter: | Burton Windle (bwindle-kbt) |
Component: | VFS | Assignee: | Matthew Wilcox (matthew) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | khoa |
Priority: | P2 | ||
Hardware: | IA-32 | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | --- | Bisected commit-id: |
Matthew -- since this is in your area of expertise, I took the liberty of making you the owner of this bug. If not, please let me know. Thanks. I have also been able to reproduce it with the C code found in this posting to the LKML: http://marc.theaimsgroup.com/?l=linux-kernel&m=103610606822889&w=2 Basically, I was running the first program (uncomment the sleep so it hangs out for a while), and killing it, then cat'ing /proc/locks. I can now reproduce this is at will with a simple test case. Using the first C program from http://marc.theaimsgroup.com/?l=linux- kernel&m=103610606822889&w=2, uncomment out the ending 'sleep 30', compile it so it is named "1", and run this script will cause an oops on my machine 100% of the time. #!/bin/sh echo "asdf" > /tmp/dmo ./1 & sleep 2 killall 1 rm /tmp/dmo echo "asdf" > /tmp/dmo ./1 & sleep 2 cat /proc/locks killall 1 #next cat will oops cat /proc/locks As of 2.5.54-bk1, this problem still exists, but the oops looks different: Unable to handle kernel NULL pointer dereference at virtual address 00000008 printing eip: c014ac4f *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0060:[<c014ac4f>] Not tainted EFLAGS: 00010286 EIP is at posix_unblock_lock+0x17/0x20c eax: 00000000 ebx: cab2702f ecx: cab27000 edx: cab2702f esi: c134739c edi: 00000000 ebp: 00000400 esp: caafbee0 ds: 007b es: 007b ss: 0068 Process cat (pid: 285, threadinfo=caafa000 task=cb33e760) Stack: c1347d68 c13473a0 c134739c c014af5c cab2702f c134739c 00000002 c0293293 caafa000 00000400 00000400 cab27000 caafbf1c caafbf20 00000002 cab2702f 0000002f c015e4ea cab27000 caafbf7c 00000000 00000400 00000000 00000400 Call Trace: [<c014af5c>] move_lock_status+0x80/0x148 [<c015e4ea>] cmdline_read_proc+0x36/0x80 [<c015c2d8>] proc_match+0xec/0x190 [<c0138935>] do_sync_read+0xa5/0x138 [<c0138be2>] vfs_write+0x2a/0x3c [<c0108a0f>] system_call+0x7/0xb Code: 8b 78 08 8b 44 24 1c 50 8b 44 24 1c 50 68 ac 31 29 c0 53 e8 Erp, nevermind, the oops still looks the same.. ksymall was broken and output the wrong symbols. The oops, as of 2.5.56, still looks like: Unable to handle kernel NULL pointer dereference at virtual address 00000008 printing eip: c014b38f *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0060:[<c014b38f>] Not tainted EFLAGS: 00010282 EIP is at lock_get_status+0x17/0x20c eax: 00000000 ebx: ca7ed02f ecx: ca7ed000 edx: ca7ed02f esi: c136d7ac edi: 00000000 ebp: 00000400 esp: caafdee0 ds: 007b es: 007b ss: 0068 Process cat (pid: 281, threadinfo=caafc000 task=cac90760) Stack: c136d548 c136d7b0 c136d7ac c014b69c ca7ed02f c136d7ac 00000002 c0294313 caafc000 00000400 00000400 ca7ed000 caafdf1c caafdf20 00000002 ca7ed02f 0000002f c015ecfa ca7ed000 caafdf7c 00000000 00000400 00000000 00000400 Call Trace: [<c014b69c>] get_locks_status+0x80/0x148 [<c015ecfa>] locks_read_proc+0x36/0x80 [<c015cac8>] proc_file_read+0xec/0x190 [<c0138ed5>] vfs_read+0xa5/0x138 [<c0139182>] sys_read+0x2a/0x3c [<c01089c7>] syscall_call+0x7/0xb Code: 8b 78 08 8b 44 24 1c 50 8b 44 24 1c 50 68 2c 42 29 c0 53 e8 This issue has been open for 4 months now, and the problem is still 100% reproduceable on the latest kernel, 2.5.63-bk3. Is this problem no longer being worked on? I know at one time there was a patch that fixed this, but the author said it caused NFS locking problems, but that was back in December. PLM #1593 Problem does not happen with the following patch: http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.63/2.5.63- mm1/broken-out/flock-fix.patch patch will be in 2.5.65 |
Please enter Exact Kernel version: 2.5.47, but many many older 2.5 kernels have same problem Distribution: Debian Testing Hardware Environment: single x86 CPU Software Environment: preempt enabled, non-SMP Problem Description: Reliable oops when reading /proc/locks. Unable to handle kernel NULL pointer dereference at virtual address 00000008 c014c08f *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0060:[<c014c08f>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010286 eax: 00000000 ebx: c868f000 ecx: 00000001 edx: c8657f20 esi: c13c07ac edi: 00000000 ebp: 00000400 esp: c8657ee0 ds: 0068 es: 0068 ss: 0068 Stack: c8657f1c c13c07b0 c13c07ac c014c39c c868f000 c13c07ac 00000001 c0285593 c8656000 00000400 00000400 c868f000 c8657f1c c8657f20 00000001 c868f000 00000000 c015f65a c868f000 c8657f7c 00000000 00000400 00000000 00000400 Call Trace: [<c014c39c>] [<c015f65a>] [<c015d378>] [<c0139c61>] [<c0139f32>] [<c01088f3>] Code: 8b 78 08 8b 44 24 1c 50 8b 44 24 1c 50 68 ac 54 28 c0 53 e8 >>EIP; c014c08f <lock_get_status+17/20c> <===== Trace; c014c39c <get_locks_status+80/148> Trace; c015f65a <locks_read_proc+36/80> Trace; c015d378 <proc_file_read+ec/190> Trace; c0139c61 <vfs_read+c1/158> Trace; c0139f32 <sys_read+2a/3c> Trace; c01088f3 <syscall_call+7/b> Code; c014c08f <lock_get_status+17/20c> 00000000 <_EIP>: Code; c014c08f <lock_get_status+17/20c> <===== 0: 8b 78 08 mov 0x8(%eax),%edi <===== Code; c014c092 <lock_get_status+1a/20c> 3: 8b 44 24 1c mov 0x1c(%esp,1),%eax Code; c014c096 <lock_get_status+1e/20c> 7: 50 push %eax Code; c014c097 <lock_get_status+1f/20c> 8: 8b 44 24 1c mov 0x1c(%esp,1),%eax Code; c014c09b <lock_get_status+23/20c> c: 50 push %eax Code; c014c09c <lock_get_status+24/20c> d: 68 ac 54 28 c0 push $0xc02854ac Code; c014c0a1 <lock_get_status+29/20c> 12: 53 push %ebx Code; c014c0a2 <lock_get_status+2a/20c> 13: e8 00 00 00 00 call 18 <_EIP+0x18> c014c0a7 <lock_get_status+2f/20c> Steps to reproduce: Debian Testing, with the ntop package installed. Run the ntop program as '/etc/init.d/ntop start', let it run for a few seconds, then do '/etc/init.d/ntop stop', then cat /proc/locks. It will give a non-fatal oops 100% of the time on two different Debian Testing machines. According to Matthew Wilcox: if (fl->fl_file != NULL) { if (fl->fl_file->f_dentry) { inode = fl->fl_file->f_dentry->d_inode; } else { printk(KERN_EMERG "null dentry at %d\n", id); } } That will avoid the oops, and tell us who managed to set a file lock on a file without a dentry.