Bug 16

Summary: reproduceable oops in lock_get_status
Product: File System Reporter: Burton Windle (bwindle-kbt)
Component: VFSAssignee: Matthew Wilcox (matthew)
Status: CLOSED CODE_FIX    
Severity: normal CC: khoa
Priority: P2    
Hardware: IA-32   
OS: Linux   
Kernel Version: Subsystem:
Regression: --- Bisected commit-id:

Description Burton Windle 2002-11-14 12:28:21 UTC
Please enter
Exact Kernel version: 2.5.47, but many many older 2.5 kernels have same problem
Distribution: Debian Testing
Hardware Environment: single x86 CPU
Software Environment: preempt enabled, non-SMP
Problem Description:

Reliable oops when reading /proc/locks.
Unable to handle kernel NULL pointer dereference at virtual address 00000008
c014c08f
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0060:[<c014c08f>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 00000000   ebx: c868f000   ecx: 00000001   edx: c8657f20
esi: c13c07ac   edi: 00000000   ebp: 00000400   esp: c8657ee0
ds: 0068   es: 0068   ss: 0068
Stack: c8657f1c c13c07b0 c13c07ac c014c39c c868f000 c13c07ac 00000001 c0285593
       c8656000 00000400 00000400 c868f000 c8657f1c c8657f20 00000001 c868f000
       00000000 c015f65a c868f000 c8657f7c 00000000 00000400 00000000 00000400
Call Trace: [<c014c39c>]  [<c015f65a>]  [<c015d378>]  [<c0139c61>]  
[<c0139f32>]  [<c01088f3>]
Code: 8b 78 08 8b 44 24 1c 50 8b 44 24 1c 50 68 ac 54 28 c0 53 e8


>>EIP; c014c08f <lock_get_status+17/20c>   <=====

Trace; c014c39c <get_locks_status+80/148>
Trace; c015f65a <locks_read_proc+36/80>
Trace; c015d378 <proc_file_read+ec/190>
Trace; c0139c61 <vfs_read+c1/158>
Trace; c0139f32 <sys_read+2a/3c>
Trace; c01088f3 <syscall_call+7/b>

Code;  c014c08f <lock_get_status+17/20c>
00000000 <_EIP>:
Code;  c014c08f <lock_get_status+17/20c>   <=====
   0:   8b 78 08                  mov    0x8(%eax),%edi   <=====
Code;  c014c092 <lock_get_status+1a/20c>
   3:   8b 44 24 1c               mov    0x1c(%esp,1),%eax
Code;  c014c096 <lock_get_status+1e/20c>
   7:   50                        push   %eax
Code;  c014c097 <lock_get_status+1f/20c>
   8:   8b 44 24 1c               mov    0x1c(%esp,1),%eax
Code;  c014c09b <lock_get_status+23/20c>
   c:   50                        push   %eax
Code;  c014c09c <lock_get_status+24/20c>
   d:   68 ac 54 28 c0            push   $0xc02854ac
Code;  c014c0a1 <lock_get_status+29/20c>
  12:   53                        push   %ebx
Code;  c014c0a2 <lock_get_status+2a/20c>
  13:   e8 00 00 00 00            call   18 <_EIP+0x18> c014c0a7 
<lock_get_status+2f/20c>




Steps to reproduce:
Debian Testing, with the ntop package installed. Run the ntop program 
as '/etc/init.d/ntop start', let it run for a few seconds, then 
do '/etc/init.d/ntop stop', then cat /proc/locks. It will give a non-fatal oops 
100% of the time on two different Debian Testing machines.

According to Matthew Wilcox:
        if (fl->fl_file != NULL) {
                if (fl->fl_file->f_dentry) {
                        inode = fl->fl_file->f_dentry->d_inode;
                } else {
                        printk(KERN_EMERG "null dentry at %d\n", id);
                }
        }

That will avoid the oops, and tell us who managed to set a file lock on
a file without a dentry.
Comment 1 Khoa Huynh 2002-11-19 09:16:10 UTC
Matthew -- since this is in your area of expertise, I took the liberty of
making you the owner of this bug.  If not, please let me know.  Thanks.
Comment 2 Burton Windle 2002-11-19 14:57:23 UTC
I have also been able to reproduce it with the C code found in this posting to 
the LKML:
http://marc.theaimsgroup.com/?l=linux-kernel&m=103610606822889&w=2

Basically, I was running the first program (uncomment the sleep so it hangs out 
for a while), and killing it, then cat'ing /proc/locks.
Comment 3 Burton Windle 2002-11-20 11:37:30 UTC
I can now reproduce this is at will with a simple test case. Using the first C 
program from http://marc.theaimsgroup.com/?l=linux-
kernel&m=103610606822889&w=2, uncomment out the ending 'sleep 30', compile it 
so it is named "1", and run this script will cause an oops on my machine 100% 
of the time.

#!/bin/sh
echo "asdf" > /tmp/dmo
./1 &
sleep 2
killall 1
rm /tmp/dmo
echo "asdf" > /tmp/dmo
./1 &
sleep 2
cat /proc/locks
killall 1
#next cat will oops
cat /proc/locks
Comment 4 Burton Windle 2002-12-19 13:32:08 UTC
See also:
http://marc.theaimsgroup.com/?l=linux-kernel&m=103825968004879&w=2
Comment 5 Burton Windle 2003-01-03 07:42:56 UTC
As of 2.5.54-bk1, this problem still exists, but the oops looks different:
Unable to handle kernel NULL pointer dereference at virtual address 00000008
 printing eip:
c014ac4f
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0060:[<c014ac4f>]    Not tainted
EFLAGS: 00010286
EIP is at posix_unblock_lock+0x17/0x20c
eax: 00000000   ebx: cab2702f   ecx: cab27000   edx: cab2702f
esi: c134739c   edi: 00000000   ebp: 00000400   esp: caafbee0
ds: 007b   es: 007b   ss: 0068
Process cat (pid: 285, threadinfo=caafa000 task=cb33e760)
Stack: c1347d68 c13473a0 c134739c c014af5c cab2702f c134739c 00000002 c0293293
       caafa000 00000400 00000400 cab27000 caafbf1c caafbf20 00000002 cab2702f
       0000002f c015e4ea cab27000 caafbf7c 00000000 00000400 00000000 00000400
Call Trace:
 [<c014af5c>] move_lock_status+0x80/0x148
 [<c015e4ea>] cmdline_read_proc+0x36/0x80
 [<c015c2d8>] proc_match+0xec/0x190
 [<c0138935>] do_sync_read+0xa5/0x138
 [<c0138be2>] vfs_write+0x2a/0x3c
 [<c0108a0f>] system_call+0x7/0xb

Code: 8b 78 08 8b 44 24 1c 50 8b 44 24 1c 50 68 ac 31 29 c0 53 e8
Comment 6 Burton Windle 2003-01-10 14:26:05 UTC
Erp, nevermind, the oops still looks the same.. ksymall was broken and output 
the wrong symbols. The oops, as of 2.5.56, still looks like:

Unable to handle kernel NULL pointer dereference at virtual address 00000008
 printing eip:
c014b38f
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0060:[<c014b38f>]    Not tainted
EFLAGS: 00010282
EIP is at lock_get_status+0x17/0x20c
eax: 00000000   ebx: ca7ed02f   ecx: ca7ed000   edx: ca7ed02f
esi: c136d7ac   edi: 00000000   ebp: 00000400   esp: caafdee0
ds: 007b   es: 007b   ss: 0068
Process cat (pid: 281, threadinfo=caafc000 task=cac90760)
Stack: c136d548 c136d7b0 c136d7ac c014b69c ca7ed02f c136d7ac 00000002 c0294313
       caafc000 00000400 00000400 ca7ed000 caafdf1c caafdf20 00000002 ca7ed02f
       0000002f c015ecfa ca7ed000 caafdf7c 00000000 00000400 00000000 00000400
Call Trace:
 [<c014b69c>] get_locks_status+0x80/0x148
 [<c015ecfa>] locks_read_proc+0x36/0x80
 [<c015cac8>] proc_file_read+0xec/0x190
 [<c0138ed5>] vfs_read+0xa5/0x138
 [<c0139182>] sys_read+0x2a/0x3c
 [<c01089c7>] syscall_call+0x7/0xb

Code: 8b 78 08 8b 44 24 1c 50 8b 44 24 1c 50 68 2c 42 29 c0 53 e8
Comment 7 Burton Windle 2003-02-27 07:25:55 UTC
This issue has been open for 4 months now, and the problem is still 100% 
reproduceable on the latest kernel, 2.5.63-bk3. Is this problem no longer being 
worked on?  I know at one time there was a patch that fixed this, but the 
author said it caused NFS locking problems, but that was back in December.
Comment 8 Matthew Wilcox 2003-02-27 08:18:27 UTC
PLM #1593
Comment 9 Burton Windle 2003-02-28 08:26:44 UTC
Problem does not happen with the following patch:

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.63/2.5.63-
mm1/broken-out/flock-fix.patch
Comment 10 Matthew Wilcox 2003-03-07 20:13:04 UTC
patch will be in 2.5.65