Bug 68991

Summary: fs/proc: BUG race on accessing /proc under high load
Product: Process Management Reporter: wiebittewas
Component: OtherAssignee: process_other
Status: NEW ---    
Severity: high CC: adobriyan, alan, wiebittewas, zsalab
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.10.27 Subsystem:
Regression: No Bisected commit-id:
Attachments: oops-msg and objdump of relevant code
kernel-file, where the error oocures
configuration for attached kernel-file
the script for listing the entries
the vmlinux-file

Description wiebittewas 2014-01-19 17:43:03 UTC
Created attachment 122621 [details]
oops-msg and objdump of relevant code

when doing a find /proc -type f -exec sha512sum {} \; x-hundred times parallel, this can lead in a BUG within d_path.

the tainting is due to a previous msh while booting, but it's a fresh kernel:
[    6.270947] mtrr: your BIOS has configured an incorrect mask, fixing it.

as I can see in the oops and the objdump, the problems occurs in fs/dcache.c:d_path(), testing for existant of a function and then calling it:

if (path->dentry->d_op && path->dentry->d_op->d_dname)
   return path->dentry->d_op->d_dname(path->dentry, buf, buflen);

looking at fs/proc/base.c:proc_pid_readlink() "path" is initialized in one of the getlink-functions within proc.c: proc_cwd_link() or proc_root_link()

in both functions first the task-structure is get by get_roc_task() and if 's not NULL, then locked and then the wanted fs_struct is acquired/locked.

now I don't have enough experience, but maybe beetween getting the task-struct and locking it, the task may be already invalidated because it exited, so the lock should be done in get_proc_task() or at least it should be checked after locking, if the task is still alive....???

even this problem shouldn't occure very often, because the find/shasum-calls are not very useful, I've found this bug due to a complete freeze of another computer running X and many tasks with a similar (but really tainted) kernel, but maybe these problems might be connected, because the hardware, where these tests are now done, are booted to console only with nearly nothing else running, so the environment/complexity of both are not comparable.
Comment 1 wiebittewas 2014-01-19 23:49:19 UTC
another test give the result, that it doesn't seem to help, to check the task after locking it. the only possibility I found is to look at PF_EXITING in taskflags, which is set in exit_signals before exit_fs in do_exit. 

starting the find/shasum leads after similar time in the BUG-msg, without showing the temporarily added printk if PF_EXITING is set. 

Ok, from now I've no further idea, why the path is filled with bad data, especially because I didn't find any check for task-locks in do_exit.
Comment 2 Alexey Dobriyan 2014-02-14 12:11:57 UTC
please, attach .config and "vmlinux" file to bug report
Comment 3 Alexey Dobriyan 2014-02-14 12:16:00 UTC
most importantly vmlinux
i doubt d_path() is the culprit
Comment 4 Alexey Dobriyan 2014-02-14 14:54:45 UTC
Please run memtest on the box.

If there was 0000 => 1000 bitflip, then code would oops in your way.
But if there wasn't, code would correctly skip NULL ->d_dname as it should.
Comment 5 Alexey Dobriyan 2014-02-14 14:57:03 UTC
but then again 0x1000 == PAGE_SIZE which is quite a coincidence.
Comment 6 wiebittewas 2014-02-15 19:57:59 UTC
well, because the error is exactly reproducable with different kernels and after various reboots in the same way with the same location, I don't think, that it's caused by memory-failures, but nevertheless I tested the whole mem twice: no result. to prove it, I change the testboard to another one with other memory: failure is reproducable as before. so the possibility, that it's caused by hardware-errors is really really small.
I'll attach the requested files.
Comment 7 wiebittewas 2014-02-15 20:03:51 UTC
Created attachment 126261 [details]
kernel-file, where the error oocures

kernel-file as requested, where the error occures
Comment 8 wiebittewas 2014-02-15 20:05:15 UTC
Created attachment 126271 [details]
configuration for attached kernel-file

config-file as requested
Comment 9 Alexey Dobriyan 2014-02-17 10:54:04 UTC
thanks for ruling hardware out of equation

bzImage file you've provided was stripped of symbolic information.
I have hard time locating proc_pid_readlink(), though I've found d_path().
it seems kernel was relocated a bit

could you please do the following:

compile clean kernel from scratch
reproduce
post an oops from that kernel
post "vmlinux" file which is left in source tree (not bzImage in /boot) form that kernel

------

if you still have full kernel build tree from the oops you've attached post "vmlinux" 


    Alexey
Comment 10 Alexey Dobriyan 2014-02-17 11:00:31 UTC
Another test:

you're doing

    find /proc -type f -exec sha512sum {} \;

"sha512sum" is equivalent to slow "cat"
there is "pagemap" file for every process, reading it could take very long time on x86_64.

can you confirm that the following also reproduces the bug?

   find /proc -type f ! -name 'pagemap' -exec /bin/cat {} \;
Comment 11 wiebittewas 2014-02-18 23:05:48 UTC
first: a big sorry, that I've given the bzImage instead of the vmlinux - I didn't thought about the concrete needs for the request. I'll attach the vmlinux.

additionally I've done soem tests again because it seems that you've troubles to reproduce the problem and I recognize, that's not as easy, as I thought. 
Here I'm using a script for getiing complete filelists and even if I change that script only a bit, so that it writes less gearbage to console, it seems to be difficult to reproduce. So I'll attach that script, too.
I started it in /proc as 
"for (( i=0; i<2000; i++)) ;do ( ~/ooops \! -name pagemap  & ); done"

the first tests were done on a athlon BE2400 dual-core with 16MB and a phenom X4-910e quad-core with 8, now with 16MB. if necessary, I can try it on newer machines (which are currently in use).
most of the time it takes round about 300-400 seconds from start of the loop until the first occurence of a bug-msg. the excluding of pagemap doesn't seem to make a difference.

but again: sorry I hadn't put the script with my first msg - I didn't thought, that it may be important.
Comment 12 wiebittewas 2014-02-18 23:06:50 UTC
Created attachment 126691 [details]
the script for listing the entries
Comment 13 wiebittewas 2014-02-18 23:09:23 UTC
Created attachment 126701 [details]
the vmlinux-file
Comment 14 wiebittewas 2014-02-18 23:21:07 UTC
just have done another test with kernel 3.10.30: the error comes here, too.
(nearly 300 seconds after starting the loop, similar to the previous tests)
Comment 15 Alexey Dobriyan 2014-02-21 11:10:22 UTC
reproduced locally once
Comment 16 Alan 2014-03-03 14:14:45 UTC
*** Bug 70971 has been marked as a duplicate of this bug. ***
Comment 17 Alexey Dobriyan 2014-03-06 12:56:45 UTC
> *** Bug 70971 has been marked as a duplicate of this bug. ***

this seems wrong
Comment 18 Alexey Dobriyan 2014-03-06 14:53:42 UTC
Steps to reproduce!

CONFIG_CHECKPOINT_RESTORE=y

fd = open(...)
while(1) {
    mmap(, fd);
    munmap(fd,);
}

ls -la /proc/$PID/map_files
Comment 19 Alexey Dobriyan 2014-03-06 14:54:07 UTC
it will trigger immediately and fix is obvious
Comment 20 Alexey Dobriyan 2014-03-11 11:12:14 UTC
commit 70335abb2689c8cd5df91bf2d95a65649addf50b
fs/proc/base.c: fix GPF in /proc/$PID/map_files
Comment 21 wiebittewas 2014-03-14 18:54:54 UTC
(In reply to Alexey Dobriyan from comment #20)
> commit 70335abb2689c8cd5df91bf2d95a65649addf50b
> fs/proc/base.c: fix GPF in /proc/$PID/map_files

just applied this patch manually to a 3.12.14-kernel and it seems to work. (no oops within 15min of running the looped test-script)

good work - many thanks.