Bug 199437

Summary: fsnotify: Race between fsnotify() and fsnotify_put_mark() causing kernel panic
Product: File System Reporter: Robert Kolchmeyer (rkolchmeyer)
Component: OtherAssignee: fs_other
Status: RESOLVED CODE_FIX    
Severity: normal CC: danielwzhg, jack, pradeep.sawlani
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.11-rc7 and later Subsystem:
Regression: No Bisected commit-id:
Attachments: Includes a reproducer and an example stack trace.

Description Robert Kolchmeyer 2018-04-18 19:42:46 UTC
Created attachment 275441 [details]
Includes a reproducer and an example stack trace.

There seems to be a race between fsnotify() and fsnotify_put_mark() that can cause a kernel panic. It seems to me that it's possible for fsnotify() to use conn->list after fsnotify_put_mark() assigns conn->destroy_next, causing fsnotify() to traverse a corrupted linked list.

To reproduce:

1) Insert a udelay(1) after line 251 in fs/notify/fsnotify.c and recompile:

250                                                                                  
251   conn = srcu_dereference(*connp, &fsnotify_mark_srcu);                          
252   udelay(1);                                                                     
253   if (conn)                                                                      
254     node = srcu_dereference(conn->list.first, &fsnotify_mark_srcu);              
255 

2) Run the attached reproducer on a machine running an impacted kernel:

$ tar zxvf fsnotify_bug.tar.gz
$ /bin/bash run.sh

The attached tarball includes an example stack trace from the kernel HEAD. I will follow up with a patch that I believe resolves this issue. I believe this issue was introduced in the patchset that ended at revision 054c636e5c8054884ede889be82ce059879945e6.
Comment 1 Robert Kolchmeyer 2018-04-19 04:57:57 UTC
Proposed patch: https://patchwork.kernel.org/patch/10349009/
Comment 2 Pradeep Sawlani 2018-04-19 05:07:41 UTC
Similar thread discussing  this issue:
https://lkml.org/lkml/2018/3/8/896
Comment 3 Robert Kolchmeyer 2018-04-19 05:10:10 UTC
It's not clear to me that that issue is related. But I suppose it might be valuable to keep in mind.
Comment 4 Jan Kara 2018-04-19 11:31:56 UTC
Thanks for report and for the reproducer. I've tested it and hit a kernel crash after 50 iterations of your ./run.sh script in my test KVM. I've also tried my suggestion of moving destroy_next out of the union and wasn't able to reproduce any crash in 400 iterations.
Comment 5 Jan Kara 2018-04-19 11:32:41 UTC
I'll integrate your testcase into LTP among other inotify tests.
Comment 6 Robert Kolchmeyer 2018-04-19 18:01:58 UTC
Sounds great, thank you! Patch v2: https://patchwork.kernel.org/patch/10351417/
Comment 7 Jan Kara 2018-04-19 20:25:57 UTC
Thanks. I've added the patch to my tree and will push it to Linus.
Comment 8 Daniel Wang 2018-10-26 02:30:24 UTC
Hi @Jan-

Do we have any plan to add the bug fix/patch to some specific kernel version? Thanks!

All the best,
Daniel Wang
Comment 9 Jan Kara 2018-10-29 08:48:54 UTC
Your patch is present in the kernel since 4.17-rc3. Is it what you are asking about?