Created attachment 275441 [details]
Includes a reproducer and an example stack trace.
There seems to be a race between fsnotify() and fsnotify_put_mark() that can cause a kernel panic. It seems to me that it's possible for fsnotify() to use conn->list after fsnotify_put_mark() assigns conn->destroy_next, causing fsnotify() to traverse a corrupted linked list.
1) Insert a udelay(1) after line 251 in fs/notify/fsnotify.c and recompile:
251 conn = srcu_dereference(*connp, &fsnotify_mark_srcu);
253 if (conn)
254 node = srcu_dereference(conn->list.first, &fsnotify_mark_srcu);
2) Run the attached reproducer on a machine running an impacted kernel:
$ tar zxvf fsnotify_bug.tar.gz
$ /bin/bash run.sh
The attached tarball includes an example stack trace from the kernel HEAD. I will follow up with a patch that I believe resolves this issue. I believe this issue was introduced in the patchset that ended at revision 054c636e5c8054884ede889be82ce059879945e6.
Proposed patch: https://patchwork.kernel.org/patch/10349009/
Similar thread discussing this issue:
It's not clear to me that that issue is related. But I suppose it might be valuable to keep in mind.
Thanks for report and for the reproducer. I've tested it and hit a kernel crash after 50 iterations of your ./run.sh script in my test KVM. I've also tried my suggestion of moving destroy_next out of the union and wasn't able to reproduce any crash in 400 iterations.
I'll integrate your testcase into LTP among other inotify tests.
Sounds great, thank you! Patch v2: https://patchwork.kernel.org/patch/10351417/
Thanks. I've added the patch to my tree and will push it to Linus.
Do we have any plan to add the bug fix/patch to some specific kernel version? Thanks!
All the best,
Your patch is present in the kernel since 4.17-rc3. Is it what you are asking about?