Bug 62221
Summary: | Fanotify blocks any write operation to CIFS mounted volumes for about 10-15 secs | ||
---|---|---|---|
Product: | File System | Reporter: | priyamn (pria.mn9) |
Component: | CIFS | Assignee: | fs_cifs (fs_cifs) |
Status: | NEW --- | ||
Severity: | blocking | CC: | douglas.leeder, eparis, jlayton, lsahlber, smfrench, tim.pickersgill |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
syslog output without fanotify
syslog output with fanotify |
Description
priyamn
2013-09-27 13:46:35 UTC
Eric, any idea of what the problem here might be? Any updates on this ? Easily reproduced using Eric Paris's fanotify-example test program (from http://git.infradead.org/users/eparis/fanotify-example.git/tree) E.g.: ./fanotify -o open_perm -m /mnt/cifs We see a 30s or 40s delay when creating a new file with touch. Other accesses (including writes) appear to be okay. Used git bisect to find the commit where this was introduced: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8b0192a5f478da1c1ae906bf3ffff53f26204f56 commit 8b0192a5f478da1c1ae906bf3ffff53f26204f56 Author: Jeff Layton <jlayton@redhat.com> Date: Tue Feb 7 06:30:52 2012 -0500 cifs: request oplock when doing open on lookup Currently, it's always set to 0 (no oplock requested). Cc: <stable@vger.kernel.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Setting enable_oplocks (in /sys/module/cifs/parameters) to 0 makes the problem go away. Created attachment 128441 [details]
syslog output without fanotify
syslog output (with cifsFYI=7) when running touch to create a new file, without fanotify.
Created attachment 128451 [details]
syslog output with fanotify
syslog output (with cifsFYI=7) when running touch to create a new file, with fanotify.
Have set cifsFYI to 7 and re-run touch with and without fanotify trapping OPEN_PERM events. The syslog output from both are attached. with fanotify trapping OPEN_PERM events, there's a cifs_open call in the middle of the cifs_lookup, which doesn't occur normally. I've inserted blank lines into the logs to highlight this difference. Everything else looks the same to me. I guess this additional open may have been happening before, but without the oplock, it just worked anyway? Backtrace of the fanotify and touch processes during the pause: 0xffff8800379617c0 1241 1204 0 0 D 0xffff880037961d28 fanotify_test 0xffff88003d3e8000 1244 1204 0 0 S 0xffff88003d3e8568 su 0xffff8800379e2f80 1245 1244 0 0 D 0xffff8800379e34e8 touch [0]kdb> btp 1241 Stack traceback for pid 1241 0xffff8800379617c0 1241 1204 0 0 D 0xffff880037961d28 fanotify_test ffff88003d3e1a80 0000000000000086 ffff8800379617c0 ffffffff81909480 ffff88003d3e1fd8 00000000000125c0 00000000000125c0 ffff880037b325d0 ffff88003d04f968 0000000000000000 ffff88003d4d8000 ffff88003d3e1b64 Call Trace: [<ffffffff815f9689>] schedule+0x29/0x70 [<ffffffff812c4ada>] wait_for_response.isra.7+0x9a/0xf0 [<ffffffff8107d0b0>] ? prepare_to_wait_event+0x100/0x100 [<ffffffff812c5a3f>] SendReceive+0x12f/0x300 [<ffffffff812a6e34>] CIFSPOSIXCreate+0x194/0x3b0 [<ffffffff812b7280>] cifs_posix_open+0x180/0x2b0 [<ffffffff815f26d0>] ? printk+0x67/0x69 [<ffffffff812b8f7d>] cifs_open+0x44d/0x650 [<ffffffff81074f45>] ? set_next_entity+0x85/0xa0 [<ffffffff815f9223>] ? __schedule+0x2f3/0x730 [<ffffffff811511d2>] ? kmem_cache_alloc_trace+0x132/0x140 [<ffffffff81156d5f>] do_dentry_open+0x1ff/0x2d0 [<ffffffff812b8b30>] ? cifsFileInfo_get+0x30/0x30 [<ffffffff81156e89>] dentry_open+0x59/0xe0 [<ffffffff8119c730>] fanotify_read+0x210/0x550 [<ffffffff8107d0b0>] ? prepare_to_wait_event+0x100/0x100 [<ffffffff8115935b>] vfs_read+0x9b/0x160 [<ffffffff81159f76>] SyS_read+0x46/0xb0 [<ffffffff8116df95>] ? SyS_poll+0x65/0x100 [<ffffffff81604f92>] system_call_fastpath+0x16/0x1b [0]kdb> btp 1245 Stack traceback for pid 1245 0xffff8800379e2f80 1245 1244 0 0 D 0xffff8800379e34e8 touch ffff88003d32d9d8 0000000000000086 ffff8800379e2f80 ffff8800379617c0 ffff88003d32dfd8 00000000000125c0 00000000000125c0 0000000000000000 ffff88003d27a3c0 ffff880037b30c00 ffff880037b30cd8 0000000000000000 Call Trace: [<ffffffff815f9689>] schedule+0x29/0x70 [<ffffffff8119c028>] fanotify_handle_event+0x1a8/0x310 [<ffffffff8107d0b0>] ? prepare_to_wait_event+0x100/0x100 [<ffffffff811984f1>] send_to_group+0xd1/0x1b0 [<ffffffff811987a3>] fsnotify+0x1d3/0x2f0 [<ffffffff812f6ace>] security_file_open+0x6e/0x70 [<ffffffff81156c24>] do_dentry_open+0xc4/0x2d0 [<ffffffff811569e0>] ? finish_no_open+0x20/0x20 [<ffffffff81157001>] finish_open+0x31/0x40 [<ffffffff812b625b>] cifs_atomic_open+0x16b/0x300 [<ffffffff81167f3f>] do_last+0x76f/0x1200 [<ffffffff812f68a6>] ? security_file_alloc+0x16/0x20 [<ffffffff81168a8b>] path_openat+0xbb/0x660 [<ffffffff8112e739>] ? handle_mm_fault+0x389/0x9a0 [<ffffffff8116982a>] do_filp_open+0x3a/0x90 [<ffffffff81175d0d>] ? __alloc_fd+0x7d/0x120 [<ffffffff81158878>] do_sys_open+0x128/0x220 [<ffffffff8115898e>] SyS_open+0x1e/0x20 [<ffffffff81604f92>] system_call_fastpath+0x16/0x1b The fanotify case is resulting in an extra open of the file, which is just wrong. With a local filesystem, an extra open is generally painless -- not so much with a network filesystem. So, it looks like the the client requests an oplock on the file (as it should), but then second open races in before the reply is processed. That causes the server to immediately request that the oplock be broken, but we have no record of that oplock yet. The original open hasn't had a chance to complete because it's blocked in the fanotify code. Eventually (15s later) the server gives up on the client and revokes the oplock anyway which lets the second open complete and that unwedges things. The only way I can see to fix this is to fix fanotify not to do this extra open. How would we do that? We need access to the file contents to to content-based access control, so would need to ?duplicate? the existing open file handle? (And close it if permission is denied) The other alternatives we thought of: a) Take the op-lock later, after the security check b) Remember the file earlier so the release can happen. c) Release unknown op-locks: Send the appropriate message back to the server if we can't match the op-lock to a file. (the code looks like it needs the inode to send the message though?) (In reply to Douglas Leeder from comment #10) > How would we do that? We need access to the file contents to to > content-based access control, so would need to ?duplicate? the existing open > file handle? (And close it if permission is denied) > > The other alternatives we thought of: > > a) Take the op-lock later, after the security check > b) Remember the file earlier so the release can happen. > c) Release unknown op-locks: Send the appropriate message back to the server > if we can't match the op-lock to a file. (the code looks like it needs the > inode to send the message though?) All of those options are pretty icky. Why reopen the file in the first place? What makes you think that the server is going to allow you to read the file. What if you don't have any credentials that allow it? I think Trond's comments in bug 60563 also apply here. The correct way to do this is to plumb this sort of notification into the VFS layer as a proper operation, but even with that I'm not sure this will ever work properly with a network filesystem like NFS or CIFS. If you don't have an open file handle that you can read from, then I don't think you can reasonably do any sort of content-based access control. c) is certainly very icky. a) or b) seem like they close a general race-condition. cifs isn't ready for a broken op-lock at the time when it requests the op-lock, which seems like it could happen anyway if another machine requested the same file. Ok, I'll buy that... I don't see how you can reasonably do "a" since at this level you don't know that there will be a recursive open occurring. "b" is possible, but that would require an overhaul of the oplock handling. That said...doing that might fix your immediate problem, but it's going to mean that performance will suck. Without an oplock, you can't do any client-side caching, so the *best* thing is still to avoid doing any sort of extra open on the file if you can help it. I do wonder though -- why doesn't commit 233839b1df65a prevent this from occurring? Ahh right...that patch only fixed this for the SMB2 case. The SMB1 case still has the same problem. Still, you can probably hook into the same infrastructure to fix the race there too. SMB1 is getting obsolete fast. Since this is fixed for SMB2+ I suggest we close this bug. Have we verified that this still fails on SMB1 even? It does remind me that we should improve fanotify support |