Bug 77111

Summary: inotify: inotify_add_watch reuses watch descriptor ID before the queue is drained of events with the same ID
Product: File System Reporter: Heinrich Schuchardt (xypron.glpk)
Component: OtherAssignee: fs_other
Status: NEW ---    
Severity: normal CC: eparis, jack, john, mtk.manpages, rlove, szg00000, xypron.glpk
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.14.4 Subsystem:
Regression: No Bisected commit-id:
Attachments: test_wd_reuse.c - Test watch descriptor reuse
[PATCH 1/1] inotify: bug 77111 - fix reusage of watch descriptors

Description Heinrich Schuchardt 2014-05-29 17:14:46 UTC
Created attachment 137691 [details]
test_wd_reuse.c - Test watch descriptor reuse

Watch descriptor IDs are returned by inotify_add_watch.
When calling inotify_rm_watch an IN_IGNORE is placed on the inotify queue pointing to the ID of the removed watch.

inotify_add_watch should not return a watch descriptor ID for which events are still on the queue but should return an unused ID.

Unfortunately the existing Kernel code does not provide such a guarantee.

Actually in rare cases watch descriptor IDs are returned by inotify_add_watch for which events are still on the inotify queue.

I verified this bug with the test program appended.

It creates one watch for file "0" to demonstrate the problem.
Afterwards it creates and removes watches again and again just to force idr_alloc_cyclic to reach INT_MAX.
Then events are created for file "0" on the queue.
The watch for "0" is removed.
A watch for another file is created.

The output of the test program confirms that a watch descriptor ID has been returned by inotify_add_watch while events for the same ID still exist on the queue.


Test program output (after a few hours, on Linux 3.14.4 x86_64):

...
2147459044      2147467235      2147475426      2147483617      2147483647      
Preparation done
BINGO Collision detected
Watch descriptor 1 for /tmp/test/8192
Watch descriptor 1 for /tmp/test/0
Comment 1 Heinrich Schuchardt 2014-06-02 18:00:16 UTC
Created attachment 137911 [details]
[PATCH 1/1] inotify: bug 77111 - fix reusage of watch descriptors

Without the patch inotify watch descriptors may be reused by
inotify_add_watch before all events for the previous usage
of the watch descriptor have been read.

With the patch watch descriptors are removed from the idr only
after the IN_IGNORED event has been read.

The sequence of some static routines is rearranged.

The significant change moving the call of
inotify_remove_from_idr form inotify_ignored_and_remove_idr to
to copy_event_to_user and renaming inotify_ignored_and_remove_idr
to inotify_ignored.
Comment 2 Michael Kerrisk 2014-06-23 07:27:09 UTC
Just for completeness, I add Eric Pasis's thoughts from the thread at 
http://thread.gmane.org/gmane.linux.kernel/1716290/focus=1721343:

From: Eric Paris <eparis <at> redhat.com>
Subject: Re: [PATCH 1/1] inotify: bug 77111 - fix reusage of watch descriptors
Newsgroups: gmane.linux.kernel
Date: 2014-06-09 19:16:39 GMT (1 week, 6 days, 12 hours and 9 minutes ago)

This 'bug' feels very theoretical to me.  There were about 3 kernel
releases back when inotify was rewriten onto fsnotify where it was
intentionally reusing wd's.  So instead of a MAX_INT wrap all you have
to do was a single create/destroy/create to get reuse.  Almost every
utility survived...   But we did have 2 things 'misbehave'.   udev and
restorecond (an SELinux utility)   Both of which were rewritten to
handle reuse, but then we stopped re-use because obviously it had
broken userspace...

I'm also not all that worried about the 'long lived daemon' comment
since it wasn't until about 4 kernels ago (the idr_alloc_cyclic work
from jlayton) that the idr COULD loop.  And I'd never seen a complaint
that anyone hit the max.  So any looping seems unlikely.

In any case...   I'm so scared of changing any object lifetime in this
code.  It's just really complex!

What happens with this patch if I close(inotify_fd) ?  We
obviously can't write the IN_IGNORED event to userspace so we don't free
the mark/idr entry?  Ever?  pretty sure it'll BUG()...

What happens if you cause an IGNORED events, don't read it, then
close(inotify_fd)...

Also if the copy to userspace fails we NEVER free the mark?  That'll
BUG() eventually, I'm pretty sure...

Also by leaving he mark until you sent the ignored to userspace, can't
we keep generating events afer the ignored?

I'm not sure this patch is helping, but maybe I'm not seeing it right...

I do think a mention of potential reuse in the man page is
appropriate...