Bug 12754 - inotify doesn't free memory allocated to watches
Summary: inotify doesn't free memory allocated to watches
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Alexey Dobriyan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-22 16:32 UTC by russell bell
Modified: 2009-12-02 01:50 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.28
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Decrement the userspace inotify watch counter in fsnotify freeing_marks callback (1.13 KB, patch)
2009-07-27 17:33 UTC, Alex Riesen
Details | Diff

Description russell bell 2009-02-22 16:32:47 UTC
Latest working kernel version: none
Earliest failing kernel version:
Distribution:  CentOS 5
Hardware Environment: intel
Software Environment:
Problem Description:  run out of watches

Steps to reproduce:  use enough watches

We have webcams that take photos every 5 minutes.  I run a daemon
that detects the creation of a new webcam photo then makes a thumbnail
of it.  Eventually it stopped working.  I figured out that I could
not create new watches:  inotify_add_watch reported "No space left on device".
I create all the watches with the ONESHOT parameter so they are deleted
as soon as they are triggered.  When I make it display the watch number
it's always 3.  A new watch is added only when the old watch has been
triggered.  inotify isn't recovering the memory from deleted watches.
Comment 1 Anonymous Emailer 2009-02-24 13:05:29 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sun, 22 Feb 2009 16:32:47 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12754
> 
>            Summary: inotify doesn't free memory allocated to watches
>            Product: File System
>            Version: 2.5
>      KernelVersion: 2.6.28
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: fs_other@kernel-bugs.osdl.org
>         ReportedBy: russell@rickstewart.com
> 
> 
> Latest working kernel version: none
> Earliest failing kernel version:
> Distribution:  CentOS 5
> Hardware Environment: intel
> Software Environment:
> Problem Description:  run out of watches
> 
> Steps to reproduce:  use enough watches
> 
> We have webcams that take photos every 5 minutes.  I run a daemon
> that detects the creation of a new webcam photo then makes a thumbnail
> of it.  Eventually it stopped working.  I figured out that I could
> not create new watches:  inotify_add_watch reported "No space left on
> device".
> I create all the watches with the ONESHOT parameter so they are deleted
> as soon as they are triggered.  When I make it display the watch number
> it's always 3.  A new watch is added only when the old watch has been
> triggered.  inotify isn't recovering the memory from deleted watches.
> 

So we have a serious leak.  

A few fixes have gone into inotify since 2.6.28.  I don't immediately
see any which would address this bug, but it would be worth testing
2.6.29-rc6 if poss, please.  Hopefully those fixes also got fed back
into 2.6.28.x, but that path is somewhat unreliable.

If it isn't yet fixed then I'm not sure how to get it fixed, really -
inotify development is a bit quiet.
Comment 2 Anonymous Emailer 2009-02-24 13:38:10 UTC
Reply-To: viro@ZenIV.linux.org.uk

On Tue, Feb 24, 2009 at 01:05:23PM -0800, Andrew Morton wrote:

> > We have webcams that take photos every 5 minutes.  I run a daemon
> > that detects the creation of a new webcam photo then makes a thumbnail
> > of it.  Eventually it stopped working.  I figured out that I could
> > not create new watches:  inotify_add_watch reported "No space left on
> device".
> > I create all the watches with the ONESHOT parameter so they are deleted
> > as soon as they are triggered.  When I make it display the watch number
> > it's always 3.  A new watch is added only when the old watch has been
> > triggered.  inotify isn't recovering the memory from deleted watches.
> 
> So we have a serious leak.  
> 
> A few fixes have gone into inotify since 2.6.28.  I don't immediately
> see any which would address this bug, but it would be worth testing
> 2.6.29-rc6 if poss, please.  Hopefully those fixes also got fed back
> into 2.6.28.x, but that path is somewhat unreliable.
> 
> If it isn't yet fixed then I'm not sure how to get it fixed, really -
> inotify development is a bit quiet.

Build with CONFIG_DEBUG_SLAB_LEAK (and yes, that means CONFIG_SLAB),
reproduce that and see what's in /proc/slab_allocators.  FWIW, we probably
ought to port that stuff to SLUB et.al., but that's a different story...
Comment 3 Anonymous Emailer 2009-02-24 15:23:47 UTC
Reply-To: viro@ZenIV.linux.org.uk

On Tue, Feb 24, 2009 at 09:38:00PM +0000, Al Viro wrote:
> On Tue, Feb 24, 2009 at 01:05:23PM -0800, Andrew Morton wrote:
> 
> > > We have webcams that take photos every 5 minutes.  I run a daemon
> > > that detects the creation of a new webcam photo then makes a thumbnail
> > > of it.  Eventually it stopped working.  I figured out that I could
> > > not create new watches:  inotify_add_watch reported "No space left on
> device".
> > > I create all the watches with the ONESHOT parameter so they are deleted
> > > as soon as they are triggered.  When I make it display the watch number
> > > it's always 3.  A new watch is added only when the old watch has been
> > > triggered.  inotify isn't recovering the memory from deleted watches.

IN_ONESHOT means that they will be *removed* as they are triggered.  You still
have to call put_inotify_watch() from your ->handle_event() when you get
IN_ONESHOT in the mask.  IOW, check your ->handle_event(); unless it does
that put_inotify_watch(), you are leaking.
Comment 4 Anonymous Emailer 2009-02-24 15:40:27 UTC
Reply-To: akpm@linux-foundation.org

On Tue, 24 Feb 2009 23:23:37 +0000
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Tue, Feb 24, 2009 at 09:38:00PM +0000, Al Viro wrote:
> > On Tue, Feb 24, 2009 at 01:05:23PM -0800, Andrew Morton wrote:
> > 
> > > > We have webcams that take photos every 5 minutes.  I run a daemon
> > > > that detects the creation of a new webcam photo then makes a thumbnail
> > > > of it.  Eventually it stopped working.  I figured out that I could
> > > > not create new watches:  inotify_add_watch reported "No space left on
> device".
> > > > I create all the watches with the ONESHOT parameter so they are deleted
> > > > as soon as they are triggered.  When I make it display the watch number
> > > > it's always 3.  A new watch is added only when the old watch has been
> > > > triggered.  inotify isn't recovering the memory from deleted watches.
> 
> IN_ONESHOT means that they will be *removed* as they are triggered.  You
> still
> have to call put_inotify_watch() from your ->handle_event() when you get
> IN_ONESHOT in the mask.  IOW, check your ->handle_event(); unless it does
> that put_inotify_watch(), you are leaking.

y:/usr/src/linux-2.6.29-rc6> grep -rl inotify_operations .
./Documentation/filesystems/inotify.txt
./kernel/audit.c
./kernel/audit_tree.c
./fs/notify/inotify/inotify_user.c
./fs/notify/inotify/inotify.c
./include/linux/inotify.h

I assume it's inotify_dev_queue_event()?

        if (mask & IN_IGNORED || w->mask & IN_ONESHOT)
                put_inotify_watch(w); /* final put */
Comment 5 russell bell 2009-02-24 15:51:33 UTC
     viro@ZenIV.linux.org.uk wrote:

     'IN_ONESHOT means that they will be *removed* as they are
triggered.'
     Just as I thought!     

     'You still have to call put_inotify_watch() from your ->handle_event() when you get IN_ONESHOT in the mask.'
     ?  I can't find put_inotify_watch or inotify_put_watch as system
calls.  Are you talking about rewriting the kernel?

russell bell
Comment 6 Anonymous Emailer 2009-02-24 16:29:15 UTC
Reply-To: viro@ZenIV.linux.org.uk

On Tue, Feb 24, 2009 at 03:40:23PM -0800, Andrew Morton wrote:
> ./Documentation/filesystems/inotify.txt
> ./kernel/audit.c
> ./kernel/audit_tree.c
> ./fs/notify/inotify/inotify_user.c
> ./fs/notify/inotify/inotify.c
> ./include/linux/inotify.h
> 
> I assume it's inotify_dev_queue_event()?
> 
>         if (mask & IN_IGNORED || w->mask & IN_ONESHOT)
>                 put_inotify_watch(w); /* final put */

Duh.  Nevermind, I'd misparsed the report.  If we are talking about
inotify_user.c (and not some new client), then we are back to "let's
see what gets leaked"...

Actually, looking at inotify_user.c, we seem to be doing something rather
fishy.  Look: event gets triggered, we pick the watch, get inotify_device
(inotify_user-specific stuff) from it, grap mutex on it (dev->ev_mutex)
and drop reference to inotify_watch.  Which happily triggers ->destroy_watch,
which does put_inotify_dev().  Which is
        if (atomic_dec_and_test(&dev->count)) {
                atomic_dec(&dev->user->inotify_devs);
                free_uid(dev->user);
                kfree(dev);
        }
What's to stop that from happening when we'd been holding the last reference
to that sucker?  kfree() while holding a mutex inside the structure being
freed is not nice...
Comment 7 Anonymous Emailer 2009-02-24 16:40:32 UTC
Reply-To: akpm@linux-foundation.org

On Wed, 25 Feb 2009 00:28:33 +0000
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Tue, Feb 24, 2009 at 03:40:23PM -0800, Andrew Morton wrote:
> > ./Documentation/filesystems/inotify.txt
> > ./kernel/audit.c
> > ./kernel/audit_tree.c
> > ./fs/notify/inotify/inotify_user.c
> > ./fs/notify/inotify/inotify.c
> > ./include/linux/inotify.h
> > 
> > I assume it's inotify_dev_queue_event()?
> > 
> >         if (mask & IN_IGNORED || w->mask & IN_ONESHOT)
> >                 put_inotify_watch(w); /* final put */
> 
> Duh.  Nevermind, I'd misparsed the report.  If we are talking about
> inotify_user.c (and not some new client), then we are back to "let's
> see what gets leaked"...
> 
> Actually, looking at inotify_user.c, we seem to be doing something rather
> fishy.  Look: event gets triggered, we pick the watch, get inotify_device
> (inotify_user-specific stuff) from it, grap mutex on it (dev->ev_mutex)
> and drop reference to inotify_watch.  Which happily triggers ->destroy_watch,
> which does put_inotify_dev().  Which is
>         if (atomic_dec_and_test(&dev->count)) {
>                 atomic_dec(&dev->user->inotify_devs);
>                 free_uid(dev->user);
>                 kfree(dev);
>         }
> What's to stop that from happening when we'd been holding the last reference
> to that sucker?  kfree() while holding a mutex inside the structure being
> freed is not nice...

slab and slub have runtime checking for kfree() of a currently-locked
lock.  I forget which DEBUG_foo option turns it on. 
CONFIG_DEBUG_OBJECTS_FREE?
Comment 8 Anonymous Emailer 2009-02-24 18:40:11 UTC
Reply-To: josef@redhat.com

On Wed, Feb 25, 2009 at 12:28:33AM +0000, Al Viro wrote:
> On Tue, Feb 24, 2009 at 03:40:23PM -0800, Andrew Morton wrote:
> > ./Documentation/filesystems/inotify.txt
> > ./kernel/audit.c
> > ./kernel/audit_tree.c
> > ./fs/notify/inotify/inotify_user.c
> > ./fs/notify/inotify/inotify.c
> > ./include/linux/inotify.h
> > 
> > I assume it's inotify_dev_queue_event()?
> > 
> >         if (mask & IN_IGNORED || w->mask & IN_ONESHOT)
> >                 put_inotify_watch(w); /* final put */
> 
> Duh.  Nevermind, I'd misparsed the report.  If we are talking about
> inotify_user.c (and not some new client), then we are back to "let's
> see what gets leaked"...
> 
> Actually, looking at inotify_user.c, we seem to be doing something rather
> fishy.  Look: event gets triggered, we pick the watch, get inotify_device
> (inotify_user-specific stuff) from it, grap mutex on it (dev->ev_mutex)
> and drop reference to inotify_watch.  Which happily triggers ->destroy_watch,
> which does put_inotify_dev().  Which is
>         if (atomic_dec_and_test(&dev->count)) {
>                 atomic_dec(&dev->user->inotify_devs);
>                 free_uid(dev->user);
>                 kfree(dev);
>         }
> What's to stop that from happening when we'd been holding the last reference
> to that sucker?  kfree() while holding a mutex inside the structure being
> freed is not nice...

That shouldn't happen, we should only be doing the last put on the dev when all
watches have been removed.  When we do the inotify_init we do the get on the
dev, and then for every watch theres a pair of get/put for instantiation and
removal, so we can't free the dev when doing a put for a watch (well obviously
we can, but we shouldn't be anyway).  If the user is getting back -ENOSPC from
inotify then it can only mean we've run out of watches

        if (atomic_read(&dev->user->inotify_watches) >=                       
                        inotify_max_user_watches)
                return -ENOSPC;

is from create_watch in inotify_user.c.  So it seems we are just forgetting to
do a put somewhere on our watch, or the user unknowingly hitting their max.
Thanks,

Josef
Comment 9 Anonymous Emailer 2009-02-24 19:13:32 UTC
Reply-To: viro@ZenIV.linux.org.uk

On Tue, Feb 24, 2009 at 09:38:50PM -0500, Josef Bacik wrote:

> > Actually, looking at inotify_user.c, we seem to be doing something rather
> > fishy.  Look: event gets triggered, we pick the watch, get inotify_device
> > (inotify_user-specific stuff) from it, grap mutex on it (dev->ev_mutex)
> > and drop reference to inotify_watch.  Which happily triggers
> ->destroy_watch,
> > which does put_inotify_dev().  Which is
> >         if (atomic_dec_and_test(&dev->count)) {
> >                 atomic_dec(&dev->user->inotify_devs);
> >                 free_uid(dev->user);
> >                 kfree(dev);
> >         }
> > What's to stop that from happening when we'd been holding the last
> reference
> > to that sucker?  kfree() while holding a mutex inside the structure being
> > freed is not nice...
> 
> That shouldn't happen, we should only be doing the last put on the dev when
> all
> watches have been removed.  When we do the inotify_init we do the get on the
> dev, and then for every watch theres a pair of get/put for instantiation and
> removal, so we can't free the dev when doing a put for a watch (well
> obviously
> we can, but we shouldn't be anyway).

Hold on.  Either that sucker can happen after inotify_release() has dropped
its reference or it can not.  In the former case, we are screwed since _we_
are holding the last remaining reference.  In the latter, WTF are we grabbing
references to ->dev at watch creation?

>  If the user is getting back -ENOSPC from
> inotify then it can only mean we've run out of watches

> 
>         if (atomic_read(&dev->user->inotify_watches) >=                       
>                         inotify_max_user_watches)
>                 return -ENOSPC;

... or that idr_pre_get() has failed in inotify.c
Comment 10 russell bell 2009-03-23 12:48:01 UTC
Is there anyway to recover this memory other than re-booting?
If I unload the filesystem, perhaps?  It's mounting up and
up.  If it were my personal system, which I turn off once
daily, I wouldn't mind, but our server stays politely up
for months.
Comment 11 Alex Riesen 2009-07-26 15:56:17 UTC
Still broken in v2.6.31-rc4, despite the patch from KeithP:
http://lkml.org/lkml/2009/7/2/7 (bdae997f44535ac4ebe1 in mainline).
Comment 12 Alex Riesen 2009-07-26 16:55:34 UTC
Umm, sorry. The patch from Keith was not about the count of watches at all.
So the problem was never actually fixed...

Russell: you can just comment out "goto out_err;" after the line
"ret = -ENOSPC;" (it is the only line containing ENOSPC in
fs/notify/inotify/inotify_user.c).
I assume it is you very own appliance with the webcam, so you wouldn't be
that much concerned about the only user on it which uses inotify to be
not safeguarded anymore by the counter of inotify watches.
Comment 13 Alex Riesen 2009-07-27 17:33:52 UTC
Created attachment 22512 [details]
Decrement the userspace inotify watch counter in fsnotify freeing_marks callback
Comment 14 Alex Riesen 2009-07-28 06:05:02 UTC
Fixed in mainline commit 5549f7cdf84c02939fd368d0842aa2f472bb6e98:
Author: Eric Paris <eparis@redhat.com>  2009-07-07 16:28:23
Committer: Eric Paris <eparis@redhat.com>  2009-07-21 21:26:26
Parent: aea1f7964ae6cba5eb419a958956deb9016b3341 (Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes)
Child:  75fe2b26394c59c8e16bd7b76f4be5d048103ad1 (inotify: do not leak inode marks in inotify_add_watch)
Branch: master
Follows: v2.6.31-rc3
Precedes: 

    inotify: drop user watch count when a watch is removed
    
    The inotify rewrite forgot to drop the inotify watch use cound when a watch
    was removed.  This means that a single inotify fd can only ever register a
    maximum of /proc/sys/fs/max_user_watches even if some of those had been
    freed.
    
(which is exactly the same patch I sent. And if I looked (or just merged) at
the Eric's inotify branch properly I should have seen that).
Comment 15 Alexey Dobriyan 2009-12-02 01:50:05 UTC
commit 5549f7cdf84c02939fd368d0842aa2f472bb6e98
"inotify: drop user watch count when a watch is removed"

Note You need to log in before you can comment on or make changes to this bug.