Latest working kernel version: none Earliest failing kernel version: Distribution: CentOS 5 Hardware Environment: intel Software Environment: Problem Description: run out of watches Steps to reproduce: use enough watches We have webcams that take photos every 5 minutes. I run a daemon that detects the creation of a new webcam photo then makes a thumbnail of it. Eventually it stopped working. I figured out that I could not create new watches: inotify_add_watch reported "No space left on device". I create all the watches with the ONESHOT parameter so they are deleted as soon as they are triggered. When I make it display the watch number it's always 3. A new watch is added only when the old watch has been triggered. inotify isn't recovering the memory from deleted watches.
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 22 Feb 2009 16:32:47 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12754 > > Summary: inotify doesn't free memory allocated to watches > Product: File System > Version: 2.5 > KernelVersion: 2.6.28 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: fs_other@kernel-bugs.osdl.org > ReportedBy: russell@rickstewart.com > > > Latest working kernel version: none > Earliest failing kernel version: > Distribution: CentOS 5 > Hardware Environment: intel > Software Environment: > Problem Description: run out of watches > > Steps to reproduce: use enough watches > > We have webcams that take photos every 5 minutes. I run a daemon > that detects the creation of a new webcam photo then makes a thumbnail > of it. Eventually it stopped working. I figured out that I could > not create new watches: inotify_add_watch reported "No space left on > device". > I create all the watches with the ONESHOT parameter so they are deleted > as soon as they are triggered. When I make it display the watch number > it's always 3. A new watch is added only when the old watch has been > triggered. inotify isn't recovering the memory from deleted watches. > So we have a serious leak. A few fixes have gone into inotify since 2.6.28. I don't immediately see any which would address this bug, but it would be worth testing 2.6.29-rc6 if poss, please. Hopefully those fixes also got fed back into 2.6.28.x, but that path is somewhat unreliable. If it isn't yet fixed then I'm not sure how to get it fixed, really - inotify development is a bit quiet.
Reply-To: viro@ZenIV.linux.org.uk On Tue, Feb 24, 2009 at 01:05:23PM -0800, Andrew Morton wrote: > > We have webcams that take photos every 5 minutes. I run a daemon > > that detects the creation of a new webcam photo then makes a thumbnail > > of it. Eventually it stopped working. I figured out that I could > > not create new watches: inotify_add_watch reported "No space left on > device". > > I create all the watches with the ONESHOT parameter so they are deleted > > as soon as they are triggered. When I make it display the watch number > > it's always 3. A new watch is added only when the old watch has been > > triggered. inotify isn't recovering the memory from deleted watches. > > So we have a serious leak. > > A few fixes have gone into inotify since 2.6.28. I don't immediately > see any which would address this bug, but it would be worth testing > 2.6.29-rc6 if poss, please. Hopefully those fixes also got fed back > into 2.6.28.x, but that path is somewhat unreliable. > > If it isn't yet fixed then I'm not sure how to get it fixed, really - > inotify development is a bit quiet. Build with CONFIG_DEBUG_SLAB_LEAK (and yes, that means CONFIG_SLAB), reproduce that and see what's in /proc/slab_allocators. FWIW, we probably ought to port that stuff to SLUB et.al., but that's a different story...
Reply-To: viro@ZenIV.linux.org.uk On Tue, Feb 24, 2009 at 09:38:00PM +0000, Al Viro wrote: > On Tue, Feb 24, 2009 at 01:05:23PM -0800, Andrew Morton wrote: > > > > We have webcams that take photos every 5 minutes. I run a daemon > > > that detects the creation of a new webcam photo then makes a thumbnail > > > of it. Eventually it stopped working. I figured out that I could > > > not create new watches: inotify_add_watch reported "No space left on > device". > > > I create all the watches with the ONESHOT parameter so they are deleted > > > as soon as they are triggered. When I make it display the watch number > > > it's always 3. A new watch is added only when the old watch has been > > > triggered. inotify isn't recovering the memory from deleted watches. IN_ONESHOT means that they will be *removed* as they are triggered. You still have to call put_inotify_watch() from your ->handle_event() when you get IN_ONESHOT in the mask. IOW, check your ->handle_event(); unless it does that put_inotify_watch(), you are leaking.
Reply-To: akpm@linux-foundation.org On Tue, 24 Feb 2009 23:23:37 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Tue, Feb 24, 2009 at 09:38:00PM +0000, Al Viro wrote: > > On Tue, Feb 24, 2009 at 01:05:23PM -0800, Andrew Morton wrote: > > > > > > We have webcams that take photos every 5 minutes. I run a daemon > > > > that detects the creation of a new webcam photo then makes a thumbnail > > > > of it. Eventually it stopped working. I figured out that I could > > > > not create new watches: inotify_add_watch reported "No space left on > device". > > > > I create all the watches with the ONESHOT parameter so they are deleted > > > > as soon as they are triggered. When I make it display the watch number > > > > it's always 3. A new watch is added only when the old watch has been > > > > triggered. inotify isn't recovering the memory from deleted watches. > > IN_ONESHOT means that they will be *removed* as they are triggered. You > still > have to call put_inotify_watch() from your ->handle_event() when you get > IN_ONESHOT in the mask. IOW, check your ->handle_event(); unless it does > that put_inotify_watch(), you are leaking. y:/usr/src/linux-2.6.29-rc6> grep -rl inotify_operations . ./Documentation/filesystems/inotify.txt ./kernel/audit.c ./kernel/audit_tree.c ./fs/notify/inotify/inotify_user.c ./fs/notify/inotify/inotify.c ./include/linux/inotify.h I assume it's inotify_dev_queue_event()? if (mask & IN_IGNORED || w->mask & IN_ONESHOT) put_inotify_watch(w); /* final put */
viro@ZenIV.linux.org.uk wrote: 'IN_ONESHOT means that they will be *removed* as they are triggered.' Just as I thought! 'You still have to call put_inotify_watch() from your ->handle_event() when you get IN_ONESHOT in the mask.' ? I can't find put_inotify_watch or inotify_put_watch as system calls. Are you talking about rewriting the kernel? russell bell
Reply-To: viro@ZenIV.linux.org.uk On Tue, Feb 24, 2009 at 03:40:23PM -0800, Andrew Morton wrote: > ./Documentation/filesystems/inotify.txt > ./kernel/audit.c > ./kernel/audit_tree.c > ./fs/notify/inotify/inotify_user.c > ./fs/notify/inotify/inotify.c > ./include/linux/inotify.h > > I assume it's inotify_dev_queue_event()? > > if (mask & IN_IGNORED || w->mask & IN_ONESHOT) > put_inotify_watch(w); /* final put */ Duh. Nevermind, I'd misparsed the report. If we are talking about inotify_user.c (and not some new client), then we are back to "let's see what gets leaked"... Actually, looking at inotify_user.c, we seem to be doing something rather fishy. Look: event gets triggered, we pick the watch, get inotify_device (inotify_user-specific stuff) from it, grap mutex on it (dev->ev_mutex) and drop reference to inotify_watch. Which happily triggers ->destroy_watch, which does put_inotify_dev(). Which is if (atomic_dec_and_test(&dev->count)) { atomic_dec(&dev->user->inotify_devs); free_uid(dev->user); kfree(dev); } What's to stop that from happening when we'd been holding the last reference to that sucker? kfree() while holding a mutex inside the structure being freed is not nice...
Reply-To: akpm@linux-foundation.org On Wed, 25 Feb 2009 00:28:33 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote: > On Tue, Feb 24, 2009 at 03:40:23PM -0800, Andrew Morton wrote: > > ./Documentation/filesystems/inotify.txt > > ./kernel/audit.c > > ./kernel/audit_tree.c > > ./fs/notify/inotify/inotify_user.c > > ./fs/notify/inotify/inotify.c > > ./include/linux/inotify.h > > > > I assume it's inotify_dev_queue_event()? > > > > if (mask & IN_IGNORED || w->mask & IN_ONESHOT) > > put_inotify_watch(w); /* final put */ > > Duh. Nevermind, I'd misparsed the report. If we are talking about > inotify_user.c (and not some new client), then we are back to "let's > see what gets leaked"... > > Actually, looking at inotify_user.c, we seem to be doing something rather > fishy. Look: event gets triggered, we pick the watch, get inotify_device > (inotify_user-specific stuff) from it, grap mutex on it (dev->ev_mutex) > and drop reference to inotify_watch. Which happily triggers ->destroy_watch, > which does put_inotify_dev(). Which is > if (atomic_dec_and_test(&dev->count)) { > atomic_dec(&dev->user->inotify_devs); > free_uid(dev->user); > kfree(dev); > } > What's to stop that from happening when we'd been holding the last reference > to that sucker? kfree() while holding a mutex inside the structure being > freed is not nice... slab and slub have runtime checking for kfree() of a currently-locked lock. I forget which DEBUG_foo option turns it on. CONFIG_DEBUG_OBJECTS_FREE?
Reply-To: josef@redhat.com On Wed, Feb 25, 2009 at 12:28:33AM +0000, Al Viro wrote: > On Tue, Feb 24, 2009 at 03:40:23PM -0800, Andrew Morton wrote: > > ./Documentation/filesystems/inotify.txt > > ./kernel/audit.c > > ./kernel/audit_tree.c > > ./fs/notify/inotify/inotify_user.c > > ./fs/notify/inotify/inotify.c > > ./include/linux/inotify.h > > > > I assume it's inotify_dev_queue_event()? > > > > if (mask & IN_IGNORED || w->mask & IN_ONESHOT) > > put_inotify_watch(w); /* final put */ > > Duh. Nevermind, I'd misparsed the report. If we are talking about > inotify_user.c (and not some new client), then we are back to "let's > see what gets leaked"... > > Actually, looking at inotify_user.c, we seem to be doing something rather > fishy. Look: event gets triggered, we pick the watch, get inotify_device > (inotify_user-specific stuff) from it, grap mutex on it (dev->ev_mutex) > and drop reference to inotify_watch. Which happily triggers ->destroy_watch, > which does put_inotify_dev(). Which is > if (atomic_dec_and_test(&dev->count)) { > atomic_dec(&dev->user->inotify_devs); > free_uid(dev->user); > kfree(dev); > } > What's to stop that from happening when we'd been holding the last reference > to that sucker? kfree() while holding a mutex inside the structure being > freed is not nice... That shouldn't happen, we should only be doing the last put on the dev when all watches have been removed. When we do the inotify_init we do the get on the dev, and then for every watch theres a pair of get/put for instantiation and removal, so we can't free the dev when doing a put for a watch (well obviously we can, but we shouldn't be anyway). If the user is getting back -ENOSPC from inotify then it can only mean we've run out of watches if (atomic_read(&dev->user->inotify_watches) >= inotify_max_user_watches) return -ENOSPC; is from create_watch in inotify_user.c. So it seems we are just forgetting to do a put somewhere on our watch, or the user unknowingly hitting their max. Thanks, Josef
Reply-To: viro@ZenIV.linux.org.uk On Tue, Feb 24, 2009 at 09:38:50PM -0500, Josef Bacik wrote: > > Actually, looking at inotify_user.c, we seem to be doing something rather > > fishy. Look: event gets triggered, we pick the watch, get inotify_device > > (inotify_user-specific stuff) from it, grap mutex on it (dev->ev_mutex) > > and drop reference to inotify_watch. Which happily triggers > ->destroy_watch, > > which does put_inotify_dev(). Which is > > if (atomic_dec_and_test(&dev->count)) { > > atomic_dec(&dev->user->inotify_devs); > > free_uid(dev->user); > > kfree(dev); > > } > > What's to stop that from happening when we'd been holding the last > reference > > to that sucker? kfree() while holding a mutex inside the structure being > > freed is not nice... > > That shouldn't happen, we should only be doing the last put on the dev when > all > watches have been removed. When we do the inotify_init we do the get on the > dev, and then for every watch theres a pair of get/put for instantiation and > removal, so we can't free the dev when doing a put for a watch (well > obviously > we can, but we shouldn't be anyway). Hold on. Either that sucker can happen after inotify_release() has dropped its reference or it can not. In the former case, we are screwed since _we_ are holding the last remaining reference. In the latter, WTF are we grabbing references to ->dev at watch creation? > If the user is getting back -ENOSPC from > inotify then it can only mean we've run out of watches > > if (atomic_read(&dev->user->inotify_watches) >= > inotify_max_user_watches) > return -ENOSPC; ... or that idr_pre_get() has failed in inotify.c
Is there anyway to recover this memory other than re-booting? If I unload the filesystem, perhaps? It's mounting up and up. If it were my personal system, which I turn off once daily, I wouldn't mind, but our server stays politely up for months.
Still broken in v2.6.31-rc4, despite the patch from KeithP: http://lkml.org/lkml/2009/7/2/7 (bdae997f44535ac4ebe1 in mainline).
Umm, sorry. The patch from Keith was not about the count of watches at all. So the problem was never actually fixed... Russell: you can just comment out "goto out_err;" after the line "ret = -ENOSPC;" (it is the only line containing ENOSPC in fs/notify/inotify/inotify_user.c). I assume it is you very own appliance with the webcam, so you wouldn't be that much concerned about the only user on it which uses inotify to be not safeguarded anymore by the counter of inotify watches.
Created attachment 22512 [details] Decrement the userspace inotify watch counter in fsnotify freeing_marks callback
Fixed in mainline commit 5549f7cdf84c02939fd368d0842aa2f472bb6e98: Author: Eric Paris <eparis@redhat.com> 2009-07-07 16:28:23 Committer: Eric Paris <eparis@redhat.com> 2009-07-21 21:26:26 Parent: aea1f7964ae6cba5eb419a958956deb9016b3341 (Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes) Child: 75fe2b26394c59c8e16bd7b76f4be5d048103ad1 (inotify: do not leak inode marks in inotify_add_watch) Branch: master Follows: v2.6.31-rc3 Precedes: inotify: drop user watch count when a watch is removed The inotify rewrite forgot to drop the inotify watch use cound when a watch was removed. This means that a single inotify fd can only ever register a maximum of /proc/sys/fs/max_user_watches even if some of those had been freed. (which is exactly the same patch I sent. And if I looked (or just merged) at the Eric's inotify branch properly I should have seen that).
commit 5549f7cdf84c02939fd368d0842aa2f472bb6e98 "inotify: drop user watch count when a watch is removed"