Bug 14646 - Warning inotify_fsnotify.c:129 idr_callback+0x41/0x70()
Summary: Warning inotify_fsnotify.c:129 idr_callback+0x41/0x70()
Status: RESOLVED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Eric Paris
URL:
Keywords:
: 14912 15023 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-11-20 14:32 UTC by Tomas Simonaitis
Modified: 2012-06-18 12:41 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.31.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (23.27 KB, text/plain)
2009-11-20 14:32 UTC, Tomas Simonaitis
Details

Description Tomas Simonaitis 2009-11-20 14:32:07 UTC
Created attachment 23843 [details]
dmesg

It looks like this bug should be fixed by
9f0d793b52eb2266359661369ef6303838904855 in 2.6.31.6, however
I still got problems after upgrading to 2.6.31.6:

------------[ cut here ]------------
WARNING: at fs/notify/inotify/inotify_fsnotify.c:129 idr_callback+0x41/0x70()
Hardware name: P4SC8
inotify closing but id=0 for entry=f69d2c1c in group=ed23d480 still in idr.  Probably leaking memory
Modules linked in: sha1_generic arc4 ecb ppp_mppe ppp_async crc_ccitt ppp_generic slhc af_packet ipt_REDIRECT iptable_nat nf_nat xt_hashlimit xt_limit ipt_ULOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state iptable_filter ip_tables ip6table_filter ip6_tables x_tables dm_snapshot dm_mirror dm_region_hash dm_log dm_mod nf_conntrack_ftp nf_conntrack bonding ipv6 psmouse i2c_i801 skge tulip ehci_hcd i2c_core bitrev crc32 e1000 uhci_hcd rng_core fan floppy usbcore processor thermal thermal_sys button evdev unix
Pid: 11148, comm: imap Not tainted 2.6.31.6 #1
Call Trace:
 [<c10979d1>] ? idr_callback+0x41/0x70
 [<c10979d1>] ? idr_callback+0x41/0x70
 [<c1022f64>] warn_slowpath_common+0x66/0xb2
 [<c10979d1>] ? idr_callback+0x41/0x70
 [<c1022fed>] warn_slowpath_fmt+0x26/0x28
 [<c10979d1>] idr_callback+0x41/0x70
 [<c1140dd0>] idr_for_each+0x62/0xb0
 [<c1097990>] ? idr_callback+0x0/0x70
 [<c1095e9e>] ? fsnotify_put_event+0x39/0x51
 [<c109797f>] inotify_free_group_priv+0x15/0x26
 [<c1095faf>] fsnotify_final_destroy_group+0x19/0x23
 [<c1096075>] fsnotify_put_group+0x70/0x77
 [<c1097b5c>] inotify_release+0x19/0x26
 [<c1070669>] __fput+0xd3/0x1b8
 [<c1070769>] fput+0x1b/0x23
 [<c106dc44>] filp_close+0x41/0x5f
 [<c101588e>] ? do_page_fault+0x143/0x22e
 [<c106dccb>] sys_close+0x69/0xa8
 [<c1002c35>] syscall_call+0x7/0xb
---[ end trace b3c59749bdcc3e76 ]---
entry->group=(null) inode=(null) wd=1024

------------[ cut here ]------------
WARNING: at fs/notify/inotify/inotify_fsnotify.c:129 idr_callback+0x41/0x70()
Hardware name: P4SC8
inotify closing but id=0 for entry=e7f4cc94 in group=db4e3e80 still in idr.  Probably leaking memory
Modules linked in: sha1_generic arc4 ecb ppp_mppe ppp_async crc_ccitt ppp_generic slhc af_packet ipt_REDIRECT iptable_nat nf_nat xt_hashlimit xt_limit ipt_ULOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state iptable_filter ip_tables ip6table_filter ip6_tables x_tables dm_snapshot dm_mirror dm_region_hash dm_log dm_mod nf_conntrack_ftp nf_conntrack bonding ipv6 psmouse i2c_i801 skge tulip ehci_hcd i2c_core bitrev crc32 e1000 uhci_hcd rng_core fan floppy usbcore processor thermal thermal_sys button evdev unix
Pid: 27572, comm: imap Tainted: G        W  2.6.31.6 #1
Call Trace:
 [<c10979d1>] ? idr_callback+0x41/0x70
 [<c10979d1>] ? idr_callback+0x41/0x70
 [<c1022f64>] warn_slowpath_common+0x66/0xb2
 [<c10979d1>] ? idr_callback+0x41/0x70
 [<c1022fed>] warn_slowpath_fmt+0x26/0x28
 [<c10979d1>] idr_callback+0x41/0x70
 [<c1140dd0>] idr_for_each+0x62/0xb0
 [<c1097990>] ? idr_callback+0x0/0x70
 [<c1095e9e>] ? fsnotify_put_event+0x39/0x51
 [<c109797f>] inotify_free_group_priv+0x15/0x26
 [<c1095faf>] fsnotify_final_destroy_group+0x19/0x23
 [<c1096075>] fsnotify_put_group+0x70/0x77
 [<c1097b5c>] inotify_release+0x19/0x26
 [<c1070669>] __fput+0xd3/0x1b8
 [<c1070769>] fput+0x1b/0x23
 [<c106dc44>] filp_close+0x41/0x5f
 [<c101588e>] ? do_page_fault+0x143/0x22e
 [<c106dccb>] sys_close+0x69/0xa8
 [<c1002c35>] syscall_call+0x7/0xb
---[ end trace b3c59749bdcc3e77 ]---
entry->group=(null) inode=(null) wd=1024
Comment 1 Andrew Morton 2009-11-23 22:52:13 UTC
Hi, Eric :)
Comment 2 Gleboff 2009-12-13 19:20:06 UTC
This bug present in 2.6.31.7

WARNING: at fs/notify/inotify/inotify_fsnotify.c:129 idr_callback+0x4f/0x80()
Hardware name: EP35-DS3L
inotify closing but id=0 for entry=e047b5c0 in group=effd3080 still in idr.  Probably leaking memory
Pid: 18542, comm: kded Tainted: P        W  2.6.31.7-comp #1
Comment 3 Mark Granoff 2009-12-19 14:01:17 UTC
I see this problem a LOT in 2.6.31-16.

When it occurs, always at random intervals (could be days before I see it happen), it is often in the wee hours of the morning (say, right around the time when logs are rotated and daemons are restarted).

The result, however, is that my server's clock essentially slows to a crawl (skewing it horribly) and the system cannot be accessed over the network.

I can always "fix" this by going to the console and pressing enter. That's it. Except for resetting the time by hand, the system resumes normal operation, until the next time this Oops occurs.

Here's my trace, with additional comments following:

Dec 19 01:01:46 opus kernel: [356951.742091] WARNING: at /build/buildd/linux-2.6.31/fs/notify/inotify/inotify_fsnotify.c:129 idr_callback+0x4f/0x80()
Dec 19 01:01:46 opus kernel: [356951.742096] Hardware name: A7V400-MX
Dec 19 01:01:46 opus kernel: [356951.742100] inotify closing but id=0 for entry=c5a60440 in group=f3cb3100 still in idr.  Probably leaking memory
Dec 19 01:01:46 opus kernel: [356951.742104] Modules linked in: iptable_filter ip_tables i2c_viapro snd_via82xx snd_ac97_codec x_tables ac97_bus via_ircc snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_mpu401_uart snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi lp snd_seq_midi_event snd_seq snd_timer snd_seq_device ppdev irda snd via_agp shpchp soundcore agpgart parport_pc crc_ccitt parport ns558 gameport raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 via_rhine mii multipath linear
Dec 19 01:01:46 opus kernel: [356951.742156] Pid: 20047, comm: imap Not tainted 2.6.31-16-generic #53-Ubuntu
Dec 19 01:01:46 opus kernel: [356951.742160] Call Trace:
Dec 19 01:01:46 opus kernel: [356951.742175]  [<c014518d>] warn_slowpath_common+0x6d/0xa0
Dec 19 01:01:46 opus kernel: [356951.742180]  [<c021620f>] ? idr_callback+0x4f/0x80
Dec 19 01:01:46 opus kernel: [356951.742185]  [<c021620f>] ? idr_callback+0x4f/0x80
Dec 19 01:01:46 opus kernel: [356951.742190]  [<c0145206>] warn_slowpath_fmt+0x26/0x30
Dec 19 01:01:46 opus kernel: [356951.742195]  [<c021620f>] idr_callback+0x4f/0x80
Dec 19 01:01:46 opus kernel: [356951.742204]  [<c0310fe6>] idr_for_each+0x76/0xd0
Dec 19 01:01:46 opus kernel: [356951.742208]  [<c02161c0>] ? idr_callback+0x0/0x80
Dec 19 01:01:46 opus kernel: [356951.742213]  [<c0214357>] ? fsnotify_put_event+0x37/0x50
Dec 19 01:01:46 opus kernel: [356951.742218]  [<c0214401>] ? fsnotify_flush_notify+0x91/0xa0
Dec 19 01:01:46 opus kernel: [356951.742225]  [<c0570f18>] ? _spin_lock+0x8/0x10
Dec 19 01:01:46 opus kernel: [356951.742230]  [<c02161a5>] inotify_free_group_priv+0x15/0x30
Dec 19 01:01:46 opus kernel: [356951.742235]  [<c0214479>] fsnotify_final_destroy_group+0x19/0x30
Dec 19 01:01:46 opus kernel: [356951.742240]  [<c0214567>] fsnotify_put_group+0x77/0x80
Dec 19 01:01:46 opus kernel: [356951.742245]  [<c0216401>] inotify_release+0x21/0x40
Dec 19 01:01:46 opus kernel: [356951.742253]  [<c01e915a>] __fput+0xda/0x1f0
Dec 19 01:01:46 opus kernel: [356951.742257]  [<c01e9285>] fput+0x15/0x20
Dec 19 01:01:46 opus kernel: [356951.742262]  [<c01e56a7>] filp_close+0x47/0x70
Dec 19 01:01:46 opus kernel: [356951.742266]  [<c01e5743>] sys_close+0x73/0xb0
Dec 19 01:01:46 opus kernel: [356951.742272]  [<c010336c>] syscall_call+0x7/0xb
Dec 19 01:01:46 opus kernel: [356951.742276] ---[ end trace 511ccd8d09f7ef25 ]---
Dec 19 01:01:46 opus kernel: [356951.742280] entry->group=(null) inode=(null) wd=1024


I am REALLY surprised that this bug has not received Critical status and attention! It happens a LOT to a LOT of people! It is causing my system to be VERY unreliable.

I see on launchpad that some kind of fix has been committed and is slated for 2.6.32, but that kernel is not apparently going to be released until the Spring. I can't wait that long!

Further, it also appears that people who have tried that kernel already are reporting they are seeing this same problem STILL!

WE NEED THIS FIXED!!!! I cannot put too much emphasis on this!

PLEASE, PLEASE, fix this bug in 2.6.31 and newer once and for all!

Thank you.

-Mark
Comment 4 Eric Paris 2009-12-23 03:50:12 UTC
Hey Mark, I've spent the last 2 days working on this.  I'm a bit surprised about your clock skew, are you connecting your kernel output to a slow serial port?  I found a bug where it's possible we are pushing too much to the serial port which might do it.

As another note I can't seem to make my latest devel tree

http://git.infradead.org/users/eparis/notify.git/shortlog/refs/heads/for-linus

output any problems.  I'm not 100% confident in my reproducer, but maybe you can test that kernel an let me know your results?

If you are comfortable with patching and building your own kernel I can certainly fix the dmesg spew bug which is likely causing your clock skew issues....
Comment 5 Mark Granoff 2009-12-23 15:33:24 UTC
Hi Eric. Thanks for the response. (I am glad to know that folks actually work on these issues! :-) Great to make your acquantance!)

Here's some more info.

Having eliminated all the smoking guns I saw, my system "lag/slowdown/freeze" problem continues, in both kernels 2.6.31 and 2.6..28.

Interestingly, when it occurred yesterday, I was running the 2.6.28 kernel, and the occurance was NOT accompanied by a kernel oops.

I am beginning to believe that my clock skew and system unresponsiveness issue (which so far has always occurred randomly) has less (if anything) to do with the idr_callback bug, and more to do with "something else", where "something else" might be connected to an issue with: dynamic ticks, serial I/O, interrupt processing during idle times... who knows.

Just yesterday I found the following "fresh" post

http://ubuntuforums.org/showthread.php?p=8546746

which seems to be exactly what I have been experiencing. At least, it sounds pretty close. The suggestion in that thread was to add nohz=off to the kernel boot options. I did this last evening about 18:00 EST and booted the 2.6.31 kernel. Overnight, the system remains stable, and to this point today it's fine. But overnight stability does not indicate a solved problem, in my opinion, since I have seen my system run for several days before exhibiting the lag/slowdown/freeze.

As I noted in the the discussion over at ubuntuforums, I am wondering if there is something going on with interrupt processing, or if interrupt processing is being affected somehow by some other issue (perhaps the dmesg bug you mentioned). It is interesting that by pressing return on the console (which triggers an interrupt, yes?) everything returns to "normal". (And I have to reset the time manually.)

Regarding the dmesg issue you mentioned. I guess one smoking gun that I see consistently is that on the console I see output about daemons being restarted (e.g. when logs are rotated). That may be fodder for your theory about sending too much data to the console causing a clock skew issue. Log rotation and daemon restart can cause a lot of logging, and I always see my issue (it seems) when this rotation/restart cycle has occurred...

My console is connected through a KVM switch. There is no mouse attached to the system (it's a server without X running). The connections are PS2-style.

I am not comfortable building my own kernel. I did it once and had a dickens of a time getting back to the mainstream officially released stuff. Where this is a server on which I rely for my e-mail and some web sites I host, I am hesitant to move to far askew from "important stuff that is automatically updated", so to speak.

Now, if the dmesg fix can be provided in an installable package on top of my existing 2.6.31-16 kernel, I am willing to try that.

Barring that, is there a way to completely turn off console logging once the system is up? I am happy to have all that go into log files anyway.

If you have created a new bug report for the dmesg problem you identified and feel this discussion is better continued in that thread, please direct me to it.

Thanks for your efforts! I really appreciate it (as I am sure everyone else in the Ubuntu community does as well)!

-Mark
Comment 7 Eric Paris 2010-01-16 13:48:55 UTC

*** This bug has been marked as a duplicate of bug 14912 ***
Comment 8 Eric Paris 2010-01-16 13:50:26 UTC
*** Bug 14912 has been marked as a duplicate of this bug. ***
Comment 9 Eric Paris 2010-01-16 13:57:53 UTC
This problem should go away thanks to the changes in commit 9e572cc9877ee6c43af60778f6b8d5ba0692d935 which can already be found in Linus' git tree.  It has also been submitted to -stable.

That patch does NOT fix the real bug, which wasn't an inotify bug at all.  That patch changes how inotify uses the idr which should prevent the bug in lib/idr.c from causing us problems.  The real bug here was in the idr and was rather hard to find but my results of that can be found here:

http://marc.info/?l=linux-kernel&m=126359014911675&w=2

That patch needs more review from people who understand the code better than I so it has not been commit'd to Linus' tree yet.  But I think we are finally to the bottom of these issues and if you see problems using 2.6.33-rc5 or later please let us know!
Comment 10 Eric Paris 2010-01-16 14:02:33 UTC
*** Bug 15023 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.