From the crash report which I promise to provide here in a while, it seems like it has to do with nfs + automount that I use. For now let me attach dmesg and will try to log the dmesg with a crash in it. After compiling the kernel without the option in question, there are no more crashes.
Created attachment 50592 [details] dmesg
I used to overclock my CPU and after normalising my cpu settings the crashes disappeared. So, it seems like this is caused my my own settings. However, if anybody is reading, I would like to also point out that with the "Automatic process group scheduling" option on, my computer (which runs a AMD Phenom X6 1090T cpu) is much much worse in terms of performance when under load. When the system is under load, all the programs get really sluggish and firefox takes about 15 seconds to open. Turning off the option I get a much better performance. I was under the impression that this option will make things better when the cpu is under load. Am I missing something here? Also, I am attaching my config.
Created attachment 50662 [details] kernel config
Here is that crash happening again, the system was NOT running overclocked or anything... [ 1860.156122] ------------[ cut here ]------------ [ 1860.156124] kernel BUG at fs/dcache.c:943! [ 1860.156126] invalid opcode: 0000 [#1] SMP [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input [ 1860.156128] CPU 3 [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables x_tables nvidia(P) [ 1860.156137] [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P 2.6.38-rc8 #9 Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] [<ffffffff810e9648>] shrink_dcache_for_umount_subtree+0x268/0x270 [ 1860.156147] RSP: 0018:ffff8800be82fe08 EFLAGS: 00010296 [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: 000000000003ffff [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: ffffffff8174c9f8 [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: 0000000000000006 [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023a07f5e0 [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: ffff880211d38740 [ 1860.156155] FS: 00007f3428cb2700(0000) GS:ffff8800bfac0000(0000) knlGS:00000000f74186c0 [ 1860.156156] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: 00000000000006e0 [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo ffff8800be82e000, task ffff880211fd5640) [ 1860.156162] Stack: [ 1860.156163] ffff88020c05ce50 0000000000000000 ffff88023fc07128 ffff88020c05cc00 [ 1860.156165] ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300 ffffffff810e96a4 [ 1860.156167] ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0 ffffffff810d5d15 [ 1860.156169] Call Trace: [ 1860.156172] [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60 [ 1860.156174] [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100 [ 1860.156176] [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40 [ 1860.156179] [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20 [ 1860.156181] [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70 [ 1860.156183] [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90 [ 1860.156185] [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0 [ 1860.156187] [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05 50 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00 <0f> 0b eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 [ 1860.156201] RIP [<ffffffff810e9648>] shrink_dcache_for_umount_subtree+0x268/0x270 [ 1860.156204] RSP <ffff8800be82fe08> [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). Seems that we have a nasty involving autofs, nfs and the VFS. Mehmet, the kernel should have printed some diagnostics prior to doing the BUG() call: if (dentry->d_count != 0) { printk(KERN_ERR "BUG: Dentry %p{i=%lx,n=%s}" " still in use (%d)" " [unmount of %s %s]\n", dentry, dentry->d_inode ? dentry->d_inode->i_ino : 0UL, dentry->d_name.name, dentry->d_count, dentry->d_sb->s_type->name, dentry->d_sb->s_id); BUG(); } Please find those in the log and email them to use - someone might find it useful. On Tue, 15 Mar 2011 21:02:23 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=30882 > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> 2011-03-15 21:02:22 > --- > Here is that crash happening again, the system was NOT running overclocked or > anything... > > [ 1860.156122] ------------[ cut here ]------------ > [ 1860.156124] kernel BUG at fs/dcache.c:943! > [ 1860.156126] invalid opcode: 0000 [#1] SMP > [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input > [ 1860.156128] CPU 3 > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables x_tables > nvidia(P) > [ 1860.156137] > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P 2.6.38-rc8 > #9 > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] [<ffffffff810e9648>] > shrink_dcache_for_umount_subtree+0x268/0x270 > [ 1860.156147] RSP: 0018:ffff8800be82fe08 EFLAGS: 00010296 > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: > 000000000003ffff > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: > ffffffff8174c9f8 > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: > 0000000000000006 > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: > ffff88023a07f5e0 > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: > ffff880211d38740 > [ 1860.156155] FS: 00007f3428cb2700(0000) GS:ffff8800bfac0000(0000) > knlGS:00000000f74186c0 > [ 1860.156156] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: > 00000000000006e0 > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo ffff8800be82e000, > task > ffff880211fd5640) > [ 1860.156162] Stack: > [ 1860.156163] ffff88020c05ce50 0000000000000000 ffff88023fc07128 > ffff88020c05cc00 > [ 1860.156165] ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300 > ffffffff810e96a4 > [ 1860.156167] ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0 > ffffffff810d5d15 > [ 1860.156169] Call Trace: > [ 1860.156172] [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60 > [ 1860.156174] [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100 > [ 1860.156176] [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40 > [ 1860.156179] [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20 > [ 1860.156181] [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70 > [ 1860.156183] [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90 > [ 1860.156185] [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0 > [ 1860.156187] [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05 50 > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00 <0f> > 0b > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 > [ 1860.156201] RIP [<ffffffff810e9648>] > shrink_dcache_for_umount_subtree+0x268/0x270 > [ 1860.156204] RSP <ffff8800be82fe08> > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]--- > > -- > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug.
Reply-To: mgiritli@giritli.eu Ops, sorry about that. The missing piece is as follows: Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f] On Tue, 2011-03-15 at 21:25 +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=30882 > > > > > > --- Comment #5 from Andrew Morton <akpm@linux-foundation.org> 2011-03-15 > 21:25:07 --- > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > Seems that we have a nasty involving autofs, nfs and the VFS. > > Mehmet, the kernel should have printed some diagnostics prior to doing > the BUG() call: > > if (dentry->d_count != 0) { > printk(KERN_ERR > "BUG: Dentry %p{i=%lx,n=%s}" > " still in use (%d)" > " [unmount of %s %s]\n", > dentry, > dentry->d_inode ? > dentry->d_inode->i_ino : 0UL, > dentry->d_name.name, > dentry->d_count, > dentry->d_sb->s_type->name, > dentry->d_sb->s_id); > BUG(); > } > > Please find those in the log and email them to use - someone might find > it useful. > > > On Tue, 15 Mar 2011 21:02:23 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882 > > > > > > > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> 2011-03-15 21:02:22 > --- > > Here is that crash happening again, the system was NOT running overclocked > or > > anything... > > > > [ 1860.156122] ------------[ cut here ]------------ > > [ 1860.156124] kernel BUG at fs/dcache.c:943! > > [ 1860.156126] invalid opcode: 0000 [#1] SMP > > [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input > > [ 1860.156128] CPU 3 > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables > x_tables > > nvidia(P) > > [ 1860.156137] > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P 2.6.38-rc8 > #9 > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] [<ffffffff810e9648>] > > shrink_dcache_for_umount_subtree+0x268/0x270 > > [ 1860.156147] RSP: 0018:ffff8800be82fe08 EFLAGS: 00010296 > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: > > 000000000003ffff > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: > > ffffffff8174c9f8 > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: > > 0000000000000006 > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: > > ffff88023a07f5e0 > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: > > ffff880211d38740 > > [ 1860.156155] FS: 00007f3428cb2700(0000) GS:ffff8800bfac0000(0000) > > knlGS:00000000f74186c0 > > [ 1860.156156] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: > > 00000000000006e0 > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > 0000000000000400 > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo ffff8800be82e000, > task > > ffff880211fd5640) > > [ 1860.156162] Stack: > > [ 1860.156163] ffff88020c05ce50 0000000000000000 ffff88023fc07128 > > ffff88020c05cc00 > > [ 1860.156165] ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300 > > ffffffff810e96a4 > > [ 1860.156167] ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0 > > ffffffff810d5d15 > > [ 1860.156169] Call Trace: > > [ 1860.156172] [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60 > > [ 1860.156174] [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100 > > [ 1860.156176] [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40 > > [ 1860.156179] [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20 > > [ 1860.156181] [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70 > > [ 1860.156183] [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90 > > [ 1860.156185] [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0 > > [ 1860.156187] [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05 > 50 > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00 > <0f> 0b > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 > > [ 1860.156201] RIP [<ffffffff810e9648>] > > shrink_dcache_for_umount_subtree+0x268/0x270 > > [ 1860.156204] RSP <ffff8800be82fe08> > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]--- > > > > -- > > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You are on the CC list for the bug. >
Reply-To: mgiritli@giritli.eu The missing piece is as follows: Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f] (sorry for the inconvenience Andrew) On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > Seems that we have a nasty involving autofs, nfs and the VFS. > > Mehmet, the kernel should have printed some diagnostics prior to doing > the BUG() call: > > if (dentry->d_count != 0) { > printk(KERN_ERR > "BUG: Dentry %p{i=%lx,n=%s}" > " still in use (%d)" > " [unmount of %s %s]\n", > dentry, > dentry->d_inode ? > dentry->d_inode->i_ino : 0UL, > dentry->d_name.name, > dentry->d_count, > dentry->d_sb->s_type->name, > dentry->d_sb->s_id); > BUG(); > } > > Please find those in the log and email them to use - someone might find > it useful. > > > On Tue, 15 Mar 2011 21:02:23 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882 > > > > > > > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> 2011-03-15 21:02:22 > --- > > Here is that crash happening again, the system was NOT running overclocked > or > > anything... > > > > [ 1860.156122] ------------[ cut here ]------------ > > [ 1860.156124] kernel BUG at fs/dcache.c:943! > > [ 1860.156126] invalid opcode: 0000 [#1] SMP > > [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input > > [ 1860.156128] CPU 3 > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables > x_tables > > nvidia(P) > > [ 1860.156137] > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P 2.6.38-rc8 > #9 > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] [<ffffffff810e9648>] > > shrink_dcache_for_umount_subtree+0x268/0x270 > > [ 1860.156147] RSP: 0018:ffff8800be82fe08 EFLAGS: 00010296 > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: > > 000000000003ffff > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: > > ffffffff8174c9f8 > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: > > 0000000000000006 > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: > > ffff88023a07f5e0 > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: > > ffff880211d38740 > > [ 1860.156155] FS: 00007f3428cb2700(0000) GS:ffff8800bfac0000(0000) > > knlGS:00000000f74186c0 > > [ 1860.156156] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: > > 00000000000006e0 > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > 0000000000000400 > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo ffff8800be82e000, > task > > ffff880211fd5640) > > [ 1860.156162] Stack: > > [ 1860.156163] ffff88020c05ce50 0000000000000000 ffff88023fc07128 > > ffff88020c05cc00 > > [ 1860.156165] ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300 > > ffffffff810e96a4 > > [ 1860.156167] ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0 > > ffffffff810d5d15 > > [ 1860.156169] Call Trace: > > [ 1860.156172] [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60 > > [ 1860.156174] [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100 > > [ 1860.156176] [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40 > > [ 1860.156179] [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20 > > [ 1860.156181] [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70 > > [ 1860.156183] [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90 > > [ 1860.156185] [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0 > > [ 1860.156187] [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05 > 50 > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00 > <0f> 0b > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 > > [ 1860.156201] RIP [<ffffffff810e9648>] > > shrink_dcache_for_umount_subtree+0x268/0x270 > > [ 1860.156204] RSP <ffff8800be82fe08> > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]--- > > > > -- > > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email > > ------- You are receiving this mail because: ------- > > You are on the CC list for the bug. >
On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote: > The missing piece is as follows: > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f] This might be the same problem I saw and described in rc1. However, for me the fs in the BUG() report was autofs. Hopefully that just means my autofs setup is different. At the moment I believe a dentry leak Al Viro spotted is the cause. Please try this patch. autofs4 - fix dentry leak in autofs4_expire_direct() From: Ian Kent <raven@themaw.net> There is a missing dput() when returning from autofs4_expire_direct() when we see that the dentry is already a pending mount. Signed-off-by: Ian Kent <raven@themaw.net> --- fs/autofs4/expire.c | 7 +++---- 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c index c896dd6..c403abc 100644 --- a/fs/autofs4/expire.c +++ b/fs/autofs4/expire.c @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct super_block *sb, spin_lock(&sbi->fs_lock); ino = autofs4_dentry_ino(root); /* No point expiring a pending mount */ - if (ino->flags & AUTOFS_INF_PENDING) { - spin_unlock(&sbi->fs_lock); - return NULL; - } + if (ino->flags & AUTOFS_INF_PENDING) + goto out; if (!autofs4_direct_busy(mnt, root, timeout, do_now)) { struct autofs_info *ino = autofs4_dentry_ino(root); ino->flags |= AUTOFS_INF_EXPIRING; @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct super_block *sb, spin_unlock(&sbi->fs_lock); return root; } +out: spin_unlock(&sbi->fs_lock); dput(root); > > (sorry for the inconvenience Andrew) > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > Seems that we have a nasty involving autofs, nfs and the VFS. > > > > Mehmet, the kernel should have printed some diagnostics prior to doing > > the BUG() call: > > > > if (dentry->d_count != 0) { > > printk(KERN_ERR > > "BUG: Dentry %p{i=%lx,n=%s}" > > " still in use (%d)" > > " [unmount of %s %s]\n", > > dentry, > > dentry->d_inode ? > > dentry->d_inode->i_ino : 0UL, > > dentry->d_name.name, > > dentry->d_count, > > dentry->d_sb->s_type->name, > > dentry->d_sb->s_id); > > BUG(); > > } > > > > Please find those in the log and email them to use - someone might find > > it useful. > > > > > > On Tue, 15 Mar 2011 21:02:23 GMT > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882 > > > > > > > > > > > > > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> 2011-03-15 > 21:02:22 --- > > > Here is that crash happening again, the system was NOT running > overclocked or > > > anything... > > > > > > [ 1860.156122] ------------[ cut here ]------------ > > > [ 1860.156124] kernel BUG at fs/dcache.c:943! > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP > > > [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input > > > [ 1860.156128] CPU 3 > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat > ipt_LOG > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables > x_tables > > > nvidia(P) > > > [ 1860.156137] > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P > 2.6.38-rc8 #9 > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] [<ffffffff810e9648>] > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08 EFLAGS: 00010296 > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: > > > 000000000003ffff > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: > > > ffffffff8174c9f8 > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: > > > 0000000000000006 > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: > > > ffff88023a07f5e0 > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: > > > ffff880211d38740 > > > [ 1860.156155] FS: 00007f3428cb2700(0000) GS:ffff8800bfac0000(0000) > > > knlGS:00000000f74186c0 > > > [ 1860.156156] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: > > > 00000000000006e0 > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > 0000000000000000 > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > > 0000000000000400 > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo > ffff8800be82e000, task > > > ffff880211fd5640) > > > [ 1860.156162] Stack: > > > [ 1860.156163] ffff88020c05ce50 0000000000000000 ffff88023fc07128 > > > ffff88020c05cc00 > > > [ 1860.156165] ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300 > > > ffffffff810e96a4 > > > [ 1860.156167] ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0 > > > ffffffff810d5d15 > > > [ 1860.156169] Call Trace: > > > [ 1860.156172] [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60 > > > [ 1860.156174] [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100 > > > [ 1860.156176] [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40 > > > [ 1860.156179] [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20 > > > [ 1860.156181] [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70 > > > [ 1860.156183] [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90 > > > [ 1860.156185] [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0 > > > [ 1860.156187] [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 > 05 50 > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00 > <0f> 0b > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 > > > [ 1860.156201] RIP [<ffffffff810e9648>] > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > [ 1860.156204] RSP <ffff8800be82fe08> > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]--- > > > > > > -- > > > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email > > > ------- You are receiving this mail because: ------- > > > You are on the CC list for the bug. > > > >
Reply-To: mgiritli@giritli.eu Ian, I am having much more frequent crashes now. I havent been able to cleanly reboot my machine yet and I have tried three times so far. Init scripts fail to unmount the file systems and I have to reboot manually On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote: > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote: > > The missing piece is as follows: > > > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f] > > This might be the same problem I saw and described in rc1. > However, for me the fs in the BUG() report was autofs. > Hopefully that just means my autofs setup is different. > > At the moment I believe a dentry leak Al Viro spotted is the cause. > Please try this patch. > > autofs4 - fix dentry leak in autofs4_expire_direct() > > From: Ian Kent <raven@themaw.net> > > There is a missing dput() when returning from autofs4_expire_direct() > when we see that the dentry is already a pending mount. > > Signed-off-by: Ian Kent <raven@themaw.net> > --- > > fs/autofs4/expire.c | 7 +++---- > 1 files changed, 3 insertions(+), 4 deletions(-) > > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c > index c896dd6..c403abc 100644 > --- a/fs/autofs4/expire.c > +++ b/fs/autofs4/expire.c > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct super_block > *sb, > spin_lock(&sbi->fs_lock); > ino = autofs4_dentry_ino(root); > /* No point expiring a pending mount */ > - if (ino->flags & AUTOFS_INF_PENDING) { > - spin_unlock(&sbi->fs_lock); > - return NULL; > - } > + if (ino->flags & AUTOFS_INF_PENDING) > + goto out; > if (!autofs4_direct_busy(mnt, root, timeout, do_now)) { > struct autofs_info *ino = autofs4_dentry_ino(root); > ino->flags |= AUTOFS_INF_EXPIRING; > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct super_block > *sb, > spin_unlock(&sbi->fs_lock); > return root; > } > +out: > spin_unlock(&sbi->fs_lock); > dput(root); > > > > > > (sorry for the inconvenience Andrew) > > > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote: > > > (switched to email. Please respond via emailed reply-to-all, not via the > > > bugzilla web interface). > > > > > > Seems that we have a nasty involving autofs, nfs and the VFS. > > > > > > Mehmet, the kernel should have printed some diagnostics prior to doing > > > the BUG() call: > > > > > > if (dentry->d_count != 0) { > > > printk(KERN_ERR > > > "BUG: Dentry %p{i=%lx,n=%s}" > > > " still in use (%d)" > > > " [unmount of %s %s]\n", > > > dentry, > > > dentry->d_inode ? > > > dentry->d_inode->i_ino : 0UL, > > > dentry->d_name.name, > > > dentry->d_count, > > > dentry->d_sb->s_type->name, > > > dentry->d_sb->s_id); > > > BUG(); > > > } > > > > > > Please find those in the log and email them to use - someone might find > > > it useful. > > > > > > > > > On Tue, 15 Mar 2011 21:02:23 GMT > > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882 > > > > > > > > > > > > > > > > > > > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> 2011-03-15 > 21:02:22 --- > > > > Here is that crash happening again, the system was NOT running > overclocked or > > > > anything... > > > > > > > > [ 1860.156122] ------------[ cut here ]------------ > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943! > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP > > > > [ 1860.156127] last sysfs file: > /sys/devices/platform/it87.552/fan3_input > > > > [ 1860.156128] CPU 3 > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat > ipt_LOG > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac > iptable_filter > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables > x_tables > > > > nvidia(P) > > > > [ 1860.156137] > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P > 2.6.38-rc8 #9 > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] [<ffffffff810e9648>] > > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08 EFLAGS: 00010296 > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: > > > > 000000000003ffff > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: > > > > ffffffff8174c9f8 > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: > > > > 0000000000000006 > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: > > > > ffff88023a07f5e0 > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: > > > > ffff880211d38740 > > > > [ 1860.156155] FS: 00007f3428cb2700(0000) GS:ffff8800bfac0000(0000) > > > > knlGS:00000000f74186c0 > > > > [ 1860.156156] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: > > > > 00000000000006e0 > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > > 0000000000000000 > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > > > 0000000000000400 > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo > ffff8800be82e000, task > > > > ffff880211fd5640) > > > > [ 1860.156162] Stack: > > > > [ 1860.156163] ffff88020c05ce50 0000000000000000 ffff88023fc07128 > > > > ffff88020c05cc00 > > > > [ 1860.156165] ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300 > > > > ffffffff810e96a4 > > > > [ 1860.156167] ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0 > > > > ffffffff810d5d15 > > > > [ 1860.156169] Call Trace: > > > > [ 1860.156172] [<ffffffff810e96a4>] ? > shrink_dcache_for_umount+0x54/0x60 > > > > [ 1860.156174] [<ffffffff810d5d15>] ? > generic_shutdown_super+0x25/0x100 > > > > [ 1860.156176] [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40 > > > > [ 1860.156179] [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20 > > > > [ 1860.156181] [<ffffffff810d5f13>] ? > deactivate_locked_super+0x43/0x70 > > > > [ 1860.156183] [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90 > > > > [ 1860.156185] [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0 > > > > [ 1860.156187] [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 > 05 50 > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00 > <0f> 0b > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 > > > > [ 1860.156201] RIP [<ffffffff810e9648>] > > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > > [ 1860.156204] RSP <ffff8800be82fe08> > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]--- > > > > > > > > -- > > > > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email > > > > ------- You are receiving this mail because: ------- > > > > You are on the CC list for the bug. > > > > > > > > >
On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote: > Ian, > > I am having much more frequent crashes now. I havent been able to > cleanly reboot my machine yet and I have tried three times so far. Init > scripts fail to unmount the file systems and I have to reboot manually What do your autofs maps look like? > > On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote: > > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote: > > > The missing piece is as follows: > > > > > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry > > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f] > > > > This might be the same problem I saw and described in rc1. > > However, for me the fs in the BUG() report was autofs. > > Hopefully that just means my autofs setup is different. > > > > At the moment I believe a dentry leak Al Viro spotted is the cause. > > Please try this patch. > > > > autofs4 - fix dentry leak in autofs4_expire_direct() > > > > From: Ian Kent <raven@themaw.net> > > > > There is a missing dput() when returning from autofs4_expire_direct() > > when we see that the dentry is already a pending mount. > > > > Signed-off-by: Ian Kent <raven@themaw.net> > > --- > > > > fs/autofs4/expire.c | 7 +++---- > > 1 files changed, 3 insertions(+), 4 deletions(-) > > > > > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c > > index c896dd6..c403abc 100644 > > --- a/fs/autofs4/expire.c > > +++ b/fs/autofs4/expire.c > > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct > super_block *sb, > > spin_lock(&sbi->fs_lock); > > ino = autofs4_dentry_ino(root); > > /* No point expiring a pending mount */ > > - if (ino->flags & AUTOFS_INF_PENDING) { > > - spin_unlock(&sbi->fs_lock); > > - return NULL; > > - } > > + if (ino->flags & AUTOFS_INF_PENDING) > > + goto out; > > if (!autofs4_direct_busy(mnt, root, timeout, do_now)) { > > struct autofs_info *ino = autofs4_dentry_ino(root); > > ino->flags |= AUTOFS_INF_EXPIRING; > > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct super_block > *sb, > > spin_unlock(&sbi->fs_lock); > > return root; > > } > > +out: > > spin_unlock(&sbi->fs_lock); > > dput(root); > > > > > > > > > > (sorry for the inconvenience Andrew) > > > > > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote: > > > > (switched to email. Please respond via emailed reply-to-all, not via > the > > > > bugzilla web interface). > > > > > > > > Seems that we have a nasty involving autofs, nfs and the VFS. > > > > > > > > Mehmet, the kernel should have printed some diagnostics prior to doing > > > > the BUG() call: > > > > > > > > if (dentry->d_count != 0) { > > > > printk(KERN_ERR > > > > "BUG: Dentry %p{i=%lx,n=%s}" > > > > " still in use (%d)" > > > > " [unmount of %s %s]\n", > > > > dentry, > > > > dentry->d_inode ? > > > > dentry->d_inode->i_ino : 0UL, > > > > dentry->d_name.name, > > > > dentry->d_count, > > > > dentry->d_sb->s_type->name, > > > > dentry->d_sb->s_id); > > > > BUG(); > > > > } > > > > > > > > Please find those in the log and email them to use - someone might find > > > > it useful. > > > > > > > > > > > > On Tue, 15 Mar 2011 21:02:23 GMT > > > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> 2011-03-15 > 21:02:22 --- > > > > > Here is that crash happening again, the system was NOT running > overclocked or > > > > > anything... > > > > > > > > > > [ 1860.156122] ------------[ cut here ]------------ > > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943! > > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP > > > > > [ 1860.156127] last sysfs file: > /sys/devices/platform/it87.552/fan3_input > > > > > [ 1860.156128] CPU 3 > > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat > ipt_LOG > > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac > iptable_filter > > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables > x_tables > > > > > nvidia(P) > > > > > [ 1860.156137] > > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P > 2.6.38-rc8 #9 > > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 > > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] [<ffffffff810e9648>] > > > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08 EFLAGS: 00010296 > > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: > > > > > 000000000003ffff > > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: > > > > > ffffffff8174c9f8 > > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: > > > > > 0000000000000006 > > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: > > > > > ffff88023a07f5e0 > > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: > > > > > ffff880211d38740 > > > > > [ 1860.156155] FS: 00007f3428cb2700(0000) GS:ffff8800bfac0000(0000) > > > > > knlGS:00000000f74186c0 > > > > > [ 1860.156156] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: > > > > > 00000000000006e0 > > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > > > 0000000000000000 > > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > > > > 0000000000000400 > > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo > ffff8800be82e000, task > > > > > ffff880211fd5640) > > > > > [ 1860.156162] Stack: > > > > > [ 1860.156163] ffff88020c05ce50 0000000000000000 ffff88023fc07128 > > > > > ffff88020c05cc00 > > > > > [ 1860.156165] ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300 > > > > > ffffffff810e96a4 > > > > > [ 1860.156167] ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0 > > > > > ffffffff810d5d15 > > > > > [ 1860.156169] Call Trace: > > > > > [ 1860.156172] [<ffffffff810e96a4>] ? > shrink_dcache_for_umount+0x54/0x60 > > > > > [ 1860.156174] [<ffffffff810d5d15>] ? > generic_shutdown_super+0x25/0x100 > > > > > [ 1860.156176] [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40 > > > > > [ 1860.156179] [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20 > > > > > [ 1860.156181] [<ffffffff810d5f13>] ? > deactivate_locked_super+0x43/0x70 > > > > > [ 1860.156183] [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90 > > > > > [ 1860.156185] [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0 > > > > > [ 1860.156187] [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b > > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 > 48 05 50 > > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 > 00 <0f> 0b > > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 > > > > > [ 1860.156201] RIP [<ffffffff810e9648>] > > > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > > > [ 1860.156204] RSP <ffff8800be82fe08> > > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]--- > > > > > > > > > > -- > > > > > Configure bugmail: > https://bugzilla.kernel.org/userprefs.cgi?tab=email > > > > > ------- You are receiving this mail because: ------- > > > > > You are on the CC list for the bug. > > > > > > > > > > > > > > > >
Reply-To: mgiritli@giritli.eu On Wed, 2011-03-16 at 23:21 +0800, Ian Kent wrote: > On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote: > > Ian, > > > > I am having much more frequent crashes now. I havent been able to > > cleanly reboot my machine yet and I have tried three times so far. Init > > scripts fail to unmount the file systems and I have to reboot manually > > What do your autofs maps look like? > > Here is the contents of my auto.misc: gollum-media -rsize=8192,wsize=8192,soft,timeo=10,rw gollum.giritli.eu:/mnt/media gollum-distfiles -rsize=8192,wsize=8192,soft,timeo=10,rw gollum.giritli.eu:/usr/portage/distfiles gollum-www -rsize=8192,wsize=8192,soft,timeo=10,rw gollum.giritli.eu:/var/www gollum-WebDav -rsize=8192,wsize=8192,soft,timeo=10,rw gollum.giritli.eu:/var/dav > > > > On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote: > > > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote: > > > > The missing piece is as follows: > > > > > > > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry > > > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f] > > > > > > This might be the same problem I saw and described in rc1. > > > However, for me the fs in the BUG() report was autofs. > > > Hopefully that just means my autofs setup is different. > > > > > > At the moment I believe a dentry leak Al Viro spotted is the cause. > > > Please try this patch. > > > > > > autofs4 - fix dentry leak in autofs4_expire_direct() > > > > > > From: Ian Kent <raven@themaw.net> > > > > > > There is a missing dput() when returning from autofs4_expire_direct() > > > when we see that the dentry is already a pending mount. > > > > > > Signed-off-by: Ian Kent <raven@themaw.net> > > > --- > > > > > > fs/autofs4/expire.c | 7 +++---- > > > 1 files changed, 3 insertions(+), 4 deletions(-) > > > > > > > > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c > > > index c896dd6..c403abc 100644 > > > --- a/fs/autofs4/expire.c > > > +++ b/fs/autofs4/expire.c > > > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct > super_block *sb, > > > spin_lock(&sbi->fs_lock); > > > ino = autofs4_dentry_ino(root); > > > /* No point expiring a pending mount */ > > > - if (ino->flags & AUTOFS_INF_PENDING) { > > > - spin_unlock(&sbi->fs_lock); > > > - return NULL; > > > - } > > > + if (ino->flags & AUTOFS_INF_PENDING) > > > + goto out; > > > if (!autofs4_direct_busy(mnt, root, timeout, do_now)) { > > > struct autofs_info *ino = autofs4_dentry_ino(root); > > > ino->flags |= AUTOFS_INF_EXPIRING; > > > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct > super_block *sb, > > > spin_unlock(&sbi->fs_lock); > > > return root; > > > } > > > +out: > > > spin_unlock(&sbi->fs_lock); > > > dput(root); > > > > > > > > > > > > > > (sorry for the inconvenience Andrew) > > > > > > > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote: > > > > > (switched to email. Please respond via emailed reply-to-all, not via > the > > > > > bugzilla web interface). > > > > > > > > > > Seems that we have a nasty involving autofs, nfs and the VFS. > > > > > > > > > > Mehmet, the kernel should have printed some diagnostics prior to > doing > > > > > the BUG() call: > > > > > > > > > > if (dentry->d_count != 0) { > > > > > printk(KERN_ERR > > > > > "BUG: Dentry %p{i=%lx,n=%s}" > > > > > " still in use (%d)" > > > > > " [unmount of %s %s]\n", > > > > > dentry, > > > > > dentry->d_inode ? > > > > > dentry->d_inode->i_ino : 0UL, > > > > > dentry->d_name.name, > > > > > dentry->d_count, > > > > > dentry->d_sb->s_type->name, > > > > > dentry->d_sb->s_id); > > > > > BUG(); > > > > > } > > > > > > > > > > Please find those in the log and email them to use - someone might > find > > > > > it useful. > > > > > > > > > > > > > > > On Tue, 15 Mar 2011 21:02:23 GMT > > > > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> 2011-03-15 > 21:02:22 --- > > > > > > Here is that crash happening again, the system was NOT running > overclocked or > > > > > > anything... > > > > > > > > > > > > [ 1860.156122] ------------[ cut here ]------------ > > > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943! > > > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP > > > > > > [ 1860.156127] last sysfs file: > /sys/devices/platform/it87.552/fan3_input > > > > > > [ 1860.156128] CPU 3 > > > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat > ipt_LOG > > > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac > iptable_filter > > > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack > ip_tables x_tables > > > > > > nvidia(P) > > > > > > [ 1860.156137] > > > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P > 2.6.38-rc8 #9 > > > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 > > > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] [<ffffffff810e9648>] > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08 EFLAGS: 00010296 > > > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: > > > > > > 000000000003ffff > > > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: > > > > > > ffffffff8174c9f8 > > > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: > > > > > > 0000000000000006 > > > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: > > > > > > ffff88023a07f5e0 > > > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: > > > > > > ffff880211d38740 > > > > > > [ 1860.156155] FS: 00007f3428cb2700(0000) > GS:ffff8800bfac0000(0000) > > > > > > knlGS:00000000f74186c0 > > > > > > [ 1860.156156] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: > > > > > > 00000000000006e0 > > > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > > > > 0000000000000000 > > > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > > > > > 0000000000000400 > > > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo > ffff8800be82e000, task > > > > > > ffff880211fd5640) > > > > > > [ 1860.156162] Stack: > > > > > > [ 1860.156163] ffff88020c05ce50 0000000000000000 ffff88023fc07128 > > > > > > ffff88020c05cc00 > > > > > > [ 1860.156165] ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300 > > > > > > ffffffff810e96a4 > > > > > > [ 1860.156167] ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0 > > > > > > ffffffff810d5d15 > > > > > > [ 1860.156169] Call Trace: > > > > > > [ 1860.156172] [<ffffffff810e96a4>] ? > shrink_dcache_for_umount+0x54/0x60 > > > > > > [ 1860.156174] [<ffffffff810d5d15>] ? > generic_shutdown_super+0x25/0x100 > > > > > > [ 1860.156176] [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40 > > > > > > [ 1860.156179] [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20 > > > > > > [ 1860.156181] [<ffffffff810d5f13>] ? > deactivate_locked_super+0x43/0x70 > > > > > > [ 1860.156183] [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90 > > > > > > [ 1860.156185] [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0 > > > > > > [ 1860.156187] [<ffffffff8100243b>] ? > system_call_fastpath+0x16/0x1b > > > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 > 00 48 05 50 > > > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc > 35 00 <0f> 0b > > > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 > > > > > > [ 1860.156201] RIP [<ffffffff810e9648>] > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > > > > [ 1860.156204] RSP <ffff8800be82fe08> > > > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]--- > > > > > > > > > > > > -- > > > > > > Configure bugmail: > https://bugzilla.kernel.org/userprefs.cgi?tab=email > > > > > > ------- You are receiving this mail because: ------- > > > > > > You are on the CC list for the bug. > > > > > > > > > > > > > > > > > > > > > > > > >
On Wed, 2011-03-16 at 17:29 +0200, Mehmet Giritli wrote: > On Wed, 2011-03-16 at 23:21 +0800, Ian Kent wrote: > > On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote: > > > Ian, > > > > > > I am having much more frequent crashes now. I havent been able to > > > cleanly reboot my machine yet and I have tried three times so far. Init > > > scripts fail to unmount the file systems and I have to reboot manually > > > > What do your autofs maps look like? > > > > > > Here is the contents of my auto.misc: > > gollum-media -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/mnt/media > gollum-distfiles -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/usr/portage/distfiles > gollum-www -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/var/www > gollum-WebDav -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/var/dav What, that's it, and your only using "/misc /etc/auto.misc" in the master map and your having problems. Are the crashes always the same? How have you established that the BUG()s are in fact due to automount umounting mounts and that the BUG()s correspond to NFS mounts previously mounted by autofs? Is there any noise at all in the syslog? Are you sure your using a kernel with the dentry leak patch? What sort of automounting load is happening on the machine, ie. frequency or mounts and umounts and what timeout are you using? The dentry leak patch got rid of the BUG()s I was seeing but by that time I did have a couple of other patches. I still don't think the other patches made much difference for this particular case. > > > > > > > On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote: > > > > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote: > > > > > The missing piece is as follows: > > > > > > > > > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry > > > > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f] > > > > > > > > This might be the same problem I saw and described in rc1. > > > > However, for me the fs in the BUG() report was autofs. > > > > Hopefully that just means my autofs setup is different. > > > > > > > > At the moment I believe a dentry leak Al Viro spotted is the cause. > > > > Please try this patch. > > > > > > > > autofs4 - fix dentry leak in autofs4_expire_direct() > > > > > > > > From: Ian Kent <raven@themaw.net> > > > > > > > > There is a missing dput() when returning from autofs4_expire_direct() > > > > when we see that the dentry is already a pending mount. > > > > > > > > Signed-off-by: Ian Kent <raven@themaw.net> > > > > --- > > > > > > > > fs/autofs4/expire.c | 7 +++---- > > > > 1 files changed, 3 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c > > > > index c896dd6..c403abc 100644 > > > > --- a/fs/autofs4/expire.c > > > > +++ b/fs/autofs4/expire.c > > > > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct > super_block *sb, > > > > spin_lock(&sbi->fs_lock); > > > > ino = autofs4_dentry_ino(root); > > > > /* No point expiring a pending mount */ > > > > - if (ino->flags & AUTOFS_INF_PENDING) { > > > > - spin_unlock(&sbi->fs_lock); > > > > - return NULL; > > > > - } > > > > + if (ino->flags & AUTOFS_INF_PENDING) > > > > + goto out; > > > > if (!autofs4_direct_busy(mnt, root, timeout, do_now)) { > > > > struct autofs_info *ino = autofs4_dentry_ino(root); > > > > ino->flags |= AUTOFS_INF_EXPIRING; > > > > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct > super_block *sb, > > > > spin_unlock(&sbi->fs_lock); > > > > return root; > > > > } > > > > +out: > > > > spin_unlock(&sbi->fs_lock); > > > > dput(root); > > > > > > > > > > > > > > > > > > (sorry for the inconvenience Andrew) > > > > > > > > > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote: > > > > > > (switched to email. Please respond via emailed reply-to-all, not > via the > > > > > > bugzilla web interface). > > > > > > > > > > > > Seems that we have a nasty involving autofs, nfs and the VFS. > > > > > > > > > > > > Mehmet, the kernel should have printed some diagnostics prior to > doing > > > > > > the BUG() call: > > > > > > > > > > > > if (dentry->d_count != 0) { > > > > > > printk(KERN_ERR > > > > > > "BUG: Dentry %p{i=%lx,n=%s}" > > > > > > " still in use (%d)" > > > > > > " [unmount of %s %s]\n", > > > > > > dentry, > > > > > > dentry->d_inode ? > > > > > > dentry->d_inode->i_ino : 0UL, > > > > > > dentry->d_name.name, > > > > > > dentry->d_count, > > > > > > dentry->d_sb->s_type->name, > > > > > > dentry->d_sb->s_id); > > > > > > BUG(); > > > > > > } > > > > > > > > > > > > Please find those in the log and email them to use - someone might > find > > > > > > it useful. > > > > > > > > > > > > > > > > > > On Tue, 15 Mar 2011 21:02:23 GMT > > > > > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> > 2011-03-15 21:02:22 --- > > > > > > > Here is that crash happening again, the system was NOT running > overclocked or > > > > > > > anything... > > > > > > > > > > > > > > [ 1860.156122] ------------[ cut here ]------------ > > > > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943! > > > > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP > > > > > > > [ 1860.156127] last sysfs file: > /sys/devices/platform/it87.552/fan3_input > > > > > > > [ 1860.156128] CPU 3 > > > > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat > nf_nat ipt_LOG > > > > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac > iptable_filter > > > > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack > ip_tables x_tables > > > > > > > nvidia(P) > > > > > > > [ 1860.156137] > > > > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P > 2.6.38-rc8 #9 > > > > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 > > > > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] > [<ffffffff810e9648>] > > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08 EFLAGS: 00010296 > > > > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: > > > > > > > 000000000003ffff > > > > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: > > > > > > > ffffffff8174c9f8 > > > > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: > > > > > > > 0000000000000006 > > > > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: > > > > > > > ffff88023a07f5e0 > > > > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: > > > > > > > ffff880211d38740 > > > > > > > [ 1860.156155] FS: 00007f3428cb2700(0000) > GS:ffff8800bfac0000(0000) > > > > > > > knlGS:00000000f74186c0 > > > > > > > [ 1860.156156] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: > > > > > > > 00000000000006e0 > > > > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > > > > > 0000000000000000 > > > > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > > > > > > 0000000000000400 > > > > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo > ffff8800be82e000, task > > > > > > > ffff880211fd5640) > > > > > > > [ 1860.156162] Stack: > > > > > > > [ 1860.156163] ffff88020c05ce50 0000000000000000 > ffff88023fc07128 > > > > > > > ffff88020c05cc00 > > > > > > > [ 1860.156165] ffff88023f96e6c0 ffff8800be82ff28 > ffff88023f96e300 > > > > > > > ffffffff810e96a4 > > > > > > > [ 1860.156167] ffff88023f49f480 ffff88020c05cc00 > ffffffff8146d4a0 > > > > > > > ffffffff810d5d15 > > > > > > > [ 1860.156169] Call Trace: > > > > > > > [ 1860.156172] [<ffffffff810e96a4>] ? > shrink_dcache_for_umount+0x54/0x60 > > > > > > > [ 1860.156174] [<ffffffff810d5d15>] ? > generic_shutdown_super+0x25/0x100 > > > > > > > [ 1860.156176] [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40 > > > > > > > [ 1860.156179] [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20 > > > > > > > [ 1860.156181] [<ffffffff810d5f13>] ? > deactivate_locked_super+0x43/0x70 > > > > > > > [ 1860.156183] [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90 > > > > > > > [ 1860.156185] [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0 > > > > > > > [ 1860.156187] [<ffffffff8100243b>] ? > system_call_fastpath+0x16/0x1b > > > > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 > 00 48 05 50 > > > > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc > 35 00 <0f> 0b > > > > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 > > > > > > > [ 1860.156201] RIP [<ffffffff810e9648>] > > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > > > > > [ 1860.156204] RSP <ffff8800be82fe08> > > > > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]--- > > > > > > > > > > > > > > -- > > > > > > > Configure bugmail: > https://bugzilla.kernel.org/userprefs.cgi?tab=email > > > > > > > ------- You are receiving this mail because: ------- > > > > > > > You are on the CC list for the bug. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Reply-To: mgiritli@giritli.eu Ian, Well, here is a copy-paste of the crash that I am seeing *everytime at shutdown* now that I am running with your patch. It is pretty much identical to the older ones..except that I see this at every time, instead of just rare crashes... Now I am going to revert your patch and check for changes in the frequency of the crashes... Mar 16 18:49:27 mordor kernel: [ 5696.670114] BUG: Dentry ffff88015007f300{i=2,n=donkey} still in use (1) [unmount of nfs 0:f] Mar 16 18:49:27 mordor kernel: [ 5696.670134] ------------[ cut here ]------------ Mar 16 18:49:27 mordor kernel: [ 5696.670187] kernel BUG at fs/dcache.c:943! Mar 16 18:49:27 mordor kernel: [ 5696.670237] invalid opcode: 0000 [#1] SMP Mar 16 18:49:27 mordor kernel: [ 5696.670369] last sysfs file: /sys/devices/platform/it87.552/pwm1_enable Mar 16 18:49:27 mordor kernel: [ 5696.670421] CPU 2 Mar 16 18:49:27 mordor kernel: [ 5696.670466] Modules linked in: ipt_LOG xt_tcpudp xt_state xt_mac iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle xt_multiport xt_mark xt_conntrack xt_connmark nf_conntr$ Mar 16 18:49:27 mordor kernel: [ 5696.671003] Mar 16 18:49:27 mordor kernel: [ 5696.671003] Pid: 21015, comm: umount.nfs Tainted: P 2.6.38-gentoo #3 Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 Mar 16 18:49:27 mordor kernel: [ 5696.671003] RIP: 0010:[<ffffffff810e95c8>] [<ffffffff810e95c8>] shrink_dcache_for_umount_subtree+0x268/0x270 Mar 16 18:49:27 mordor kernel: [ 5696.671003] RSP: 0018:ffff880219903e08 EFLAGS: 00010296 Mar 16 18:49:27 mordor kernel: [ 5696.671003] RAX: 0000000000000066 RBX: ffff88015007f300 RCX: 000000000003ffff Mar 16 18:49:27 mordor kernel: [ 5696.671003] RDX: ffffffff81623888 RSI: 0000000000000046 RDI: ffffffff817509f8 Mar 16 18:49:27 mordor kernel: [ 5696.671003] RBP: ffff88023a1480c0 R08: 000000000001fa7c R09: 0000000000000006 Mar 16 18:49:27 mordor kernel: [ 5696.671003] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88015007f3a0 Mar 16 18:49:27 mordor kernel: [ 5696.671003] R13: ffff88023a14811c R14: ffff880219903f18 R15: ffff8802107d9640 Mar 16 18:49:27 mordor kernel: [ 5696.671003] FS: 00007ff9ec925700(0000) GS:ffff8800bfa80000(0000) knlGS:0000000000000000 Mar 16 18:49:27 mordor kernel: [ 5696.671003] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 16 18:49:27 mordor kernel: [ 5696.671003] CR2: 00007fd326cc2bc8 CR3: 0000000232272000 CR4: 00000000000006e0 Mar 16 18:49:27 mordor kernel: [ 5696.671003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 16 18:49:27 mordor kernel: [ 5696.671003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 16 18:49:27 mordor kernel: [ 5696.671003] Process umount.nfs (pid: 21015, threadinfo ffff880219902000, task ffff88023ca16780) Mar 16 18:49:27 mordor kernel: [ 5696.671003] Stack: Mar 16 18:49:27 mordor kernel: [ 5696.671003] ffff880219af1650 0000000000000000 ffff88023fc07128 ffff880219af1400 Mar 16 18:49:27 mordor kernel: [ 5696.671003] ffff88023a1486c0 ffff880219903f28 ffff88023a03a180 ffffffff810e9624 Mar 16 18:49:27 mordor kernel: [ 5696.671003] ffff88023f4f2480 ffff880219af1400 ffffffff81471460 ffffffff810d5ce5 Mar 16 18:49:27 mordor kernel: [ 5696.671003] Call Trace: Mar 16 18:49:27 mordor kernel: [ 5696.671003] [<ffffffff810e9624>] ? shrink_dcache_for_umount+0x54/0x60 Mar 16 18:49:27 mordor kernel: [ 5696.671003] [<ffffffff810d5ce5>] ? generic_shutdown_super+0x25/0x100 Mar 16 18:49:27 mordor kernel: [ 5696.671003] [<ffffffff810d5e49>] ? kill_anon_super+0x9/0x40 Mar 16 18:49:27 mordor kernel: [ 5696.671003] [<ffffffff81179acd>] ? nfs_kill_super+0xd/0x20 Mar 16 18:49:27 mordor kernel: [ 5696.671003] [<ffffffff810d5ee3>] ? deactivate_locked_super+0x43/0x70 Mar 16 18:49:27 mordor kernel: [ 5696.671003] [<ffffffff810ef4d0>] ? release_mounts+0x70/0x90 Mar 16 18:49:27 mordor kernel: [ 5696.671003] [<ffffffff810efa44>] ? sys_umount+0x314/0x3d0 Mar 16 18:49:27 mordor kernel: [ 5696.671003] [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b Mar 16 18:49:27 mordor kernel: [ 5696.671003] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05 50 02 00 00 48 89 de 48 c7 c7 e0 7f 52 81 48 89 04 24 31 c0 e8 f1 f0 35 00 <0f> 0b eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48$ Mar 16 18:49:27 mordor kernel: [ 5696.678048] RIP [<ffffffff810e95c8>] shrink_dcache_for_umount_subtree+0x268/0x270 Mar 16 18:49:27 mordor kernel: [ 5696.678048] RSP <ffff880219903e08> Mar 16 18:49:27 mordor kernel: [ 5696.678552] ---[ end trace 24269d237584cd43 ]--- Mar 16 18:49:28 mordor mountd[4362]: Caught signal 15, un-registering and exiting. Mar 16 18:49:28 mordor kernel: [ 5697.256066] nfsd: last server has exited, flushing export cache Mar 16 18:49:28 mordor kernel: [ 5697.272903] nfsd: last server has exited, flushing export cache
Reply-To: mgiritli@giritli.eu On Wed, 2011-03-16 at 23:44 +0800, Ian Kent wrote: > On Wed, 2011-03-16 at 17:29 +0200, Mehmet Giritli wrote: > > On Wed, 2011-03-16 at 23:21 +0800, Ian Kent wrote: > > > On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote: > > > > Ian, > > > > > > > > I am having much more frequent crashes now. I havent been able to > > > > cleanly reboot my machine yet and I have tried three times so far. Init > > > > scripts fail to unmount the file systems and I have to reboot manually > > > > > > What do your autofs maps look like? > > > > > > > > > > Here is the contents of my auto.misc: > > > > gollum-media -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/mnt/media > > gollum-distfiles -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/usr/portage/distfiles > > gollum-www -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/var/www > > gollum-WebDav -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/var/dav > > What, that's it, and your only using "/misc /etc/auto.misc" in the > master map and your having problems. yes > > Are the crashes always the same? identical > How have you established that the BUG()s are in fact due to automount > umounting mounts and that the BUG()s correspond to NFS mounts previously > mounted by autofs? I havent established anything. However, thats the only way I mount nfs and my file manager hangs, init scripts hang when trying to unmount... > Is there any noise at all in the syslog? nothing unusual > Are you sure your using a kernel with the dentry leak patch? yes > What sort of automounting load is happening on the machine, ie. > frequency or mounts and umounts and what timeout are you using? from auto.master: /mnt/autofs /etc/auto.misc --timeout=300 --ghost Not very much. Lets say 2-3 times every hour for each mount point. > The dentry leak patch got rid of the BUG()s I was seeing but by that > time I did have a couple of other patches. I still don't think the other > patches made much difference for this particular case. > > > > > > > > > > > On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote: > > > > > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote: > > > > > > The missing piece is as follows: > > > > > > > > > > > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry > > > > > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f] > > > > > > > > > > This might be the same problem I saw and described in rc1. > > > > > However, for me the fs in the BUG() report was autofs. > > > > > Hopefully that just means my autofs setup is different. > > > > > > > > > > At the moment I believe a dentry leak Al Viro spotted is the cause. > > > > > Please try this patch. > > > > > > > > > > autofs4 - fix dentry leak in autofs4_expire_direct() > > > > > > > > > > From: Ian Kent <raven@themaw.net> > > > > > > > > > > There is a missing dput() when returning from autofs4_expire_direct() > > > > > when we see that the dentry is already a pending mount. > > > > > > > > > > Signed-off-by: Ian Kent <raven@themaw.net> > > > > > --- > > > > > > > > > > fs/autofs4/expire.c | 7 +++---- > > > > > 1 files changed, 3 insertions(+), 4 deletions(-) > > > > > > > > > > > > > > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c > > > > > index c896dd6..c403abc 100644 > > > > > --- a/fs/autofs4/expire.c > > > > > +++ b/fs/autofs4/expire.c > > > > > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct > super_block *sb, > > > > > spin_lock(&sbi->fs_lock); > > > > > ino = autofs4_dentry_ino(root); > > > > > /* No point expiring a pending mount */ > > > > > - if (ino->flags & AUTOFS_INF_PENDING) { > > > > > - spin_unlock(&sbi->fs_lock); > > > > > - return NULL; > > > > > - } > > > > > + if (ino->flags & AUTOFS_INF_PENDING) > > > > > + goto out; > > > > > if (!autofs4_direct_busy(mnt, root, timeout, do_now)) { > > > > > struct autofs_info *ino = autofs4_dentry_ino(root); > > > > > ino->flags |= AUTOFS_INF_EXPIRING; > > > > > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct > super_block *sb, > > > > > spin_unlock(&sbi->fs_lock); > > > > > return root; > > > > > } > > > > > +out: > > > > > spin_unlock(&sbi->fs_lock); > > > > > dput(root); > > > > > > > > > > > > > > > > > > > > > > (sorry for the inconvenience Andrew) > > > > > > > > > > > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote: > > > > > > > (switched to email. Please respond via emailed reply-to-all, not > via the > > > > > > > bugzilla web interface). > > > > > > > > > > > > > > Seems that we have a nasty involving autofs, nfs and the VFS. > > > > > > > > > > > > > > Mehmet, the kernel should have printed some diagnostics prior to > doing > > > > > > > the BUG() call: > > > > > > > > > > > > > > if (dentry->d_count != 0) { > > > > > > > printk(KERN_ERR > > > > > > > "BUG: Dentry %p{i=%lx,n=%s}" > > > > > > > " still in use (%d)" > > > > > > > " [unmount of %s %s]\n", > > > > > > > dentry, > > > > > > > dentry->d_inode ? > > > > > > > dentry->d_inode->i_ino : 0UL, > > > > > > > dentry->d_name.name, > > > > > > > dentry->d_count, > > > > > > > dentry->d_sb->s_type->name, > > > > > > > dentry->d_sb->s_id); > > > > > > > BUG(); > > > > > > > } > > > > > > > > > > > > > > Please find those in the log and email them to use - someone > might find > > > > > > > it useful. > > > > > > > > > > > > > > > > > > > > > On Tue, 15 Mar 2011 21:02:23 GMT > > > > > > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > > > > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> > 2011-03-15 21:02:22 --- > > > > > > > > Here is that crash happening again, the system was NOT running > overclocked or > > > > > > > > anything... > > > > > > > > > > > > > > > > [ 1860.156122] ------------[ cut here ]------------ > > > > > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943! > > > > > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP > > > > > > > > [ 1860.156127] last sysfs file: > /sys/devices/platform/it87.552/fan3_input > > > > > > > > [ 1860.156128] CPU 3 > > > > > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat > nf_nat ipt_LOG > > > > > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac > iptable_filter > > > > > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack > ip_tables x_tables > > > > > > > > nvidia(P) > > > > > > > > [ 1860.156137] > > > > > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P > 2.6.38-rc8 #9 > > > > > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5 > > > > > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] > [<ffffffff810e9648>] > > > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > > > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08 EFLAGS: 00010296 > > > > > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: > > > > > > > > 000000000003ffff > > > > > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: > > > > > > > > ffffffff8174c9f8 > > > > > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: > > > > > > > > 0000000000000006 > > > > > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: > > > > > > > > ffff88023a07f5e0 > > > > > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: > > > > > > > > ffff880211d38740 > > > > > > > > [ 1860.156155] FS: 00007f3428cb2700(0000) > GS:ffff8800bfac0000(0000) > > > > > > > > knlGS:00000000f74186c0 > > > > > > > > [ 1860.156156] CS: 0010 DS: 0000 ES: 0000 CR0: > 000000008005003b > > > > > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: > > > > > > > > 00000000000006e0 > > > > > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > > > > > > 0000000000000000 > > > > > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > > > > > > > 0000000000000400 > > > > > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo > ffff8800be82e000, task > > > > > > > > ffff880211fd5640) > > > > > > > > [ 1860.156162] Stack: > > > > > > > > [ 1860.156163] ffff88020c05ce50 0000000000000000 > ffff88023fc07128 > > > > > > > > ffff88020c05cc00 > > > > > > > > [ 1860.156165] ffff88023f96e6c0 ffff8800be82ff28 > ffff88023f96e300 > > > > > > > > ffffffff810e96a4 > > > > > > > > [ 1860.156167] ffff88023f49f480 ffff88020c05cc00 > ffffffff8146d4a0 > > > > > > > > ffffffff810d5d15 > > > > > > > > [ 1860.156169] Call Trace: > > > > > > > > [ 1860.156172] [<ffffffff810e96a4>] ? > shrink_dcache_for_umount+0x54/0x60 > > > > > > > > [ 1860.156174] [<ffffffff810d5d15>] ? > generic_shutdown_super+0x25/0x100 > > > > > > > > [ 1860.156176] [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40 > > > > > > > > [ 1860.156179] [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20 > > > > > > > > [ 1860.156181] [<ffffffff810d5f13>] ? > deactivate_locked_super+0x43/0x70 > > > > > > > > [ 1860.156183] [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90 > > > > > > > > [ 1860.156185] [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0 > > > > > > > > [ 1860.156187] [<ffffffff8100243b>] ? > system_call_fastpath+0x16/0x1b > > > > > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 > 00 00 48 05 50 > > > > > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 > bc 35 00 <0f> 0b > > > > > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 > > > > > > > > [ 1860.156201] RIP [<ffffffff810e9648>] > > > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270 > > > > > > > > [ 1860.156204] RSP <ffff8800be82fe08> > > > > > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]--- > > > > > > > > > > > > > > > > -- > > > > > > > > Configure bugmail: > https://bugzilla.kernel.org/userprefs.cgi?tab=email > > > > > > > > ------- You are receiving this mail because: ------- > > > > > > > > You are on the CC list for the bug. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > >
On Wed, 2011-03-16 at 19:47 +0200, Mehmet Giritli wrote: > Well, here is a copy-paste of the crash that I am seeing *everytime at > shutdown* now that I am running with your patch. It is pretty much > identical to the older ones..except that I see this at every time, > instead of just rare crashes... That's odd because, with the maps your using, the patch changes code that's never executed.
(In reply to comment #0) > From the crash report which I promise to provide here in a while, it seems > like > it has to do with nfs + automount that I use. For now let me attach dmesg and > will try to log the dmesg with a crash in it. After compiling the kernel > without the option in question, there are no more crashes. Umm .. what option?
On Wed, 2011-03-16 at 17:50 +0000, bugzilla-daemon@bugzilla.kernel.org > --- Comment #14 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org> > 2011-03-16 17:50:54 --- > Reply-To: mgiritli@giritli.eu > > On Wed, 2011-03-16 at 23:44 +0800, Ian Kent wrote: > > On Wed, 2011-03-16 at 17:29 +0200, Mehmet Giritli wrote: > > > On Wed, 2011-03-16 at 23:21 +0800, Ian Kent wrote: > > > > On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote: > > > > > Ian, > > > > > > > > > > I am having much more frequent crashes now. I havent been able to > > > > > cleanly reboot my machine yet and I have tried three times so far. > Init > > > > > scripts fail to unmount the file systems and I have to reboot > manually I expect you have had to revert to an earlier kernel. Hopefully you will be willing to test some more when you get time. As I said with the maps your using the change I sent won't make any difference. The other thing is that the autofs Connectathon test that I also use for testing worked OK from rc1 and it has a bunch of stuff in it just like your map. So what your seeing is a bit of a surprise. There must be something in your environment that is causing some unexpected usage pattern. The problem is working out what that is. Perhaps we could enable autofs debug logging and see if that gives us any information about it. If you do enable debug loggin in the autofs configuration remember to ensure that syslog is logging daemon.* somewhere or we won't get the full debug log output. > > > > > > > > What do your autofs maps look like? > > > > > > > > > > > > > > Here is the contents of my auto.misc: > > > > > > gollum-media -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/mnt/media > > > gollum-distfiles -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/usr/portage/distfiles > > > gollum-www -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/var/www > > > gollum-WebDav -rsize=8192,wsize=8192,soft,timeo=10,rw > gollum.giritli.eu:/var/dav > > > > What, that's it, and your only using "/misc /etc/auto.misc" in the > > master map and your having problems. > > yes > > > > > Are the crashes always the same? > > identical > > > How have you established that the BUG()s are in fact due to automount > > umounting mounts and that the BUG()s correspond to NFS mounts previously > > mounted by autofs? > > I havent established anything. However, thats the only way I mount nfs > and my file manager hangs, init scripts hang when trying to unmount... The file manager hang is also interesting because I only ever saw BUG()s at shutdown after all activity was finished. It sounds like the init scripts hang because of the BUG() at shutdown. Could you try and get a gdb backtrace of automount when the file manager hangs, or has the machine already BUG()ed? Use "thr a a bt" to get the backtrace. Note that a backtrace of automount without gdb debug information isn't useful so you'll need find out how your distribution makes applications with debugging information available and use that. One thing that tends to happen in graphical environments is file system scanning to keep file views up to date. This tends to happen when the mount table changes causing immediate expire to mount activity. In 2.6.38-rc the concurrent merge of the vfs-scale patch series along with the vfs-automount series caught me by surprise. Both of these patch series make fairly significant changes to the same general area of the VFS and that has caused problems. I've been working on this for a while now and I have a series of patches (although not finished yet) that might help your situation. I've just about finished testing against 2.6.38 now (previous last was rc7) so I could send a combined diff of those changes for you to test if you are still willing. Of course assumes that this is an autofs problem which might not be the case. > > > Is there any noise at all in the syslog? > > nothing unusual > > > Are you sure your using a kernel with the dentry leak patch? > > yes > > > What sort of automounting load is happening on the machine, ie. > > frequency or mounts and umounts and what timeout are you using? > > from auto.master: > > /mnt/autofs /etc/auto.misc --timeout=300 --ghost > > Not very much. Lets say 2-3 times every hour for each mount point. > Do any of the exports from the NFS server that your using in the map entries above have nohide exports within them? That is to say is NFS doing any of its own automounting? Ian
(In reply to comment #16) > (In reply to comment #0) > > From the crash report which I promise to provide here in a while, it seems > like > > it has to do with nfs + automount that I use. For now let me attach dmesg > and > > will try to log the dmesg with a crash in it. After compiling the kernel > > without the option in question, there are no more crashes. > > Umm .. what option? Initially I thought the problem was related to CONFIG_SCHED_AUTOGROUP option that I turned on in the kernel. Because that is when I had this bug. But that is perhaps just a coincidence now that I am having crashes without it.
> > I expect you have had to revert to an earlier kernel. > Hopefully you will be willing to test some more when you get time. > > As I said with the maps your using the change I sent won't make any > difference. > > The other thing is that the autofs Connectathon test that I also use for > testing worked OK from rc1 and it has a bunch of stuff in it just like > your map. So what your seeing is a bit of a surprise. > > There must be something in your environment that is causing some > unexpected usage pattern. The problem is working out what that is. > Perhaps we could enable autofs debug logging and see if that gives us > any information about it. If you do enable debug loggin in the autofs > configuration remember to ensure that syslog is logging daemon.* > somewhere or we won't get the full debug log output. I have been testing the latest mainline kernel 2.6.38 with partially disabling things and trying to get some pattern. In the mean time, I upgraded my network to nfs4, hoping that might change things. As I found out this morning, it didn't. I still have crashes but less frequently. Now I am trying it without autofs, using manual nfs mounting. Will report back after a while or whenever I get crashes. Perhaps we can focus on autofs if I don't get any crashes from this point on. > The file manager hang is also interesting because I only ever saw BUG()s > at shutdown after all activity was finished. It sounds like the init > scripts hang because of the BUG() at shutdown. Could you try and get a > gdb backtrace of automount when the file manager hangs, or has the > machine already BUG()ed? Use "thr a a bt" to get the backtrace. Note > that a backtrace of automount without gdb debug information isn't useful > so you'll need find out how your distribution makes applications with > debugging information available and use that. > > One thing that tends to happen in graphical environments is file system > scanning to keep file views up to date. This tends to happen when the > mount table changes causing immediate expire to mount activity. > > In 2.6.38-rc the concurrent merge of the vfs-scale patch series along > with the vfs-automount series caught me by surprise. Both of these patch > series make fairly significant changes to the same general area of the > VFS and that has caused problems. I've been working on this for a while > now and I have a series of patches (although not finished yet) that > might help your situation. I've just about finished testing against > 2.6.38 now (previous last was rc7) so I could send a combined diff of > those changes for you to test if you are still willing. Yep, send them this way please...no problem testing... > > Of course assumes that this is an autofs problem which might not be the > case. > I think my current test mode (disabling autofs) will answer that at least.. > > Do any of the exports from the NFS server that your using in the map > entries above have nohide exports within them? That is to say is NFS > doing any of its own automounting? Yes, they do. As of yesterday, I upgraded to NFSv4. My new exports look like this: /exports 192.168.2.0/24(rw,fsid=0,insecure,no_subtree_check,async) /exports/media 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide) /exports/media/backups 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide) /exports/media/donkey 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide) /exports/distfiles 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide,no_root_squash) /exports/www 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide) The old one was as follows (NFSv3): /mnt/media 192.168.2.0/24(async,rw,no_subtree_check,nohide,crossmnt) /usr/portage/distfiles 192.168.2.0/24(async,rw,no_subtree_check,no_root_squash,nohide,crossmnt) /var/www/ 192.168.2.3(async,rw,no_subtree_check,no_root_squash,nohide) But since I am having identical crashes with both, the distinction between two versions of nfs is perhaps irrelevant. /mnt/media contains multiple sub-mounts. Some are LVM and some are just plain ext4 parititons. /usr/portage/distfiles is also a (single) mount point for an ext4 partition. /var/www does not contain any sub-mounts. I see no errors or anything unusual in my server logs. > > Ian
On Fri, 2011-03-18 at 11:29 +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > --- Comment #19 from Mehmet Giritli <mehmet@giritli.eu> 2011-03-18 11:29:21 > --- > > > > The other thing is that the autofs Connectathon test that I also use for > > testing worked OK from rc1 and it has a bunch of stuff in it just like > > your map. So what your seeing is a bit of a surprise. > > > > There must be something in your environment that is causing some > > unexpected usage pattern. The problem is working out what that is. > > Perhaps we could enable autofs debug logging and see if that gives us > > any information about it. If you do enable debug loggin in the autofs > > configuration remember to ensure that syslog is logging daemon.* > > somewhere or we won't get the full debug log output. > > I have been testing the latest mainline kernel 2.6.38 with partially > disabling > things and trying to get some pattern. In the mean time, I upgraded my > network > to nfs4, hoping that might change things. As I found out this morning, it > didn't. I still have crashes but less frequently. Now I am trying it without > autofs, using manual nfs mounting. Will report back after a while or whenever > I > get crashes. Perhaps we can focus on autofs if I don't get any crashes from > this point on. That may show a difference but I expect you won't see the BUG()s at all. I believe the problem is related to concurrent expire to mount and even if the problems go away we still can't say that the issue is entirely autofs because autofs will just facilitate the expire to mount behavior that triggers the problem. > > > The file manager hang is also interesting because I only ever saw BUG()s > > at shutdown after all activity was finished. It sounds like the init > > scripts hang because of the BUG() at shutdown. Could you try and get a > > gdb backtrace of automount when the file manager hangs, or has the > > machine already BUG()ed? Use "thr a a bt" to get the backtrace. Note > > that a backtrace of automount without gdb debug information isn't useful > > so you'll need find out how your distribution makes applications with > > debugging information available and use that. Can you offer any more info on this? > > > > One thing that tends to happen in graphical environments is file system > > scanning to keep file views up to date. This tends to happen when the > > mount table changes causing immediate expire to mount activity. > > > > In 2.6.38-rc the concurrent merge of the vfs-scale patch series along > > with the vfs-automount series caught me by surprise. Both of these patch > > series make fairly significant changes to the same general area of the > > VFS and that has caused problems. I've been working on this for a while > > now and I have a series of patches (although not finished yet) that > > might help your situation. I've just about finished testing against > > 2.6.38 now (previous last was rc7) so I could send a combined diff of > > those changes for you to test if you are still willing. > > Yep, send them this way please...no problem testing... OK, I'll send individual patches in a series so that you can see the individual patch descriptions and have some idea of what they are might be fixing. > > > > > Of course assumes that this is an autofs problem which might not be the > > case. > > > > I think my current test mode (disabling autofs) will answer that at least.. I suspect it won't really do that although it may appear so. The other problem is that by changing autofs I may well be hiding problems in NFS, exposed by the 2.6.38 path walking changes. > > > > > Do any of the exports from the NFS server that your using in the map > > entries above have nohide exports within them? That is to say is NFS > > doing any of its own automounting? > > Yes, they do. As of yesterday, I upgraded to NFSv4. My new exports look like > this: > > /exports 192.168.2.0/24(rw,fsid=0,insecure,no_subtree_check,async) > /exports/media 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide) > /exports/media/backups > 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide) > /exports/media/donkey > 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide) > /exports/distfiles > 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide,no_root_squash) > /exports/www 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide) Right, so that has a chance of exposing problems with the NFS path walk. Ian
I have the same bug running a Ubuntu-bowdlerised/augemented/mangled 2.6.38 kernel build 2.6.38-7-generic, running autofs (5.0.5-0ubuntu4) with 60 second timeout on an nfs4-only network. The bug does not always hit, but once it does the system has to be rebooted to get back nfs4 services. These are the relevant lines from auto.master/auto.ufs: auto.master: /net /etc/auto.ufs --timeout=60 auto.ufs: mntstr="-fstype=nfs4,noatime,async,proto=tcp,retry=60,hard,intr $server:/" The auto.ufs script mounts local shares as bind mounts to make it possible to use the same paths network-wide without incurring speed penalties. Hackish, but it works. This is a typical export: /export/src 192.168.1.0/24(rw,nohide,insecure,no_subtree_check,async) Some exports use sync, most are async. All except for the root mount use nohide and no_subtree_check. The kernel oopses are virtually identical to the ones mentioned in this bug report. Tests are performed on a Thinkpad T23 with 1.2GHz PIII-m. Anything you want me to try? I run recent git pulls on several other machines, I could use that here as well.
As to whether this is an autofs-related problem I'd wager to say it is most likely not. The fact that it thus far has surfaced in combination with autofs is most likely related to the increased umount-frequency in an autofs-managed system compared to a manual mount configuration. As an experiment I have changed the dismount timeout in auto.master to 3600 seconds. The bug has yet to rear its ugly head which is remarkable as it has been running without problems for more than 4 hours now.
(In reply to comment #22) > As to whether this is an autofs-related problem I'd wager to say it is most > likely not. The fact that it thus far has surfaced in combination with autofs > is most likely related to the increased umount-frequency in an autofs-managed > system compared to a manual mount configuration. As an experiment I have > changed the dismount timeout in auto.master to 3600 seconds. The bug has yet > to > rear its ugly head which is remarkable as it has been running without > problems > for more than 4 hours now. LOL, we can't say it isn't autofs either. It's true that it is most likely happening during path walks that occur concurrently with mount/umount activity. I've studied that code a lot over the last several weeks and I can't see any obvious problem. The reason for it getting so much attention was that there is a dentry leak which causes a similar BUG() but it is in an area of code that is never executed when using maps like the ones here. The fact is that there were two significant patch series merged at the same time in 2.6.38, the vfs-scale and the vfs-automount series. Both made considerable changes to the same area of the VFS. Each was tested independently (certainly the autofs changes were) but together exposed a few problems. But, once the initial issues were dealt with, none were so easy to reproduce as has been seen here. There has continued to be quite a bit of churn in the VFS as a result of these merges. The question that really needs to be answered is what makes this so easy to reproduce for the reporters here? What's worse is that the crash probably occurs well after the event that lead to the increased reference count which causes the BUG() so we have no way of knowing where to look or insert debug prints. Anyway, it may be a good idea to try the current git since there has been quite a bit of change to the path walking code and changes to the mount locking. Be aware that there was also one report of a lockup when umounting a tmpfs fs which hasn't been resolved. Ian
I've been getting bit by the exact same bug and been bisecting for the past couple of weeks. It's slow going as it can sometimes take a day for the BUG() to show up (though can also at time take 10 minutes). And I've also seen it more than once where something was good after a day and then BUG()'d later on, just to make things more complicated. So the upshot is that while I feel confident enough about this latest batch of bisecting to post it here, I wouldn't bet my life on it. I hope this isn't a case where bisecting just shows where the bug gets exposed but not where it actually got planted :) Incidentally, I tried the patch from the top of this thread and it didn't seem to make a difference. I still got bit. I've posted on the linux-fsdevel thread that Jeff Layton started about it, http://www.spinics.net/lists/linux-nfs/msg20280.html if you need more details on my setup (though I'll be happy to provide anything else you need). Though in that thread you'll see that I'm not using autofs explicitly, the Netapp GX cluster NFS appears to use autofs to do the implicit submounts (I'm not 100% sure that's the correct terminology, so hopefully you know what I mean). Here's my bisect log, ending up at commit e61da20a50d21725ff27571a6dff9468e4fb7146 git bisect start 'fs' # good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37 git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5 # bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1 git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470 # good: [7c955fca3e1d8132982148267d9efcafae849bb6] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 git bisect good 7c955fca3e1d8132982148267d9efcafae849bb6 # good: [c32b0d4b3f19c2f5d29568f8b7b72b61693f1277] fs/mpage.c: consolidate code git bisect good c32b0d4b3f19c2f5d29568f8b7b72b61693f1277 # bad: [f8206b925fb0eba3a11839419be118b09105d7b1] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 git bisect bad f8206b925fb0eba3a11839419be118b09105d7b1 # good: [a8f2800b4f7b76cecb7209cb6a7d2b14904fc711] nfsd4: fix callback restarting git bisect good a8f2800b4f7b76cecb7209cb6a7d2b14904fc711 # bad: [6651149371b842715906311b4631b8489cebf7e8] autofs4: Clean up autofs4_free_ino() git bisect bad 6651149371b842715906311b4631b8489cebf7e8 # good: [0ad53eeefcbb2620b6a71ffdaad4add20b450b8b] afs: add afs_wq and use it instead of the system workqueue git bisect good 0ad53eeefcbb2620b6a71ffdaad4add20b450b8b # good: [01c64feac45cea1317263eabc4f7ee1b240f297f] CIFS: Use d_automount() rather than abusing follow_link() git bisect good 01c64feac45cea1317263eabc4f7ee1b240f297f # good: [b5b801779d59165c4ecf1009009109545bd1f642] autofs4: Add d_manage() dentry operation git bisect good b5b801779d59165c4ecf1009009109545bd1f642 # bad: [e61da20a50d21725ff27571a6dff9468e4fb7146] autofs4: Clean up inode operations git bisect bad e61da20a50d21725ff27571a6dff9468e4fb7146 # good: [8c13a676d5a56495c350f3141824a5ef6c6b4606] autofs4: Remove unused code git bisect good 8c13a676d5a56495c350f3141824a5ef6c6b4606
I ended up opening a bug against Fedora since I wasn't aware of this bug on kernel.org. I have a reproducer in that bug, but it's a bit convoluted and I haven't had a lot of luck making it simpler: https://bugzilla.redhat.com/show_bug.cgi?id=708039 I did a bisect with my reproducer and was able to bisect it down to: commit 36d43a43761b004ad1879ac21471d8fc5f3157ec Author: David Howells <dhowells@redhat.com> Date: Fri Jan 14 18:45:42 2011 +0000 NFS: Use d_automount() rather than abusing follow_link() Make NFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If it's of any help, I'm affected by this bug, too (2.6.39.1, nfsv4, AUTOFS4, autofs (userspace) 5.0.4). After the bug is triggered I'm still able to mount nfsv4 shares through plain 'mount -t nfs -o nfsv4 ...'. [63595.511635] ------------[ cut here ]------------ [63595.511654] kernel BUG at fs/dcache.c:925! [63595.511667] invalid opcode: 0000 [#1] PREEMPT SMP [63595.511687] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input [63595.511704] CPU 0 [63595.511711] Modules linked in: btusb scsi_wait_scan pata_marvell [63595.511738] [63595.511744] Pid: 11449, comm: umount.nfs4 Not tainted 2.6.39.1 #3 MICRO-STAR INTERNATIONAL CO.,LTD MS-7345/MS-7345 [63595.511779] RIP: 0010:[<ffffffff810f1f83>] [<ffffffff810f1f83>] shrink_dcache_for_umount_subtree+0x263/0x270 [63595.511809] RSP: 0018:ffff88006dac3df8 EFLAGS: 00010296 [63595.511828] RAX: 0000000000000067 RBX: ffff8800b60c39c0 RCX: 000000000003ffff [63595.511848] RDX: ffffffff81ada388 RSI: 0000000000000046 RDI: ffffffff81c339f8 [63595.511867] RBP: ffff8800b60c39c0 R08: 0000000000000000 R09: 0000000000000000 [63595.511886] R10: 000000000000000a R11: 0000000000000001 R12: ffff8800b60c3a1c [63595.511905] R13: dead000000200200 R14: dead000000100100 R15: ffff88006dac3f28 [63595.511926] FS: 00007fe9cd733700(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000 [63595.511951] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [63595.511966] CR2: 00007fe9cd0d1b30 CR3: 000000006da0a000 CR4: 00000000000406f0 [63595.511986] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [63595.512006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [63595.512026] Process umount.nfs4 (pid: 11449, threadinfo ffff88006dac2000, task ffff880071e72080) [63595.512055] Stack: [63595.512062] ffff880096cb2658 ffff8800c98dd0b0 ffff880096cb2400 ffff8800b60c3900 [63595.512089] ffff8800b60c395c ffff8800b60c3300 0000000000000000 ffffffff810f2f79 [63595.512117] ffff880096cb2400 ffffffff81862de0 ffff88006dac3f28 ffffffff810ddc35 [63595.512147] Call Trace: [63595.512155] [<ffffffff810f2f79>] ? shrink_dcache_for_umount+0x59/0x70 [63595.512173] [<ffffffff810ddc35>] ? generic_shutdown_super+0x25/0x100 [63595.512190] [<ffffffff810ddd99>] ? kill_anon_super+0x9/0x50 [63595.512206] [<ffffffff811de291>] ? nfs4_kill_super+0x31/0x90 [63595.512221] [<ffffffff810de063>] ? deactivate_locked_super+0x43/0x70 [63595.512238] [<ffffffff810f9bd7>] ? release_mounts+0x67/0x90 [63595.512253] [<ffffffff810fa173>] ? sys_umount+0x1b3/0x3d0 [63595.512269] [<ffffffff8183693b>] ? system_call_fastpath+0x16/0x1b [63595.512285] Code: 8b 40 28 4c 8b 08 48 8b 43 30 48 85 c0 74 07 48 8b 90 a8 00 00 00 48 89 34 24 48 c7 c7 a8 d8 9d 81 48 89 de 31 c0 e8 49 10 74 00 <0f> 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 41 55 4c 8d 6f 5c 41 54 [63595.512438] RIP [<ffffffff810f1f83>] ashrink_dcache_for_umount_subtree+0x263/0x270 [63595.512460] RSP <ffff88006dac3df8> [63595.517799] ---[ end trace 11c7e3bfba462d9a ]---
Created attachment 62262 [details] Patch - VFS: Fix vfsmount overput on simultaneous automount If anyone cced on this bug is still interested in a resolution for this problem please try this patch. It should apply to 2.6.38 and 2.6.39 but may report some line number offset mismatches. If that is really a problem (in that the patch actually doesn't apply or you are concerned it doesn't apply correctly), let me know, along with specific kernel version, and I'll produce one that does apply.
I applied the patch to 2.6.39.1. Got a hard lockup (could not even Magic-SysRq). Don't know if it is, but the dying breath looks related to this bug. Here's a picture of it: http://obfusc.gavagai.nl/nfscrash.jpg
(In reply to comment #28) > I applied the patch to 2.6.39.1. Got a hard lockup (could not even > Magic-SysRq). Don't know if it is, but the dying breath looks related to this > bug. Here's a picture of it: > > http://obfusc.gavagai.nl/nfscrash.jpg Thanks for the testing. It's hard to see how this happens or to know for sure if it is related to the original problem. We'll need to wait and see as others look into it.
Ian, unfortunately I will be away from my computer for at least two months. Otherwise I would test your patch immediately. Sorry can't help right now.
(In reply to comment #28) > I applied the patch to 2.6.39.1. Got a hard lockup (could not even > Magic-SysRq). Don't know if it is, but the dying breath looks related to this > bug. Here's a picture of it: > > http://obfusc.gavagai.nl/nfscrash.jpg This looks like an entirely different issue, but I suppose it's possible this is the end result of another refcount imbalance. Can you reproduce this at will with this patch installed?
(In reply to comment #28) > I applied the patch to 2.6.39.1. Got a hard lockup (could not even > Magic-SysRq). Don't know if it is, but the dying breath looks related to this > bug. Here's a picture of it: > > http://obfusc.gavagai.nl/nfscrash.jpg It's just been pointed out to me that the screen capture has at least one other oops before the one on the screen. So that capture isn't all that useful because it likley isn't the root cause of the problem. I suppose none of it finds its way to syslog, right?
I don't know about the others, but I have been running 3.0.6 since a while now, and this bug doesn't occur anymore.
I haven't had a crash like this for many months. I've been running ≥ 3.0.6 since October 19th and actually can't remember having a crash while running any 3.x-kernel (which I have since August 16th).