30882 – autofs + nfs causes BUG()

Bug 30882 - autofs + nfs causes BUG()

Summary: autofs + nfs causes BUG()

Status:	CLOSED UNREPRODUCIBLE

Alias:	None

Product:	File System
Classification:	Unclassified
Component:	NFS (show other bugs)
Hardware:	All Linux

Importance:	P1 normal
Assignee:	Ian Kent

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-03-10 17:45 UTC by Mehmet Giritli
Modified:	2012-05-12 14:57 UTC (History)
CC List:	8 users (show)

See Also:
Kernel Version:	2.6.38
Subsystem:
Regression:	Yes
Bisected commit-id:

Attachments
dmesg (62.56 KB, text/plain) 2011-03-10 17:46 UTC, Mehmet Giritli	Details
kernel config (55.72 KB, text/plain) 2011-03-12 11:34 UTC, Mehmet Giritli	Details
Patch - VFS: Fix vfsmount overput on simultaneous automount (6.02 KB, patch) 2011-06-16 15:17 UTC, Ian Kent	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Description Mehmet Giritli 2011-03-10 17:45:14 UTC

From the crash report which I promise to provide here in a while, it seems like it has to do with nfs + automount that I use. For now let me attach dmesg and will try to log the dmesg with a crash in it. After compiling the kernel without the option in question, there are no more crashes.

Comment 1 Mehmet Giritli 2011-03-10 17:46:45 UTC

Created attachment 50592 [details]
dmesg

Comment 2 Mehmet Giritli 2011-03-12 11:33:48 UTC

I used to overclock my CPU and after normalising my cpu settings the crashes disappeared. So, it seems like this is caused my my own settings.

However, if anybody is reading, I would like to also point out that with the "Automatic process group scheduling" option on, my computer (which runs a AMD Phenom X6 1090T cpu) is much much worse in terms of performance when under load.

When the system is under load, all the programs get really sluggish and firefox takes about 15 seconds to open. Turning off the option I get a much better performance.

I was under the impression that this option will make things better when the cpu is under load. Am I missing something here?

Also, I am attaching my config.

Comment 3 Mehmet Giritli 2011-03-12 11:34:34 UTC

Created attachment 50662 [details]
kernel config

Comment 4 Mehmet Giritli 2011-03-15 21:02:22 UTC

Here is that crash happening again, the system was NOT running overclocked or anything...

[ 1860.156122] ------------[ cut here ]------------
[ 1860.156124] kernel BUG at fs/dcache.c:943!
[ 1860.156126] invalid opcode: 0000 [#1] SMP 
[ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input
[ 1860.156128] CPU 3 
[ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables x_tables nvidia(P)
[ 1860.156137] 
[ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P            2.6.38-rc8 #9 Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
[ 1860.156142] RIP: 0010:[<ffffffff810e9648>]  [<ffffffff810e9648>] shrink_dcache_for_umount_subtree+0x268/0x270
[ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
[ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX: 000000000003ffff
[ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI: ffffffff8174c9f8
[ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09: 0000000000000006
[ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023a07f5e0
[ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15: ffff880211d38740
[ 1860.156155] FS:  00007f3428cb2700(0000) GS:ffff8800bfac0000(0000) knlGS:00000000f74186c0
[ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4: 00000000000006e0
[ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1860.156161] Process umount.nfs (pid: 7388, threadinfo ffff8800be82e000, task ffff880211fd5640)
[ 1860.156162] Stack:
[ 1860.156163]  ffff88020c05ce50 0000000000000000 ffff88023fc07128 ffff88020c05cc00
[ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300 ffffffff810e96a4
[ 1860.156167]  ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0 ffffffff810d5d15
[ 1860.156169] Call Trace:
[ 1860.156172]  [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60
[ 1860.156174]  [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100
[ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
[ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
[ 1860.156181]  [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70
[ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
[ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
[ 1860.156187]  [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b
[ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05 50 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00 <0f> 0b eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
[ 1860.156201] RIP  [<ffffffff810e9648>] shrink_dcache_for_umount_subtree+0x268/0x270
[ 1860.156204]  RSP <ffff8800be82fe08>
[ 1860.156205] ---[ end trace ee03486c16c108a7 ]---

Comment 5 Andrew Morton 2011-03-15 21:25:07 UTC

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

Seems that we have a nasty involving autofs, nfs and the VFS.

Mehmet, the kernel should have printed some diagnostics prior to doing
the BUG() call:

			if (dentry->d_count != 0) {
				printk(KERN_ERR
				       "BUG: Dentry %p{i=%lx,n=%s}"
				       " still in use (%d)"
				       " [unmount of %s %s]\n",
				       dentry,
				       dentry->d_inode ?
				       dentry->d_inode->i_ino : 0UL,
				       dentry->d_name.name,
				       dentry->d_count,
				       dentry->d_sb->s_type->name,
				       dentry->d_sb->s_id);
				BUG();
			}

Please find those in the log and email them to use - someone might find
it useful.


On Tue, 15 Mar 2011 21:02:23 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=30882
> 
> 
> 
> 
> 
> --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu>  2011-03-15 21:02:22
> ---
> Here is that crash happening again, the system was NOT running overclocked or
> anything...
> 
> [ 1860.156122] ------------[ cut here ]------------
> [ 1860.156124] kernel BUG at fs/dcache.c:943!
> [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input
> [ 1860.156128] CPU 3 
> [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG
> xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter
> xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables x_tables
> nvidia(P)
> [ 1860.156137] 
> [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P            2.6.38-rc8
> #9
> Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> [ 1860.156142] RIP: 0010:[<ffffffff810e9648>]  [<ffffffff810e9648>]
> shrink_dcache_for_umount_subtree+0x268/0x270
> [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> 000000000003ffff
> [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> ffffffff8174c9f8
> [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> 0000000000000006
> [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff88023a07f5e0
> [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> ffff880211d38740
> [ 1860.156155] FS:  00007f3428cb2700(0000) GS:ffff8800bfac0000(0000)
> knlGS:00000000f74186c0
> [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> 00000000000006e0
> [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo ffff8800be82e000,
> task
> ffff880211fd5640)
> [ 1860.156162] Stack:
> [ 1860.156163]  ffff88020c05ce50 0000000000000000 ffff88023fc07128
> ffff88020c05cc00
> [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300
> ffffffff810e96a4
> [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0
> ffffffff810d5d15
> [ 1860.156169] Call Trace:
> [ 1860.156172]  [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60
> [ 1860.156174]  [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100
> [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> [ 1860.156181]  [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70
> [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> [ 1860.156187]  [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b
> [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05 50
> 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00 <0f>
> 0b
> eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> [ 1860.156201] RIP  [<ffffffff810e9648>]
> shrink_dcache_for_umount_subtree+0x268/0x270
> [ 1860.156204]  RSP <ffff8800be82fe08>
> [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> 
> -- 
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.

Comment 6 Anonymous Emailer 2011-03-16 00:13:28 UTC

Reply-To: mgiritli@giritli.eu

Ops, sorry about that. The missing piece is as follows:

Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry
ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f]

On Tue, 2011-03-15 at 21:25 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=30882
> 
> 
> 
> 
> 
> --- Comment #5 from Andrew Morton <akpm@linux-foundation.org>  2011-03-15
> 21:25:07 ---
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> Seems that we have a nasty involving autofs, nfs and the VFS.
> 
> Mehmet, the kernel should have printed some diagnostics prior to doing
> the BUG() call:
> 
>             if (dentry->d_count != 0) {
>                 printk(KERN_ERR
>                        "BUG: Dentry %p{i=%lx,n=%s}"
>                        " still in use (%d)"
>                        " [unmount of %s %s]\n",
>                        dentry,
>                        dentry->d_inode ?
>                        dentry->d_inode->i_ino : 0UL,
>                        dentry->d_name.name,
>                        dentry->d_count,
>                        dentry->d_sb->s_type->name,
>                        dentry->d_sb->s_id);
>                 BUG();
>             }
> 
> Please find those in the log and email them to use - someone might find
> it useful.
> 
> 
> On Tue, 15 Mar 2011 21:02:23 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=30882
> > 
> > 
> > 
> > 
> > 
> > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu>  2011-03-15 21:02:22
> ---
> > Here is that crash happening again, the system was NOT running overclocked
> or
> > anything...
> > 
> > [ 1860.156122] ------------[ cut here ]------------
> > [ 1860.156124] kernel BUG at fs/dcache.c:943!
> > [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> > [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input
> > [ 1860.156128] CPU 3 
> > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG
> > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter
> > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables
> x_tables
> > nvidia(P)
> > [ 1860.156137] 
> > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P            2.6.38-rc8
> #9
> > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>]  [<ffffffff810e9648>]
> > shrink_dcache_for_umount_subtree+0x268/0x270
> > [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> > 000000000003ffff
> > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> > ffffffff8174c9f8
> > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> > 0000000000000006
> > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffff88023a07f5e0
> > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> > ffff880211d38740
> > [ 1860.156155] FS:  00007f3428cb2700(0000) GS:ffff8800bfac0000(0000)
> > knlGS:00000000f74186c0
> > [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> > 00000000000006e0
> > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo ffff8800be82e000,
> task
> > ffff880211fd5640)
> > [ 1860.156162] Stack:
> > [ 1860.156163]  ffff88020c05ce50 0000000000000000 ffff88023fc07128
> > ffff88020c05cc00
> > [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300
> > ffffffff810e96a4
> > [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0
> > ffffffff810d5d15
> > [ 1860.156169] Call Trace:
> > [ 1860.156172]  [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60
> > [ 1860.156174]  [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100
> > [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> > [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> > [ 1860.156181]  [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70
> > [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> > [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> > [ 1860.156187]  [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b
> > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05
> 50
> > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00
> <0f> 0b
> > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> > [ 1860.156201] RIP  [<ffffffff810e9648>]
> > shrink_dcache_for_umount_subtree+0x268/0x270
> > [ 1860.156204]  RSP <ffff8800be82fe08>
> > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> > 
> > -- 
> > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> > ------- You are receiving this mail because: -------
> > You are on the CC list for the bug.
>

Comment 7 Anonymous Emailer 2011-03-16 00:27:21 UTC

Reply-To: mgiritli@giritli.eu

The missing piece is as follows:

Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry
ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f]

(sorry for the inconvenience Andrew)
 
On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> Seems that we have a nasty involving autofs, nfs and the VFS.
> 
> Mehmet, the kernel should have printed some diagnostics prior to doing
> the BUG() call:
> 
>                       if (dentry->d_count != 0) {
>                               printk(KERN_ERR
>                                      "BUG: Dentry %p{i=%lx,n=%s}"
>                                      " still in use (%d)"
>                                      " [unmount of %s %s]\n",
>                                      dentry,
>                                      dentry->d_inode ?
>                                      dentry->d_inode->i_ino : 0UL,
>                                      dentry->d_name.name,
>                                      dentry->d_count,
>                                      dentry->d_sb->s_type->name,
>                                      dentry->d_sb->s_id);
>                               BUG();
>                       }
> 
> Please find those in the log and email them to use - someone might find
> it useful.
> 
> 
> On Tue, 15 Mar 2011 21:02:23 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=30882
> > 
> > 
> > 
> > 
> > 
> > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu>  2011-03-15 21:02:22
> ---
> > Here is that crash happening again, the system was NOT running overclocked
> or
> > anything...
> > 
> > [ 1860.156122] ------------[ cut here ]------------
> > [ 1860.156124] kernel BUG at fs/dcache.c:943!
> > [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> > [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input
> > [ 1860.156128] CPU 3 
> > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat ipt_LOG
> > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter
> > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables
> x_tables
> > nvidia(P)
> > [ 1860.156137] 
> > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P            2.6.38-rc8
> #9
> > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>]  [<ffffffff810e9648>]
> > shrink_dcache_for_umount_subtree+0x268/0x270
> > [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> > 000000000003ffff
> > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> > ffffffff8174c9f8
> > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> > 0000000000000006
> > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffff88023a07f5e0
> > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> > ffff880211d38740
> > [ 1860.156155] FS:  00007f3428cb2700(0000) GS:ffff8800bfac0000(0000)
> > knlGS:00000000f74186c0
> > [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> > 00000000000006e0
> > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo ffff8800be82e000,
> task
> > ffff880211fd5640)
> > [ 1860.156162] Stack:
> > [ 1860.156163]  ffff88020c05ce50 0000000000000000 ffff88023fc07128
> > ffff88020c05cc00
> > [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300
> > ffffffff810e96a4
> > [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0
> > ffffffff810d5d15
> > [ 1860.156169] Call Trace:
> > [ 1860.156172]  [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60
> > [ 1860.156174]  [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100
> > [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> > [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> > [ 1860.156181]  [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70
> > [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> > [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> > [ 1860.156187]  [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b
> > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48 05
> 50
> > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00
> <0f> 0b
> > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> > [ 1860.156201] RIP  [<ffffffff810e9648>]
> > shrink_dcache_for_umount_subtree+0x268/0x270
> > [ 1860.156204]  RSP <ffff8800be82fe08>
> > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> > 
> > -- 
> > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> > ------- You are receiving this mail because: -------
> > You are on the CC list for the bug.
>

Comment 8 Ian Kent 2011-03-16 03:09:37 UTC

On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote:
> The missing piece is as follows:
> 
> Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry
> ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f]

This might be the same problem I saw and described in rc1.
However, for me the fs in the BUG() report was autofs.
Hopefully that just means my autofs setup is different.

At the moment I believe a dentry leak Al Viro spotted is the cause.
Please try this patch.

autofs4 - fix dentry leak in autofs4_expire_direct()

From: Ian Kent <raven@themaw.net>

There is a missing dput() when returning from autofs4_expire_direct()
when we see that the dentry is already a pending mount.

Signed-off-by: Ian Kent <raven@themaw.net>
---

 fs/autofs4/expire.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)


diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index c896dd6..c403abc 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct super_block *sb,
 	spin_lock(&sbi->fs_lock);
 	ino = autofs4_dentry_ino(root);
 	/* No point expiring a pending mount */
-	if (ino->flags & AUTOFS_INF_PENDING) {
-		spin_unlock(&sbi->fs_lock);
-		return NULL;
-	}
+	if (ino->flags & AUTOFS_INF_PENDING)
+		goto out;
 	if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
 		struct autofs_info *ino = autofs4_dentry_ino(root);
 		ino->flags |= AUTOFS_INF_EXPIRING;
@@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct super_block *sb,
 		spin_unlock(&sbi->fs_lock);
 		return root;
 	}
+out:
 	spin_unlock(&sbi->fs_lock);
 	dput(root);
 

> 
> (sorry for the inconvenience Andrew)
>  
> On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote:
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > Seems that we have a nasty involving autofs, nfs and the VFS.
> > 
> > Mehmet, the kernel should have printed some diagnostics prior to doing
> > the BUG() call:
> > 
> >                     if (dentry->d_count != 0) {
> >                             printk(KERN_ERR
> >                                    "BUG: Dentry %p{i=%lx,n=%s}"
> >                                    " still in use (%d)"
> >                                    " [unmount of %s %s]\n",
> >                                    dentry,
> >                                    dentry->d_inode ?
> >                                    dentry->d_inode->i_ino : 0UL,
> >                                    dentry->d_name.name,
> >                                    dentry->d_count,
> >                                    dentry->d_sb->s_type->name,
> >                                    dentry->d_sb->s_id);
> >                             BUG();
> >                     }
> > 
> > Please find those in the log and email them to use - someone might find
> > it useful.
> > 
> > 
> > On Tue, 15 Mar 2011 21:02:23 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=30882
> > > 
> > > 
> > > 
> > > 
> > > 
> > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu>  2011-03-15
> 21:02:22 ---
> > > Here is that crash happening again, the system was NOT running
> overclocked or
> > > anything...
> > > 
> > > [ 1860.156122] ------------[ cut here ]------------
> > > [ 1860.156124] kernel BUG at fs/dcache.c:943!
> > > [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> > > [ 1860.156127] last sysfs file: /sys/devices/platform/it87.552/fan3_input
> > > [ 1860.156128] CPU 3 
> > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat
> ipt_LOG
> > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac iptable_filter
> > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables
> x_tables
> > > nvidia(P)
> > > [ 1860.156137] 
> > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P           
> 2.6.38-rc8 #9
> > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>]  [<ffffffff810e9648>]
> > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> > > 000000000003ffff
> > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> > > ffffffff8174c9f8
> > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> > > 0000000000000006
> > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffff88023a07f5e0
> > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> > > ffff880211d38740
> > > [ 1860.156155] FS:  00007f3428cb2700(0000) GS:ffff8800bfac0000(0000)
> > > knlGS:00000000f74186c0
> > > [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> > > 00000000000006e0
> > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > 0000000000000400
> > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo
> ffff8800be82e000, task
> > > ffff880211fd5640)
> > > [ 1860.156162] Stack:
> > > [ 1860.156163]  ffff88020c05ce50 0000000000000000 ffff88023fc07128
> > > ffff88020c05cc00
> > > [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300
> > > ffffffff810e96a4
> > > [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0
> > > ffffffff810d5d15
> > > [ 1860.156169] Call Trace:
> > > [ 1860.156172]  [<ffffffff810e96a4>] ? shrink_dcache_for_umount+0x54/0x60
> > > [ 1860.156174]  [<ffffffff810d5d15>] ? generic_shutdown_super+0x25/0x100
> > > [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> > > [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> > > [ 1860.156181]  [<ffffffff810d5f13>] ? deactivate_locked_super+0x43/0x70
> > > [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> > > [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> > > [ 1860.156187]  [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b
> > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48
> 05 50
> > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00
> <0f> 0b
> > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> > > [ 1860.156201] RIP  [<ffffffff810e9648>]
> > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > [ 1860.156204]  RSP <ffff8800be82fe08>
> > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> > > 
> > > -- 
> > > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> > > ------- You are receiving this mail because: -------
> > > You are on the CC list for the bug.
> > 
> 
>

Comment 9 Anonymous Emailer 2011-03-16 14:28:19 UTC

Reply-To: mgiritli@giritli.eu

Ian,

I am having much more frequent crashes now. I havent been able to
cleanly reboot my machine yet and I have tried three times so far. Init
scripts fail to unmount the file systems and I have to reboot manually

On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote:
> On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote:
> > The missing piece is as follows:
> > 
> > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry
> > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f]
> 
> This might be the same problem I saw and described in rc1.
> However, for me the fs in the BUG() report was autofs.
> Hopefully that just means my autofs setup is different.
> 
> At the moment I believe a dentry leak Al Viro spotted is the cause.
> Please try this patch.
> 
> autofs4 - fix dentry leak in autofs4_expire_direct()
> 
> From: Ian Kent <raven@themaw.net>
> 
> There is a missing dput() when returning from autofs4_expire_direct()
> when we see that the dentry is already a pending mount.
> 
> Signed-off-by: Ian Kent <raven@themaw.net>
> ---
> 
>  fs/autofs4/expire.c |    7 +++----
>  1 files changed, 3 insertions(+), 4 deletions(-)
> 
> 
> diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> index c896dd6..c403abc 100644
> --- a/fs/autofs4/expire.c
> +++ b/fs/autofs4/expire.c
> @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct super_block
> *sb,
>       spin_lock(&sbi->fs_lock);
>       ino = autofs4_dentry_ino(root);
>       /* No point expiring a pending mount */
> -     if (ino->flags & AUTOFS_INF_PENDING) {
> -             spin_unlock(&sbi->fs_lock);
> -             return NULL;
> -     }
> +     if (ino->flags & AUTOFS_INF_PENDING)
> +             goto out;
>       if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
>               struct autofs_info *ino = autofs4_dentry_ino(root);
>               ino->flags |= AUTOFS_INF_EXPIRING;
> @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct super_block
> *sb,
>               spin_unlock(&sbi->fs_lock);
>               return root;
>       }
> +out:
>       spin_unlock(&sbi->fs_lock);
>       dput(root);
>  
> 
> > 
> > (sorry for the inconvenience Andrew)
> >  
> > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote:
> > > (switched to email.  Please respond via emailed reply-to-all, not via the
> > > bugzilla web interface).
> > > 
> > > Seems that we have a nasty involving autofs, nfs and the VFS.
> > > 
> > > Mehmet, the kernel should have printed some diagnostics prior to doing
> > > the BUG() call:
> > > 
> > >                   if (dentry->d_count != 0) {
> > >                           printk(KERN_ERR
> > >                                  "BUG: Dentry %p{i=%lx,n=%s}"
> > >                                  " still in use (%d)"
> > >                                  " [unmount of %s %s]\n",
> > >                                  dentry,
> > >                                  dentry->d_inode ?
> > >                                  dentry->d_inode->i_ino : 0UL,
> > >                                  dentry->d_name.name,
> > >                                  dentry->d_count,
> > >                                  dentry->d_sb->s_type->name,
> > >                                  dentry->d_sb->s_id);
> > >                           BUG();
> > >                   }
> > > 
> > > Please find those in the log and email them to use - someone might find
> > > it useful.
> > > 
> > > 
> > > On Tue, 15 Mar 2011 21:02:23 GMT
> > > bugzilla-daemon@bugzilla.kernel.org wrote:
> > > 
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu>  2011-03-15
> 21:02:22 ---
> > > > Here is that crash happening again, the system was NOT running
> overclocked or
> > > > anything...
> > > > 
> > > > [ 1860.156122] ------------[ cut here ]------------
> > > > [ 1860.156124] kernel BUG at fs/dcache.c:943!
> > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> > > > [ 1860.156127] last sysfs file:
> /sys/devices/platform/it87.552/fan3_input
> > > > [ 1860.156128] CPU 3 
> > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat
> ipt_LOG
> > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac
> iptable_filter
> > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables
> x_tables
> > > > nvidia(P)
> > > > [ 1860.156137] 
> > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P           
> 2.6.38-rc8 #9
> > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>]  [<ffffffff810e9648>]
> > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> > > > 000000000003ffff
> > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> > > > ffffffff8174c9f8
> > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> > > > 0000000000000006
> > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > ffff88023a07f5e0
> > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> > > > ffff880211d38740
> > > > [ 1860.156155] FS:  00007f3428cb2700(0000) GS:ffff8800bfac0000(0000)
> > > > knlGS:00000000f74186c0
> > > > [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> > > > 00000000000006e0
> > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > > 0000000000000400
> > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo
> ffff8800be82e000, task
> > > > ffff880211fd5640)
> > > > [ 1860.156162] Stack:
> > > > [ 1860.156163]  ffff88020c05ce50 0000000000000000 ffff88023fc07128
> > > > ffff88020c05cc00
> > > > [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300
> > > > ffffffff810e96a4
> > > > [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0
> > > > ffffffff810d5d15
> > > > [ 1860.156169] Call Trace:
> > > > [ 1860.156172]  [<ffffffff810e96a4>] ?
> shrink_dcache_for_umount+0x54/0x60
> > > > [ 1860.156174]  [<ffffffff810d5d15>] ?
> generic_shutdown_super+0x25/0x100
> > > > [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> > > > [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> > > > [ 1860.156181]  [<ffffffff810d5f13>] ?
> deactivate_locked_super+0x43/0x70
> > > > [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> > > > [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> > > > [ 1860.156187]  [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b
> > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00 48
> 05 50
> > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35 00
> <0f> 0b
> > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> > > > [ 1860.156201] RIP  [<ffffffff810e9648>]
> > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > [ 1860.156204]  RSP <ffff8800be82fe08>
> > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> > > > 
> > > > -- 
> > > > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> > > > ------- You are receiving this mail because: -------
> > > > You are on the CC list for the bug.
> > > 
> > 
> > 
> 
>

Comment 10 Ian Kent 2011-03-16 15:22:19 UTC

On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote:
> Ian,
> 
> I am having much more frequent crashes now. I havent been able to
> cleanly reboot my machine yet and I have tried three times so far. Init
> scripts fail to unmount the file systems and I have to reboot manually

What do your autofs maps look like?

> 
> On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote:
> > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote:
> > > The missing piece is as follows:
> > > 
> > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry
> > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f]
> > 
> > This might be the same problem I saw and described in rc1.
> > However, for me the fs in the BUG() report was autofs.
> > Hopefully that just means my autofs setup is different.
> > 
> > At the moment I believe a dentry leak Al Viro spotted is the cause.
> > Please try this patch.
> > 
> > autofs4 - fix dentry leak in autofs4_expire_direct()
> > 
> > From: Ian Kent <raven@themaw.net>
> > 
> > There is a missing dput() when returning from autofs4_expire_direct()
> > when we see that the dentry is already a pending mount.
> > 
> > Signed-off-by: Ian Kent <raven@themaw.net>
> > ---
> > 
> >  fs/autofs4/expire.c |    7 +++----
> >  1 files changed, 3 insertions(+), 4 deletions(-)
> > 
> > 
> > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> > index c896dd6..c403abc 100644
> > --- a/fs/autofs4/expire.c
> > +++ b/fs/autofs4/expire.c
> > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct
> super_block *sb,
> >     spin_lock(&sbi->fs_lock);
> >     ino = autofs4_dentry_ino(root);
> >     /* No point expiring a pending mount */
> > -   if (ino->flags & AUTOFS_INF_PENDING) {
> > -           spin_unlock(&sbi->fs_lock);
> > -           return NULL;
> > -   }
> > +   if (ino->flags & AUTOFS_INF_PENDING)
> > +           goto out;
> >     if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
> >             struct autofs_info *ino = autofs4_dentry_ino(root);
> >             ino->flags |= AUTOFS_INF_EXPIRING;
> > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct super_block
> *sb,
> >             spin_unlock(&sbi->fs_lock);
> >             return root;
> >     }
> > +out:
> >     spin_unlock(&sbi->fs_lock);
> >     dput(root);
> >  
> > 
> > > 
> > > (sorry for the inconvenience Andrew)
> > >  
> > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote:
> > > > (switched to email.  Please respond via emailed reply-to-all, not via
> the
> > > > bugzilla web interface).
> > > > 
> > > > Seems that we have a nasty involving autofs, nfs and the VFS.
> > > > 
> > > > Mehmet, the kernel should have printed some diagnostics prior to doing
> > > > the BUG() call:
> > > > 
> > > >                         if (dentry->d_count != 0) {
> > > >                                 printk(KERN_ERR
> > > >                                        "BUG: Dentry %p{i=%lx,n=%s}"
> > > >                                        " still in use (%d)"
> > > >                                        " [unmount of %s %s]\n",
> > > >                                        dentry,
> > > >                                        dentry->d_inode ?
> > > >                                        dentry->d_inode->i_ino : 0UL,
> > > >                                        dentry->d_name.name,
> > > >                                        dentry->d_count,
> > > >                                        dentry->d_sb->s_type->name,
> > > >                                        dentry->d_sb->s_id);
> > > >                                 BUG();
> > > >                         }
> > > > 
> > > > Please find those in the log and email them to use - someone might find
> > > > it useful.
> > > > 
> > > > 
> > > > On Tue, 15 Mar 2011 21:02:23 GMT
> > > > bugzilla-daemon@bugzilla.kernel.org wrote:
> > > > 
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu>  2011-03-15
> 21:02:22 ---
> > > > > Here is that crash happening again, the system was NOT running
> overclocked or
> > > > > anything...
> > > > > 
> > > > > [ 1860.156122] ------------[ cut here ]------------
> > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943!
> > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> > > > > [ 1860.156127] last sysfs file:
> /sys/devices/platform/it87.552/fan3_input
> > > > > [ 1860.156128] CPU 3 
> > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat
> ipt_LOG
> > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac
> iptable_filter
> > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack ip_tables
> x_tables
> > > > > nvidia(P)
> > > > > [ 1860.156137] 
> > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P           
> 2.6.38-rc8 #9
> > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>]  [<ffffffff810e9648>]
> > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> > > > > 000000000003ffff
> > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> > > > > ffffffff8174c9f8
> > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> > > > > 0000000000000006
> > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > > ffff88023a07f5e0
> > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> > > > > ffff880211d38740
> > > > > [ 1860.156155] FS:  00007f3428cb2700(0000) GS:ffff8800bfac0000(0000)
> > > > > knlGS:00000000f74186c0
> > > > > [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> > > > > 00000000000006e0
> > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > 0000000000000000
> > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > > > 0000000000000400
> > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo
> ffff8800be82e000, task
> > > > > ffff880211fd5640)
> > > > > [ 1860.156162] Stack:
> > > > > [ 1860.156163]  ffff88020c05ce50 0000000000000000 ffff88023fc07128
> > > > > ffff88020c05cc00
> > > > > [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300
> > > > > ffffffff810e96a4
> > > > > [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0
> > > > > ffffffff810d5d15
> > > > > [ 1860.156169] Call Trace:
> > > > > [ 1860.156172]  [<ffffffff810e96a4>] ?
> shrink_dcache_for_umount+0x54/0x60
> > > > > [ 1860.156174]  [<ffffffff810d5d15>] ?
> generic_shutdown_super+0x25/0x100
> > > > > [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> > > > > [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> > > > > [ 1860.156181]  [<ffffffff810d5f13>] ?
> deactivate_locked_super+0x43/0x70
> > > > > [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> > > > > [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> > > > > [ 1860.156187]  [<ffffffff8100243b>] ? system_call_fastpath+0x16/0x1b
> > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00 00
> 48 05 50
> > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc 35
> 00 <0f> 0b
> > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> > > > > [ 1860.156201] RIP  [<ffffffff810e9648>]
> > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > [ 1860.156204]  RSP <ffff8800be82fe08>
> > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> > > > > 
> > > > > -- 
> > > > > Configure bugmail:
> https://bugzilla.kernel.org/userprefs.cgi?tab=email
> > > > > ------- You are receiving this mail because: -------
> > > > > You are on the CC list for the bug.
> > > > 
> > > 
> > > 
> > 
> > 
> 
>

Comment 11 Anonymous Emailer 2011-03-16 15:30:24 UTC

Reply-To: mgiritli@giritli.eu

On Wed, 2011-03-16 at 23:21 +0800, Ian Kent wrote:
> On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote:
> > Ian,
> > 
> > I am having much more frequent crashes now. I havent been able to
> > cleanly reboot my machine yet and I have tried three times so far. Init
> > scripts fail to unmount the file systems and I have to reboot manually
> 
> What do your autofs maps look like?
> 
> 

Here is  the contents of my auto.misc:

gollum-media            -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/mnt/media
gollum-distfiles        -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/usr/portage/distfiles
gollum-www              -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/var/www
gollum-WebDav           -rsize=8192,wsize=8192,soft,timeo=10,rw         gollum.giritli.eu:/var/dav

> > 
> > On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote:
> > > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote:
> > > > The missing piece is as follows:
> > > > 
> > > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry
> > > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f]
> > > 
> > > This might be the same problem I saw and described in rc1.
> > > However, for me the fs in the BUG() report was autofs.
> > > Hopefully that just means my autofs setup is different.
> > > 
> > > At the moment I believe a dentry leak Al Viro spotted is the cause.
> > > Please try this patch.
> > > 
> > > autofs4 - fix dentry leak in autofs4_expire_direct()
> > > 
> > > From: Ian Kent <raven@themaw.net>
> > > 
> > > There is a missing dput() when returning from autofs4_expire_direct()
> > > when we see that the dentry is already a pending mount.
> > > 
> > > Signed-off-by: Ian Kent <raven@themaw.net>
> > > ---
> > > 
> > >  fs/autofs4/expire.c |    7 +++----
> > >  1 files changed, 3 insertions(+), 4 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> > > index c896dd6..c403abc 100644
> > > --- a/fs/autofs4/expire.c
> > > +++ b/fs/autofs4/expire.c
> > > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct
> super_block *sb,
> > >   spin_lock(&sbi->fs_lock);
> > >   ino = autofs4_dentry_ino(root);
> > >   /* No point expiring a pending mount */
> > > - if (ino->flags & AUTOFS_INF_PENDING) {
> > > -         spin_unlock(&sbi->fs_lock);
> > > -         return NULL;
> > > - }
> > > + if (ino->flags & AUTOFS_INF_PENDING)
> > > +         goto out;
> > >   if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
> > >           struct autofs_info *ino = autofs4_dentry_ino(root);
> > >           ino->flags |= AUTOFS_INF_EXPIRING;
> > > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct
> super_block *sb,
> > >           spin_unlock(&sbi->fs_lock);
> > >           return root;
> > >   }
> > > +out:
> > >   spin_unlock(&sbi->fs_lock);
> > >   dput(root);
> > >  
> > > 
> > > > 
> > > > (sorry for the inconvenience Andrew)
> > > >  
> > > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote:
> > > > > (switched to email.  Please respond via emailed reply-to-all, not via
> the
> > > > > bugzilla web interface).
> > > > > 
> > > > > Seems that we have a nasty involving autofs, nfs and the VFS.
> > > > > 
> > > > > Mehmet, the kernel should have printed some diagnostics prior to
> doing
> > > > > the BUG() call:
> > > > > 
> > > > >                       if (dentry->d_count != 0) {
> > > > >                               printk(KERN_ERR
> > > > >                                      "BUG: Dentry %p{i=%lx,n=%s}"
> > > > >                                      " still in use (%d)"
> > > > >                                      " [unmount of %s %s]\n",
> > > > >                                      dentry,
> > > > >                                      dentry->d_inode ?
> > > > >                                      dentry->d_inode->i_ino : 0UL,
> > > > >                                      dentry->d_name.name,
> > > > >                                      dentry->d_count,
> > > > >                                      dentry->d_sb->s_type->name,
> > > > >                                      dentry->d_sb->s_id);
> > > > >                               BUG();
> > > > >                       }
> > > > > 
> > > > > Please find those in the log and email them to use - someone might
> find
> > > > > it useful.
> > > > > 
> > > > > 
> > > > > On Tue, 15 Mar 2011 21:02:23 GMT
> > > > > bugzilla-daemon@bugzilla.kernel.org wrote:
> > > > > 
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu>  2011-03-15
> 21:02:22 ---
> > > > > > Here is that crash happening again, the system was NOT running
> overclocked or
> > > > > > anything...
> > > > > > 
> > > > > > [ 1860.156122] ------------[ cut here ]------------
> > > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943!
> > > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> > > > > > [ 1860.156127] last sysfs file:
> /sys/devices/platform/it87.552/fan3_input
> > > > > > [ 1860.156128] CPU 3 
> > > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat nf_nat
> ipt_LOG
> > > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac
> iptable_filter
> > > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack
> ip_tables x_tables
> > > > > > nvidia(P)
> > > > > > [ 1860.156137] 
> > > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P           
> 2.6.38-rc8 #9
> > > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> > > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>]  [<ffffffff810e9648>]
> > > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> > > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> > > > > > 000000000003ffff
> > > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> > > > > > ffffffff8174c9f8
> > > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> > > > > > 0000000000000006
> > > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > > > ffff88023a07f5e0
> > > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> > > > > > ffff880211d38740
> > > > > > [ 1860.156155] FS:  00007f3428cb2700(0000)
> GS:ffff8800bfac0000(0000)
> > > > > > knlGS:00000000f74186c0
> > > > > > [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> > > > > > 00000000000006e0
> > > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > > 0000000000000000
> > > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > > > > 0000000000000400
> > > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo
> ffff8800be82e000, task
> > > > > > ffff880211fd5640)
> > > > > > [ 1860.156162] Stack:
> > > > > > [ 1860.156163]  ffff88020c05ce50 0000000000000000 ffff88023fc07128
> > > > > > ffff88020c05cc00
> > > > > > [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28 ffff88023f96e300
> > > > > > ffffffff810e96a4
> > > > > > [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00 ffffffff8146d4a0
> > > > > > ffffffff810d5d15
> > > > > > [ 1860.156169] Call Trace:
> > > > > > [ 1860.156172]  [<ffffffff810e96a4>] ?
> shrink_dcache_for_umount+0x54/0x60
> > > > > > [ 1860.156174]  [<ffffffff810d5d15>] ?
> generic_shutdown_super+0x25/0x100
> > > > > > [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> > > > > > [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> > > > > > [ 1860.156181]  [<ffffffff810d5f13>] ?
> deactivate_locked_super+0x43/0x70
> > > > > > [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> > > > > > [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> > > > > > [ 1860.156187]  [<ffffffff8100243b>] ?
> system_call_fastpath+0x16/0x1b
> > > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00
> 00 48 05 50
> > > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc
> 35 00 <0f> 0b
> > > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> > > > > > [ 1860.156201] RIP  [<ffffffff810e9648>]
> > > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > > [ 1860.156204]  RSP <ffff8800be82fe08>
> > > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> > > > > > 
> > > > > > -- 
> > > > > > Configure bugmail:
> https://bugzilla.kernel.org/userprefs.cgi?tab=email
> > > > > > ------- You are receiving this mail because: -------
> > > > > > You are on the CC list for the bug.
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> 
>

Comment 12 Ian Kent 2011-03-16 15:45:21 UTC

On Wed, 2011-03-16 at 17:29 +0200, Mehmet Giritli wrote:
> On Wed, 2011-03-16 at 23:21 +0800, Ian Kent wrote:
> > On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote:
> > > Ian,
> > > 
> > > I am having much more frequent crashes now. I havent been able to
> > > cleanly reboot my machine yet and I have tried three times so far. Init
> > > scripts fail to unmount the file systems and I have to reboot manually
> > 
> > What do your autofs maps look like?
> > 
> > 
> 
> Here is  the contents of my auto.misc:
> 
> gollum-media            -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/mnt/media
> gollum-distfiles        -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/usr/portage/distfiles
> gollum-www              -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/var/www
> gollum-WebDav           -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/var/dav

What, that's it, and your only using "/misc    /etc/auto.misc" in the
master map and your having problems.

Are the crashes always the same?
How have you established that the BUG()s are in fact due to automount
umounting mounts and that the BUG()s correspond to NFS mounts previously
mounted by autofs?
Is there any noise at all in the syslog?
Are you sure your using a kernel with the dentry leak patch?
What sort of automounting load is happening on the machine, ie.
frequency or mounts and umounts and what timeout are you using?

The dentry leak patch got rid of the BUG()s I was seeing but by that
time I did have a couple of other patches. I still don't think the other
patches made much difference for this particular case.

> 
> > > 
> > > On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote:
> > > > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote:
> > > > > The missing piece is as follows:
> > > > > 
> > > > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry
> > > > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f]
> > > > 
> > > > This might be the same problem I saw and described in rc1.
> > > > However, for me the fs in the BUG() report was autofs.
> > > > Hopefully that just means my autofs setup is different.
> > > > 
> > > > At the moment I believe a dentry leak Al Viro spotted is the cause.
> > > > Please try this patch.
> > > > 
> > > > autofs4 - fix dentry leak in autofs4_expire_direct()
> > > > 
> > > > From: Ian Kent <raven@themaw.net>
> > > > 
> > > > There is a missing dput() when returning from autofs4_expire_direct()
> > > > when we see that the dentry is already a pending mount.
> > > > 
> > > > Signed-off-by: Ian Kent <raven@themaw.net>
> > > > ---
> > > > 
> > > >  fs/autofs4/expire.c |    7 +++----
> > > >  1 files changed, 3 insertions(+), 4 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> > > > index c896dd6..c403abc 100644
> > > > --- a/fs/autofs4/expire.c
> > > > +++ b/fs/autofs4/expire.c
> > > > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct
> super_block *sb,
> > > >         spin_lock(&sbi->fs_lock);
> > > >         ino = autofs4_dentry_ino(root);
> > > >         /* No point expiring a pending mount */
> > > > -       if (ino->flags & AUTOFS_INF_PENDING) {
> > > > -               spin_unlock(&sbi->fs_lock);
> > > > -               return NULL;
> > > > -       }
> > > > +       if (ino->flags & AUTOFS_INF_PENDING)
> > > > +               goto out;
> > > >         if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
> > > >                 struct autofs_info *ino = autofs4_dentry_ino(root);
> > > >                 ino->flags |= AUTOFS_INF_EXPIRING;
> > > > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct
> super_block *sb,
> > > >                 spin_unlock(&sbi->fs_lock);
> > > >                 return root;
> > > >         }
> > > > +out:
> > > >         spin_unlock(&sbi->fs_lock);
> > > >         dput(root);
> > > >  
> > > > 
> > > > > 
> > > > > (sorry for the inconvenience Andrew)
> > > > >  
> > > > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote:
> > > > > > (switched to email.  Please respond via emailed reply-to-all, not
> via the
> > > > > > bugzilla web interface).
> > > > > > 
> > > > > > Seems that we have a nasty involving autofs, nfs and the VFS.
> > > > > > 
> > > > > > Mehmet, the kernel should have printed some diagnostics prior to
> doing
> > > > > > the BUG() call:
> > > > > > 
> > > > > >                     if (dentry->d_count != 0) {
> > > > > >                             printk(KERN_ERR
> > > > > >                                    "BUG: Dentry %p{i=%lx,n=%s}"
> > > > > >                                    " still in use (%d)"
> > > > > >                                    " [unmount of %s %s]\n",
> > > > > >                                    dentry,
> > > > > >                                    dentry->d_inode ?
> > > > > >                                    dentry->d_inode->i_ino : 0UL,
> > > > > >                                    dentry->d_name.name,
> > > > > >                                    dentry->d_count,
> > > > > >                                    dentry->d_sb->s_type->name,
> > > > > >                                    dentry->d_sb->s_id);
> > > > > >                             BUG();
> > > > > >                     }
> > > > > > 
> > > > > > Please find those in the log and email them to use - someone might
> find
> > > > > > it useful.
> > > > > > 
> > > > > > 
> > > > > > On Tue, 15 Mar 2011 21:02:23 GMT
> > > > > > bugzilla-daemon@bugzilla.kernel.org wrote:
> > > > > > 
> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> 
> 2011-03-15 21:02:22 ---
> > > > > > > Here is that crash happening again, the system was NOT running
> overclocked or
> > > > > > > anything...
> > > > > > > 
> > > > > > > [ 1860.156122] ------------[ cut here ]------------
> > > > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943!
> > > > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> > > > > > > [ 1860.156127] last sysfs file:
> /sys/devices/platform/it87.552/fan3_input
> > > > > > > [ 1860.156128] CPU 3 
> > > > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat
> nf_nat ipt_LOG
> > > > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac
> iptable_filter
> > > > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack
> ip_tables x_tables
> > > > > > > nvidia(P)
> > > > > > > [ 1860.156137] 
> > > > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P           
> 2.6.38-rc8 #9
> > > > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> > > > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] 
> [<ffffffff810e9648>]
> > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> > > > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> > > > > > > 000000000003ffff
> > > > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> > > > > > > ffffffff8174c9f8
> > > > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> > > > > > > 0000000000000006
> > > > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > > > > ffff88023a07f5e0
> > > > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> > > > > > > ffff880211d38740
> > > > > > > [ 1860.156155] FS:  00007f3428cb2700(0000)
> GS:ffff8800bfac0000(0000)
> > > > > > > knlGS:00000000f74186c0
> > > > > > > [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> > > > > > > 00000000000006e0
> > > > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > > > 0000000000000000
> > > > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > > > > > 0000000000000400
> > > > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo
> ffff8800be82e000, task
> > > > > > > ffff880211fd5640)
> > > > > > > [ 1860.156162] Stack:
> > > > > > > [ 1860.156163]  ffff88020c05ce50 0000000000000000
> ffff88023fc07128
> > > > > > > ffff88020c05cc00
> > > > > > > [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28
> ffff88023f96e300
> > > > > > > ffffffff810e96a4
> > > > > > > [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00
> ffffffff8146d4a0
> > > > > > > ffffffff810d5d15
> > > > > > > [ 1860.156169] Call Trace:
> > > > > > > [ 1860.156172]  [<ffffffff810e96a4>] ?
> shrink_dcache_for_umount+0x54/0x60
> > > > > > > [ 1860.156174]  [<ffffffff810d5d15>] ?
> generic_shutdown_super+0x25/0x100
> > > > > > > [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> > > > > > > [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> > > > > > > [ 1860.156181]  [<ffffffff810d5f13>] ?
> deactivate_locked_super+0x43/0x70
> > > > > > > [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> > > > > > > [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> > > > > > > [ 1860.156187]  [<ffffffff8100243b>] ?
> system_call_fastpath+0x16/0x1b
> > > > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00 00
> 00 48 05 50
> > > > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1 bc
> 35 00 <0f> 0b
> > > > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> > > > > > > [ 1860.156201] RIP  [<ffffffff810e9648>]
> > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > > > [ 1860.156204]  RSP <ffff8800be82fe08>
> > > > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> > > > > > > 
> > > > > > > -- 
> > > > > > > Configure bugmail:
> https://bugzilla.kernel.org/userprefs.cgi?tab=email
> > > > > > > ------- You are receiving this mail because: -------
> > > > > > > You are on the CC list for the bug.
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comment 13 Anonymous Emailer 2011-03-16 17:48:19 UTC

Reply-To: mgiritli@giritli.eu

Ian,

Well, here is a copy-paste of the crash that I am seeing *everytime at
shutdown* now that I am running with your patch. It is pretty much
identical to the older ones..except that I see this at every time,
instead of just rare crashes...

Now I am going to revert your patch and check for changes in the
frequency of the crashes...


Mar 16 18:49:27 mordor kernel: [ 5696.670114] BUG: Dentry
ffff88015007f300{i=2,n=donkey} still in use (1) [unmount of nfs 0:f]
Mar 16 18:49:27 mordor kernel: [ 5696.670134] ------------[ cut
here ]------------
Mar 16 18:49:27 mordor kernel: [ 5696.670187] kernel BUG at
fs/dcache.c:943!
Mar 16 18:49:27 mordor kernel: [ 5696.670237] invalid opcode: 0000 [#1]
SMP
Mar 16 18:49:27 mordor kernel: [ 5696.670369] last sysfs
file: /sys/devices/platform/it87.552/pwm1_enable
Mar 16 18:49:27 mordor kernel: [ 5696.670421] CPU 2
Mar 16 18:49:27 mordor kernel: [ 5696.670466] Modules linked in: ipt_LOG
xt_tcpudp xt_state xt_mac iptable_filter iptable_nat nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle xt_multiport xt_mark
xt_conntrack xt_connmark nf_conntr$
Mar 16 18:49:27 mordor kernel: [ 5696.671003]
Mar 16 18:49:27 mordor kernel: [ 5696.671003] Pid: 21015, comm:
umount.nfs Tainted: P            2.6.38-gentoo #3 Gigabyte Technology
Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
Mar 16 18:49:27 mordor kernel: [ 5696.671003] RIP:
0010:[<ffffffff810e95c8>]  [<ffffffff810e95c8>]
shrink_dcache_for_umount_subtree+0x268/0x270
Mar 16 18:49:27 mordor kernel: [ 5696.671003] RSP: 0018:ffff880219903e08
EFLAGS: 00010296
Mar 16 18:49:27 mordor kernel: [ 5696.671003] RAX: 0000000000000066 RBX:
ffff88015007f300 RCX: 000000000003ffff
Mar 16 18:49:27 mordor kernel: [ 5696.671003] RDX: ffffffff81623888 RSI:
0000000000000046 RDI: ffffffff817509f8
Mar 16 18:49:27 mordor kernel: [ 5696.671003] RBP: ffff88023a1480c0 R08:
000000000001fa7c R09: 0000000000000006
Mar 16 18:49:27 mordor kernel: [ 5696.671003] R10: 0000000000000000 R11:
0000000000000000 R12: ffff88015007f3a0
Mar 16 18:49:27 mordor kernel: [ 5696.671003] R13: ffff88023a14811c R14:
ffff880219903f18 R15: ffff8802107d9640
Mar 16 18:49:27 mordor kernel: [ 5696.671003] FS:
00007ff9ec925700(0000) GS:ffff8800bfa80000(0000) knlGS:0000000000000000
Mar 16 18:49:27 mordor kernel: [ 5696.671003] CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Mar 16 18:49:27 mordor kernel: [ 5696.671003] CR2: 00007fd326cc2bc8 CR3:
0000000232272000 CR4: 00000000000006e0
Mar 16 18:49:27 mordor kernel: [ 5696.671003] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Mar 16 18:49:27 mordor kernel: [ 5696.671003] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Mar 16 18:49:27 mordor kernel: [ 5696.671003] Process umount.nfs (pid:
21015, threadinfo ffff880219902000, task ffff88023ca16780)
Mar 16 18:49:27 mordor kernel: [ 5696.671003] Stack:
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  ffff880219af1650
0000000000000000 ffff88023fc07128 ffff880219af1400
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  ffff88023a1486c0
ffff880219903f28 ffff88023a03a180 ffffffff810e9624
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  ffff88023f4f2480
ffff880219af1400 ffffffff81471460 ffffffff810d5ce5
Mar 16 18:49:27 mordor kernel: [ 5696.671003] Call Trace:
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  [<ffffffff810e9624>] ?
shrink_dcache_for_umount+0x54/0x60
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  [<ffffffff810d5ce5>] ?
generic_shutdown_super+0x25/0x100
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  [<ffffffff810d5e49>] ?
kill_anon_super+0x9/0x40
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  [<ffffffff81179acd>] ?
nfs_kill_super+0xd/0x20
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  [<ffffffff810d5ee3>] ?
deactivate_locked_super+0x43/0x70
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  [<ffffffff810ef4d0>] ?
release_mounts+0x70/0x90
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  [<ffffffff810efa44>] ?
sys_umount+0x314/0x3d0
Mar 16 18:49:27 mordor kernel: [ 5696.671003]  [<ffffffff8100243b>] ?
system_call_fastpath+0x16/0x1b
Mar 16 18:49:27 mordor kernel: [ 5696.671003] Code: 8b 0a 31 d2 48 85 f6
74 07 48 8b 96 a8 00 00 00 48 05 50 02 00 00 48 89 de 48 c7 c7 e0 7f 52
81 48 89 04 24 31 c0 e8 f1 f0 35 00 <0f> 0b eb fe 0f 0b eb fe 55 53 48
89 fb 48 8d 7f 68 48$
Mar 16 18:49:27 mordor kernel: [ 5696.678048] RIP  [<ffffffff810e95c8>]
shrink_dcache_for_umount_subtree+0x268/0x270
Mar 16 18:49:27 mordor kernel: [ 5696.678048]  RSP <ffff880219903e08>
Mar 16 18:49:27 mordor kernel: [ 5696.678552] ---[ end trace
24269d237584cd43 ]---
Mar 16 18:49:28 mordor mountd[4362]: Caught signal 15, un-registering
and exiting.
Mar 16 18:49:28 mordor kernel: [ 5697.256066] nfsd: last server has
exited, flushing export cache
Mar 16 18:49:28 mordor kernel: [ 5697.272903] nfsd: last server has
exited, flushing export cache

Comment 14 Anonymous Emailer 2011-03-16 17:50:54 UTC

Reply-To: mgiritli@giritli.eu

On Wed, 2011-03-16 at 23:44 +0800, Ian Kent wrote:
> On Wed, 2011-03-16 at 17:29 +0200, Mehmet Giritli wrote:
> > On Wed, 2011-03-16 at 23:21 +0800, Ian Kent wrote:
> > > On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote:
> > > > Ian,
> > > > 
> > > > I am having much more frequent crashes now. I havent been able to
> > > > cleanly reboot my machine yet and I have tried three times so far. Init
> > > > scripts fail to unmount the file systems and I have to reboot manually
> > > 
> > > What do your autofs maps look like?
> > > 
> > > 
> > 
> > Here is  the contents of my auto.misc:
> > 
> > gollum-media            -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/mnt/media
> > gollum-distfiles        -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/usr/portage/distfiles
> > gollum-www              -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/var/www
> > gollum-WebDav           -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/var/dav
> 
> What, that's it, and your only using "/misc    /etc/auto.misc" in the
> master map and your having problems.

yes

> 
> Are the crashes always the same?

identical

> How have you established that the BUG()s are in fact due to automount
> umounting mounts and that the BUG()s correspond to NFS mounts previously
> mounted by autofs?

I havent established anything. However, thats the only way I mount nfs
and my file manager hangs, init scripts hang when trying to unmount...

> Is there any noise at all in the syslog?

nothing unusual

> Are you sure your using a kernel with the dentry leak patch?

yes

> What sort of automounting load is happening on the machine, ie.
> frequency or mounts and umounts and what timeout are you using?

from auto.master:

/mnt/autofs     /etc/auto.misc  --timeout=300 --ghost

Not very much. Lets say 2-3 times every hour for each mount point.

> The dentry leak patch got rid of the BUG()s I was seeing but by that
> time I did have a couple of other patches. I still don't think the other
> patches made much difference for this particular case.
> 
> > 
> > > > 
> > > > On Wed, 2011-03-16 at 10:32 +0800, Ian Kent wrote:
> > > > > On Wed, 2011-03-16 at 01:54 +0200, Mehmet Giritli wrote:
> > > > > > The missing piece is as follows:
> > > > > > 
> > > > > > Mar 15 22:37:38 mordor kernel: [ 1860.156114] BUG: Dentry
> > > > > > ffff88023f96e600{i=25f56f,n=} still in use (1) [unmount of nfs 0:f]
> > > > > 
> > > > > This might be the same problem I saw and described in rc1.
> > > > > However, for me the fs in the BUG() report was autofs.
> > > > > Hopefully that just means my autofs setup is different.
> > > > > 
> > > > > At the moment I believe a dentry leak Al Viro spotted is the cause.
> > > > > Please try this patch.
> > > > > 
> > > > > autofs4 - fix dentry leak in autofs4_expire_direct()
> > > > > 
> > > > > From: Ian Kent <raven@themaw.net>
> > > > > 
> > > > > There is a missing dput() when returning from autofs4_expire_direct()
> > > > > when we see that the dentry is already a pending mount.
> > > > > 
> > > > > Signed-off-by: Ian Kent <raven@themaw.net>
> > > > > ---
> > > > > 
> > > > >  fs/autofs4/expire.c |    7 +++----
> > > > >  1 files changed, 3 insertions(+), 4 deletions(-)
> > > > > 
> > > > > 
> > > > > diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
> > > > > index c896dd6..c403abc 100644
> > > > > --- a/fs/autofs4/expire.c
> > > > > +++ b/fs/autofs4/expire.c
> > > > > @@ -290,10 +290,8 @@ struct dentry *autofs4_expire_direct(struct
> super_block *sb,
> > > > >       spin_lock(&sbi->fs_lock);
> > > > >       ino = autofs4_dentry_ino(root);
> > > > >       /* No point expiring a pending mount */
> > > > > -     if (ino->flags & AUTOFS_INF_PENDING) {
> > > > > -             spin_unlock(&sbi->fs_lock);
> > > > > -             return NULL;
> > > > > -     }
> > > > > +     if (ino->flags & AUTOFS_INF_PENDING)
> > > > > +             goto out;
> > > > >       if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
> > > > >               struct autofs_info *ino = autofs4_dentry_ino(root);
> > > > >               ino->flags |= AUTOFS_INF_EXPIRING;
> > > > > @@ -301,6 +299,7 @@ struct dentry *autofs4_expire_direct(struct
> super_block *sb,
> > > > >               spin_unlock(&sbi->fs_lock);
> > > > >               return root;
> > > > >       }
> > > > > +out:
> > > > >       spin_unlock(&sbi->fs_lock);
> > > > >       dput(root);
> > > > >  
> > > > > 
> > > > > > 
> > > > > > (sorry for the inconvenience Andrew)
> > > > > >  
> > > > > > On Tue, 2011-03-15 at 14:24 -0700, Andrew Morton wrote:
> > > > > > > (switched to email.  Please respond via emailed reply-to-all, not
> via the
> > > > > > > bugzilla web interface).
> > > > > > > 
> > > > > > > Seems that we have a nasty involving autofs, nfs and the VFS.
> > > > > > > 
> > > > > > > Mehmet, the kernel should have printed some diagnostics prior to
> doing
> > > > > > > the BUG() call:
> > > > > > > 
> > > > > > >                   if (dentry->d_count != 0) {
> > > > > > >                           printk(KERN_ERR
> > > > > > >                                  "BUG: Dentry %p{i=%lx,n=%s}"
> > > > > > >                                  " still in use (%d)"
> > > > > > >                                  " [unmount of %s %s]\n",
> > > > > > >                                  dentry,
> > > > > > >                                  dentry->d_inode ?
> > > > > > >                                  dentry->d_inode->i_ino : 0UL,
> > > > > > >                                  dentry->d_name.name,
> > > > > > >                                  dentry->d_count,
> > > > > > >                                  dentry->d_sb->s_type->name,
> > > > > > >                                  dentry->d_sb->s_id);
> > > > > > >                           BUG();
> > > > > > >                   }
> > > > > > > 
> > > > > > > Please find those in the log and email them to use - someone
> might find
> > > > > > > it useful.
> > > > > > > 
> > > > > > > 
> > > > > > > On Tue, 15 Mar 2011 21:02:23 GMT
> > > > > > > bugzilla-daemon@bugzilla.kernel.org wrote:
> > > > > > > 
> > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=30882
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > --- Comment #4 from Mehmet Giritli <mehmet@giritli.eu> 
> 2011-03-15 21:02:22 ---
> > > > > > > > Here is that crash happening again, the system was NOT running
> overclocked or
> > > > > > > > anything...
> > > > > > > > 
> > > > > > > > [ 1860.156122] ------------[ cut here ]------------
> > > > > > > > [ 1860.156124] kernel BUG at fs/dcache.c:943!
> > > > > > > > [ 1860.156126] invalid opcode: 0000 [#1] SMP 
> > > > > > > > [ 1860.156127] last sysfs file:
> /sys/devices/platform/it87.552/fan3_input
> > > > > > > > [ 1860.156128] CPU 3 
> > > > > > > > [ 1860.156129] Modules linked in: iptable_mangle iptable_nat
> nf_nat ipt_LOG
> > > > > > > > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_mac
> iptable_filter
> > > > > > > > xt_multiport xt_mark xt_conntrack xt_connmark nf_conntrack
> ip_tables x_tables
> > > > > > > > nvidia(P)
> > > > > > > > [ 1860.156137] 
> > > > > > > > [ 1860.156139] Pid: 7388, comm: umount.nfs Tainted: P          
>  2.6.38-rc8 #9
> > > > > > > > Gigabyte Technology Co., Ltd. GA-790FXTA-UD5/GA-790FXTA-UD5
> > > > > > > > [ 1860.156142] RIP: 0010:[<ffffffff810e9648>] 
> [<ffffffff810e9648>]
> > > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > > > > [ 1860.156147] RSP: 0018:ffff8800be82fe08  EFLAGS: 00010296
> > > > > > > > [ 1860.156149] RAX: 0000000000000065 RBX: ffff88023f96e600 RCX:
> > > > > > > > 000000000003ffff
> > > > > > > > [ 1860.156150] RDX: ffffffff8161f888 RSI: 0000000000000046 RDI:
> > > > > > > > ffffffff8174c9f8
> > > > > > > > [ 1860.156151] RBP: ffff88023f96e600 R08: 0000000000012c37 R09:
> > > > > > > > 0000000000000006
> > > > > > > > [ 1860.156152] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > > > > > ffff88023a07f5e0
> > > > > > > > [ 1860.156154] R13: ffff88023f96e65c R14: ffff8800be82ff18 R15:
> > > > > > > > ffff880211d38740
> > > > > > > > [ 1860.156155] FS:  00007f3428cb2700(0000)
> GS:ffff8800bfac0000(0000)
> > > > > > > > knlGS:00000000f74186c0
> > > > > > > > [ 1860.156156] CS:  0010 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> > > > > > > > [ 1860.156157] CR2: 00007f7c97da1000 CR3: 00000000bea08000 CR4:
> > > > > > > > 00000000000006e0
> > > > > > > > [ 1860.156159] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > > > > 0000000000000000
> > > > > > > > [ 1860.156160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > > > > > > > 0000000000000400
> > > > > > > > [ 1860.156161] Process umount.nfs (pid: 7388, threadinfo
> ffff8800be82e000, task
> > > > > > > > ffff880211fd5640)
> > > > > > > > [ 1860.156162] Stack:
> > > > > > > > [ 1860.156163]  ffff88020c05ce50 0000000000000000
> ffff88023fc07128
> > > > > > > > ffff88020c05cc00
> > > > > > > > [ 1860.156165]  ffff88023f96e6c0 ffff8800be82ff28
> ffff88023f96e300
> > > > > > > > ffffffff810e96a4
> > > > > > > > [ 1860.156167]  ffff88023f49f480 ffff88020c05cc00
> ffffffff8146d4a0
> > > > > > > > ffffffff810d5d15
> > > > > > > > [ 1860.156169] Call Trace:
> > > > > > > > [ 1860.156172]  [<ffffffff810e96a4>] ?
> shrink_dcache_for_umount+0x54/0x60
> > > > > > > > [ 1860.156174]  [<ffffffff810d5d15>] ?
> generic_shutdown_super+0x25/0x100
> > > > > > > > [ 1860.156176]  [<ffffffff810d5e79>] ? kill_anon_super+0x9/0x40
> > > > > > > > [ 1860.156179]  [<ffffffff81179aed>] ? nfs_kill_super+0xd/0x20
> > > > > > > > [ 1860.156181]  [<ffffffff810d5f13>] ?
> deactivate_locked_super+0x43/0x70
> > > > > > > > [ 1860.156183]  [<ffffffff810ef4d8>] ? release_mounts+0x68/0x90
> > > > > > > > [ 1860.156185]  [<ffffffff810efa54>] ? sys_umount+0x314/0x3d0
> > > > > > > > [ 1860.156187]  [<ffffffff8100243b>] ?
> system_call_fastpath+0x16/0x1b
> > > > > > > > [ 1860.156188] Code: 8b 0a 31 d2 48 85 f6 74 07 48 8b 96 a8 00
> 00 00 48 05 50
> > > > > > > > 02 00 00 48 89 de 48 c7 c7 40 3a 52 81 48 89 04 24 31 c0 e8 a1
> bc 35 00 <0f> 0b
> > > > > > > > eb fe 0f 0b eb fe 55 53 48 89 fb 48 8d 7f 68 48 83 ec 08 
> > > > > > > > [ 1860.156201] RIP  [<ffffffff810e9648>]
> > > > > > > > shrink_dcache_for_umount_subtree+0x268/0x270
> > > > > > > > [ 1860.156204]  RSP <ffff8800be82fe08>
> > > > > > > > [ 1860.156205] ---[ end trace ee03486c16c108a7 ]---
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > Configure bugmail:
> https://bugzilla.kernel.org/userprefs.cgi?tab=email
> > > > > > > > ------- You are receiving this mail because: -------
> > > > > > > > You are on the CC list for the bug.
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>

Comment 15 Ian Kent 2011-03-17 02:37:02 UTC

On Wed, 2011-03-16 at 19:47 +0200, Mehmet Giritli wrote:
> Well, here is a copy-paste of the crash that I am seeing *everytime at
> shutdown* now that I am running with your patch. It is pretty much
> identical to the older ones..except that I see this at every time,
> instead of just rare crashes...

That's odd because, with the maps your using, the patch changes code
that's never executed.

Comment 16 Ian Kent 2011-03-18 04:50:50 UTC

(In reply to comment #0)
> From the crash report which I promise to provide here in a while, it seems
> like
> it has to do with nfs + automount that I use. For now let me attach dmesg and
> will try to log the dmesg with a crash in it. After compiling the kernel
> without the option in question, there are no more crashes.

Umm .. what option?

Comment 17 Ian Kent 2011-03-18 04:57:00 UTC

On Wed, 2011-03-16 at 17:50 +0000, bugzilla-daemon@bugzilla.kernel.org 
> --- Comment #14 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org> 
> 2011-03-16 17:50:54 ---
> Reply-To: mgiritli@giritli.eu
> 
> On Wed, 2011-03-16 at 23:44 +0800, Ian Kent wrote:
> > On Wed, 2011-03-16 at 17:29 +0200, Mehmet Giritli wrote:
> > > On Wed, 2011-03-16 at 23:21 +0800, Ian Kent wrote:
> > > > On Wed, 2011-03-16 at 16:27 +0200, Mehmet Giritli wrote:
> > > > > Ian,
> > > > > 
> > > > > I am having much more frequent crashes now. I havent been able to
> > > > > cleanly reboot my machine yet and I have tried three times so far.
> Init
> > > > > scripts fail to unmount the file systems and I have to reboot
> manually

I expect you have had to revert to an earlier kernel.
Hopefully you will be willing to test some more when you get time.

As I said with the maps your using the change I sent won't make any
difference.

The other thing is that the autofs Connectathon test that I also use for
testing worked OK from rc1 and it has a bunch of stuff in it just like
your map. So what your seeing is a bit of a surprise.

There must be something in your environment that is causing some
unexpected usage pattern. The problem is working out what that is.
Perhaps we could enable autofs debug logging and see if that gives us
any information about it. If you do enable debug loggin in the autofs
configuration remember to ensure that syslog is logging daemon.*
somewhere or we won't get the full debug log output.

> > > > 
> > > > What do your autofs maps look like?
> > > > 
> > > > 
> > > 
> > > Here is  the contents of my auto.misc:
> > > 
> > > gollum-media            -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/mnt/media
> > > gollum-distfiles        -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/usr/portage/distfiles
> > > gollum-www              -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/var/www
> > > gollum-WebDav           -rsize=8192,wsize=8192,soft,timeo=10,rw        
> gollum.giritli.eu:/var/dav
> > 
> > What, that's it, and your only using "/misc    /etc/auto.misc" in the
> > master map and your having problems.
> 
> yes
> 
> > 
> > Are the crashes always the same?
> 
> identical
> 
> > How have you established that the BUG()s are in fact due to automount
> > umounting mounts and that the BUG()s correspond to NFS mounts previously
> > mounted by autofs?
> 
> I havent established anything. However, thats the only way I mount nfs
> and my file manager hangs, init scripts hang when trying to unmount...

The file manager hang is also interesting because I only ever saw BUG()s
at shutdown after all activity was finished. It sounds like the init
scripts hang because of the BUG() at shutdown. Could you try and get a
gdb backtrace of automount when the file manager hangs, or has the
machine already BUG()ed? Use "thr a a bt" to get the backtrace. Note
that a backtrace of automount without gdb debug information isn't useful
so you'll need find out how your distribution makes applications with
debugging information available and use that.

One thing that tends to happen in graphical environments is file system
scanning to keep file views up to date. This tends to happen when the
mount table changes causing immediate expire to mount activity.

In 2.6.38-rc the concurrent merge of the vfs-scale patch series along
with the vfs-automount series caught me by surprise. Both of these patch
series make fairly significant changes to the same general area of the
VFS and that has caused problems. I've been working on this for a while
now and I have a series of patches (although not finished yet) that
might help your situation. I've just about finished testing against
2.6.38 now (previous last was rc7) so I could send a combined diff of
those changes for you to test if you are still willing.

Of course assumes that this is an autofs problem which might not be the
case.

> 
> > Is there any noise at all in the syslog?
> 
> nothing unusual
> 
> > Are you sure your using a kernel with the dentry leak patch?
> 
> yes
> 
> > What sort of automounting load is happening on the machine, ie.
> > frequency or mounts and umounts and what timeout are you using?
> 
> from auto.master:
> 
> /mnt/autofs     /etc/auto.misc  --timeout=300 --ghost
> 
> Not very much. Lets say 2-3 times every hour for each mount point.
> 

Do any of the exports from the NFS server that your using in the map
entries above have nohide exports within them? That is to say is NFS
doing any of its own automounting?

Ian

Comment 18 Mehmet Giritli 2011-03-18 11:09:33 UTC

(In reply to comment #16)
> (In reply to comment #0)
> > From the crash report which I promise to provide here in a while, it seems
> like
> > it has to do with nfs + automount that I use. For now let me attach dmesg
> and
> > will try to log the dmesg with a crash in it. After compiling the kernel
> > without the option in question, there are no more crashes.
> 
> Umm .. what option?

Initially I thought the problem was related to CONFIG_SCHED_AUTOGROUP option that I turned on in the kernel. Because that is when I had this bug. But that is perhaps just a coincidence now that I am having crashes without it.

Comment 19 Mehmet Giritli 2011-03-18 11:29:21 UTC

> 
> I expect you have had to revert to an earlier kernel.
> Hopefully you will be willing to test some more when you get time.
> 
> As I said with the maps your using the change I sent won't make any
> difference.
> 
> The other thing is that the autofs Connectathon test that I also use for
> testing worked OK from rc1 and it has a bunch of stuff in it just like
> your map. So what your seeing is a bit of a surprise.
> 
> There must be something in your environment that is causing some
> unexpected usage pattern. The problem is working out what that is.
> Perhaps we could enable autofs debug logging and see if that gives us
> any information about it. If you do enable debug loggin in the autofs
> configuration remember to ensure that syslog is logging daemon.*
> somewhere or we won't get the full debug log output.

I have been testing the latest mainline kernel 2.6.38 with partially disabling things and trying to get some pattern. In the mean time, I upgraded my network to nfs4, hoping that might change things. As I found out this morning, it didn't. I still have crashes but less frequently. Now I am trying it without autofs, using manual nfs mounting. Will report back after a while or whenever I get crashes. Perhaps we can focus on autofs if I don't get any crashes from this point on.

> The file manager hang is also interesting because I only ever saw BUG()s
> at shutdown after all activity was finished. It sounds like the init
> scripts hang because of the BUG() at shutdown. Could you try and get a
> gdb backtrace of automount when the file manager hangs, or has the
> machine already BUG()ed? Use "thr a a bt" to get the backtrace. Note
> that a backtrace of automount without gdb debug information isn't useful
> so you'll need find out how your distribution makes applications with
> debugging information available and use that.
> 
> One thing that tends to happen in graphical environments is file system
> scanning to keep file views up to date. This tends to happen when the
> mount table changes causing immediate expire to mount activity.
> 
> In 2.6.38-rc the concurrent merge of the vfs-scale patch series along
> with the vfs-automount series caught me by surprise. Both of these patch
> series make fairly significant changes to the same general area of the
> VFS and that has caused problems. I've been working on this for a while
> now and I have a series of patches (although not finished yet) that
> might help your situation. I've just about finished testing against
> 2.6.38 now (previous last was rc7) so I could send a combined diff of
> those changes for you to test if you are still willing.

Yep, send them this way please...no problem testing...

> 
> Of course assumes that this is an autofs problem which might not be the
> case.
> 

I think my current test mode (disabling autofs) will answer that at least..

> 
> Do any of the exports from the NFS server that your using in the map
> entries above have nohide exports within them? That is to say is NFS
> doing any of its own automounting?

Yes, they do. As of yesterday, I upgraded to NFSv4. My new exports look like this:

/exports 192.168.2.0/24(rw,fsid=0,insecure,no_subtree_check,async)
/exports/media 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide)
/exports/media/backups 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide)
/exports/media/donkey 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide)
/exports/distfiles 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide,no_root_squash)
/exports/www 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide)

The old one was as follows (NFSv3):

/mnt/media 192.168.2.0/24(async,rw,no_subtree_check,nohide,crossmnt)
/usr/portage/distfiles 192.168.2.0/24(async,rw,no_subtree_check,no_root_squash,nohide,crossmnt)
/var/www/ 192.168.2.3(async,rw,no_subtree_check,no_root_squash,nohide)

But since I am having identical crashes with both, the distinction between two versions of nfs is perhaps irrelevant.

/mnt/media contains multiple sub-mounts. Some are LVM and some are just plain ext4 parititons.

/usr/portage/distfiles is also a (single) mount point for an ext4 partition.

/var/www does not contain any sub-mounts.

I see no errors or anything unusual in my server logs.

> 
> Ian

Comment 20 Ian Kent 2011-03-20 06:11:48 UTC

On Fri, 2011-03-18 at 11:29 +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> --- Comment #19 from Mehmet Giritli <mehmet@giritli.eu>  2011-03-18 11:29:21
> ---
> > 
> > The other thing is that the autofs Connectathon test that I also use for
> > testing worked OK from rc1 and it has a bunch of stuff in it just like
> > your map. So what your seeing is a bit of a surprise.
> > 
> > There must be something in your environment that is causing some
> > unexpected usage pattern. The problem is working out what that is.
> > Perhaps we could enable autofs debug logging and see if that gives us
> > any information about it. If you do enable debug loggin in the autofs
> > configuration remember to ensure that syslog is logging daemon.*
> > somewhere or we won't get the full debug log output.
> 
> I have been testing the latest mainline kernel 2.6.38 with partially
> disabling
> things and trying to get some pattern. In the mean time, I upgraded my
> network
> to nfs4, hoping that might change things. As I found out this morning, it
> didn't. I still have crashes but less frequently. Now I am trying it without
> autofs, using manual nfs mounting. Will report back after a while or whenever
> I
> get crashes. Perhaps we can focus on autofs if I don't get any crashes from
> this point on.

That may show a difference but I expect you won't see the BUG()s at all.
I believe the problem is related to concurrent expire to mount and
even if the problems go away we still can't say that the issue is
entirely autofs because autofs will just facilitate the expire to mount
behavior that triggers the problem.

> 
> > The file manager hang is also interesting because I only ever saw BUG()s
> > at shutdown after all activity was finished. It sounds like the init
> > scripts hang because of the BUG() at shutdown. Could you try and get a
> > gdb backtrace of automount when the file manager hangs, or has the
> > machine already BUG()ed? Use "thr a a bt" to get the backtrace. Note
> > that a backtrace of automount without gdb debug information isn't useful
> > so you'll need find out how your distribution makes applications with
> > debugging information available and use that.

Can you offer any more info on this?

> > 
> > One thing that tends to happen in graphical environments is file system
> > scanning to keep file views up to date. This tends to happen when the
> > mount table changes causing immediate expire to mount activity.
> > 
> > In 2.6.38-rc the concurrent merge of the vfs-scale patch series along
> > with the vfs-automount series caught me by surprise. Both of these patch
> > series make fairly significant changes to the same general area of the
> > VFS and that has caused problems. I've been working on this for a while
> > now and I have a series of patches (although not finished yet) that
> > might help your situation. I've just about finished testing against
> > 2.6.38 now (previous last was rc7) so I could send a combined diff of
> > those changes for you to test if you are still willing.
> 
> Yep, send them this way please...no problem testing...

OK, I'll send individual patches in a series so that you can see the
individual patch descriptions and have some idea of what they are might
be fixing.

> 
> > 
> > Of course assumes that this is an autofs problem which might not be the
> > case.
> > 
> 
> I think my current test mode (disabling autofs) will answer that at least..

I suspect it won't really do that although it may appear so.

The other problem is that by changing autofs I may well be hiding
problems in NFS, exposed by the 2.6.38 path walking changes.

> 
> > 
> > Do any of the exports from the NFS server that your using in the map
> > entries above have nohide exports within them? That is to say is NFS
> > doing any of its own automounting?
> 
> Yes, they do. As of yesterday, I upgraded to NFSv4. My new exports look like
> this:
> 
> /exports 192.168.2.0/24(rw,fsid=0,insecure,no_subtree_check,async)
> /exports/media 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide)
> /exports/media/backups
> 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide)
> /exports/media/donkey
> 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide)
> /exports/distfiles
> 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide,no_root_squash)
> /exports/www 192.168.2.0/24(rw,insecure,no_subtree_check,async,nohide)

Right, so that has a chance of exposing problems with the NFS path walk.

Ian

Comment 21 Frank de Lange 2011-03-22 19:19:06 UTC

I have the same bug running a Ubuntu-bowdlerised/augemented/mangled 2.6.38 kernel build 2.6.38-7-generic, running autofs (5.0.5-0ubuntu4) with 60 second timeout on an nfs4-only network. The bug does not always hit, but once it does the system has to be rebooted to get back nfs4 services. These are the relevant lines from auto.master/auto.ufs:

auto.master: /net	/etc/auto.ufs --timeout=60

auto.ufs: mntstr="-fstype=nfs4,noatime,async,proto=tcp,retry=60,hard,intr $server:/"

The auto.ufs script mounts local shares as bind mounts to make it possible to use the same paths network-wide without incurring speed penalties. Hackish, but it works.

This is a typical export:

/export/src            192.168.1.0/24(rw,nohide,insecure,no_subtree_check,async)

Some exports use sync, most are async. All except for the root mount use nohide and no_subtree_check.

The kernel oopses are virtually identical to the ones mentioned in this bug report. Tests are performed on a Thinkpad T23 with 1.2GHz PIII-m.

Anything you want me to try? I run recent git pulls on several other machines, I could use that here as well.

Comment 22 Frank de Lange 2011-03-22 22:35:37 UTC

As to whether this is an autofs-related problem I'd wager to say it is most likely not. The fact that it thus far has surfaced in combination with autofs is most likely related to the increased umount-frequency in an autofs-managed system compared to a manual mount configuration. As an experiment I have changed the dismount timeout in auto.master to 3600 seconds. The bug has yet to rear its ugly head which is remarkable as it has been running without problems for more than 4 hours now.

Comment 23 Ian Kent 2011-03-23 04:34:08 UTC

(In reply to comment #22)
> As to whether this is an autofs-related problem I'd wager to say it is most
> likely not. The fact that it thus far has surfaced in combination with autofs
> is most likely related to the increased umount-frequency in an autofs-managed
> system compared to a manual mount configuration. As an experiment I have
> changed the dismount timeout in auto.master to 3600 seconds. The bug has yet
> to
> rear its ugly head which is remarkable as it has been running without
> problems
> for more than 4 hours now.

LOL, we can't say it isn't autofs either.

It's true that it is most likely happening during path walks
that occur concurrently with mount/umount activity. I've studied
that code a lot over the last several weeks and I can't see any
obvious problem. The reason for it getting so much attention
was that there is a dentry leak which causes a similar BUG()
but it is in an area of code that is never executed when using
maps like the ones here.

The fact is that there were two significant patch series merged
at the same time in 2.6.38, the vfs-scale and the vfs-automount
series. Both made considerable changes to the same area of the
VFS. Each was tested independently (certainly the autofs changes
were) but together exposed a few problems. But, once the initial
issues were dealt with, none were so easy to reproduce as has
been seen here.

There has continued to be quite a bit of churn in the VFS as a
result of these merges.

The question that really needs to be answered is what makes this
so easy to reproduce for the reporters here?

What's worse is that the crash probably occurs well after
the event that lead to the increased reference count which
causes the BUG() so we have no way of knowing where to look
or insert debug prints.

Anyway, it may be a good idea to try the current git since
there has been quite a bit of change to the path walking
code and changes to the mount locking. Be aware that there
was also one report of a lockup when umounting a tmpfs fs
which hasn't been resolved.

Ian

Comment 24 Mark Moseley 2011-04-18 04:12:13 UTC

I've been getting bit by the exact same bug and been bisecting for the past couple of weeks. It's slow going as it can sometimes take a day for the BUG() to show up (though can also at time take 10 minutes). And I've also seen it more than once where something was good after a day and then BUG()'d later on, just to make things more complicated. So the upshot is that while I feel confident enough about this latest batch of bisecting to post it here, I wouldn't bet my life on it. I hope this isn't a case where bisecting just shows where the bug gets exposed but not where it actually got planted :)

Incidentally, I tried the patch from the top of this thread and it didn't seem to make a difference. I still got bit.

I've posted on the linux-fsdevel thread that Jeff Layton started about it, http://www.spinics.net/lists/linux-nfs/msg20280.html if you need more details on my setup (though I'll be happy to provide anything else you need). Though in that thread you'll see that I'm not using autofs explicitly, the Netapp GX cluster NFS appears to use autofs to do the implicit submounts (I'm not 100% sure that's the correct terminology, so hopefully you know what I mean).

Here's my bisect log, ending up at commit e61da20a50d21725ff27571a6dff9468e4fb7146 

git bisect start 'fs'
# good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
# bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1
git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470
# good: [7c955fca3e1d8132982148267d9efcafae849bb6] Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
git bisect good 7c955fca3e1d8132982148267d9efcafae849bb6
# good: [c32b0d4b3f19c2f5d29568f8b7b72b61693f1277] fs/mpage.c: consolidate code
git bisect good c32b0d4b3f19c2f5d29568f8b7b72b61693f1277
# bad: [f8206b925fb0eba3a11839419be118b09105d7b1] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
git bisect bad f8206b925fb0eba3a11839419be118b09105d7b1
# good: [a8f2800b4f7b76cecb7209cb6a7d2b14904fc711] nfsd4: fix callback restarting
git bisect good a8f2800b4f7b76cecb7209cb6a7d2b14904fc711
# bad: [6651149371b842715906311b4631b8489cebf7e8] autofs4: Clean up autofs4_free_ino()
git bisect bad 6651149371b842715906311b4631b8489cebf7e8
# good: [0ad53eeefcbb2620b6a71ffdaad4add20b450b8b] afs: add afs_wq and use it instead of the system workqueue
git bisect good 0ad53eeefcbb2620b6a71ffdaad4add20b450b8b
# good: [01c64feac45cea1317263eabc4f7ee1b240f297f] CIFS: Use d_automount() rather than abusing follow_link()
git bisect good 01c64feac45cea1317263eabc4f7ee1b240f297f
# good: [b5b801779d59165c4ecf1009009109545bd1f642] autofs4: Add d_manage() dentry operation
git bisect good b5b801779d59165c4ecf1009009109545bd1f642
# bad: [e61da20a50d21725ff27571a6dff9468e4fb7146] autofs4: Clean up inode operations
git bisect bad e61da20a50d21725ff27571a6dff9468e4fb7146
# good: [8c13a676d5a56495c350f3141824a5ef6c6b4606] autofs4: Remove unused code
git bisect good 8c13a676d5a56495c350f3141824a5ef6c6b4606

Comment 25 Jeff Layton 2011-05-28 00:31:02 UTC

I ended up opening a bug against Fedora since I wasn't aware of this bug on kernel.org. I have a reproducer in that bug, but it's a bit convoluted and I haven't had a lot of luck making it simpler:

    https://bugzilla.redhat.com/show_bug.cgi?id=708039

I did a bisect with my reproducer and was able to bisect it down to:

commit 36d43a43761b004ad1879ac21471d8fc5f3157ec
Author: David Howells <dhowells@redhat.com>
Date:   Fri Jan 14 18:45:42 2011 +0000

    NFS: Use d_automount() rather than abusing follow_link()

    Make NFS use the new d_automount() dentry operation rather than abusing
    follow_link() on directories.

    Signed-off-by: David Howells <dhowells@redhat.com>
    Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Acked-by: Ian Kent <raven@themaw.net>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Comment 26 Wicher Minnaard 2011-06-12 14:37:47 UTC

If it's of any help, I'm affected by this bug, too (2.6.39.1, nfsv4, AUTOFS4, autofs (userspace) 5.0.4). After the bug is triggered I'm still able to mount nfsv4 shares through plain 'mount -t nfs -o nfsv4 ...'.

[63595.511635] ------------[ cut here ]------------
[63595.511654] kernel BUG at fs/dcache.c:925!
[63595.511667] invalid opcode: 0000 [#1] PREEMPT SMP 
[63595.511687] last sysfs file: /sys/devices/platform/coretemp.1/temp1_input
[63595.511704] CPU 0 
[63595.511711] Modules linked in: btusb scsi_wait_scan pata_marvell
[63595.511738] 
[63595.511744] Pid: 11449, comm: umount.nfs4 Not tainted 2.6.39.1 #3 MICRO-STAR INTERNATIONAL CO.,LTD MS-7345/MS-7345
[63595.511779] RIP: 0010:[<ffffffff810f1f83>]  [<ffffffff810f1f83>] shrink_dcache_for_umount_subtree+0x263/0x270
[63595.511809] RSP: 0018:ffff88006dac3df8  EFLAGS: 00010296
[63595.511828] RAX: 0000000000000067 RBX: ffff8800b60c39c0 RCX: 000000000003ffff
[63595.511848] RDX: ffffffff81ada388 RSI: 0000000000000046 RDI: ffffffff81c339f8
[63595.511867] RBP: ffff8800b60c39c0 R08: 0000000000000000 R09: 0000000000000000
[63595.511886] R10: 000000000000000a R11: 0000000000000001 R12: ffff8800b60c3a1c
[63595.511905] R13: dead000000200200 R14: dead000000100100 R15: ffff88006dac3f28
[63595.511926] FS:  00007fe9cd733700(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
[63595.511951] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63595.511966] CR2: 00007fe9cd0d1b30 CR3: 000000006da0a000 CR4: 00000000000406f0
[63595.511986] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[63595.512006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[63595.512026] Process umount.nfs4 (pid: 11449, threadinfo ffff88006dac2000, task ffff880071e72080)
[63595.512055] Stack:
[63595.512062]  ffff880096cb2658 ffff8800c98dd0b0 ffff880096cb2400 ffff8800b60c3900
[63595.512089]  ffff8800b60c395c ffff8800b60c3300 0000000000000000 ffffffff810f2f79
[63595.512117]  ffff880096cb2400 ffffffff81862de0 ffff88006dac3f28 ffffffff810ddc35
[63595.512147] Call Trace:
[63595.512155]  [<ffffffff810f2f79>] ? shrink_dcache_for_umount+0x59/0x70
[63595.512173]  [<ffffffff810ddc35>] ? generic_shutdown_super+0x25/0x100
[63595.512190]  [<ffffffff810ddd99>] ? kill_anon_super+0x9/0x50
[63595.512206]  [<ffffffff811de291>] ? nfs4_kill_super+0x31/0x90
[63595.512221]  [<ffffffff810de063>] ? deactivate_locked_super+0x43/0x70
[63595.512238]  [<ffffffff810f9bd7>] ? release_mounts+0x67/0x90
[63595.512253]  [<ffffffff810fa173>] ? sys_umount+0x1b3/0x3d0
[63595.512269]  [<ffffffff8183693b>] ? system_call_fastpath+0x16/0x1b
[63595.512285] Code: 8b 40 28 4c 8b 08 48 8b 43 30 48 85 c0 74 07 48 8b 90 a8 00 00 00 48 89 34 24 48 c7 c7 a8 d8 9d 81 48 89 de 31 c0 e8 49 10 74 00 <0f> 0b 0f 0b 66 0f 1f 84 00 00 00 00 00 41 55 4c 8d 6f 5c 41 54 
[63595.512438] RIP  [<ffffffff810f1f83>] ashrink_dcache_for_umount_subtree+0x263/0x270
[63595.512460]  RSP <ffff88006dac3df8>
[63595.517799] ---[ end trace 11c7e3bfba462d9a ]---

Comment 27 Ian Kent 2011-06-16 15:17:21 UTC

Created attachment 62262 [details]
Patch - VFS: Fix vfsmount overput on simultaneous automount

If anyone cced on this bug is still interested in a resolution
for this problem please try this patch.

It should apply to 2.6.38 and 2.6.39 but may report some line
number offset mismatches. If that is really a problem (in that
the patch actually doesn't apply or you are concerned it doesn't
apply correctly), let me know, along with specific kernel version,
and I'll produce one that does apply.

Comment 28 Wicher Minnaard 2011-06-18 12:48:09 UTC

I applied the patch to 2.6.39.1. Got a hard lockup (could not even Magic-SysRq). Don't know if it is, but the dying breath looks related to this bug. Here's a picture of it:

http://obfusc.gavagai.nl/nfscrash.jpg

Comment 29 Ian Kent 2011-06-18 16:23:26 UTC

(In reply to comment #28)
> I applied the patch to 2.6.39.1. Got a hard lockup (could not even
> Magic-SysRq). Don't know if it is, but the dying breath looks related to this
> bug. Here's a picture of it:
> 
> http://obfusc.gavagai.nl/nfscrash.jpg

Thanks for the testing.
It's hard to see how this happens or to know for sure if it is
related to the original problem. We'll need to wait and see as
others look into it.

Comment 30 Mehmet Giritli 2011-06-18 16:35:19 UTC

Ian, unfortunately I will be away from my computer for at least two months. Otherwise I would test your patch immediately. Sorry can't help right now.

Comment 31 Jeff Layton 2011-06-19 11:09:14 UTC

(In reply to comment #28)
> I applied the patch to 2.6.39.1. Got a hard lockup (could not even
> Magic-SysRq). Don't know if it is, but the dying breath looks related to this
> bug. Here's a picture of it:
> 
> http://obfusc.gavagai.nl/nfscrash.jpg

This looks like an entirely different issue, but I suppose it's possible this is the end result of another refcount imbalance. Can you reproduce this at will with this patch installed?

Comment 32 Ian Kent 2011-06-20 14:07:25 UTC

(In reply to comment #28)
> I applied the patch to 2.6.39.1. Got a hard lockup (could not even
> Magic-SysRq). Don't know if it is, but the dying breath looks related to this
> bug. Here's a picture of it:
> 
> http://obfusc.gavagai.nl/nfscrash.jpg

It's just been pointed out to me that the screen capture has
at least one other oops before the one on the screen. So that
capture isn't all that useful because it likley isn't the root
cause of the problem.

I suppose none of it finds its way to syslog, right?

Comment 33 Mehmet Giritli 2012-01-18 18:07:49 UTC

I don't know about the others, but I have been running 3.0.6 since a while now, and this bug doesn't occur anymore.

Comment 34 Wicher Minnaard 2012-01-19 06:43:44 UTC

I haven't had a crash like this for many months. I've been running ≥ 3.0.6 since October 19th and actually can't remember having a crash while running any 3.x-kernel (which I have since August 16th).

Note You need to log in before you can comment on or make changes to this bug.