Bug 48421
Summary: | BUG: __rpc_pipefs_event: failed to remove clnt dir | ||
---|---|---|---|
Product: | File System | Reporter: | Cyril B. (cbay) |
Component: | NFS | Assignee: | Trond Myklebust (trondmy) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alan, florian, skinsbursky |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.6.0, 3.5.x | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_info
[PATCH v2] SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_info SUNRPC: continue over the look of client instead of break on PiepFS event |
Description
Cyril B.
2012-10-05 11:49:35 UTC
Happens on 3.4.18 as well. I don't get the bug on 3.3.8, and still get it on 3.4.4. Anything I can do to help? Created attachment 85941 [details]
SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_info
Does the above patch help? It fixes a couple of pointer validity issues that may be pertinent to the above Oops. Created attachment 85951 [details]
[PATCH v2] SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_info
The previous patch had a theoretical race if 2 processes within the same
net namespace are trying to mount the rpc_pipefs filesystem, and the first
attempt fails.
Thanks, I can confirm that it fixes the problem (tested on 3.4.18). Thanks for testing! Was this the second patch that you tested? Yes, I didn't even try the first one. Perfect. Thanks again! Marking this bug as resolved. I will close it once the patch is upstream. Whoa, I've just hit the bug again. Either I tested something wrong yesterday, or the bug disappeared for no reason and reappeared. I'll have to recompile kernels and retry everything. I'm very sorry. Okay, so the bug is definitely not fixed, but it doesn't happen every single time I reboot, as I initially thought. When I tested the patched kernel yesterday for the first time, the bug didn't occur, which is why I thought it was fixed. I've just rebooted a patched kernel (3.4.18) several times in a row, and sometimes I get the bug, sometimes I don't. I couldn't find a pattern; all I'm doing is logging in over SSH and typing 'reboot'. More details: I'm on a Debian Lenny with the following versions: portmap 6.0-9 nfs-common 1:1.2.0-2 nfs-kernel-server 1:1.2.0-2 This machine is both an NFS server and client. It actually mounts 4 different NFSv4 filesystems, and I get: NFS: __rpc_pipefs_event: failed to remove clnt dir! actually 4 times before the stack trace appears. I think I have found a pattern: since I normally use kexec, 'reboot' will actually do a kexec-reboot. After a cold reboot (going through POST), a 'reboot' will make the kernel crash. After a kexec-reboot, a 'reboot' won't hit the bug. Manually umounting the NFS filesystems before rebooting will always avoid the bug. I have no idea why kexec doesn't trigger the bug. Maybe because of delays? A kexec-reboot is very fast (~ 20 seconds), whereas a cold reboot is longer (~ 2 minutes). At least I have a way to reliably test new patches (unless my pattern theory is pure luck, but I have tested it more than 4 times). Oops, I hadn't changed the status. I've finally managed to bisect it, here's the culprit: 7aab449e5a2ebfa9c5116e87e16536bc4195e4de SUNRPC: skip clients with program without PipeFS entries This is most probably a race... Could you provide more crash traces here? Traces look quite similar to me. Here are a few more, from my bisect session: Dec 3 11:10:54 spare1 kernel: [ 83.562889] Kernel BUG at ffffffff81119c1e [verbose debug info unavailable] Dec 3 11:10:54 spare1 kernel: [ 83.562961] invalid opcode: 0000 [#1] SMP Dec 3 11:10:54 spare1 kernel: [ 83.563119] CPU 5 Dec 3 11:10:54 spare1 kernel: [ 83.563169] Modules linked in: xt_NFLOG nfnetlink_log nfnetlink nf_conntrack_ftp nf_conntrack_ipv6 nf_defrag_ipv6 xt_recent n f_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_multiport sb_edac edac_core i2c_i801 coretemp hwmon ioatdma usbhid ehci_hcd usbcore usb_common netcons ole igb dca [last unloaded: scsi_wait_scan] Dec 3 11:10:54 spare1 kernel: [ 83.564621] Dec 3 11:10:54 spare1 kernel: [ 83.564686] Pid: 13083, comm: umount Not tainted 3.4.18-alwaysdata #1 Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F /X9SRi/X9SRi-3F Dec 3 11:10:54 spare1 kernel: [ 83.564917] RIP: 0010:[<ffffffff81119c1e>] [<ffffffff81119c1e>] shrink_dcache_for_umount_subtree+0x1de/0x1f0 Dec 3 11:10:54 spare1 kernel: [ 83.565056] RSP: 0018:ffff88102d6fbd78 EFLAGS: 00010296 Dec 3 11:10:54 spare1 kernel: [ 83.565124] RAX: 0000000000000073 RBX: ffff881078cffcc0 RCX: 0000000000000082 Dec 3 11:10:54 spare1 kernel: [ 83.565196] RDX: 000000000000006b RSI: 0000000000000046 RDI: 0000000000000246 Dec 3 11:10:54 spare1 kernel: [ 83.565267] RBP: ffff88102d6fbd98 R08: 0000000000000000 R09: 0000000000000000 Dec 3 11:10:54 spare1 kernel: [ 83.565338] R10: 0000000000000000 R11: 0000000000000000 R12: ffff881078f84c80 Dec 3 11:10:54 spare1 kernel: [ 83.565409] R13: ffffffff81701cc0 R14: ffff88102e7ca400 R15: ffff88102e7ca400 Dec 3 11:10:54 spare1 kernel: [ 83.565481] FS: 00007fcec3b19730(0000) GS:ffff88107fd40000(0000) knlGS:0000000000000000 Dec 3 11:10:54 spare1 kernel: [ 83.565567] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 3 11:10:54 spare1 kernel: [ 83.565637] CR2: 00007fcec31e6ad5 CR3: 0000001027ced000 CR4: 00000000000407e0 Dec 3 11:10:54 spare1 kernel: [ 83.565708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 3 11:10:54 spare1 kernel: [ 83.565779] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 3 11:10:54 spare1 kernel: [ 83.565851] Process umount (pid: 13083, threadinfo ffff88102d6fa000, task ffff88102c19e840) Dec 3 11:10:54 spare1 kernel: [ 83.565938] Stack: Dec 3 11:10:54 spare1 kernel: [ 83.566001] ffff88102e7ca6c0 ffffffff8111625a ffff88102e7ca400 ffff88102fede6f0 Dec 3 11:10:54 spare1 kernel: [ 83.566266] ffff88102d6fbdb8 ffffffff81119c62 ffff881078f84d20 ffff88102e7ca400 Dec 3 11:10:54 spare1 kernel: [ 83.566530] ffff88102d6fbde8 ffffffff81104577 ffff881078ccd200 0000000000000011 Dec 3 11:10:54 spare1 kernel: [ 83.566795] Call Trace: Dec 3 11:10:54 spare1 kernel: [ 83.566863] [<ffffffff8111625a>] ? try_to_ascend+0x3a/0x90 Dec 3 11:10:54 spare1 kernel: [ 83.566933] [<ffffffff81119c62>] shrink_dcache_for_umount+0x32/0x60 Dec 3 11:10:54 spare1 kernel: [ 83.567006] [<ffffffff81104577>] generic_shutdown_super+0x27/0xe0 Dec 3 11:10:54 spare1 kernel: [ 83.567076] [<ffffffff811046c1>] kill_anon_super+0x11/0x20 Dec 3 11:10:54 spare1 kernel: [ 83.567144] [<ffffffff811046f2>] kill_litter_super+0x22/0x30 Dec 3 11:10:54 spare1 kernel: [ 83.567216] [<ffffffff8159c09e>] rpc_kill_sb+0x6e/0xf0 Dec 3 11:10:54 spare1 kernel: [ 83.567284] [<ffffffff811049a5>] deactivate_locked_super+0x45/0x70 Dec 3 11:10:54 spare1 kernel: [ 83.567354] [<ffffffff81105721>] deactivate_super+0x61/0x70 Dec 3 11:10:54 spare1 kernel: [ 83.567425] [<ffffffff8111fe5f>] mntput_no_expire+0x16f/0x1b0 Dec 3 11:10:54 spare1 kernel: [ 83.567495] [<ffffffff81120c36>] sys_umount+0x76/0x380 Dec 3 11:10:54 spare1 kernel: [ 83.567566] [<ffffffff815eb939>] system_call_fastpath+0x16/0x1b Dec 3 11:10:54 spare1 kernel: [ 83.567635] Code: 8b 08 48 8b 43 30 48 85 c0 74 04 48 8b 50 40 48 8d 86 c0 02 00 00 48 c7 c7 a8 0b 82 81 48 89 de 48 89 04 24 31 c0 e8 0f 86 4c 00 <0f> 0b eb fe 0f 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 Dec 3 11:10:54 spare1 kernel: [ 83.570649] RIP [<ffffffff81119c1e>] shrink_dcache_for_umount_subtree+0x1de/0x1f0 Dec 3 11:10:54 spare1 kernel: [ 83.570780] RSP <ffff88102d6fbd78> Dec 3 11:10:54 spare1 kernel: [ 83.570865] ---[ end trace 5aed5d79590e2681 ]--- Dec 3 18:11:09 spare1 kernel: [ 1142.183765] Kernel BUG at ffffffff811192ee [verbose debug info unavailable] Dec 3 18:11:09 spare1 kernel: [ 1142.183836] invalid opcode: 0000 [#1] SMP Dec 3 18:11:09 spare1 kernel: [ 1142.183996] CPU 4 Dec 3 18:11:09 spare1 kernel: [ 1142.184045] Modules linked in: xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_multiport coretemp hwmon u sbhid sb_edac edac_core ioatdma i2c_i801 ehci_hcd usbcore usb_common netconsole igb dca [last unloaded: scsi_wait_scan] Dec 3 18:11:09 spare1 kernel: [ 1142.185251] Dec 3 18:11:09 spare1 kernel: [ 1142.185315] Pid: 15410, comm: umount Not tainted 3.4.0-rc4-alwaysdata-bug-48421-5+ #5 Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3 F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F Dec 3 18:11:09 spare1 kernel: [ 1142.185545] RIP: 0010:[<ffffffff811192ee>] [<ffffffff811192ee>] shrink_dcache_for_umount_subtree+0x1de/0x1f0 Dec 3 18:11:09 spare1 kernel: [ 1142.185684] RSP: 0018:ffff88102c5e1d88 EFLAGS: 00010292 Dec 3 18:11:09 spare1 kernel: [ 1142.185751] RAX: 0000000000000073 RBX: ffff88102c871b40 RCX: 0000000000000082 Dec 3 18:11:09 spare1 kernel: [ 1142.185822] RDX: 00000000000000b2 RSI: 0000000000000046 RDI: 0000000000000246 Dec 3 18:11:09 spare1 kernel: [ 1142.185893] RBP: ffff88102c5e1da8 R08: 0000000000000000 R09: 0000000000000000 Dec 3 18:11:09 spare1 kernel: [ 1142.185963] R10: 0000000000000000 R11: 0000000000000000 R12: ffff881078f6dc00 Dec 3 18:11:09 spare1 kernel: [ 1142.186034] R13: ffffffff81700000 R14: ffff88102ba3a400 R15: ffff88102e921ca0 Dec 3 18:11:09 spare1 kernel: [ 1142.186105] FS: 00007fb2bd9ac730(0000) GS:ffff88107fd00000(0000) knlGS:0000000000000000 Dec 3 18:11:09 spare1 kernel: [ 1142.186191] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 3 18:11:09 spare1 kernel: [ 1142.186260] CR2: 00007fb2bd079ad5 CR3: 000000102d45f000 CR4: 00000000000407e0 Dec 3 18:11:09 spare1 kernel: [ 1142.186330] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 3 18:11:09 spare1 kernel: [ 1142.186401] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 3 18:11:09 spare1 kernel: [ 1142.186472] Process umount (pid: 15410, threadinfo ffff88102c5e0000, task ffff88102fe78900) Dec 3 18:11:09 spare1 kernel: [ 1142.186558] Stack: Dec 3 18:11:09 spare1 kernel: [ 1142.186620] ffff88102ba3a6c0 ffffffff8111599a ffff88102ba3a400 ffff88102fd9b6c0 Dec 3 18:11:09 spare1 kernel: [ 1142.186887] ffff88102c5e1dc8 ffffffff81119332 ffff881078f6dca0 ffff88102ba3a400 Dec 3 18:11:09 spare1 kernel: [ 1142.187204] ffff88102c5e1df8 ffffffff81103d27 ffff881078f2db40 0000000000000011 Dec 3 18:11:09 spare1 kernel: [ 1142.187465] Call Trace: Dec 3 18:11:09 spare1 kernel: [ 1142.187531] [<ffffffff8111599a>] ? try_to_ascend+0x3a/0x90 Dec 3 18:11:09 spare1 kernel: [ 1142.187600] [<ffffffff81119332>] shrink_dcache_for_umount+0x32/0x60 Dec 3 18:11:09 spare1 kernel: [ 1142.187672] [<ffffffff81103d27>] generic_shutdown_super+0x27/0xe0 Dec 3 18:11:09 spare1 kernel: [ 1142.187741] [<ffffffff81103e71>] kill_anon_super+0x11/0x20 Dec 3 18:11:09 spare1 kernel: [ 1142.187810] [<ffffffff81103ea2>] kill_litter_super+0x22/0x30 Dec 3 18:11:09 spare1 kernel: [ 1142.187880] [<ffffffff81589b4c>] rpc_kill_sb+0x7c/0xa0 Dec 3 18:11:09 spare1 kernel: [ 1142.187948] [<ffffffff81104155>] deactivate_locked_super+0x45/0x70 Dec 3 18:11:09 spare1 kernel: [ 1142.188017] [<ffffffff81104ed1>] deactivate_super+0x61/0x70 Dec 3 18:11:09 spare1 kernel: [ 1142.188088] [<ffffffff8111f4af>] mntput_no_expire+0x16f/0x1b0 Dec 3 18:11:09 spare1 kernel: [ 1142.188157] [<ffffffff81120286>] sys_umount+0x76/0x380 Dec 3 18:11:09 spare1 kernel: [ 1142.188228] [<ffffffff815d8539>] system_call_fastpath+0x16/0x1b Dec 3 18:11:09 spare1 kernel: [ 1142.188297] Code: 8b 08 48 8b 43 30 48 85 c0 74 04 48 8b 50 40 48 8d 86 c0 02 00 00 48 c7 c7 80 d4 81 81 48 89 de 48 89 04 24 31 c0 e8 df 5a 4b 00 <0f> 0b eb fe 0f 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 Dec 3 18:11:09 spare1 kernel: [ 1142.191314] RIP [<ffffffff811192ee>] shrink_dcache_for_umount_subtree+0x1de/0x1f0 Dec 3 18:11:09 spare1 kernel: [ 1142.191445] RSP <ffff88102c5e1d88> Dec 3 18:17:06 spare1 kernel: [ 148.145579] Kernel BUG at ffffffff811192ee [verbose debug info unavailable] Dec 3 18:17:06 spare1 kernel: [ 148.145650] invalid opcode: 0000 [#1] SMP Dec 3 18:17:06 spare1 kernel: [ 148.145806] CPU 4 Dec 3 18:17:06 spare1 kernel: [ 148.145854] Modules linked in: xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_multiport coretemp sb_edac hwmon edac_core i2c_i801 usbhid ioatdma ehci_hcd usbcore usb_common netconsole igb dca [last unloaded: scsi_wait_scan] Dec 3 18:17:06 spare1 kernel: [ 148.147006] Dec 3 18:17:06 spare1 kernel: [ 148.147070] Pid: 13090, comm: umount Not tainted 3.4.0-rc4-alwaysdata-bug-48421-6+ #6 Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3 F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F Dec 3 18:17:06 spare1 kernel: [ 148.147302] RIP: 0010:[<ffffffff811192ee>] [<ffffffff811192ee>] shrink_dcache_for_umount_subtree+0x1de/0x1f0 Dec 3 18:17:06 spare1 kernel: [ 148.147443] RSP: 0018:ffff88102b6c7d88 EFLAGS: 00010292 Dec 3 18:17:06 spare1 kernel: [ 148.147512] RAX: 0000000000000073 RBX: ffff88102c519840 RCX: 0000000000000082 Dec 3 18:17:06 spare1 kernel: [ 148.147583] RDX: 000000000000009e RSI: 0000000000000046 RDI: 0000000000000246 Dec 3 18:17:06 spare1 kernel: [ 148.147654] RBP: ffff88102b6c7da8 R08: 0000000000000000 R09: 0000000000000000 Dec 3 18:17:06 spare1 kernel: [ 148.147728] R10: 0000000000000000 R11: 0000000000000000 R12: ffff881078c27e00 Dec 3 18:17:06 spare1 kernel: [ 148.147799] R13: ffffffff81700000 R14: ffff88102fd24c00 R15: ffff88102e77dce0 Dec 3 18:17:06 spare1 kernel: [ 148.147870] FS: 00007f7fefc24730(0000) GS:ffff88107fd00000(0000) knlGS:0000000000000000 Dec 3 18:17:06 spare1 kernel: [ 148.147957] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 3 18:17:06 spare1 kernel: [ 148.148026] CR2: 00007f7fef2f1ad5 CR3: 000000102d67f000 CR4: 00000000000407e0 Dec 3 18:17:06 spare1 kernel: [ 148.148097] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 3 18:17:06 spare1 kernel: [ 148.148168] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 3 18:17:06 spare1 kernel: [ 148.148239] Process umount (pid: 13090, threadinfo ffff88102b6c6000, task ffff88102cdb6300) Dec 3 18:17:06 spare1 kernel: [ 148.148326] Stack: Dec 3 18:17:06 spare1 kernel: [ 148.148388] ffff88102fd24ec0 ffffffff8111599a ffff88102fd24c00 ffff88102fd4e6c0 Dec 3 18:17:06 spare1 kernel: [ 148.148651] ffff88102b6c7dc8 ffffffff81119332 ffff881078c27ea0 ffff88102fd24c00 Dec 3 18:17:06 spare1 kernel: [ 148.148916] ffff88102b6c7df8 ffffffff81103d27 ffff881078c2b740 0000000000000011 Dec 3 18:17:06 spare1 kernel: [ 148.149189] Call Trace: Dec 3 18:17:06 spare1 kernel: [ 148.149258] [<ffffffff8111599a>] ? try_to_ascend+0x3a/0x90 Dec 3 18:17:06 spare1 kernel: [ 148.149327] [<ffffffff81119332>] shrink_dcache_for_umount+0x32/0x60 Dec 3 18:17:06 spare1 kernel: [ 148.149398] [<ffffffff81103d27>] generic_shutdown_super+0x27/0xe0 Dec 3 18:17:06 spare1 kernel: [ 148.149470] [<ffffffff81103e71>] kill_anon_super+0x11/0x20 Dec 3 18:17:06 spare1 kernel: [ 148.149539] [<ffffffff81103ea2>] kill_litter_super+0x22/0x30 Dec 3 18:17:06 spare1 kernel: [ 148.149610] [<ffffffff81589b8c>] rpc_kill_sb+0x7c/0xa0 Dec 3 18:17:06 spare1 kernel: [ 148.149678] [<ffffffff81104155>] deactivate_locked_super+0x45/0x70 Dec 3 18:17:06 spare1 kernel: [ 148.149748] [<ffffffff81104ed1>] deactivate_super+0x61/0x70 Dec 3 18:17:06 spare1 kernel: [ 148.149818] [<ffffffff8111f4af>] mntput_no_expire+0x16f/0x1b0 Dec 3 18:17:06 spare1 kernel: [ 148.149888] [<ffffffff81120286>] sys_umount+0x76/0x380 Dec 3 18:17:06 spare1 kernel: [ 148.149959] [<ffffffff815d8579>] system_call_fastpath+0x16/0x1b Dec 3 18:17:06 spare1 kernel: [ 148.150027] Code: 8b 08 48 8b 43 30 48 85 c0 74 04 48 8b 50 40 48 8d 86 c0 02 00 00 48 c7 c7 a8 d4 81 81 48 89 de 48 89 04 24 31 c0 e8 1f 5b 4b 00 <0f> 0b eb fe 0f 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 Dec 3 18:17:06 spare1 kernel: [ 148.153054] RIP [<ffffffff811192ee>] shrink_dcache_for_umount_subtree+0x1de/0x1f0 Dec 3 18:17:06 spare1 kernel: [ 148.153186] RSP <ffff88102b6c7d88> Dec 3 18:21:08 spare1 kernel: [ 131.681864] Kernel BUG at ffffffff811192ee [verbose debug info unavailable] Dec 3 18:21:08 spare1 kernel: [ 131.681936] invalid opcode: 0000 [#1] SMP Dec 3 18:21:08 spare1 kernel: [ 131.682094] CPU 6 Dec 3 18:21:08 spare1 kernel: [ 131.682144] Modules linked in: ehci_hcd usbcore usb_common netconsole igb dca [last unloaded: scsi_wait_scan] Dec 3 18:21:08 spare1 kernel: [ 131.682655] Dec 3 18:21:08 spare1 kernel: [ 131.682718] Pid: 12982, comm: umount Not tainted 3.4.0-rc4-alwaysdata-bug-48421-6+ #6 Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3 F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F Dec 3 18:21:08 spare1 kernel: [ 131.682948] RIP: 0010:[<ffffffff811192ee>] [<ffffffff811192ee>] shrink_dcache_for_umount_subtree+0x1de/0x1f0 Dec 3 18:21:08 spare1 kernel: [ 131.683090] RSP: 0018:ffff88102b9e7d88 EFLAGS: 00010292 Dec 3 18:21:08 spare1 kernel: [ 131.683159] RAX: 0000000000000073 RBX: ffff88102ca7b2c0 RCX: 0000000000000082 Dec 3 18:21:08 spare1 kernel: [ 131.683230] RDX: 00000000000000e9 RSI: 0000000000000046 RDI: 0000000000000246 Dec 3 18:21:08 spare1 kernel: [ 131.683302] RBP: ffff88102b9e7da8 R08: 0000000000000000 R09: 0000000000000000 Dec 3 18:21:08 spare1 kernel: [ 131.683372] R10: 0000000000000000 R11: 0000000000000000 R12: ffff881078f8d600 Dec 3 18:21:08 spare1 kernel: [ 131.683444] R13: ffffffff81700000 R14: ffff88102f20e800 R15: ffff881030bcfbe0 Dec 3 18:21:08 spare1 kernel: [ 131.683515] FS: 00007ff6970bd730(0000) GS:ffff88107fd80000(0000) knlGS:0000000000000000 Dec 3 18:21:08 spare1 kernel: [ 131.683602] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 3 18:21:08 spare1 kernel: [ 131.683673] CR2: 00007ff69678aad5 CR3: 000000102c38e000 CR4: 00000000000407e0 Dec 3 18:21:08 spare1 kernel: [ 131.683744] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 3 18:21:08 spare1 kernel: [ 131.683815] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 3 18:21:08 spare1 kernel: [ 131.683886] Process umount (pid: 12982, threadinfo ffff88102b9e6000, task ffff88102bf84140) Dec 3 18:21:08 spare1 kernel: [ 131.683973] Stack: Dec 3 18:21:08 spare1 kernel: [ 131.684035] ffff88102f20eac0 ffffffff8111599a ffff88102f20e800 ffff88102fee26c0 Dec 3 18:21:08 spare1 kernel: [ 131.684297] ffff88102b9e7dc8 ffffffff81119332 ffff881078f8d6a0 ffff88102f20e800 Dec 3 18:21:08 spare1 kernel: [ 131.684560] ffff88102b9e7df8 ffffffff81103d27 ffff881078c56200 0000000000000011 Dec 3 18:21:08 spare1 kernel: [ 131.684823] Call Trace: Dec 3 18:21:08 spare1 kernel: [ 131.684889] [<ffffffff8111599a>] ? try_to_ascend+0x3a/0x90 Dec 3 18:21:08 spare1 kernel: [ 131.684957] [<ffffffff81119332>] shrink_dcache_for_umount+0x32/0x60 Dec 3 18:21:08 spare1 kernel: [ 131.685030] [<ffffffff81103d27>] generic_shutdown_super+0x27/0xe0 Dec 3 18:21:08 spare1 kernel: [ 131.685100] [<ffffffff81103e71>] kill_anon_super+0x11/0x20 Dec 3 18:21:08 spare1 kernel: [ 131.685171] [<ffffffff81103ea2>] kill_litter_super+0x22/0x30 Dec 3 18:21:08 spare1 kernel: [ 131.685243] [<ffffffff81589b8c>] rpc_kill_sb+0x7c/0xa0 Dec 3 18:21:08 spare1 kernel: [ 131.685311] [<ffffffff81104155>] deactivate_locked_super+0x45/0x70 Dec 3 18:21:08 spare1 kernel: [ 131.685381] [<ffffffff81104ed1>] deactivate_super+0x61/0x70 Dec 3 18:21:08 spare1 kernel: [ 131.685451] [<ffffffff8111f4af>] mntput_no_expire+0x16f/0x1b0 Dec 3 18:21:08 spare1 kernel: [ 131.685520] [<ffffffff81120286>] sys_umount+0x76/0x380 Dec 3 18:21:08 spare1 kernel: [ 131.685591] [<ffffffff815d8579>] system_call_fastpath+0x16/0x1b Dec 3 18:21:08 spare1 kernel: [ 131.685660] Code: 8b 08 48 8b 43 30 48 85 c0 74 04 48 8b 50 40 48 8d 86 c0 02 00 00 48 c7 c7 a8 d4 81 81 48 89 de 48 89 04 24 31 c0 e8 1f 5b 4b 00 <0f> 0b eb fe 0f 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 Dec 3 18:21:08 spare1 kernel: [ 131.688672] RIP [<ffffffff811192ee>] shrink_dcache_for_umount_subtree+0x1de/0x1f0 Dec 3 18:21:08 spare1 kernel: [ 131.688802] RSP <ffff88102b9e7d88> Yes, this is understandable. Could you build a kernel if I'll provide you a debug patch? Look like we have some tiny races here, which I can't find in the code for now. If yes, please, specify your kernel sources version: is it still 3.4.4? Yes you can send me patches. I was working on a 3.4.18, but you can send me patches for other versions if you prefer. I also tested reverting the commit on a recent 3.6, it worked fine. Created attachment 88971 [details]
SUNRPC: continue over the look of client instead of break on PiepFS event
Please, try this patch (not debug one).
This patch is against current 3.7 kernel. Yes, it fixes the bug. I first made sure that the vanilla 3.7 still exhibited the bug, and after patching it, I cannot hit it again, despite multiple reboots and coldreboots. OK. Stanislav, can you please resend that patch with a fixed up comment (so that we know why it is correct)? Also, should we Cc: stable? If so, which kernel versions are affected? Patch titled "SUNRPC: continue run over clients list on PipeFS event instead of break" in nfs-lkml. A patch referencing this bug report has been merged in Linux v3.8-rc1: commit 642fe4d00db56d65060ce2fd4c105884414acb16 Author: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Thu Nov 8 10:01:26 2012 -0500 SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_info |