hi.. we are exporting very large (16 TB) XFS filesystems via nfs.. starting with linux-2.6.36-rc4 we keep getting the following kernel BUG messages. currently 2.6.36-rc6 is running on the server. this BUG was not triggered with linux-2.6.35.. when the BUG was triggered once it returns (exactly) every 3 Minutes.. we will try 2.6.36-rc7 in the next minutes.. regards, marius tolzmann & peter marquardt ------------[ cut here ]------------ kernel BUG at fs/nfsd/nfsfh.h:199! invalid opcode: 0000 [#13] PREEMPT SMP last sysfs file: /sys/devices/pci0000:80/0000:80:0d.0/0000:8b:00.0/host5/scsi_host/host5/stats CPU 0 Modules linked in: 3w_9xxx Pid: 3841, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 RIP: 0010:[<ffffffff81226172>] [<ffffffff81226172>] nfsd3_proc_remove+0x122/0x130 RSP: 0018:ffff880434f25de0 EFLAGS: 00010246 RAX: 0000000046000000 RBX: ffff8814365d6c00 RCX: ffffc9000da920a8 RDX: ffffffff818aab00 RSI: ffffffff81220e50 RDI: ffff880432254d98 RBP: ffff880434f25e10 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000004000 R11: dead000000200200 R12: ffff8814365d6c08 R13: ffff881434760000 R14: ffff8814365da000 R15: ffff8803fe48d000 FS: 00007f7cb66586f0(0000) GS:ffff880002000000(0000) knlGS:00000000f76156c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000514c58 CR3: 0000000c3596d000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process nfsd (pid: 3841, threadinfo ffff880434f24000, task ffff880435a7b340) Stack: ffff881434760000 ffff8814365da000 ffff881434760000 ffffffff81c9d000 <0> ffff8803fe48d018 000000000000001c ffff880434f25e50 ffffffff81218b15 <0> ffffffff81c9d000 ffff881434760000 ffffffff81c9d000 ffff881435a69bc0 Call Trace: [<ffffffff81218b15>] nfsd_dispatch+0xb5/0x250 [<ffffffff81811d26>] svc_process+0x4b6/0x820 [<ffffffff81219180>] ? nfsd+0x0/0x160 [<ffffffff8121923d>] nfsd+0xbd/0x160 [<ffffffff810588f6>] kthread+0x96/0xa0 [<ffffffff81003154>] kernel_thread_helper+0x4/0x10 [<ffffffff81058860>] ? kthread+0x0/0xa0 [<ffffffff81003150>] ? kernel_thread_helper+0x0/0x10 Code: 00 00 48 89 f7 48 89 4d d8 e8 eb 55 ff ff 48 8b 4d d8 48 89 c6 44 89 e2 48 c7 c7 74 86 b0 81 31 c0 e8 1a ed 64 00 e9 0c ff ff ff <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec RIP [<ffffffff81226172>] nfsd3_proc_remove+0x122/0x130 RSP <ffff880434f25de0> ---[ end trace 066d5aa4346b0b50 ]--- # egrep "(: Linux version|Tainted)" /var/log/messages Sep 9 10:26:47 tiaotiao kernel: Linux version 2.6.35.x86_64.48 (root@venustraphobie.molgen.mpg.de) (gcc version 4.4.2 (GCC) ) #1 SMP PREEMPT Mon Aug 2 16:35:40 CEST 2010 Sep 15 08:47:15 tiaotiao kernel: Linux version 2.6.36-rc4.x86_64.50 (root@venustraphobie.molgen.mpg.de) (gcc version 4.4.2 (GCC) ) #1 SMP PREEMPT Mon Sep 13 12:41:33 CEST 2010 Sep 24 09:25:08 tiaotiao kernel: Linux version 2.6.36-rc4.x86_64.50 (root@venustraphobie.molgen.mpg.de) (gcc version 4.4.2 (GCC) ) #1 SMP PREEMPT Mon Sep 13 12:41:33 CEST 2010 Oct 12 14:41:57 tiaotiao kernel: Pid: 3857, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 14:44:57 tiaotiao kernel: Pid: 3861, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 14:47:57 tiaotiao kernel: Pid: 3868, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 14:50:57 tiaotiao kernel: Pid: 3846, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 14:53:57 tiaotiao kernel: Pid: 3849, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 14:56:58 tiaotiao kernel: Pid: 3852, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 14:59:58 tiaotiao kernel: Pid: 3847, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:02:58 tiaotiao kernel: Pid: 3876, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:05:58 tiaotiao kernel: Pid: 3867, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:08:58 tiaotiao kernel: Pid: 3856, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:11:59 tiaotiao kernel: Pid: 3871, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:14:59 tiaotiao kernel: Pid: 3872, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:17:59 tiaotiao kernel: Pid: 3848, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:20:59 tiaotiao kernel: Pid: 3870, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:23:59 tiaotiao kernel: Pid: 3851, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:27:00 tiaotiao kernel: Pid: 3863, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:30:00 tiaotiao kernel: Pid: 3866, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:33:00 tiaotiao kernel: Pid: 3864, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:36:00 tiaotiao kernel: Pid: 3877, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:39:00 tiaotiao kernel: Pid: 3853, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:42:01 tiaotiao kernel: Pid: 3873, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:45:01 tiaotiao kernel: Pid: 3858, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:48:01 tiaotiao kernel: Pid: 3854, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:51:01 tiaotiao kernel: Pid: 3850, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:54:01 tiaotiao kernel: Pid: 3862, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 15:57:01 tiaotiao kernel: Pid: 3874, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 16:00:02 tiaotiao kernel: Pid: 3875, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 16:03:02 tiaotiao kernel: Pid: 3855, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 16:06:02 tiaotiao kernel: Pid: 3859, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 16:09:02 tiaotiao kernel: Pid: 3865, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 12 16:12:02 tiaotiao kernel: Pid: 3869, comm: nfsd Tainted: G D 2.6.36-rc4.x86_64.50 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:08:35 tiaotiao kernel: Linux version 2.6.36-rc6.x86_64.51 (root@venustraphobie.molgen.mpg.de) (gcc version 4.4.2 (GCC) ) #1 SMP PREEMPT Fri Oct 1 09:55:18 CEST 2010 Oct 13 14:11:40 tiaotiao kernel: Pid: 3854, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:14:40 tiaotiao kernel: Pid: 3851, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:17:40 tiaotiao kernel: Pid: 3855, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:20:40 tiaotiao kernel: Pid: 3850, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:23:40 tiaotiao kernel: Pid: 3856, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:26:41 tiaotiao kernel: Pid: 3853, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:29:41 tiaotiao kernel: Pid: 3849, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:32:41 tiaotiao kernel: Pid: 3846, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:35:41 tiaotiao kernel: Pid: 3847, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:38:41 tiaotiao kernel: Pid: 3840, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:41:41 tiaotiao kernel: Pid: 3839, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:44:42 tiaotiao kernel: Pid: 3841, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8 Oct 13 14:47:42 tiaotiao kernel: Pid: 3838, comm: nfsd Tainted: G D 2.6.36-rc6.x86_64.51 #1 Tyan Transport VX50-B4985/CK8
Server issue. Reassigning to Bruce...
It looks like it is related to a different server trying to delete a nonexistant .nfs-file: at least thats what we found repeatedly on the network 14:47:42.295359 IP (tos 0x0, ttl 64, id 62547, offset 0, flags [DF], proto: TCP (6), length: 220) floccinaucinihi.1463474626 > tiaotiao.molgen.nfs: 168 remove fh[1000001:20000800:1000100:5b00f0cf:4e99ca52] ".nfs000000005b00f0d70000022a" 14:47:42.295383 IP (tos 0x0, ttl 64, id 22783, offset 0, flags [DF], proto: TCP (6), length: 52) tiaotiao.molgen.mpg.de.nfs > floccinaucinihilipilification.molgen.mpg.de.831: ., cksum 0x51e4 (incorrect (-> 0x19d7), 3624702097:3624702097(0) ack 1703194159 win 213 <nop,nop,timestamp 2066835 3809351168> 14:50:42.490048 IP (tos 0x0, ttl 64, id 62549, offset 0, flags [DF], proto: TCP (6), length: 220) floccinaucinihi.1463474626 > tiaotiao.molgen.nfs: 168 remove fh[1000001:20000800:1000100:5b00f0cf:4e99ca52] ".nfs000000005b00f0d70000022a" 14:50:42.490067 IP (tos 0x0, ttl 64, id 22785, offset 0, flags [DF], proto: TCP (6), length: 52) tiaotiao.molgen.mpg.de.nfs > floccinaucinihilipilification.molgen.mpg.de.831: ., cksum 0x51e4 (incorrect (-> 0x986c), 3624702097:3624702097(0) ack 1703194495 win 230 <nop,nop,timestamp 2247063 3809531392> 14:53:42.684327 IP (tos 0x0, ttl 64, id 62550, offset 0, flags [DF], proto: TCP (6), length: 220) floccinaucinihi.1463474626 > tiaotiao.molgen.nfs: 168 remove fh[1000001:20000800:1000100:5b00f0cf:4e99ca52] ".nfs000000005b00f0d70000022a" 14:53:42.684353 IP (tos 0x0, ttl 64, id 22786, offset 0, flags [DF], proto: TCP (6), length: 52) tiaotiao.molgen.mpg.de.nfs > floccinaucinihilipilification.molgen.mpg.de.831: ., cksum 0x51e4 (incorrect (-> 0x17b3), 3624702097:3624702097(0) ack 1703194663 win 238 <nop,nop,timestamp 2427291 3809711616> 14:59:43.074311 IP (tos 0x0, ttl 64, id 62553, offset 0, flags [DF], proto: TCP (6), length: 220) floccinaucinihi.1463474626 > tiaotiao.molgen.nfs: 168 remove fh[1000001:20000800:1000100:5b00f0cf:4e99ca52] ".nfs000000005b00f0d70000022a" 14:59:43.074328 IP (tos 0x0, ttl 64, id 22789, offset 0, flags [DF], proto: TCP (6), length: 52) tiaotiao.molgen.mpg.de.nfs > floccinaucinihilipilification.molgen.mpg.de.831: ., cksum 0x51e4 (incorrect (-> 0x158d), 3624702097:3624702097(0
Created attachment 33482 [details] (ponk) Packet-Of-NFS-Kernelbug tcpdump format We suspect this network packet to trigger the bug. decoded it tries to remove the file ".nfs000000005b00f0d70000022a"
This was probably introduced by 43a9aa64a2f4330a9cb59aaf5c5636566bce067c "NFSD: Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR". We either need to be more careful about when we call unlock (maybe move it inside nfsd_create() and nfsd_unlink()?), or maybe move the BUG_ON() inside the fhp->fh_locked check.
Created attachment 33492 [details] nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink This should be a fix, but I'd like to confirm that. It's more difficult than expected to reproduce the problem--I may need an artificial NFSv3 client. If someone else can confirm it that would be great.
OK, I found that client# touch FOO server# exportfs -ua client# rm FOO in rapid succession would reproduce the problem, and didn't see any bad effects from removing the BUG(). So I'll run a few more tests and then pass this along to Linus. Thanks for catching this!
Handled-By : Bruce Fields <bfields@fieldses.org> Patch : https://bugzilla.kernel.org/attachment.cgi?id=33492
just for the files: every call to this assertion killed a [nfsd]. so with 32 nfsd procs running our system stopped serving nfs after exactly 96 minutes 8) thanks for fixing it so fast.
Fixed by commit b1e86db1de2e8bc2be9fb94fae3451c2a776e8c1 .