Most recent kernel where this bug did *NOT* occur: 2.6.11.10 Distribution: RHEL4 update 3 (x86_64), Fedora 6 (both i386 and x86_64); IRIX 6.5.22 Hardware Environment: HP DL585 (4x Opteron 848), Athlon FX51, Pentium 4 3.0GHz; SGI Origin2000 (IP27, 8x R12000) Software Environment: nfs-utils 1.0.6 from RHEL4, also 1.0.10-5 from FC6 Problem Description: After upgrading the NFS server (HP DL585) from 2.6.11.10 to 2.6.20-rc7 we see problems with locking from an IRIX 6.5.22 client. After a fresh reboot of both server and client, the locking sometimes works, but then after some time the fcntl(,F_SETLK,) just locks up, the process gets stuck in this syscall, and adds +1 to the system load average on the client. It can be suspended using ^Z on the terminal, but cannot be killed even with SIGKILL. It mainly affects our users who read their mail using mutt on the NFS client, because mutt uses the same syscall for locking. 2.6.11.10 was without problems. Steps to reproduce: On a Linux box: mkdir /tmp/test ls -l / > /tmp/test/file echo '/tmp/test irix.box(ro,sync)' >>/etc/exports [ start nfs or run exportfs -r ] On an IRIX box: mount -t nfs -o vers=3,ro,soft,intr linuxbox:/tmp/test /mnt cat > lktest.c <<EOF #include <stdio.h> #include <unistd.h> #include <fcntl.h> int main () { struct flock fl; int rv; fl.l_type = F_RDLCK; fl.l_whence = SEEK_SET; fl.l_start = 0; fl.l_len = 0; fprintf(stderr, "Trying to lock stdin:\n"); rv = fcntl(0, F_SETLK, &fl); if (rv == 1) { perror("fcntl(0, F_SETLK, ...)"); return 1; } else { fprintf(stderr, "fcntl(0, F_SETLK, ...) == %d\n", rv); return 0; } } EOF cc -o lktest lktest.c ./lktest < /mnt/file - it prints "Trying to lock stdin:", but nothing else (i.e. it gets stuck inside the fcntl call). Additional info: - tried both NFSv2 and NFSv3, rw and ro, soft,intr and without soft,intr with the same results. - the behaviour does not depend on SMP or UP on the Linux server. - the behaviour is the same on 32-bit and 64-bit Linux server. - I will attach two packet dumps of the NFS traffic - one is between the FC6/x86_64 server and IRIX client, and the other is between FC6/i386 server and the same IRIX client. - the client gets stuck in fcntl() even for a freshly created file without any locks held.
Created attachment 10254 [details] x86_64 server, IRIX client (tcpdump -w)
Created attachment 10255 [details] i386 server, IRIX client (tcpdump -w)
Hmm, I have found that the problem is not IRIX-specific - the fcntl() locking fails also from the Linux client (I will attach tcpdump soon) to the same server. Apparently it is a problem in lockd on the NFS server side. When I tried to kill -9 lockd, I've got the following oops: general protection fault: 0000 [1] SMP CPU 0 Modules linked in: ohci_hcd usbcore i2c_amd756 i2c_core k8temp hwmon qla2xxx amd74xx ide_core Pid: 5144, comm: lockd Not tainted 2.6.20-rc7 #1 RIP: 0010:[<ffffffff80334a1e>] [<ffffffff80334a1e>] nlm_release_host+0x24/0x77 RSP: 0018:ffff8105f1211e50 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 2020202033363542 RCX: ffffffff805cda48 RDX: ffffffff805cda48 RSI: 0000000000000046 RDI: 2020202033363542 RBP: ffff8101547d7c00 R08: ffffffff805cda48 R09: 0000000000000100 R10: 0000000000000000 R11: ffffffff802664da R12: 00000000ffffffdb R13: 000000000000000a R14: ffff8105f1211eb0 R15: ffffffff804d45a0 FS: 00002b5cafdd4b00(0000) GS:ffffffff8060f000(0000) knlGS:00000000f7fa36c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000005b3bb4 CR3: 000000035a6e5000 CR4: 00000000000006e0 Process lockd (pid: 5144, threadinfo ffff8105f1210000, task ffff8105f8e4c870) Stack: ffff8101547d7c00 ffffffff803339e3 ffff810142abaac0 ffffffff80333d13 ffff81001ee5dec0 ffff81001ee5dec0 ffff8101547d7c38 ffff8101547d7c00 ffff81014be90480 7fffffffffffffff 00000000fffffff4 ffffffff80333d63 Call Trace: [<ffffffff803339e3>] nlm_release_call+0xd/0x24 [<ffffffff80333d13>] __nlm_async_call+0x7f/0x8f [<ffffffff80333d63>] nlm_async_call+0x40/0x45 [<ffffffff803365c9>] nlmsvc_retry_blocked+0x172/0x199 [<ffffffff80335417>] lockd+0xe3/0x227 [<ffffffff80335334>] lockd+0x0/0x227 [<ffffffff80335334>] lockd+0x0/0x227 [<ffffffff80254328>] child_rip+0xa/0x12 [<ffffffff80335334>] lockd+0x0/0x227 [<ffffffff8025431e>] child_rip+0x0/0x12 Code: 83 7b 74 00 79 04 0f 0b eb fe f0 ff 4b 74 0f 94 c0 84 c0 74 RIP [<ffffffff80334a1e>] nlm_release_host+0x24/0x77 RSP <ffff8105f1211e50>
Created attachment 10257 [details] Linux x86_64 server vs. Linux i386 client (tcpdump -w) It seems the NFS server does not reply to the NLM v4 LOCK call.
I have tried the stable version (2.6.19.2) on the NFS server, and ran while ./lktest < lktest.c; do date; sleep 1; done to see when it fails. The locking has failed after ~15 minutes of uptime. When trying to do "/etc/init.d/nfslock restart", I've got the following oops: Feb 2 18:17:53 anxur kernel: lockd: couldn't create RPC handle for nymfe27 Feb 2 18:17:53 anxur kernel: lockd: couldn't create RPC handle for nymfe27 Feb 2 18:17:53 anxur kernel: ----------- [cut here ] --------- [please bite her e ] --------- Feb 2 18:17:53 anxur kernel: Kernel BUG at fs/lockd/host.c:288 Feb 2 18:17:53 anxur kernel: invalid opcode: 0000 [1] SMP Feb 2 18:17:53 anxur kernel: CPU 2 Feb 2 18:17:53 anxur kernel: Modules linked in: ohci_hcd usbcore i2c_amd756 i2c _core k8temp hwmon qla2xxx amd74xx ide_core Feb 2 18:17:53 anxur kernel: Pid: 5233, comm: lockd Not tainted 2.6.19.2 #2 Feb 2 18:17:53 anxur kernel: RIP: 0010:[<ffffffff80337436>] [<ffffffff80337436 >] nlm_release_host+0x2f/0x94 Feb 2 18:17:53 anxur kernel: RSP: 0018:ffff8105ee4a9d80 EFLAGS: 00010286 Feb 2 18:17:53 anxur kernel: RAX: 00000000ffffffff RBX: ffff8101f8d449c0 RCX: f fff8101e99bda28 Feb 2 18:17:53 anxur kernel: RDX: ffff8101e4bd8334 RSI: ffffffff803385e5 RDI: f fff8101f8d449c0 Feb 2 18:17:53 anxur kernel: RBP: ffff8101bcd8f2c0 R08: ffff8105ee4a8000 R09: 0 5000000e5cb3273 Feb 2 18:17:53 anxur kernel: R10: 05000000e5cb3273 R11: ffff8104000efa40 R12: f fff8101e99bda10 Feb 2 18:17:53 anxur kernel: R13: ffffffff8033a178 R14: 0000000000000000 R15: 0 0000000fffffff4 Feb 2 18:17:53 anxur kernel: FS: 00002b75b4728b00(0000) GS:ffff8104000ef640(00 00) knlGS:00000000f66f9bb0 Feb 2 18:17:53 anxur kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 2 18:17:53 anxur kernel: CR2: 0000000000459520 CR3: 00000005c0eb5000 CR4: 0 0000000000006e0 Feb 2 18:17:53 anxur nfslock: lockd -KILL succeeded Feb 2 18:17:53 anxur kernel: Process lockd (pid: 5233, threadinfo ffff8105ee4a8 000, task ffff8105f1d44730) Feb 2 18:17:53 anxur kernel: Stack: ffff8101e4bd8000 ffffffff8033640a ffff8101 e99bda40 ffffffff80338660 Feb 2 18:17:53 anxur rpc.statd[4498]: Caught signal 15, un-registering and exit ing.
Hmm, in comment #5 it seems not all messages got through syslog. I am adding the dmesg output of the same incident as well: lockd: couldn't create RPC handle for nymfe27 lockd: couldn't create RPC handle for nymfe27 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at fs/lockd/host.c:288 invalid opcode: 0000 [1] SMP CPU 2 Modules linked in: ohci_hcd usbcore i2c_amd756 i2c_core k8temp hwmon qla2xxx amd74xx ide_core Pid: 5233, comm: lockd Not tainted 2.6.19.2 #2 RIP: 0010:[<ffffffff80337436>] [<ffffffff80337436>] nlm_release_host+0x2f/0x94 RSP: 0018:ffff8105ee4a9d80 EFLAGS: 00010286 RAX: 00000000ffffffff RBX: ffff8101f8d449c0 RCX: ffff8101e99bda28 RDX: ffff8101e4bd8334 RSI: ffffffff803385e5 RDI: ffff8101f8d449c0 RBP: ffff8101bcd8f2c0 R08: ffff8105ee4a8000 R09: 05000000e5cb3273 R10: 05000000e5cb3273 R11: ffff8104000efa40 R12: ffff8101e99bda10 R13: ffffffff8033a178 R14: 0000000000000000 R15: 00000000fffffff4 FS: 00002b75b4728b00(0000) GS:ffff8104000ef640(0000) knlGS:00000000f66f9bb0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000459520 CR3: 00000005c0eb5000 CR4: 00000000000006e0 Process lockd (pid: 5233, threadinfo ffff8105ee4a8000, task ffff8105f1d44730) Stack: ffff8101e4bd8000 ffffffff8033640a ffff8101e99bda40 ffffffff80338660 ffff8101bcd8f2c0 ffff8101bcd8f2c0 ffffffff803385e5 ffffffff8022f79d ffffffff805595a0 ffff8101bcd8f2c0 ffff8101e99bd980 ffffffff80338713 Call Trace: [<ffffffff8033640a>] nlm_release_call+0xd/0x2a [<ffffffff80338660>] nlmsvc_free_block+0x7b/0x90 [<ffffffff803385e5>] nlmsvc_free_block+0x0/0x90 [<ffffffff8022f79d>] kref_put+0x7a/0x85 [<ffffffff80338713>] nlmsvc_traverse_blocks+0x9e/0xa3 [<ffffffff8033a178>] nlmsvc_is_client+0x0/0x1e [<ffffffff80339e8b>] nlm_traverse_files+0x78/0x21d [<ffffffff80273cd2>] __cond_resched+0x1c/0x44 [<ffffffff80257daa>] cond_resched+0x2e/0x39 [<ffffffff804b54e3>] svc_recv+0x18f/0x3d8 [<ffffffff802734f5>] default_wake_function+0x0/0xe [<ffffffff80337e40>] lockd+0xcc/0x227 [<ffffffff80337d74>] lockd+0x0/0x227 [<ffffffff80337d74>] lockd+0x0/0x227 [<ffffffff802552b8>] child_rip+0xa/0x12 [<ffffffff80337d74>] lockd+0x0/0x227 [<ffffffff802552ae>] child_rip+0x0/0x12 Code: 0f 0b 68 a7 71 4f 80 c2 20 01 f0 ff 4b 74 0f 94 c0 84 c0 74 RIP [<ffffffff80337436>] nlm_release_host+0x2f/0x94 RSP <ffff8105ee4a9d80> svc: unknown version (3) svc: unknown version (3) svc: unknown version (3) general protection fault: 0000 [2] SMP CPU 2 Modules linked in: ohci_hcd usbcore i2c_amd756 i2c_core k8temp hwmon qla2xxx amd74xx ide_core Pid: 23768, comm: procmail Not tainted 2.6.19.2 #2 RIP: 0010:[<ffffffff80273528>] [<ffffffff80273528>] __wake_up_common+0x25/0x61 RSP: 0018:ffff810198a85de8 EFLAGS: 00010046 RAX: ffff8101e4bd8008 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8101e4bd8000 RBP: ffff810198a85e18 R08: 38313a3831203220 R09: ffff810400011440 R10: 0000003e4ee31e88 R11: ffff81040005c680 R12: ffff8101e4bd8000 R13: 0000000000000001 R14: 0000000000000246 R15: 0000000000000000 FS: 00002b7776acedc0(0000) GS:ffff8104000ef640(0000) knlGS:00000000f7fcb6c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003e4ee2e740 CR3: 0000000000201000 CR4: 00000000000006e0 Process procmail (pid: 23768, threadinfo ffff810198a84000, task ffff8101f071ca70) Stack: 0000000100000000 ffff8101e4bd8000 0000000000000000 0000000000000001 0000000000000246 0000000000000001 ffff810198a85e58 ffffffff802296ac 00000000ffffffff 0000000000000001 0000000000000000 ffff8101eef63278 Call Trace: [<ffffffff802296ac>] __wake_up+0x35/0x4b [<ffffffff8023dd2d>] pipe_release+0x66/0xa9 [<ffffffff802103bf>] __fput+0x9e/0x179 [<ffffffff80220258>] filp_close+0x59/0x61 [<ffffffff80232cd2>] put_files_struct+0x69/0xac [<ffffffff80212cef>] do_exit+0x233/0x841 [<ffffffff8024280a>] cpu_idle+0x0/0x6f [<ffffffff802544fe>] system_call+0x7e/0x83 Code: 4d 8b 30 49 39 c0 74 28 49 8d 78 e8 45 8b 68 e8 4c 89 f9 8b RIP [<ffffffff80273528>] __wake_up_common+0x25/0x61 RSP <ffff810198a85de8> <1>Fixing recursive fault but reboot is needed!
reassign to neilb: this is an NFS server problem, I think. A pretty serious one too - the kernel oopsed.
Created attachment 10267 [details] NLM: Fix double free in __nlm_async_call rpc_call_async() will always call rpc_release_calldata(), so it is an error for __nlm_async_call() to do so as well. Does this suffice to fix your bug?
whee, thanks. Jan, please test this as soon as possible, thanks.
Tried patch in the comment #8. The server now has an uptime of 30 minutes, with some locking tests running on its NFS clients. I tried "/etc/init.d/nfslock restart", and it did not crash, and locking still works. So it is definitely an improvement (thanks for the fast reply!). However, it is Saturday 02am in this part of the world, so the traffic on the NFS server is low (not to mention the main NFS load is from students' workstations, which are not much used during the weekend). I would probably wait till Monday evening for a final ACK.
The locking problem is probably gone, but there are still problems: I ran the following testing loop to see whether the locking fails: while ./lktest < lktest.c; do date; sleep 1; done on the Linux NFS client (FC6-x86_64, 2.6.20-rc7 kernel), and the loop sometimes exits with "lktest.c: no such file or directory". The total runtime of the loop varies somewhere between 5 minutes and 1 hour. When I restart the loop, it continues for another few minutes between exiting with the same error. Moreover, the NFS server has oopsed in __d_lookup (see bellow). I have already seen this oops (see http://lkml.org/lkml/2005/10/22/13 as an example - it was 2.6.14-rc4), which is the reason I still run 2.6.11.10. In fact, every time I try to upgrade from 2.6.11.10 to something newer, the oops in __d_lookup (called from the sys_lstat()) appears usually no later than 1 day since the reboot. We have already replaced the server mainboard and some other components, so I think HW failure is unlikely (but who knows). Should I open a separate bug for the __d_lookup problem? Unable to handle kernel paging request at 00000000fffffff4 RIP: [<ffffffff80208dea>] __d_lookup+0x72/0x112 PGD 5cea88067 PUD 0 Oops: 0000 [46] SMP CPU 2 Modules linked in: ohci_hcd usbcore i2c_amd756 i2c_core k8temp hwmon qla2xxx amd74xx ide_core Pid: 32250, comm: rsync Not tainted 2.6.20-rc7 #96 RIP: 0010:[<ffffffff80208dea>] [<ffffffff80208dea>] __d_lookup+0x72/0x112 RSP: 0018:ffff81012f09bbf8 EFLAGS: 00010202 RAX: 00000000fffffff4 RBX: ffff81067d184080 RCX: 0000000000000016 RDX: 0000000000117295 RSI: 018723826d117295 RDI: ffff8103b4ad7188 RBP: 00000000fffffff4 R08: ffff81012f09be48 R09: 0000000000000246 R10: 0000000000000246 R11: 0000000000000246 R12: ffff8103b4ad7188 R13: ffff81012f09bcc8 R14: 00000000bd3c3a0d R15: 0000000000000006 FS: 00002ad0394e23a0(0000) GS:ffff8104000ef640(0000) knlGS:00000000f6ecebb0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000fffffff4 CR3: 00000005b8630000 CR4: 00000000000006e0 Process rsync (pid: 32250, threadinfo ffff81012f09a000, task ffff810138a7d850) Stack: 0000000000000246 0000000000000000 ffff810578fc9015 0000000000000000 ffff81012f09be48 ffff8103b4ad55e8 0000000000000000 ffff81012f09be48 ffff81012f09bcc8 ffffffff8020c3ef ffff81012f09be48 ffff8104002ba580 Call Trace: [<ffffffff8020c3ef>] do_lookup+0x2a/0x1b8 [<ffffffff8020971e>] __link_path_walk+0x894/0xccb [<ffffffff80299598>] zone_statistics+0x41/0x63 [<ffffffff8020dc88>] link_path_walk+0x4c/0xc2 [<ffffffff8020c25e>] do_path_lookup+0x1b0/0x20c [<ffffffff802200ac>] __user_walk_fd+0x37/0x4c [<ffffffff80238ce2>] vfs_lstat_fd+0x18/0x47 [<ffffffff80226723>] sys_newlstat+0x19/0x31 [<ffffffff8025357e>] system_call+0x7e/0x83 Code: 48 8b 45 00 0f 18 08 48 8d 5d e8 44 39 73 30 75 72 4c 39 63 RIP [<ffffffff80208dea>] __d_lookup+0x72/0x112 RSP <ffff81012f09bbf8> CR2: 00000000fffffff4
Yes, please file the d_lookup() issue as a separate bug. That looks like dcache corruption rather than an NFS/NLM specific thing. When you file it, please remember to list the different types of filesystem that are in use on your machine in addition to which filesystem the "rsync" task was running on (if you know).
The __d_lookup() problem filled as bug #7929. I am not sure about the other problem ("no such file or directory" when opening a file). Maybe it should be filled as a separate bug as well? I ran "while cat /some/file >/dev/null; do date; sleep 1; done", and the loop also exited with "/some/file: no such file or directory" after several iterations (/some/file is NFS-mounted). The NFS client was 2.6.20-rc7 Fedora 6/x86_64 UP, the NFS server this time was different - 2.6.19-rc4 Fedora 5/x86_64 SMP.
About the "file not found" issue. Have you turned off subtree checking on the server using the "no_subtree_check" option? There was an issue (that should be fixed in newer kernels) whereby lookups on the server would fail if you had subtree checking enabled.
One of the servers (comment #11) has no_subtree_check option enabled, the other (comment #13) has not.
Created attachment 10275 [details] RPC: Fix double free in portmapper code This fixes a similar issue in the kernel's RPC portmapper code.
Trond, should I whizz that into 2.6.20? And 2.6.19?
Yes. I think these 2 are both nasty enough to warrant fast-tracking into mainline. AFAICS the NLM problem has existed since at least 2.6.17, whereas the portmapper problem started with 2.6.19.
In order to work around the Oops from comment #11 (subsequently reported as bug#7929), I have moved all my XFS filesystems to ext3, so I am now able to get uptimes longer than few hours with something newer than 2.6.11.10. However, in this longer timeframe, the problems from the original bug report of this bug (incl. comments #1 and #2) are back. After two days of uptime (2.6.21-rc4, so patches from the comment #8 and comment #16 are already in), my NFS clients started to hang on fcntl lock operations. I am even not able to kill -9 lockd, it just remains running. Restarting NFS services on the server does not help. In dmesg output there is nothing special, maybe the following lines: rpcsvc: received unknown control message:-2145049676/-1 rpcsvc: received unknown control message:-2145049676/-1 rpcsvc: received unknown control message:-2145049676/-1 rpcsvc: received unknown control message:-2145049676/-1 rpcsvc: received unknown control message:-2145049676/-1 rpcsvc: received unknown control message:-2145049676/-1 svc: unknown version (0 for prog 100003, nfsd) rpcsvc: received unknown control message:-2145049676/-1 svc: unknown version (0 for prog 100003, nfsd) rpcsvc: received unknown control message:-2145049676/-1 rpcsvc: received unknown control message:-2145049676/-1 svc: unknown version (0 for prog 100003, nfsd) svc: unknown version (0 for prog 100003, nfsd) rpcsvc: received unknown control message:-2145049676/-1 rpcsvc: received unknown control message:-2145049676/-1 rpcsvc: received unknown control message:-2145049676/-1 svc: unknown version (0 for prog 100003, nfsd) rpcsvc: received unknown control message:-2145049676/-1 rpcsvc: received unknown control message:-2145049676/-1 rpcsvc: received unknown control message:-2145049676/-1 svc: unknown version (0 for prog 100003, nfsd) (and the "svc: unknown version" has been there in 2.6.11.10 as well, IIRC). Since I don't have XFS anymore, I will probably try RHEL4 or 5 kernels.
Jan, Have you resolved the issues, any updates on the problem? Thanks.
I have downgraded my system back to the RHEL4 kernel, so I do not see this issue now. I may try the newer kernel probably in August, as I cannot reboot my server now.
Last week we have upgraded the system to RHEL5, and so far it seems that NFS problems are not present in the 2.6.18-8.1.8.el5 kernel.