Bug 115521
Summary: | NFS Server waiting for data causing client hanging | ||
---|---|---|---|
Product: | File System | Reporter: | piknik (piknik) |
Component: | NFS | Assignee: | bfields |
Status: | RESOLVED UNREPRODUCIBLE | ||
Severity: | high | CC: | bekeanloinse, piknik, trondmy |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.4.5 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
piknik
2016-03-29 23:08:23 UTC
I gave up just the other day trying to fix anything, but today I've found that it works again after trying my setup without restarting any services. I haven't restarted the host at any time, but the only thing I can think of that may have fixed the issue is a nightly (03:00) pagecache, dentries and inodes clear with echo 3 > /proc/sys/vm/drop_caches. I noticed I had many binds to /srv/nfs4/proto when using the mount command. Maybe this was the problem. One of my scripts had repeatedly binded the same location. I'll see if fixing that fixes my problem. Not sure what was going on there, but I assume it's no longer a problem. Closing for now. (In reply to piknik from comment #0) > I have configured an NFS server on the host with the following export: > > "/srv/nfs4/proto" > 192.168.1.70(rw,insecure,no_subtree_check,all_squash,anonuid=1000, > anongid=1000) > > I intend to have this export available to any anonymous user. I have a qemu > VM with the exact same kernel version on a different IP mounting this export > to /srv/http. > > Unfortunately, this bug does not occur immediately. It takes time before the > NFS server becomes unresponsive to any and all clients, but when it does > hang, it requires a complete reboot of the host. I have confirmed that this > bug exists at least 4 times on my machine. > > I have enabled rpcdebug and have posted relevant logs below: > > During the 'hang' (repeated many times): > Mar 29 18:54:29 PIKNIK-SRV kernel: --> nfsd4_store_cache_entry slot > ffff8800ac6f3000 > Mar 29 18:54:29 PIKNIK-SRV kernel: renewing client (clientid > 56f9c078/00000001) > Mar 29 18:54:29 PIKNIK-SRV kernel: svc: socket ffff8800ac6f4000 > sendto([ffff8803d5d60000 92... ], 92) = 92 (addr 192.168.1.70, port=1013) > Mar 29 18:54:29 PIKNIK-SRV kernel: svc: server ffff88041cd14000 waiting for > data (to = 1080000) > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: socket ffff88006326d280 TCP data > ready (svsk ffff8800ac6f4000) > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: server ffff88041cd14000, pool 0, > transport ffff8800ac6f4000, inuse=4 > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: tcp_recv ffff8800ac6f4000 data 1 > conn 0 close 0 https://drift-hunters.com > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: socket ffff8800ac6f4000 > recvfrom(ffff8800ac6f42b8, 4) = 4 > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: TCP record, 212 bytes > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: socket ffff8800ac6f4000 > recvfrom(ffff8802f8cdc000, 4096) = 212 > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: TCP final record (212 bytes) > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: got len=212 > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: svc_authenticate (1) > Mar 29 18:54:44 PIKNIK-SRV kernel: RPC: Want update, refage=120, > age=120 > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: server ffff88034c1ce000, pool 0, > transport ffff8800ac6f4000, inuse=5 > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: calling dispatcher > Mar 29 18:54:44 PIKNIK-SRV kernel: nfsd_dispatch: vers 4 proc 1 > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: tcp_recv ffff8800ac6f4000 data 1 > conn 0 close 0 > Mar 29 18:54:44 PIKNIK-SRV kernel: nfsv4 compound op #1/3: 53 (OP_SEQUENCE) > Mar 29 18:54:44 PIKNIK-SRV kernel: __find_in_sessionid_hashtbl: > 1459208312:1:2:0 > Mar 29 18:54:44 PIKNIK-SRV kernel: nfsd4_sequence: slotid 0 > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: socket ffff8800ac6f4000 > recvfrom(ffff8800ac6f42b8, 4) = -11 > Mar 29 18:54:44 PIKNIK-SRV kernel: RPC: TCP recv_record got -11 > Mar 29 18:54:44 PIKNIK-SRV kernel: RPC: TCP recvfrom got EAGAIN > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: got len=0 > Mar 29 18:54:44 PIKNIK-SRV kernel: svc: server ffff88034c1ce000 waiting for > data (to = 1080000) > Mar 29 18:54:44 PIKNIK-SRV kernel: check_slot_seqid enter. seqid 69 > slot_seqid 68 > Mar 29 18:54:44 PIKNIK-SRV kernel: nfsv4 compound op ffff880345333080 opcnt > 3 #1: 53: status 0 > Mar 29 18:54:44 PIKNIK-SRV kernel: nfsv4 compound op #2/3: 22 (OP_PUTFH) > Mar 29 18:54:44 PIKNIK-SRV kernel: nfsd: fh_verify(28: 00070001 00762703 > 00000000 9b82b229 a1414d6c 2a240087) > Mar 29 18:54:44 PIKNIK-SRV kernel: RPC: Want update, refage=120, > age=100 > Mar 29 18:54:49 PIKNIK-SRV kernel: nfsv4 compound op ffff880345333080 opcnt > 3 #2: 22: status 10008 > Mar 29 18:54:49 PIKNIK-SRV kernel: nfsv4 compound returned 10008 > > When nfsd has been shut down (could be normal, but why have activity?): > Mar 29 18:59:59 PIKNIK-SRV kernel: nfsd: initializing export module (net: > ffff880353fe98c0). > Mar 29 18:59:59 PIKNIK-SRV kernel: RPC: registering /proc/net/rpc > Mar 29 18:59:59 PIKNIK-SRV kernel: nfsd: initializing export module (net: > ffff880066d318c0). > Mar 29 18:59:59 PIKNIK-SRV kernel: nfsd: shutting down export module (net: > ffff880066d318c0). > Mar 29 18:59:59 PIKNIK-SRV kernel: nfsd: export shutdown complete (net: > ffff880066d318c0). > Mar 29 18:59:59 PIKNIK-SRV kernel: RPC: unregistering /proc/net/rpc > Mar 29 18:59:59 PIKNIK-SRV kernel: RPC: registering /proc/net/rpc > Mar 29 18:59:59 PIKNIK-SRV kernel: nfsd: initializing export module (net: > ffff880066d34a40). > Mar 29 18:59:59 PIKNIK-SRV kernel: nfsd: shutting down export module (net: > ffff880066d34a40). > Mar 29 18:59:59 PIKNIK-SRV kernel: nfsd: export shutdown complete (net: > ffff880066d34a40). > Mar 29 18:59:59 PIKNIK-SRV kernel: RPC: unregistering /proc/net/rpc > Mar 29 19:00:12 PIKNIK-SRV kernel: nfsd: shutting down export module (net: > ffff880353feb180). > Mar 29 19:00:12 PIKNIK-SRV kernel: nfsd: export shutdown complete (net: > ffff880353feb180). > Mar 29 19:00:12 PIKNIK-SRV kernel: RPC: unregistering /proc/net/rpc > Mar 29 19:00:12 PIKNIK-SRV kernel: nfsd: shutting down export module (net: > ffff880353fe98c0). > Mar 29 19:00:12 PIKNIK-SRV kernel: nfsd: export shutdown complete (net: > ffff880353fe98c0). > Mar 29 19:00:12 PIKNIK-SRV kernel: RPC: unregistering /proc/net/rpc > > Let me know if there's more info I could provide. Double-check the NFS server configuration file (/etc/exports) to ensure there are no typos or misconfigurations. Your export configuration looks fine, but it's worth verifying. Review NFS Server Logs: Check the NFS server logs (/var/log/syslog, /var/log/messages, or wherever your system logs NFS-related messages) for any errors or warnings that might indicate the cause of the issue. |