Bug 8467

Summary: nfs locking problem
Product: File System Reporter: Ryan Richter (ryan)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: REJECTED INSUFFICIENT_DATA    
Severity: normal CC: clameter, matt, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.20.11 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Fix for a timeout issue on lockd RPC calls
sysrq-t trace from 2.6.23.16

Description Ryan Richter 2007-05-11 07:46:05 UTC
Most recent kernel where this bug did *NOT* occur: unknown
Distribution: Debian Etch
Hardware Environment: x86_64 (Intel Core2 Xeon)
Software Environment: vanilla 2.6.20.11 with nfs utils from Debian Etch
Problem Description:

This machine is an NFS server with several nfsroot clients (all also running
2.6.20.11).  Mount and exports options for the root:
ro,vers=3,rsize=8192,wsize=8192,hard,intr,nolock,proto=udp,timeo=7,retrans=3,sec=sys,addr=xarello
sync,no_subtree_check,ro,no_root_squash
and other fses:
rw,nosuid,nodev,vers=3,rsize=8192,wsize=8192,hard,intr,proto=udp,timeo=7,retrans=3,sec=sys,addr=xarello
sync,no_subtree_check,rw,no_root_squash

Processes attempting to lock files, e.g. mutt, KDE logins (none were attempting
to lock files on the root fs) were hanging in D state.  This happened on all
clients.  I tried restarting mountd, statd, portmap, etc.  This had no effect,
but resulted in some strange syslog messages:

May 11 09:31:04 xarello mountd[3127]: Caught signal 15, un-registering and exiting.
May 11 09:31:04 xarello kernel: nfsd: last server has exited
May 11 09:31:04 xarello kernel: nfsd: unexporting all filesystems
May 11 09:31:04 xarello kernel: lockd: couldn't create RPC handle for molinara
May 11 09:31:04 xarello kernel: lockd: couldn't create RPC handle for molinara
May 11 09:31:04 xarello rpc.statd[3191]: Caught signal 15, un-registering and
exiting.
May 11 09:31:05 xarello rpc.statd[10613]: Version 1.0.10 Starting
May 11 09:31:06 xarello nfsd[10635]: nfssvc: writting fds to kernel failed:
errno 0 (Success)
May 11 09:31:06 graciano rpc.statd[3396]: recv_rply: [127.0.0.1] service 100021
not registered
May 11 09:31:06 molinara rpc.statd[3363]: SM_NOTIFY from
xarello.bcm.umontreal.ca while not monitoring any hosts.
May 11 09:31:41 xarello nfsd[10635]: nfssvc: writting fds to kernel failed:
errno 0 (Success)
May 11 09:31:57 saperavi kernel: lockd: unexpected unlock status: 7

The server had to be rebooted to get the clients unhung.

Steps to reproduce: unknown, similar things have happened in the past but
nothing in particular seems to trigger it.
Comment 1 Trond Myklebust 2007-05-14 10:28:13 UTC
Created attachment 11500 [details]
Fix for a timeout issue on lockd RPC calls
Comment 2 Trond Myklebust 2007-05-14 10:28:39 UTC
Does the above patch make a difference for you?
Comment 3 Ryan Richter 2007-05-14 10:34:45 UTC
I'll apply it next reboot.  Is the patch valid vs. 2.6.21.x also?
Comment 4 Trond Myklebust 2007-05-14 10:37:56 UTC
Yep. It should apply to all existing 2.6.21 kernels...
Comment 5 Ryan Richter 2007-05-15 07:35:15 UTC
I applied this patch to 2.6.20.11 last night.  I'm going on vacation in a few
days and decided to stay with the older kernel for now (in case any problems
come up while I'm gone).  As soon as people started logging in to the client
workstations this morning the problem (or a very similar one) happened again. 
This time there were a large number of kernel messages like:


lockd: server jacquere not responding, timed out
lockd: couldn't create RPC handle for jacquere
lockd: server jacquere not responding, timed out
lockd: couldn't create RPC handle for jacquere
lockd: server jacquere not responding, timed out
lockd: couldn't create RPC handle for jacquere

Was this patch not meant to work with 2.6.20.x?
Comment 6 Trond Myklebust 2007-05-15 07:43:38 UTC
It was meant to work with 2.6.20, but you appear to have an additional problem
with your setup.

Could you check the output from 'rpcinfo -t jacquere nlockmgr' and
'rpcinfo -t jacquere status'?
Comment 7 Ryan Richter 2007-05-15 07:48:58 UTC
I should note that, although all of the log messages I saw were from the client
jacquere, locking failed (i.e. kde logins froze) on all of the clients.

$ rpcinfo -t jacquere nlockmgr
program 100021 version 1 ready and waiting
rpcinfo: RPC: Program/version mismatch; low version = 1, high version = 4
program 100021 version 2 is not available
program 100021 version 3 ready and waiting
program 100021 version 4 ready and waiting
$ rpcinfo -t jacquere status
program 100024 version 1 ready and waiting
Comment 8 Trond Myklebust 2007-05-15 09:58:02 UTC
What kernel are the clients running?
Comment 9 Ryan Richter 2007-05-15 10:01:14 UTC
They're all running vanilla 2.6.20.11.
Comment 10 Trond Myklebust 2007-05-15 11:08:09 UTC
I'm assuming that you restarted the portmapper, nfsd, lockd and statd in the
correct order on the server? Does

rpcinfo -t localhost nlockmgr

and

rpcinfo -t localhost status

when you are on the server work too?
Comment 11 Ryan Richter 2007-05-15 11:14:17 UTC
The server was rebooted (nothing else will fix it), so I didn't restart the
nfs/rpc stuff by hand.  Also I switched it back to vanilla 2.6.20.11 since the
patch seemed to make things worse (people have to work on the workstations).  So
everything's working fine for now.  On the server (the last rpcinfo commands
were also run on the server):

$ rpcinfo -t localhost nlockmgr
program 100021 version 1 ready and waiting
rpcinfo: RPC: Program/version mismatch; low version = 1, high version = 4
program 100021 version 2 is not available
program 100021 version 3 ready and waiting
program 100021 version 4 ready and waiting
$ rpcinfo -t localhost status
program 100024 version 1 ready and waiting
Comment 12 Natalie Protasevich 2007-07-07 00:36:02 UTC
Can you please provide alt-sysrq-t trace at the time of the hang.
Thanks.
Comment 13 Ryan Richter 2008-05-07 08:30:40 UTC
Created attachment 16060 [details]
sysrq-t trace from 2.6.23.16

Happened again this morning, I'm now running 2.6.23.16.  I had to cut and paste the sysrq-t trace off the serial console in chunks, so the line breaks are a little funny but all the data should be there.
Comment 14 Natalie Protasevich 2008-06-03 22:11:19 UTC
Trond, does this look like nfs or rpc problem to you?

pdflush is trying to write out dirty pages and is in congestion_wait:
pdflush       D 0000000000000002     0  7172      2
 ffff8101974b5db0 0000000000000046 ffff810006df2668 ffff810007253dd0
 ffffffff80609480 ffff81025a4e6140 ffff81025fe507a0 ffff81025a4e6358
 0000000200000003 ffff8101974b5dc0 ffff8101974b5e20 000000011be047e8
Call Trace:
 [<ffffffff80231824>] __mod_timer+0xb6/0xc4
 [<ffffffff804b0a5e>] schedule_timeout+0x8a/0xad
 [<ffffffff802315ec>] process_timeout+0x0/0x5
 [<ffffffff804b043c>] io_schedule_timeout+0x28/0x33
 [<ffffffff80254634>] congestion_wait+0x66/0x80
 [<ffffffff8023a9cc>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80282737>] writeback_inodes+0xa3/0xd3
 [<ffffffff80250135>] wb_kupdate+0xba/0x111
 [<ffffffff8025042a>] pdflush+0x0/0x1ce
 [<ffffffff8025054e>] pdflush+0x124/0x1ce
 [<ffffffff8025007b>] wb_kupdate+0x0/0x111
 [<ffffffff8023a8bc>] kthread+0x47/0x73
 [<ffffffff8020c358>] child_rip+0xa/0x12
 [<ffffffff8023a875>] kthread+0x0/0x73
 [<ffffffff8020c34e>] child_rip+0x0/0x12

There is also ext3 journaling thread who's stuck trying to commit transactions:

kjournald     D 0000000000000000     0  1165      2
 ffff81025ff27d30 0000000000000046 0000000000000000 ffffffff80231824
 0000000000000287 ffff81025fe927a0 ffffffff805a43c0 ffff81025fe929b8
 000000005f74d358 ffffffff805a43c0 00000000000000ff ffffffff802f7034
Call Trace:

 [<ffffffff80231824>] __mod_timer+0xb6/0xc4
 [<ffffffff802f7034>] elv_next_request+0x139/0x149
 [<ffffffff802864d5>] sync_buffer+0x0/0x40
 [<ffffffff804b06a0>] io_schedule+0x28/0x33
 [<ffffffff80286510>] sync_buffer+0x3b/0x40
 [<ffffffff804b0bc9>] __wait_on_bit+0x40/0x6e
 [<ffffffff802864d5>] sync_buffer+0x0/0x40
 [<ffffffff804b0c63>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8023a9fa>] wake_bit_function+0x0/0x23
 [<ffffffff802b40c2>] journal_commit_transaction+0x550/0xff6
 [<ffffffff802316d9>] lock_timer_base+0x26/0x4b
 [<ffffffff802b79b4>] kjournald+0xc3/0x1e6
 [<ffffffff8023a9cc>] autoremove_wake_function+0x0/0x2e
 [<ffffffff802b78f1>] kjournald+0x0/0x1e6
 [<ffffffff8023a8bc>] kthread+0x47/0x73
 [<ffffffff8020c358>] child_rip+0xa/0x12
 [<ffffffff8023a875>] kthread+0x0/0x73
 [<ffffffff8020c34e>] child_rip+0x0/0x12

You should check if system is out of memory, with /proc/meminfo and top or vmstat output.

Some tuning seems to be needed, such as pdflush tunables in /proc/sys/vm, and for ext3 journaling. Chris, correct me if i'm wrong..
Comment 15 Natalie Protasevich 2008-06-03 22:14:56 UTC
...Ryan, you should also try latest kernel, 2.6.23 is really old.
Comment 16 Ryan Richter 2008-06-12 10:17:24 UTC
(In reply to comment #14)
> You should check if system is out of memory, with /proc/meminfo and top or
> vmstat output.

Just happened again, same old kernel I'm afraid.  I'll update the kernel soon.  Here's /proc/meminfo, which looks pretty typical for this machine:

MemTotal:      8163412 kB
MemFree:         51144 kB
Buffers:       1102236 kB
Cached:        5371748 kB
SwapCached:          0 kB
Active:        1039308 kB
Inactive:      5562868 kB
SwapTotal:     3960012 kB
SwapFree:      3959972 kB
Dirty:             232 kB
Writeback:           0 kB
AnonPages:      128196 kB
Mapped:          54420 kB
Slab:          1478300 kB
SReclaimable:  1378988 kB
SUnreclaim:      99312 kB
PageTables:       5884 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   8041716 kB
Committed_AS:   227260 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      2932 kB
VmallocChunk: 34359735411 kB
Comment 17 Christoph Lameter 2008-06-12 10:27:17 UTC
The backtraces shows a number of processes stuck in __mod_timer()

__mod_timer calls into lock_timer_base which looks suspiciously complex.

Could you enable lockdep and run the tests again?