Created attachment 25684 [details]
dmesg with the traces from tonight
On a busy server (backup disk-pool with 24 SATA disks on an Adaptec Series 5 controller, during a raid rebuild and about 6 simultaneous writing nfs-clients over a bonded 2x1gb network), we saw frequent nfsd hangs last night. dmesg, ps auxf and vmstat from around that timeframe is attached.
I don't know if this is a bug or just an overloaded server. Uptime says:
14:32:07 up 19:27, 5 users, load average: 35.63, 37.88, 41.31
which is about 20 lower than this morning.
Created attachment 25685 [details]
ps auxf with some nfsds in D state
I forgot to mention that we included the fix from #15578 in the kernel running this server.
Created attachment 25686 [details]
vmstat 60 during backup
I wonder what this is?:
nscd: segfault at 10 ip 00007f2eef903685 sp 00007fff3d067e40 error 6 in nscd[7f2eef8fd000+1c000]
and whether it could cause a problem for NFSv4 idmapping?
But I doubt that's the problem.
The backtraces all include ext4_file_write, and it might be worth asking ext4 people for suggestions.
Nah, nscd is crashing *always* on *all* our servers, see also http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=574990
These call traces haven't appeared again so far.