Created attachment 25684 [details] dmesg with the traces from tonight On a busy server (backup disk-pool with 24 SATA disks on an Adaptec Series 5 controller, during a raid rebuild and about 6 simultaneous writing nfs-clients over a bonded 2x1gb network), we saw frequent nfsd hangs last night. dmesg, ps auxf and vmstat from around that timeframe is attached. I don't know if this is a bug or just an overloaded server. Uptime says: 14:32:07 up 19:27, 5 users, load average: 35.63, 37.88, 41.31 which is about 20 lower than this morning.
Created attachment 25685 [details] ps auxf with some nfsds in D state
I forgot to mention that we included the fix from #15578 in the kernel running this server.
Created attachment 25686 [details] vmstat 60 during backup
I wonder what this is?: nscd[3566]: segfault at 10 ip 00007f2eef903685 sp 00007fff3d067e40 error 6 in nscd[7f2eef8fd000+1c000] and whether it could cause a problem for NFSv4 idmapping? But I doubt that's the problem. The backtraces all include ext4_file_write, and it might be worth asking ext4 people for suggestions.
Nah, nscd is crashing *always* on *all* our servers, see also http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=574990 These call traces haven't appeared again so far.