Bug 15622

Summary: task nfsd blocked for more than 120 seconds
Product: File System Reporter: lkolbe
Component: NFSAssignee: bfields
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, trondmy
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg with the traces from tonight
ps auxf with some nfsds in D state
vmstat 60 during backup

Description lkolbe 2010-03-24 14:34:15 UTC
Created attachment 25684 [details]
dmesg with the traces from tonight

On a busy server (backup disk-pool with 24 SATA disks on an Adaptec Series 5 controller, during a raid rebuild and about 6 simultaneous writing nfs-clients over a bonded 2x1gb network), we saw frequent nfsd hangs last night. dmesg, ps auxf and vmstat from around that timeframe is attached.

I don't know if this is a bug or just an overloaded server. Uptime says:
 14:32:07 up 19:27,  5 users,  load average: 35.63, 37.88, 41.31
which is about 20 lower than this morning.
Comment 1 lkolbe 2010-03-24 14:36:03 UTC
Created attachment 25685 [details]
ps auxf with some nfsds in D state
Comment 2 lkolbe 2010-03-24 14:38:15 UTC
I forgot to mention that we included the fix from #15578 in the kernel running this server.
Comment 3 lkolbe 2010-03-24 14:40:44 UTC
Created attachment 25686 [details]
vmstat 60 during backup
Comment 4 bfields 2010-03-29 23:38:15 UTC
I wonder what this is?:

nscd[3566]: segfault at 10 ip 00007f2eef903685 sp 00007fff3d067e40 error 6 in nscd[7f2eef8fd000+1c000]

and whether it could cause a problem for NFSv4 idmapping?

But I doubt that's the problem.

The backtraces all include ext4_file_write, and it might be worth asking ext4 people for suggestions.
Comment 5 lkolbe 2010-03-30 12:32:30 UTC
Nah, nscd is crashing *always* on *all* our servers, see also http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=574990

These call traces haven't appeared again so far.