Bug 15622 - task nfsd blocked for more than 120 seconds
Summary: task nfsd blocked for more than 120 seconds
Status: RESOLVED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: bfields
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-24 14:34 UTC by lkolbe
Modified: 2012-06-18 15:16 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.33
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg with the traces from tonight (90.53 KB, text/plain)
2010-03-24 14:34 UTC, lkolbe
Details
ps auxf with some nfsds in D state (56.97 KB, text/plain)
2010-03-24 14:36 UTC, lkolbe
Details
vmstat 60 during backup (6.31 KB, text/plain)
2010-03-24 14:40 UTC, lkolbe
Details

Description lkolbe 2010-03-24 14:34:15 UTC
Created attachment 25684 [details]
dmesg with the traces from tonight

On a busy server (backup disk-pool with 24 SATA disks on an Adaptec Series 5 controller, during a raid rebuild and about 6 simultaneous writing nfs-clients over a bonded 2x1gb network), we saw frequent nfsd hangs last night. dmesg, ps auxf and vmstat from around that timeframe is attached.

I don't know if this is a bug or just an overloaded server. Uptime says:
 14:32:07 up 19:27,  5 users,  load average: 35.63, 37.88, 41.31
which is about 20 lower than this morning.
Comment 1 lkolbe 2010-03-24 14:36:03 UTC
Created attachment 25685 [details]
ps auxf with some nfsds in D state
Comment 2 lkolbe 2010-03-24 14:38:15 UTC
I forgot to mention that we included the fix from #15578 in the kernel running this server.
Comment 3 lkolbe 2010-03-24 14:40:44 UTC
Created attachment 25686 [details]
vmstat 60 during backup
Comment 4 bfields 2010-03-29 23:38:15 UTC
I wonder what this is?:

nscd[3566]: segfault at 10 ip 00007f2eef903685 sp 00007fff3d067e40 error 6 in nscd[7f2eef8fd000+1c000]

and whether it could cause a problem for NFSv4 idmapping?

But I doubt that's the problem.

The backtraces all include ext4_file_write, and it might be worth asking ext4 people for suggestions.
Comment 5 lkolbe 2010-03-30 12:32:30 UTC
Nah, nscd is crashing *always* on *all* our servers, see also http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=574990

These call traces haven't appeared again so far.

Note You need to log in before you can comment on or make changes to this bug.