Bug 9400

Summary: nfsd gets stuck when underlying filesystem is XFS
Product: File System Reporter: Christian Kujau (kernel)
Component: XFSAssignee: XFS Guru (xfs-masters)
Status: CLOSED CODE_FIX    
Severity: normal CC: bfields, cw, mingo, rjwysocki
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9243    

Description Christian Kujau 2007-11-18 06:21:15 UTC
Most recent kernel where this bug did not occur: 2.6.23.1
Distribution: Debian/unstable, i386
Hardware Environment: AMD 2600+, 2GB RAM, see http://nerdbynature.de/bits/2.6.24-rc2/nfsd/dmesg.2.gz
Software Environment: nfs-kernel-server 1.1.1-8
Problem Description:

Exported NFS shares can be mounted (client: 2.6-git/powerpc32,                         
nfs-common-1.1.1~git-20070709-3ubuntu1), but running "ls /mountpoint" (even without "-l") on the client is enough to get the [nfsd] processes in "D" state.
Doing this again will put more [nfsd] processes in  "D" state and the loadavg is incremented +1.                                                                                                   Restarting the rpc.nfsd process on the does not help much, the new rpc.nfsd processes get stuck quickly.

This has been reported by Chris Wedgwood too:
http://lkml.org/lkml/2007/11/14/50

More details: http://www.nerdbynature.de/bits/2.6.24-rc2/nfsd/

Steps to reproduce:
http://www.nerdbynature.de/bits/2.6.24-rc2/nfsd/debug.txt.gz

Workarounds:
  * mount -o vers=2            (tested, works)
  * mount -o vers=3,nordirplus
  * set CONFIG_NFSD to "depends on !XFS && ..."
Comment 1 Rafael J. Wysocki 2007-11-18 06:49:33 UTC
*** Bug 9377 has been marked as a duplicate of this bug. ***
Comment 2 Rafael J. Wysocki 2007-11-18 06:50:02 UTC
*** Bug 9369 has been marked as a duplicate of this bug. ***
Comment 3 bfields 2007-11-18 11:37:59 UTC
This is clearly a regression.  (But I don't seem to have the rights to change it.)
Comment 4 Rafael J. Wysocki 2007-11-28 16:00:37 UTC
Patch is available: http://lkml.org/lkml/2007/11/25/39
Comment 5 Ingo Molnar 2007-12-04 03:42:23 UTC
fix is still not in 2.6.24-rc4, due to undergoing more QA:

 http://lkml.org/lkml/2007/11/30/24
Comment 6 Ingo Molnar 2007-12-10 12:44:37 UTC
resolved by:

 commit e89bc612d61edbcefaeb6f2244f86c0f3ec89d23
 Author: Christoph Hellwig <hch@infradead.org>
 Date:   Fri Dec 7 14:07:53 2007 +1100

     [XFS] revert to double-buffering readdir