Bug 14541

Summary: getcwd() incorrectly returning ENOENT...
Product: File System Reporter: Daniel J Blueman (daniel.blueman)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: CLOSED CODE_FIX    
Severity: high CC: akpm
Priority: P1    
Hardware: All   
OS: Linux   
URL: http://marc.info/?l=linux-nfs&m=125707965119418&w=2
Kernel Version: 2.6.30, 2.6.31, 2.6.32-rc5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT

Description Daniel J Blueman 2009-11-04 09:28:18 UTC
Since 2.6.30-rc, I've been experiencing various issues relating to getcwd() returning ENOENT on NFS4 clients. This looks to be racy and is readily reproducible [1].

I bisected the issue [2] and confirm that manually backing out this patch against 2.6.32-rc5 restores correct race-free behaviour. The issue did not occur with 2.6.29, so is a regression.

To observe the change to user-level behaviour (after the reproducer commands):
# make clean
# strace -ffe getcwd make -n >list
[pid  3829] getcwd(0x7fffa269a380, 4096) = -1 ENOENT (No such file or directory)
make: getcwd: No such file or directory

-> this impacts userspace and causes varying levels of failure, which often are benign, but can break applications, eg it's impossible to build an Ubuntu/Debian kernel over NFS4 since 2.6.30-rc.

--- [1]

booting eg:
http://mira.sunsite.utk.edu/ubuntu-releases/karmic/ubuntu-9.10-desktop-amd64.iso

$ sudo bash
# apt-get install build-essential
# apt-get build-dep apt
# mount server:/ /mnt -tnfs4 && cd /mnt
# apt-get source apt
# cd apt-0.7.23.1ubuntu2
# ./configure && make
 -> "getcwd: No such file or directory" messages observed with cited
patch and not without

--- [2]

a65318bf3afc93ce49227e849d213799b072c5fd is first bad commit
commit a65318bf3afc93ce49227e849d213799b072c5fd
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Wed Mar 11 14:10:28 2009 -0400
NFSv4: Simplify some cache consistency post-op GETATTRs

To observe the change to user-level behaviour (after the reproducer commands):
# make clean
# strace -ffe getcwd make -n >list
[pid  3829] getcwd(0x7fffa269a380, 4096) = -1 ENOENT (No such file or directory)
make: getcwd: No such file or directory
Comment 1 Trond Myklebust 2009-11-05 17:32:27 UTC
Created attachment 23665 [details]
NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT

Changeset a65318bf3afc93ce49227e849d213799b072c5fd (NFSv4: Simplify some
cache consistency post-op GETATTRs) incorrectly changed the getattr
bitmap for readdir().
This causes the readdir() function to fail to return a
fileid/inode number, which again exposed a bug in the NFS readdir code that
causes spurious ENOENT errors to appear in applications (see
http://bugzilla.kernel.org/show_bug.cgi?id=14541).

The immediate band aid is to revert the incorrect bitmap change, but more
long term, we should change the NFS readdir code to cope with the
fact that NFSv4 servers are not required to support fileids/inode numbers.
Comment 2 Trond Myklebust 2009-11-05 17:38:09 UTC
Sorry for having taken so long to get back to this bug. A combination of travel
and family circumstances kept me busy.
Also thanks for having taken all the trouble to bisect the problem and setting
up the bug report...

Does the above patch suffice fix things?
Comment 3 Daniel J Blueman 2009-11-06 00:35:20 UTC
Bingo! This patch fixes the behaviour and passed some heavy testing with two good test-cases.

Good work, Trond. Worthwhile for the stable stream also.
Comment 4 Andrew Morton 2009-11-10 21:56:20 UTC
I'm not seeing a Cc:stable@kernel.org in that changelog, but it's needed there, yes?
Comment 5 Trond Myklebust 2009-11-10 22:00:49 UTC
I wasn't aware that it is acceptable practice to Cc: stable@kernel.org in
bugzilla reports, but if it is, then yes we should add them...
Comment 6 Trond Myklebust 2009-11-10 22:02:03 UTC
Oh, sorry. You said 'changelog'... My reading skills will improve once I get a 
morning coffee in me...
Comment 7 Trond Myklebust 2010-02-03 21:07:03 UTC
Marking bug as CLOSED. Please reopen if the problem reoccurs.