Bug 16056

Summary: Crashes / traces after copying large files over NFS
Product: File System Reporter: Leszek Urbanski (tgr.kb)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: CLOSED CODE_FIX    
Severity: high CC: linuxram, rhuddusa
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32.12 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg immediately after failure
sysrq triggered trace and dmesg -s 900000
full sysrq triggered trace
full trace with kswapd in D state
sysrq-triggered trace with CONFIG_DEBUG_SPINLOCK (and _SLEEP)=y
trace from 2.6.34.1
sysrq-triggered full trace from 2.6.34.1
NFS: Don't let kswapd call nfs_wb_page()
NFS: kswapd must not block in nfs_release_page (attempt 2)
don't let kswapd call nfs_wb_page - 2.6.32.16

Description Leszek Urbanski 2010-05-26 20:26:25 UTC
Created attachment 26552 [details]
dmesg immediately after failure

Arch: amd64, NFS v3 client.

When copying a large file (larger than the amount of available physical memory) to an NFS-mounted filesystem, swapper and rpciod on the client complain about page allocation failures and then kswapd goes into a deadlock, resulting in a system-wide crash.

Filed as a new bug at Trond Myklebust's request from #15578.

Stack traces attached, including dmesg contents immediately after failure (the trace triggered from sysrq and read with dmesg -s 900000 seems not to cover everything).
Comment 1 Leszek Urbanski 2010-05-26 20:28:37 UTC
Created attachment 26553 [details]
sysrq triggered trace and dmesg -s 900000
Comment 2 Leszek Urbanski 2010-05-26 20:32:36 UTC
The above two traces show the kernel version as 2.6.32-5, but this is due to our internal packaging - it is in fact a 2.6.32.12 (also tried with .13 - no change, but this bug does not occur with < 2.6.30.
Comment 3 Trond Myklebust 2010-05-26 22:47:18 UTC
Hmm. From the above trace it looks as if kswapd is stuck in nfs_access_cache_shrinker().

Can you try applying the patch at
   http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=61d5eb2985b3b1d69fd53d7dc9789037c27f8d91

If that doesn't work, can you please check if syslog caught those parts of the sysrq trace that do not appear in the 'dmesg' trace?
Comment 4 Leszek Urbanski 2010-05-27 17:00:26 UTC
Created attachment 26562 [details]
full sysrq triggered trace

It still doesn't work with that patch. After the failure, dmesg looks similar to the one in attachment 26552 [details]. I've attached full output of the sysrq-triggered trace (with the patch applied).
Comment 5 Trond Myklebust 2010-05-27 17:50:24 UTC
Have you tried turning on spinlock debugging? The only thing I can see that could cause kswapd to hang inside that loop is if something else is leaking the inode->i_lock.
Comment 6 Leszek Urbanski 2010-05-27 22:20:47 UTC
Created attachment 26564 [details]
full trace with kswapd in D state

I'll test with spinlock debugging. Meanwhile, I have another full trace - this time a while after the failure, with kswapd in D state.
Comment 7 Leszek Urbanski 2010-05-29 13:13:14 UTC
Created attachment 26576 [details]
sysrq-triggered trace with CONFIG_DEBUG_SPINLOCK (and _SLEEP)=y

There were no spinlock-specific printk()s in dmesg.
Comment 8 Leszek Urbanski 2010-06-02 20:52:41 UTC
Is there anything else I can do to support debugging?
Comment 9 Leszek Urbanski 2010-07-13 12:51:06 UTC
Created attachment 27087 [details]
trace from 2.6.34.1

I reproduced this bug on 2.6.32.16 and 2.6.34.1. Trond, is there anything you can do with this?
Comment 10 Leszek Urbanski 2010-07-13 12:52:14 UTC
Created attachment 27088 [details]
sysrq-triggered full trace from 2.6.34.1
Comment 11 Trond Myklebust 2010-07-13 21:18:33 UTC
So this is with a udp mount? Do you also reproduce it when you use tcp?
Comment 12 Leszek Urbanski 2010-07-14 16:10:30 UTC
It's a UDP mount, but I tried to reproduce with TCP and got the same result.
Comment 13 Trond Myklebust 2010-07-14 21:21:23 UTC
Created attachment 27106 [details]
NFS: Don't let kswapd call nfs_wb_page()

OK, given the above traces, I'm starting to believe that we need a specific exclusion for kswapd in nfs_release_page(). Can you try the following patch?
Comment 14 Leszek Urbanski 2010-07-16 11:35:08 UTC
That seems to have fixed it. No crashes after a few hours of testing.
Comment 15 Leszek Urbanski 2010-07-23 22:29:24 UTC
That patch definitely fixes this problem. Thanks! We haven't had a single NFS related crash since we applied it. Could you please push it to 2.6.32.x and mainline? :)
Comment 16 Trond Myklebust 2010-07-23 23:02:59 UTC
Created attachment 27235 [details]
NFS: kswapd must not block in nfs_release_page (attempt 2)

I've tried to make the patch a little less intrusive, and only target kswapd. I found out how to reproduce the hang on my own system, and have confirmed that it fixes the issue there. Can you please try it on your setup?
Comment 17 Leszek Urbanski 2010-07-25 19:10:08 UTC
Created attachment 27246 [details]
don't let kswapd call nfs_wb_page - 2.6.32.16

That also works. BTW: with 2.6.32.16, only a subset of that diff is required.

If this is going to be the final patch, please let me know - I will update it in Debian bugtracker.
Comment 18 Trond Myklebust 2010-07-31 18:13:10 UTC
Commit number in Linus's tree is b608b283a962caaa280756bc8563016a71712acf (NFS: kswapd must not block in nfs_release_page)

Also Cced stable@kernel.org

I'm therefore assuming this bug can be closed.
Comment 19 Ram Pai 2010-08-06 17:45:35 UTC
Trond,
 
Can you share your test case to reproduce the problem. We run into the issue on our RH based system but we need 4-6 hours to reproduce it. A quicker turn around time will help us reproduce the issue quickly. 

Thanks in advance,
RP
Comment 20 Trond Myklebust 2010-08-06 18:02:44 UTC
I believe that I was doing

      iozone -t 32 -B -r 128k -s 512m -i0 -i1

to trigger heavy mmapped writes.