Created attachment 26552 [details] dmesg immediately after failure Arch: amd64, NFS v3 client. When copying a large file (larger than the amount of available physical memory) to an NFS-mounted filesystem, swapper and rpciod on the client complain about page allocation failures and then kswapd goes into a deadlock, resulting in a system-wide crash. Filed as a new bug at Trond Myklebust's request from #15578. Stack traces attached, including dmesg contents immediately after failure (the trace triggered from sysrq and read with dmesg -s 900000 seems not to cover everything).
Created attachment 26553 [details] sysrq triggered trace and dmesg -s 900000
The above two traces show the kernel version as 2.6.32-5, but this is due to our internal packaging - it is in fact a 2.6.32.12 (also tried with .13 - no change, but this bug does not occur with < 2.6.30.
Hmm. From the above trace it looks as if kswapd is stuck in nfs_access_cache_shrinker(). Can you try applying the patch at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=61d5eb2985b3b1d69fd53d7dc9789037c27f8d91 If that doesn't work, can you please check if syslog caught those parts of the sysrq trace that do not appear in the 'dmesg' trace?
Created attachment 26562 [details] full sysrq triggered trace It still doesn't work with that patch. After the failure, dmesg looks similar to the one in attachment 26552 [details]. I've attached full output of the sysrq-triggered trace (with the patch applied).
Have you tried turning on spinlock debugging? The only thing I can see that could cause kswapd to hang inside that loop is if something else is leaking the inode->i_lock.
Created attachment 26564 [details] full trace with kswapd in D state I'll test with spinlock debugging. Meanwhile, I have another full trace - this time a while after the failure, with kswapd in D state.
Created attachment 26576 [details] sysrq-triggered trace with CONFIG_DEBUG_SPINLOCK (and _SLEEP)=y There were no spinlock-specific printk()s in dmesg.
Is there anything else I can do to support debugging?
Created attachment 27087 [details] trace from 2.6.34.1 I reproduced this bug on 2.6.32.16 and 2.6.34.1. Trond, is there anything you can do with this?
Created attachment 27088 [details] sysrq-triggered full trace from 2.6.34.1
So this is with a udp mount? Do you also reproduce it when you use tcp?
It's a UDP mount, but I tried to reproduce with TCP and got the same result.
Created attachment 27106 [details] NFS: Don't let kswapd call nfs_wb_page() OK, given the above traces, I'm starting to believe that we need a specific exclusion for kswapd in nfs_release_page(). Can you try the following patch?
That seems to have fixed it. No crashes after a few hours of testing.
That patch definitely fixes this problem. Thanks! We haven't had a single NFS related crash since we applied it. Could you please push it to 2.6.32.x and mainline? :)
Created attachment 27235 [details] NFS: kswapd must not block in nfs_release_page (attempt 2) I've tried to make the patch a little less intrusive, and only target kswapd. I found out how to reproduce the hang on my own system, and have confirmed that it fixes the issue there. Can you please try it on your setup?
Created attachment 27246 [details] don't let kswapd call nfs_wb_page - 2.6.32.16 That also works. BTW: with 2.6.32.16, only a subset of that diff is required. If this is going to be the final patch, please let me know - I will update it in Debian bugtracker.
Commit number in Linus's tree is b608b283a962caaa280756bc8563016a71712acf (NFS: kswapd must not block in nfs_release_page) Also Cced stable@kernel.org I'm therefore assuming this bug can be closed.
Trond, Can you share your test case to reproduce the problem. We run into the issue on our RH based system but we need 4-6 hours to reproduce it. A quicker turn around time will help us reproduce the issue quickly. Thanks in advance, RP
I believe that I was doing iozone -t 32 -B -r 128k -s 512m -i0 -i1 to trigger heavy mmapped writes.