Bug 9003
Summary: | oops in nfs4_cb_recall... | ||
---|---|---|---|
Product: | File System | Reporter: | Daniel J Blueman (daniel.blueman) |
Component: | NFS | Assignee: | bfields |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | akpm, bfields, daniel.blueman, michal.k.k.piotrowski, protasnb |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.23-rc4 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
workaround patch
probable fix additional fix |
Description
Daniel J Blueman
2007-09-10 07:35:42 UTC
Michal, this might be a post-2.6.23 regression. Could you test the latest -rc? There were a few NFS fixes in -rc5. Created attachment 12797 [details]
workaround patch
This isn't a fix, but if it eliminates the symptoms then it would help confirm that the problem is what I think it is.
Looks to me like a symptom of buggy locking and reference counting in the nfsv4 state code. For example, we're reference counting the structure representing the delegation, but no the structure that contains the rpc_client used to make the callbacks. This doesn't appear to be a regression. Created attachment 12809 [details]
probable fix
This is closer to a real fix. If you can confirm it solves the original problem, that'd be great.
This fix is for a couple races that have existed for a while, so (assuming it's the reported problem) the bug wasn't a regression.
Thanks for the patch, Bruce. I have been running 2.6.23-rc6 for 1.5 days without problem so far and start testing this latest patch in the next day or so, however it will take >~2 weeks to be confident it addresses the issue, as it's less so easy to hit. I found that I needed the 'clientaddr' mount option to get delegations working correctly, so perhaps this shows there is less exposure than expected/desired? Any news? I've queued up this patch for 2.6.24 (actually, same change broken up into 3 separate patches)--currently in git://linux-nfs.org/~bfields/linux.git nfs-server-stable. No news - I didn't reproduce the bug in 2.6.23-rc7 after ~2 weeks and am not with this system for a month for further testing. Created attachment 13765 [details]
additional fix
I believe that this fix (on top of 2.6.24-rc3, which also contains the previous patch) should fix another race that could cause the same oops.
Incidentally, I've been tracking upstream -rc kernel verions, and haven't found symptom of this bug at all with my usual multi-system workload, so it would seem less acute as before. OK, we can close this bug now. Please reopen if the problem appears again. |