Bug 13621
Summary: | xfs hangs with assertion failed | ||
---|---|---|---|
Product: | File System | Reporter: | Johannes Engel (jcnengel) |
Component: | XFS | Assignee: | Christoph Hellwig (hch) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | charles, hch, rjw, sandeen |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.30 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 13070 | ||
Attachments: |
kernel log of the xfs failure
git bisect log kernel configuration kernel configuration kernel log of the xfs failure from 2.6.29-rc5 (after reverting the critical patch) Patch to fix spinlock |
Description
Johannes Engel
2009-06-25 10:07:11 UTC
Created attachment 22092 [details]
kernel log of the xfs failure
From Christoph on the mailing list:
> I have no idea how this could trigger correctly. If you look at
> the callers of xlog_state_want_sync all of them take the log
> less than five lines from the call and never have a branch before
> taking the log and calling the function.
Also note that the machine in question was only 1 CPU. Stack corruption perhaps? Is this a 4kstacks kernel?
"but extensive r/w (rsync) or attempting to unmount the device makes the kernel hang in the following sense:" Do you mean that it hangs on every unmount? If so maybe I could lazily ask if you'd be willing to bisect it, if it looks like a regression. :) This is not a 4kstacks kernel. I don't know when I will find the time to bisect, right now I am quite busy. But I will keep that in mind. (In reply to comment #3) > Do you mean that it hangs on every unmount? Yes, it does. Created attachment 22312 [details]
git bisect log
I ran a bisect restricting myself to the folder fs/xfs only. That spits out the culprit as commit d2859751cd0bf586941ffa7308635a293f943c17 (see attachment).
If that cannot be the core issue for reasons I do not see, someone might have to run a full bisect for which I do not have enough time at the moment.
Thanks for running the bisect, we'll take a look at that commit. Could you add your .config as well, just in case there's something unique there? I don't immediately see how that commit would affect this problem, but that's just from a quick look. Thanks, -Eric Created attachment 22316 [details]
kernel configuration
Of course. Thanks for looking into this. :)
Seems that kernel config was from 2.6.28, do you have the 2.6.30 version? Thanks, -eric Created attachment 22387 [details]
kernel configuration
Here is one I compiled 2.6.31-rc3 with. Is that ok?
That depends, does the resulting built kernel still hang this way? :) Still there with 2.6.31-rc5. d2859751cd0bf586941ffa7308635a293f943c17 actually got backed out not much later. Please try revision 3a011a171906a3a51a43bb860fb7c66a64cab140 which is the commit reverting it. This revert should be part of 2.6.30-rc1 and thus of 2.6.31-rc5, isn't it? In this case the reverts did obviously confused the git bisect results. So trying the first version with that patch reverted will get us another possible anchor to look for the offending commit. Created attachment 22632 [details]
kernel log of the xfs failure from 2.6.29-rc5 (after reverting the critical patch)
I tried commit 3a011a171906a3a51a43bb860fb7c66a64cab140 and still the umount does not work. Nonetheless, the error seems to be a bit different...
Are you running a uni-processor kernel maybe? From looking around at the implementations I have the fear that spin_is_locked doesn't work correctly on uni-processor kernels with CONFIG_PREEMPT, although I can't find any defintive documentation on it. Try replacing the ASSERT(spin_is_locked(&log->l_icloglock)); the tripped off for you with a assert_spin_locked(&log->l_icloglock); Alternatively try just reverting commit 39e2defe73106ca2e1c85e5286038a0a13f49513 which shouldn't cause a problem, but introduced this spin_is_locked assert, interestingly the only non-negated one in XFS, and one of very few all over the tree. Created attachment 22659 [details]
Patch to fix spinlock
Indeed, I am running an uniprocessor machine. Thanks for the hint, Christoph. The attached patch fixes the issue for me.
Handled-By : Christoph Hellwig <hch@lst.de> Patch : http://bugzilla.kernel.org/attachment.cgi?id=22659 Fixed by commit a8914f3a6d72c97328597a556a99daaf5cc288ae . |