Bug 12913
Summary: | BUG on fs/nfs/write.c:252 ! | ||
---|---|---|---|
Product: | File System | Reporter: | Rich Ercolani (rercola) |
Component: | NFS | Assignee: | Trond Myklebust (trondmy) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.29-0.rc8.git2.fc11 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
mm: close page_mkwrite races
NFS: Close page_mkwrite() races |
Description
Rich Ercolani
2009-03-21 13:44:04 UTC
Tried Nick Piggin's patch to fix this race condition (the "try 3" variant - I'd link the thread, but -fsdevel appears to be temporarily down). It took a lot longer, but it triggered. I'm going to try his "incremental patch", though I don't think it'll fix this (based on my understanding of its intent). Is there any reasonable way I can instrument this and help debug it? This bug should now be fixed in mainline by the two commits b827e496c893de0c0f142abfaeb8730a2fd6b37f: mm: close page_mkwrite races and 7fdf523067666b0eaff330f362401ee50ce187c4: NFS: Close page_mkwrite() races Cheers Trond Created attachment 21235 [details]
mm: close page_mkwrite races
Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty. This allows the filesystem to have
full control of page dirtying events coming from the VM.
Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.
The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its ->page_mkwrite. It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).
In this window, the VM could write out the page, clearing page-dirty. The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.
It is not always possible to perform the required metadata manipulation in
->set_page_dirty, because that function cannot block or fail. The
filesystem may need to allocate some data structure, for example.
And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.
This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window. This provides the filesystem with a strong
synchronisation against the VM here.
Created attachment 21236 [details]
NFS: Close page_mkwrite() races
Follow up to Nick Piggin's patches to ensure that nfs_vm_page_mkwrite
returns with the page lock held, and sets the VM_FAULT_LOCKED flag.
Marking this bug as CLOSED. Please reopen if the problem reoccurs. |