Bug 12579
Summary: | ext4 filesystem hang | ||
---|---|---|---|
Product: | File System | Reporter: | Eric Sandeen (sandeen) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | dev, ole |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.29-rc4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Tested patch
updated patch w/ my signoffs |
Description
Eric Sandeen
2009-01-29 21:03:40 UTC
Finally had some time, and I've tracked this down to the page-writeback.c changes since .28 - backing them all out the testcase runs overnight. Still working out which one and why. I suppose it's still possible it's an ext4 bug but if so the page-writeback.c changes exposed it. It's a classic deadlock; I usually have 2 threads stuck, pdflush vs. the livecd creator doing an fsync. Each is waiting for a page the other has locked. -Eric Two commits which are already in the mainline which fixed some changes after .28 are 89e1219004b3657cc014521663eeef0744f1c99d dcf6a79dda5cc2a2bec183e50d829030c0972aaa Both changes have implication wrt ext4. I would suggest try with the above two commits and see if the problem still exist. -aneesh Aneesh, thanks. I tried both of those last night, and the problem persists. >> Finally had some time, and I've tracked this down to the >> page-writeback.c changes since .28 - backing them all out the >> testcase runs overnight. Still working out which one and why. I >> suppose it's still possible it's an ext4 bug but if so the >> page-writeback.c changes exposed it. >> >> It's a classic deadlock; I usually have 2 threads stuck, pdflush >> vs. the livecd creator doing an fsync. Each is waiting for a page >> the other has locked. > > Two commits which are already in the mainline which fixed some changes > after .28 are > > 89e1219004b3657cc014521663eeef0744f1c99d > dcf6a79dda5cc2a2bec183e50d829030c0972aaa > I read Eric's email as saying he tracked it down to the page-writeback.c changes *after* 2.6.28. Eric, is that what you meant? - Ted Yep. the 2 patches Aneesh refers to are just hours hold, so they were fixes after I originally saw the hang and worth testing, but they don't seem to help. I'm still trying some bisection of the changes (unfortunately many intermediate points seem nonfunctional) and may have finally found a spot; if this doesn't yield results though it's time for another crash-course in writeback. ;) -Eric dJust for the record; here are the paths we're stuck on: livecd: sys_fdatasync do_fsync vfs_fsync filemap_fdatawrite __filemap_fdatawrite_range do_writepage ext4_da_writepages write_cache_pages lock page (currently locked by pdflush) pdflush: pdflush background_writeout writeback_inodes generic_sync_sb_inodes __writeback_single_inode do_writepages ext4_da_writepages write_cache_pages lock page (currently loscked by livecd) the pdflush wbc looks like: [1]kdb> wbc 0xffff88007d4abe10 struct writeback_control at 0xffff88007d4abe10 bdi = (null) sync_mode 0 (NONE) nr_to_write = 992 pages_skipped = 0 range_start 0, range_end 0 nonblocking 1 congestion 0 kupdate 0 reclaim 0 writepages 1 range_cyclic 1 more_io 0 no_update 1 and the livecd-creator fsync wbc looks like: [1]kdb> wbc 0xffff880012821ea8 struct writeback_control at 0xffff880012821ea8 bdi = (null) sync_mode 1 (ALL) nr_to_write = 9223372036854775689 pages_skipped = 0 range_start 0, range_end 9223372036854775807 nonblocking 0 congestion 0 kupdate 0 reclaim 0 writepages 1 range_cyclic 0 more_io 0 no_update 1 How about the below patch ? From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Subject: [PATCH] ext4: Don't use the range_cylic mode implemented by write_cache_pages With delayed allocation we lock the page in write_cache_pages and try to build an in memory extent of contiguous blocks. This is needed so that we can get large contiguous blocks request. Now with range_cyclic mode in write_cache_pages if we have not done an I/O we loop back to 0 index and try to write the page. That would imply we will attempt to take page lock of lower index page holding the page lock of higher index page. This can cause a dead lock with other writeback thread. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- fs/ext4/inode.c | 20 ++++++++++++++++++-- 1 files changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 61e8fc0..f743524 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2437,6 +2437,7 @@ static int ext4_da_writepages(struct address_space *mapping, int no_nrwrite_index_update; int pages_written = 0; long pages_skipped; + int range_cyclic = 0, cycled = 1, io_done = 0; int needed_blocks, ret = 0, nr_to_writebump = 0; struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb); @@ -2488,9 +2489,14 @@ static int ext4_da_writepages(struct address_space *mapping, if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) range_whole = 1; - if (wbc->range_cyclic) + if (wbc->range_cyclic) { index = mapping->writeback_index; - else + wbc->range_start = index << PAGE_CACHE_SHIFT; + wbc->range_end = LLONG_MAX; + wbc->range_cyclic = 0; + range_cyclic = 1; + cycled = 0; + } else index = wbc->range_start >> PAGE_CACHE_SHIFT; mpd.wbc = wbc; @@ -2504,6 +2510,7 @@ static int ext4_da_writepages(struct address_space *mapping, wbc->no_nrwrite_index_update = 1; pages_skipped = wbc->pages_skipped; +retry: while (!ret && wbc->nr_to_write > 0) { /* @@ -2546,6 +2553,7 @@ static int ext4_da_writepages(struct address_space *mapping, pages_written += mpd.pages_written; wbc->pages_skipped = pages_skipped; ret = 0; + io_done = 1; } else if (wbc->nr_to_write) /* * There is no more writeout needed @@ -2554,6 +2562,13 @@ static int ext4_da_writepages(struct address_space *mapping, */ break; } + if (!io_done && !cycled) { + cycled = 1; + index = 0; + wbc->range_start = index << PAGE_CACHE_SHIFT; + wbc->range_end = mapping->writeback_index - 1; + goto retry; + } if (pages_skipped != wbc->pages_skipped) printk(KERN_EMERG "This should not happen leaving %s " "with nr_to_write = %ld ret = %d\n", @@ -2561,6 +2576,7 @@ static int ext4_da_writepages(struct address_space *mapping, /* Update index */ index += pages_written; + wbc->range_cyclic = range_cyclic; if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0)) /* * set the writeback_index so that range_cyclic Created attachment 20238 [details]
Tested patch
Patch from Aneesh, un-whitespace-mangled.
Ted, can you push this out? Works great. :) We might want to ask the other reporter of something similar (next-20090206: deadlock on ext4) to test it too. I'll ping him.
Created attachment 20239 [details]
updated patch w/ my signoffs
Updated the patch w/ my tested/reviewed by.
I'd like to think about whether we can make write_cache_pages smarter and not need to sort of recreate bits of it in the ext4 code, but for now this solves my problem, at least.
> Patch from Aneesh, un-whitespace-mangled.
>
> Ted, can you push this out? Works great. :) We might want to ask
> the other reporter of something similar (next-20090206: deadlock on
> ext4) to test it too. I'll ping him.
Do we completely understand the root cause, in terms of which commit
broken the mm/page-writeback.c code we were depending on? And if so,
what of the code in mm/page-writeback.c? Does anyone else use it?
Can anyone sanely use it?
And am I right in assuming that this only applies to 2.6.29-rcX
kernels, and is not needed for 2.6.28 or earlier kernels?
I hadn't yet pushed it out because I needed time to understand all of
these issues, hence these questions....
- Ted
Ted, all good questions. :) If you think we need to digest this a bit that's fine, though I'd rather this go in than not at all, before 2.6.29 gets out the door. But I think we still have some time. The problem has not been seen on 2.6.28 kernels; I can't right now point for sure to one of Nick's patch series that changed the behavior; in fact many points in the series seemed to not work at all; even building kernels on other filesystems got me hung up. Didn't inspire tons of confidence. :) You're probably right though, and echoing my concern in the previous comment, that we might need to look at this holistically and decide if this exact change to ext4 is the right path or if the page-writeback infrastructure needs a tweak... -Eric On Fri, Feb 13, 2009 at 02:06:06PM -0800, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12579 > > > > > > ------- Comment #8 from sandeen@redhat.com 2009-02-13 14:06 ------- > Created an attachment (id=20238) > --> (http://bugzilla.kernel.org/attachment.cgi?id=20238&action=view) > Tested patch > > Patch from Aneesh, un-whitespace-mangled. > > Ted, can you push this out? Works great. :) We might want to ask the other > reporter of something similar (next-20090206: deadlock on ext4) to test it > too. > I'll ping him. > I think this small changes is also needed. diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f743524..c80e038 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2495,7 +2495,10 @@ static int ext4_da_writepages(struct address_space *mapping, wbc->range_end = LLONG_MAX; wbc->range_cyclic = 0; range_cyclic = 1; - cycled = 0; + if (index == 0) + cycled = 1; + else + cycled = 0; } else index = wbc->range_start >> PAGE_CACHE_SHIFT; On Fri, Feb 13, 2009 at 08:50:18PM -0500, Theodore Tso wrote: > > Patch from Aneesh, un-whitespace-mangled. > > > > Ted, can you push this out? Works great. :) We might want to ask > > the other reporter of something similar (next-20090206: deadlock on > > ext4) to test it too. I'll ping him. > > Do we completely understand the root cause, in terms of which commit > broken the mm/page-writeback.c code we were depending on? And if so, > what of the code in mm/page-writeback.c? Does anyone else use it? > Can anyone sanely use it? AFAIU we need the changes even for older kernels. The reasoning is, with delayed allocation we cannot allow to retry with lower page index in write_cache_pages. We do retry even in older version of kernel. What made it so easy to reproduce it on later kernels is that we were doing a retry even if nr_to_write was zero. This got fixed on mainline by 3a4c6800f31ea8395628af5e7e490270ee5d0585. So with that change we are logically back to 2.6.28 state, But still the possibility of deadlock remain. > > And am I right in assuming that this only applies to 2.6.29-rcX > kernels, and is not needed for 2.6.28 or earlier kernels? I guess the hang can happen on 2.6.28 or earlier kernels. > > I hadn't yet pushed it out because I needed time to understand all of > these issues, hence these questions.... -aneesh On Sat, Feb 14, 2009 at 02:10:04PM +0530, Aneesh Kumar K.V wrote:
> On Fri, Feb 13, 2009 at 08:50:18PM -0500, Theodore Tso wrote:
> > > Patch from Aneesh, un-whitespace-mangled.
> > >
> > > Ted, can you push this out? Works great. :) We might want to ask
> > > the other reporter of something similar (next-20090206: deadlock on
> > > ext4) to test it too. I'll ping him.
> >
> > Do we completely understand the root cause, in terms of which commit
> > broken the mm/page-writeback.c code we were depending on? And if so,
> > what of the code in mm/page-writeback.c? Does anyone else use it?
> > Can anyone sanely use it?
>
> AFAIU we need the changes even for older kernels. The
> reasoning is, with delayed allocation we cannot allow to retry with lower
> page index in write_cache_pages. We do retry even in older version of
> kernel. What made it so easy to reproduce it on later kernels is that
> we were doing a retry even if nr_to_write was zero. This got fixed on
> mainline by 3a4c6800f31ea8395628af5e7e490270ee5d0585. So with that
> change we are logically back to 2.6.28 state, But still the possibility
> of deadlock remain.
>
I found commit 31a12666d8f0c22235297e1c1575f82061480029 to be the root
cause. The commit is correct in what it does. Ext4 was dependent on the
wrong behaviour. The relevant change is
@@ -897,7 +903,6 @@ retry:
min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) {
unsigned i;
- scanned = 1;
for (i = 0; i < nr_pages; i++) {
I think that caused us the retry. That would imply we may not need the
patch I did for 2.6.28. But given that Ext4 was dependent on the wrong
behaviour of write_cache_pages i would suggest we still push the patch
to 2.6.28
-aneesh
There is another bug reported about the deadlock ( http://bugzilla.kernel.org/show_bug.cgi?id=12610 ) happening on 2.6.28.2 and later. This patch has been merged into mainline. I will work on getting this pushed into a stable kernel. I sent mail a few hours ago about this bug occurring on a 2.6.27.y kernel. The patch which fixed the bug on mainline also fixes 2.6.27.y, but it depends on several other patches being merged in before it (mostly other ext4 fixes). |