Bug 12610

Summary: sync-Regression in 2.6.28.2?
Product: Other Reporter: Rafael J. Wysocki (rjw)
Component: OtherAssignee: other_other
Status: CLOSED CODE_FIX    
Severity: normal CC: tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28.2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 12398    

Description Rafael J. Wysocki 2009-02-01 15:35:32 UTC
Subject    : sync-Regression in 2.6.28.2?
Submitter  : Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>
Date       : 2009-01-27 9:35
References : http://marc.info/?l=linux-kernel&m=123304977706620&w=4
Notify-Also : Federico Cuello <fedux@lugmen.org.ar>

This entry is being used for tracking a regression from 2.6.28.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Ralf Hildebrandt 2009-02-04 06:11:50 UTC
Right now 2.6.28.3 doesn't seem to expose the same behaviour. I'm having an eye on this.
Comment 2 Federico Cuello 2009-02-04 15:15:22 UTC
It's still happening to me in 2.6.28.3.
Comment 3 Rafael J. Wysocki 2009-02-15 06:45:07 UTC
On Sunday 15 February 2009, Ralf Hildebrandt wrote:
> * Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>:
> > > This patch seems good to me. If you would care to add a changelog and
> > > Signed-off-by: line, then we could get it merged?
> > > 
> > > I am not too sure about this bug. I have reproduced a strange hang with
> > > ext4 (which does include sys_sync and write_cache_pages traces), and
> > > also turned up a lockdep report. Also, we haven't seen any reports of
> > > this problem on other filesystems. So it could be an ext4 bug.
> > > 
> > > Your traces also have lots of tasks hung waiting for page lock. It is
> > > possible that wakeups get lost, which is fixed by this commit in
> > > mainline
> > > 777c6c5f1f6e757ae49ecca2ed72d6b1f523c007
> > > 
> > > Which might also be your bug.
> > > 
> > > 
> > > Any chance you can test this patch (as well as the existing patches
> > > you are using to fix write_cache_pages?).
> > 
> > I could test 2.6.28.4
> 
> Still there in 2.6.28.5 :(
Comment 4 Theodore Tso 2009-02-21 15:48:20 UTC
This fix for this has landed in mainline post 2.6.29-rc5, as commit
2acf2c.  The deadlock is technically not a regression but it was made
*much* more likely to show up because of commit 31a1266: ("mm:
write_cache_pages cyclic fix, which show up in 2.6.28.1").

Commit 3a4c68 in mainline backs up the change made in 31a1266, so you
probably won't see this much after 2.6.28.6 (when 3a4c68 was
backported to 2.6.28.y), but we should get commit 2acf2c pushed to
2.6.28.x and 2.6.27.y to completely solve the deadlock problem.

                                                - Ted
Comment 5 Rafael J. Wysocki 2009-02-22 02:03:00 UTC
Closing, since it's fixed in the mainline.