Bug 15426 - Running many copies of bonnie++ on different filesystems seems to deadlock in sync
Summary: Running many copies of bonnie++ on different filesystems seems to deadlock in...
Status: RESOLVED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: VFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_vfs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-02 16:31 UTC by John Hughes
Modified: 2012-07-05 15:39 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.32
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description John Hughes 2010-03-02 16:31:01 UTC
I create 8 ext3 filesystems on 8 disks and run a copy of bonnie++ on each filesystem.  All but two or three of the Bonnie++ processes seem to hang in sync and I get log messages like:

[ 1557.377626] INFO: task bonnie:2503 blocked for more than 120 seconds.
[ 1557.377693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1557.377775] bonnie        D 0000000000000000     0  2503   2489 0x00000000
[ 1557.377778]  ffffffff8144b1f0 0000000000000082 0000000000000000 ffff88034ac155c0
[ 1557.377780]  ffff88033c079880 ffffffff81047f43 000000000000f8a0 ffff88063b1e9fd8
[ 1557.377783]  00000000000155c0 00000000000155c0 ffff88063bbe3f90 ffff88063bbe4288
[ 1557.377785] Call Trace:
[ 1557.377787]  [<ffffffff81047f43>] ? finish_task_switch+0x3a/0xa7
[ 1557.377790]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
[ 1557.377792]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
[ 1557.377794]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
[ 1557.377796]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
[ 1557.377799]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Comment 1 John Hughes 2010-03-02 16:32:39 UTC
(Debian bug 571242)
Comment 2 Andrew Morton 2010-03-03 00:17:52 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 2 Mar 2010 16:31:03 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=15426
> 
>            Summary: Running many copies of bonnie++ on different
>                     filesystems seems to deadlock in sync
>            Product: File System
>            Version: 2.5
>     Kernel Version: 2.6.32
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: VFS
>         AssignedTo: fs_vfs@kernel-bugs.osdl.org
>         ReportedBy: john@calva.com
>         Regression: No
> 
> 
> I create 8 ext3 filesystems on 8 disks and run a copy of bonnie++ on each
> filesystem.  All but two or three of the Bonnie++ processes seem to hang in
> sync and I get log messages like:
> 
> [ 1557.377626] INFO: task bonnie:2503 blocked for more than 120 seconds.
> [ 1557.377693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this
> message.
> [ 1557.377775] bonnie        D 0000000000000000     0  2503   2489 0x00000000
> [ 1557.377778]  ffffffff8144b1f0 0000000000000082 0000000000000000
> ffff88034ac155c0
> [ 1557.377780]  ffff88033c079880 ffffffff81047f43 000000000000f8a0
> ffff88063b1e9fd8
> [ 1557.377783]  00000000000155c0 00000000000155c0 ffff88063bbe3f90
> ffff88063bbe4288
> [ 1557.377785] Call Trace:
> [ 1557.377787]  [<ffffffff81047f43>] ? finish_task_switch+0x3a/0xa7
> [ 1557.377790]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
> [ 1557.377792]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
> [ 1557.377794]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
> [ 1557.377796]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
> [ 1557.377799]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
> 

When the system is stuck, please run

	echo w > /proc/sysrq-trigger

or, equivalently, hit `alt-sysrq-w'.

This will emit huge amounts of stack traces into the system logs and
onto the console.  There's no need to send all of them - please
identify typical-looking ones and then send them in reply to this
email.  Email clients like to wordwrap these traces - please try to
prevent that.

Thanks.
Comment 3 John Hughes 2010-03-03 12:02:24 UTC
Jens Axboe wrote:
> Is IO still going on, or does it appear to be stuck? From the traces
> below, we have various procs caught in waiting for a request. So if
> things are totally stuck, it could be some race in there.
>   
I see I/O happening on three or four of the disks.

Just a thought.  What exactly is sync(2) supposed to do - block until 
there are no more dirty pages, or block until all pages that were dirty 
when the sync was done are clean?  In other words is the problem simply 
that pages are being dirtied faster than the sync is writing them out?
Comment 4 John Hughes 2010-03-03 12:02:33 UTC
Andrew Morton wrote:
> When the system is stuck, please run
>
>       echo w > /proc/sysrq-trigger
>   
Ok, we have:

[ 1128.964403] kjournald     D 0000000000000000     0  2196      2 0x00000000
[ 1128.964407]  ffffffff8144b1f0 0000000000000046 0000000000000000 ffffffff811837bb
[ 1128.964409]  ffff88063d1762e0 ffff880340002c80 000000000000f8a0 ffff88063c6e9fd8
[ 1128.964412]  00000000000155c0 00000000000155c0 ffff88063d1762e0 ffff88063d1765d8
[ 1128.964415] Call Trace:
[ 1128.964423]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
[ 1128.964427]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
[ 1128.964431]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964433]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
[ 1128.964437]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964441]  [<ffffffff8117b28d>] ? blk_recount_segments+0x17/0x27
[ 1128.964443]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
[ 1128.964447]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
[ 1128.964450]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
[ 1128.964455]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
[ 1128.964463]  [<ffffffffa017f5a2>] ? journal_commit_transaction+0x8d9/0xe2b [jbd]
[ 1128.964467]  [<ffffffffa01823fb>] ? kjournald+0xdf/0x226 [jbd]
[ 1128.964469]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964472]  [<ffffffffa018231c>] ? kjournald+0x0/0x226 [jbd]
[ 1128.964474]  [<ffffffff81064789>] ? kthread+0x79/0x81
[ 1128.964479]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 1128.964481]  [<ffffffff81064710>] ? kthread+0x0/0x81
[ 1128.964483]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20

And three of (flush-8:32, flush-8:48 and flush-8:64):

[ 1128.964485] flush-8:32    D 0000000000000000     0  2221      2 0x00000000
[ 1128.964488]  ffff88033e44b880 0000000000000046 0000000000000000 ffffffff811837bb
[ 1128.964490]  ffff88063d177100 0000001000022c80 000000000000f8a0 ffff88063b66dfd8
[ 1128.964493]  00000000000155c0 00000000000155c0 ffff88063d177100 ffff88063d1773f8
[ 1128.964495] Call Trace:
[ 1128.964497]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
[ 1128.964500]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
[ 1128.964502]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964504]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
[ 1128.964506]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964508]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
[ 1128.964511]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
[ 1128.964514]  [<ffffffff810114ce>] ? common_interrupt+0xe/0x13
[ 1128.964516]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
[ 1128.964518]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
[ 1128.964522]  [<ffffffff8110da7c>] ? __block_write_full_page+0x1d6/0x2ac
[ 1128.964525]  [<ffffffff8110c801>] ? end_buffer_async_write+0x0/0x13b
[ 1128.964528]  [<ffffffff8110fabc>] ? blkdev_get_block+0x0/0x57
[ 1128.964532]  [<ffffffff810b9572>] ? __writepage+0xa/0x25
[ 1128.964534]  [<ffffffff810b9bf9>] ? write_cache_pages+0x20b/0x327
[ 1128.964536]  [<ffffffff810b9568>] ? __writepage+0x0/0x25
[ 1128.964541]  [<ffffffff8110642e>] ? writeback_single_inode+0xe7/0x2da
[ 1128.964543]  [<ffffffff81107107>] ? writeback_inodes_wb+0x423/0x4fe
[ 1128.964546]  [<ffffffff8110730e>] ? wb_writeback+0x12c/0x1ab
[ 1128.964549]  [<ffffffff81107574>] ? wb_do_writeback+0x145/0x15b
[ 1128.964551]  [<ffffffff811075bb>] ? bdi_writeback_task+0x31/0x9d
[ 1128.964556]  [<ffffffff810c7cc6>] ? bdi_start_fn+0x0/0xca
[ 1128.964558]  [<ffffffff810c7d36>] ? bdi_start_fn+0x70/0xca
[ 1128.964560]  [<ffffffff810c7cc6>] ? bdi_start_fn+0x0/0xca
[ 1128.964562]  [<ffffffff81064789>] ? kthread+0x79/0x81
[ 1128.964564]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 1128.964566]  [<ffffffff81064710>] ? kthread+0x0/0x81
[ 1128.964568]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20


And a few copies of bonnie (14 in total):

Two like this:

[ 1128.964735] bonnie        D 0000000000000000     0  2240   2236 0x00000000
[ 1128.964737]  ffff88033e44b880 0000000000000086 0000000000000000 00000000000009ef
[ 1128.964740]  ffff8802bac52000 0000000000430bf5 000000000000f8a0 ffff88033d031fd8
[ 1128.964743]  00000000000155c0 00000000000155c0 ffff88033d811c40 ffff88033d811f38
[ 1128.964745] Call Trace:
[ 1128.964747]  [<ffffffff8110c637>] ? sync_buffer+0x0/0x40
[ 1128.964749]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964752]  [<ffffffff8110c672>] ? sync_buffer+0x3b/0x40
[ 1128.964754]  [<ffffffff812ee339>] ? __wait_on_bit_lock+0x3f/0x84
[ 1128.964756]  [<ffffffff8110c637>] ? sync_buffer+0x0/0x40
[ 1128.964758]  [<ffffffff812ee3e9>] ? out_of_line_wait_on_bit_lock+0x6b/0x77
[ 1128.964760]  [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
[ 1128.964764]  [<ffffffffa017e068>] ? do_get_write_access+0x82/0x441 [jbd]
[ 1128.964766]  [<ffffffff8110bbed>] ? __getblk+0x26/0x29a
[ 1128.964769]  [<ffffffffa017e449>] ? journal_get_write_access+0x22/0x33 [jbd]
[ 1128.964774]  [<ffffffffa01a040f>] ? __ext3_journal_get_write_access+0x1e/0x47 [ext3]
[ 1128.964778]  [<ffffffffa019418e>] ? ext3_reserve_inode_write+0x3e/0x75 [ext3]
[ 1128.964782]  [<ffffffffa0199d86>] ? ext3_orphan_add+0x94/0x15b [ext3]
[ 1128.964786]  [<ffffffffa01941f8>] ? ext3_mark_inode_dirty+0x33/0x3c [ext3]
[ 1128.964790]  [<ffffffffa019a6cb>] ? ext3_unlink+0x166/0x1b4 [ext3]
[ 1128.964793]  [<ffffffff810f645d>] ? vfs_unlink+0x72/0xa6
[ 1128.964795]  [<ffffffff810f7cb5>] ? do_unlinkat+0xad/0x134
[ 1128.964797]  [<ffffffff810f9b11>] ? vfs_readdir+0x92/0xa7
[ 1128.964800]  [<ffffffff810f9c9d>] ? sys_getdents+0xb9/0xc7
[ 1128.964803]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

One like this:

[ 1128.964804] bonnie        D 0000000000000000     0  2241   2236 0x00000000
[ 1128.964807]  ffff88033e44b880 0000000000000082 0000000000000000 ffffffff810b90a9
[ 1128.964809]  ffff880000022c80 0000000000000000 000000000000f8a0 ffff88033c3abfd8
[ 1128.964812]  00000000000155c0 00000000000155c0 ffff88033d813f90 ffff88033d814288
[ 1128.964814] Call Trace:
[ 1128.964816]  [<ffffffff810b90a9>] ? __alloc_pages_nodemask+0x10f/0x5ce
[ 1128.964819]  [<ffffffff81064977>] ? bit_waitqueue+0x10/0xa0
[ 1128.964822]  [<ffffffffa017e212>] ? do_get_write_access+0x22c/0x441 [jbd]
[ 1128.964824]  [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
[ 1128.964826]  [<ffffffff8110bbed>] ? __getblk+0x26/0x29a
[ 1128.964829]  [<ffffffff810170cd>] ? read_tsc+0xa/0x20
[ 1128.964832]  [<ffffffffa017e449>] ? journal_get_write_access+0x22/0x33 [jbd]
[ 1128.964836]  [<ffffffffa01a040f>] ? __ext3_journal_get_write_access+0x1e/0x47 [ext3]
[ 1128.964840]  [<ffffffffa019418e>] ? ext3_reserve_inode_write+0x3e/0x75 [ext3]
[ 1128.964844]  [<ffffffffa01941e6>] ? ext3_mark_inode_dirty+0x21/0x3c [ext3]
[ 1128.964847]  [<ffffffffa019a6b4>] ? ext3_unlink+0x14f/0x1b4 [ext3]
[ 1128.964850]  [<ffffffff810f645d>] ? vfs_unlink+0x72/0xa6
[ 1128.964852]  [<ffffffff810f7cb5>] ? do_unlinkat+0xad/0x134
[ 1128.964854]  [<ffffffff810f9b11>] ? vfs_readdir+0x92/0xa7
[ 1128.964856]  [<ffffffff810f9c9d>] ? sys_getdents+0xb9/0xc7
[ 1128.964859]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b


One like this:

[ 1128.964955] bonnie        D 0000000000000000     0  2244   2236 0x00000000
[ 1128.964958]  ffffffff8144b1f0 0000000000000082 0000000000000000 ffffffff811837bb
[ 1128.964960]  ffff88033d813170 00000010a00fc7ab 000000000000f8a0 ffff88033ceadfd8
[ 1128.964963]  00000000000155c0 00000000000155c0 ffff88033d813170 ffff88033d813468
[ 1128.964965] Call Trace:
[ 1128.964967]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
[ 1128.964969]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
[ 1128.964971]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964973]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
[ 1128.964976]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964978]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
[ 1128.964980]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
[ 1128.964983]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
[ 1128.964985]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
[ 1128.964987]  [<ffffffff8110da7c>] ? __block_write_full_page+0x1d6/0x2ac
[ 1128.964990]  [<ffffffff8110c801>] ? end_buffer_async_write+0x0/0x13b
[ 1128.964992]  [<ffffffff8110fabc>] ? blkdev_get_block+0x0/0x57
[ 1128.964994]  [<ffffffff810b9572>] ? __writepage+0xa/0x25
[ 1128.964997]  [<ffffffff810b9bf9>] ? write_cache_pages+0x20b/0x327
[ 1128.964999]  [<ffffffff810b9568>] ? __writepage+0x0/0x25
[ 1128.965003]  [<ffffffff810b3cda>] ? __filemap_fdatawrite_range+0x4b/0x54
[ 1128.965007]  [<ffffffffa019b5bb>] ? ext3_sync_fs+0x42/0x4b [ext3]
[ 1128.965009]  [<ffffffff810b4998>] ? filemap_write_and_wait+0x17/0x32
[ 1128.965013]  [<ffffffff8112d015>] ? sync_quota_sb+0x5d/0xf6
[ 1128.965015]  [<ffffffff8110a38d>] ? __sync_filesystem+0x43/0x70
[ 1128.965017]  [<ffffffff8110a454>] ? sync_filesystems+0x9a/0xe3
[ 1128.965019]  [<ffffffff8110a4e2>] ? sys_sync+0x1c/0x2e
[ 1128.965021]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

Seven bonnies like this:

[ 1128.965231] bonnie        D 0000000000000000     0  2253   2236 0x00000000
[ 1128.965233]  ffffffff8144b1f0 0000000000000082 0000000000000000 ffff88034ac155c0
[ 1128.965236]  ffff88033ce7f700 ffffffff81047f43 000000000000f8a0 ffff88033c249fd8
[ 1128.965239]  00000000000155c0 00000000000155c0 ffff88033d814db0 ffff88033d8150a8
[ 1128.965241] Call Trace:
[ 1128.965243]  [<ffffffff81047f43>] ? finish_task_switch+0x3a/0xa7
[ 1128.965246]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
[ 1128.965248]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
[ 1128.965250]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
[ 1128.965252]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
[ 1128.965255]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

And three bonnies like this:

[ 1128.965023] bonnie        D 0000000000000000     0  2245   2236 0x00000000
[ 1128.965026]  ffff88033e44b880 0000000000000082 0000000000000000 00000000000155c0
[ 1128.965028]  0000000000000005 00000000000155c0 000000000000f8a0 ffff88033cfc3fd8
[ 1128.965031]  00000000000155c0 00000000000155c0 ffff88033d817100 ffff88033d8173f8
[ 1128.965033] Call Trace:
[ 1128.965035]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
[ 1128.965038]  [<ffffffff8104a196>] ? try_to_wake_up+0x249/0x259
[ 1128.965040]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
[ 1128.965042]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
[ 1128.965044]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
[ 1128.965047]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
Comment 5 John Hughes 2010-03-03 12:07:29 UTC
Jens Axboe wrote:
> On Wed, Mar 03 2010, John Hughes wrote:
>   
>> Just a thought.  What exactly is sync(2) supposed to do - block until  
>> there are no more dirty pages, or block until all pages that were dirty  
>> when the sync was done are clean?  In other words is the problem simply  
>> that pages are being dirtied faster than the sync is writing them out?
>>     
>
> Our sync is currently broken in that regard, since it'll wait for too
> long. We have a debated patch going, I have included it below. Any
> chance you could give it a whirl?
>
> The semantics of sync are supposed to be 'wait for dirty IO generated
> BEFORE this sync call'.
>   
Ok, I will try this patch and get back to you.
Comment 6 Anonymous Emailer 2010-03-03 12:21:44 UTC
Reply-To: jens.axboe@oracle.com

On Wed, Mar 03 2010, John Hughes wrote:
> Jens Axboe wrote:
>> Is IO still going on, or does it appear to be stuck? From the traces
>> below, we have various procs caught in waiting for a request. So if
>> things are totally stuck, it could be some race in there.
>>   
> I see I/O happening on three or four of the disks.
>
> Just a thought.  What exactly is sync(2) supposed to do - block until  
> there are no more dirty pages, or block until all pages that were dirty  
> when the sync was done are clean?  In other words is the problem simply  
> that pages are being dirtied faster than the sync is writing them out?

Our sync is currently broken in that regard, since it'll wait for too
long. We have a debated patch going, I have included it below. Any
chance you could give it a whirl?

The semantics of sync are supposed to be 'wait for dirty IO generated
BEFORE this sync call'.
Comment 7 Anonymous Emailer 2010-03-03 12:26:43 UTC
Reply-To: jens.axboe@oracle.com

On Wed, Mar 03 2010, John Hughes wrote:
> Jens Axboe wrote:
>> On Wed, Mar 03 2010, John Hughes wrote:
>>   
>>> Just a thought.  What exactly is sync(2) supposed to do - block until 
>>>  there are no more dirty pages, or block until all pages that were 
>>> dirty  when the sync was done are clean?  In other words is the 
>>> problem simply  that pages are being dirtied faster than the sync is 
>>> writing them out?
>>>     
>>
>> Our sync is currently broken in that regard, since it'll wait for too
>> long. We have a debated patch going, I have included it below. Any
>> chance you could give it a whirl?
>>
>> The semantics of sync are supposed to be 'wait for dirty IO generated
>> BEFORE this sync call'.
>>   
> Ok, I will try this patch and get back to you.

Looking at your bug report, I saw that you were on 2.6.32. If you are
testing 2.6.33, let me know, the patch wont apply (I have another
variant for that).
Comment 8 Anonymous Emailer 2010-03-03 12:26:51 UTC
Reply-To: jens.axboe@oracle.com

On Wed, Mar 03 2010, John Hughes wrote:
> Andrew Morton wrote:
>> When the system is stuck, please run
>>
>>      echo w > /proc/sysrq-trigger
>>   
> Ok, we have:

Is IO still going on, or does it appear to be stuck? From the traces
below, we have various procs caught in waiting for a request. So if
things are totally stuck, it could be some race in there.


>
> [ 1128.964403] kjournald     D 0000000000000000     0  2196      2 0x00000000
> [ 1128.964407]  ffffffff8144b1f0 0000000000000046 0000000000000000
> ffffffff811837bb
> [ 1128.964409]  ffff88063d1762e0 ffff880340002c80 000000000000f8a0
> ffff88063c6e9fd8
> [ 1128.964412]  00000000000155c0 00000000000155c0 ffff88063d1762e0
> ffff88063d1765d8
> [ 1128.964415] Call Trace:
> [ 1128.964423]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
> [ 1128.964427]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
> [ 1128.964431]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
> [ 1128.964433]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
> [ 1128.964437]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
> [ 1128.964441]  [<ffffffff8117b28d>] ? blk_recount_segments+0x17/0x27
> [ 1128.964443]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
> [ 1128.964447]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
> [ 1128.964450]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
> [ 1128.964455]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
> [ 1128.964463]  [<ffffffffa017f5a2>] ? journal_commit_transaction+0x8d9/0xe2b
> [jbd]
> [ 1128.964467]  [<ffffffffa01823fb>] ? kjournald+0xdf/0x226 [jbd]
> [ 1128.964469]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
> [ 1128.964472]  [<ffffffffa018231c>] ? kjournald+0x0/0x226 [jbd]
> [ 1128.964474]  [<ffffffff81064789>] ? kthread+0x79/0x81
> [ 1128.964479]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
> [ 1128.964481]  [<ffffffff81064710>] ? kthread+0x0/0x81
> [ 1128.964483]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
>
> And three of (flush-8:32, flush-8:48 and flush-8:64):
>
> [ 1128.964485] flush-8:32    D 0000000000000000     0  2221      2 0x00000000
> [ 1128.964488]  ffff88033e44b880 0000000000000046 0000000000000000
> ffffffff811837bb
> [ 1128.964490]  ffff88063d177100 0000001000022c80 000000000000f8a0
> ffff88063b66dfd8
> [ 1128.964493]  00000000000155c0 00000000000155c0 ffff88063d177100
> ffff88063d1773f8
> [ 1128.964495] Call Trace:
> [ 1128.964497]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
> [ 1128.964500]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
> [ 1128.964502]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
> [ 1128.964504]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
> [ 1128.964506]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
> [ 1128.964508]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
> [ 1128.964511]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
> [ 1128.964514]  [<ffffffff810114ce>] ? common_interrupt+0xe/0x13
> [ 1128.964516]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
> [ 1128.964518]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
> [ 1128.964522]  [<ffffffff8110da7c>] ? __block_write_full_page+0x1d6/0x2ac
> [ 1128.964525]  [<ffffffff8110c801>] ? end_buffer_async_write+0x0/0x13b
> [ 1128.964528]  [<ffffffff8110fabc>] ? blkdev_get_block+0x0/0x57
> [ 1128.964532]  [<ffffffff810b9572>] ? __writepage+0xa/0x25
> [ 1128.964534]  [<ffffffff810b9bf9>] ? write_cache_pages+0x20b/0x327
> [ 1128.964536]  [<ffffffff810b9568>] ? __writepage+0x0/0x25
> [ 1128.964541]  [<ffffffff8110642e>] ? writeback_single_inode+0xe7/0x2da
> [ 1128.964543]  [<ffffffff81107107>] ? writeback_inodes_wb+0x423/0x4fe
> [ 1128.964546]  [<ffffffff8110730e>] ? wb_writeback+0x12c/0x1ab
> [ 1128.964549]  [<ffffffff81107574>] ? wb_do_writeback+0x145/0x15b
> [ 1128.964551]  [<ffffffff811075bb>] ? bdi_writeback_task+0x31/0x9d
> [ 1128.964556]  [<ffffffff810c7cc6>] ? bdi_start_fn+0x0/0xca
> [ 1128.964558]  [<ffffffff810c7d36>] ? bdi_start_fn+0x70/0xca
> [ 1128.964560]  [<ffffffff810c7cc6>] ? bdi_start_fn+0x0/0xca
> [ 1128.964562]  [<ffffffff81064789>] ? kthread+0x79/0x81
> [ 1128.964564]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
> [ 1128.964566]  [<ffffffff81064710>] ? kthread+0x0/0x81
> [ 1128.964568]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
>
>
> And a few copies of bonnie (14 in total):
>
> Two like this:
>
> [ 1128.964735] bonnie        D 0000000000000000     0  2240   2236 0x00000000
> [ 1128.964737]  ffff88033e44b880 0000000000000086 0000000000000000
> 00000000000009ef
> [ 1128.964740]  ffff8802bac52000 0000000000430bf5 000000000000f8a0
> ffff88033d031fd8
> [ 1128.964743]  00000000000155c0 00000000000155c0 ffff88033d811c40
> ffff88033d811f38
> [ 1128.964745] Call Trace:
> [ 1128.964747]  [<ffffffff8110c637>] ? sync_buffer+0x0/0x40
> [ 1128.964749]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
> [ 1128.964752]  [<ffffffff8110c672>] ? sync_buffer+0x3b/0x40
> [ 1128.964754]  [<ffffffff812ee339>] ? __wait_on_bit_lock+0x3f/0x84
> [ 1128.964756]  [<ffffffff8110c637>] ? sync_buffer+0x0/0x40
> [ 1128.964758]  [<ffffffff812ee3e9>] ? out_of_line_wait_on_bit_lock+0x6b/0x77
> [ 1128.964760]  [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
> [ 1128.964764]  [<ffffffffa017e068>] ? do_get_write_access+0x82/0x441 [jbd]
> [ 1128.964766]  [<ffffffff8110bbed>] ? __getblk+0x26/0x29a
> [ 1128.964769]  [<ffffffffa017e449>] ? journal_get_write_access+0x22/0x33
> [jbd]
> [ 1128.964774]  [<ffffffffa01a040f>] ?
> __ext3_journal_get_write_access+0x1e/0x47 [ext3]
> [ 1128.964778]  [<ffffffffa019418e>] ? ext3_reserve_inode_write+0x3e/0x75
> [ext3]
> [ 1128.964782]  [<ffffffffa0199d86>] ? ext3_orphan_add+0x94/0x15b [ext3]
> [ 1128.964786]  [<ffffffffa01941f8>] ? ext3_mark_inode_dirty+0x33/0x3c [ext3]
> [ 1128.964790]  [<ffffffffa019a6cb>] ? ext3_unlink+0x166/0x1b4 [ext3]
> [ 1128.964793]  [<ffffffff810f645d>] ? vfs_unlink+0x72/0xa6
> [ 1128.964795]  [<ffffffff810f7cb5>] ? do_unlinkat+0xad/0x134
> [ 1128.964797]  [<ffffffff810f9b11>] ? vfs_readdir+0x92/0xa7
> [ 1128.964800]  [<ffffffff810f9c9d>] ? sys_getdents+0xb9/0xc7
> [ 1128.964803]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
> One like this:
>
> [ 1128.964804] bonnie        D 0000000000000000     0  2241   2236 0x00000000
> [ 1128.964807]  ffff88033e44b880 0000000000000082 0000000000000000
> ffffffff810b90a9
> [ 1128.964809]  ffff880000022c80 0000000000000000 000000000000f8a0
> ffff88033c3abfd8
> [ 1128.964812]  00000000000155c0 00000000000155c0 ffff88033d813f90
> ffff88033d814288
> [ 1128.964814] Call Trace:
> [ 1128.964816]  [<ffffffff810b90a9>] ? __alloc_pages_nodemask+0x10f/0x5ce
> [ 1128.964819]  [<ffffffff81064977>] ? bit_waitqueue+0x10/0xa0
> [ 1128.964822]  [<ffffffffa017e212>] ? do_get_write_access+0x22c/0x441 [jbd]
> [ 1128.964824]  [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
> [ 1128.964826]  [<ffffffff8110bbed>] ? __getblk+0x26/0x29a
> [ 1128.964829]  [<ffffffff810170cd>] ? read_tsc+0xa/0x20
> [ 1128.964832]  [<ffffffffa017e449>] ? journal_get_write_access+0x22/0x33
> [jbd]
> [ 1128.964836]  [<ffffffffa01a040f>] ?
> __ext3_journal_get_write_access+0x1e/0x47 [ext3]
> [ 1128.964840]  [<ffffffffa019418e>] ? ext3_reserve_inode_write+0x3e/0x75
> [ext3]
> [ 1128.964844]  [<ffffffffa01941e6>] ? ext3_mark_inode_dirty+0x21/0x3c [ext3]
> [ 1128.964847]  [<ffffffffa019a6b4>] ? ext3_unlink+0x14f/0x1b4 [ext3]
> [ 1128.964850]  [<ffffffff810f645d>] ? vfs_unlink+0x72/0xa6
> [ 1128.964852]  [<ffffffff810f7cb5>] ? do_unlinkat+0xad/0x134
> [ 1128.964854]  [<ffffffff810f9b11>] ? vfs_readdir+0x92/0xa7
> [ 1128.964856]  [<ffffffff810f9c9d>] ? sys_getdents+0xb9/0xc7
> [ 1128.964859]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
>
> One like this:
>
> [ 1128.964955] bonnie        D 0000000000000000     0  2244   2236 0x00000000
> [ 1128.964958]  ffffffff8144b1f0 0000000000000082 0000000000000000
> ffffffff811837bb
> [ 1128.964960]  ffff88033d813170 00000010a00fc7ab 000000000000f8a0
> ffff88033ceadfd8
> [ 1128.964963]  00000000000155c0 00000000000155c0 ffff88033d813170
> ffff88033d813468
> [ 1128.964965] Call Trace:
> [ 1128.964967]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
> [ 1128.964969]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
> [ 1128.964971]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
> [ 1128.964973]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
> [ 1128.964976]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
> [ 1128.964978]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
> [ 1128.964980]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
> [ 1128.964983]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
> [ 1128.964985]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
> [ 1128.964987]  [<ffffffff8110da7c>] ? __block_write_full_page+0x1d6/0x2ac
> [ 1128.964990]  [<ffffffff8110c801>] ? end_buffer_async_write+0x0/0x13b
> [ 1128.964992]  [<ffffffff8110fabc>] ? blkdev_get_block+0x0/0x57
> [ 1128.964994]  [<ffffffff810b9572>] ? __writepage+0xa/0x25
> [ 1128.964997]  [<ffffffff810b9bf9>] ? write_cache_pages+0x20b/0x327
> [ 1128.964999]  [<ffffffff810b9568>] ? __writepage+0x0/0x25
> [ 1128.965003]  [<ffffffff810b3cda>] ? __filemap_fdatawrite_range+0x4b/0x54
> [ 1128.965007]  [<ffffffffa019b5bb>] ? ext3_sync_fs+0x42/0x4b [ext3]
> [ 1128.965009]  [<ffffffff810b4998>] ? filemap_write_and_wait+0x17/0x32
> [ 1128.965013]  [<ffffffff8112d015>] ? sync_quota_sb+0x5d/0xf6
> [ 1128.965015]  [<ffffffff8110a38d>] ? __sync_filesystem+0x43/0x70
> [ 1128.965017]  [<ffffffff8110a454>] ? sync_filesystems+0x9a/0xe3
> [ 1128.965019]  [<ffffffff8110a4e2>] ? sys_sync+0x1c/0x2e
> [ 1128.965021]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
> Seven bonnies like this:
>
> [ 1128.965231] bonnie        D 0000000000000000     0  2253   2236 0x00000000
> [ 1128.965233]  ffffffff8144b1f0 0000000000000082 0000000000000000
> ffff88034ac155c0
> [ 1128.965236]  ffff88033ce7f700 ffffffff81047f43 000000000000f8a0
> ffff88033c249fd8
> [ 1128.965239]  00000000000155c0 00000000000155c0 ffff88033d814db0
> ffff88033d8150a8
> [ 1128.965241] Call Trace:
> [ 1128.965243]  [<ffffffff81047f43>] ? finish_task_switch+0x3a/0xa7
> [ 1128.965246]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
> [ 1128.965248]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
> [ 1128.965250]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
> [ 1128.965252]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
> [ 1128.965255]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
> And three bonnies like this:
>
> [ 1128.965023] bonnie        D 0000000000000000     0  2245   2236 0x00000000
> [ 1128.965026]  ffff88033e44b880 0000000000000082 0000000000000000
> 00000000000155c0
> [ 1128.965028]  0000000000000005 00000000000155c0 000000000000f8a0
> ffff88033cfc3fd8
> [ 1128.965031]  00000000000155c0 00000000000155c0 ffff88033d817100
> ffff88033d8173f8
> [ 1128.965033] Call Trace:
> [ 1128.965035]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
> [ 1128.965038]  [<ffffffff8104a196>] ? try_to_wake_up+0x249/0x259
> [ 1128.965040]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
> [ 1128.965042]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
> [ 1128.965044]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
> [ 1128.965047]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
>
>
Comment 9 Anonymous Emailer 2010-03-03 14:44:09 UTC
Reply-To: maan@systemlinux.org

On 13:03, Jens Axboe wrote:

> Our sync is currently broken in that regard, since it'll wait for too
> long. We have a debated patch going, I have included it below. Any
> chance you could give it a whirl?

That's the same patch you posted two weeks ago and on which Linus
commented he thinks it is broken. He also proposed another, simpler
patch (included below) which I'm running for a few days and which
fixes the sync problem for me.

However, it is not clear at all whether that simpler patch is
correct. Any comments on that?

Thanks
Andre

---
 fs/fs-writeback.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1a7c42c..a0a8424 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -738,11 +738,16 @@ static long wb_writeback(struct bdi_writeback *wb,
 	long wrote = 0;
 	struct inode *inode;
 
-	if (wbc.for_kupdate) {
-		wbc.older_than_this = &oldest_jif;
-		oldest_jif = jiffies -
-				msecs_to_jiffies(dirty_expire_interval * 10);
-	}
+	/*
+	 * We never write back data that was dirtied _after_ we
+	 * started writeback. But kupdate doesn't even want to
+	 * write back recently dirtied stuff, only older data.
+	 */
+	oldest_jif = jiffies-1;
+	wbc.older_than_this = &oldest_jif;
+	if (wbc.for_kupdate)
+		oldest_jif -= msecs_to_jiffies(dirty_expire_interval * 10);
+
 	if (!wbc.range_cyclic) {
 		wbc.range_start = 0;
 		wbc.range_end = LLONG_MAX;
-- 
The only person who always got his work done by Friday was Robinson Crusoe
Comment 10 Anonymous Emailer 2010-03-03 15:25:55 UTC
Reply-To: maan@systemlinux.org

On 13:03, Jens Axboe wrote:

> Our sync is currently broken in that regard, since it'll wait for too
> long. We have a debated patch going, I have included it below. Any
> chance you could give it a whirl?

That's the same patch you posted two weeks ago and on which Linus
commented he thinks it is broken. He also proposed another, simpler
patch (included below) which I'm running for a few days and which
fixes the sync problem for me.

However, it is not clear at all whether that simpler patch is
correct. Any comments on that?

Thanks
Andre

---
 fs/fs-writeback.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1a7c42c..a0a8424 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -738,11 +738,16 @@ static long wb_writeback(struct bdi_writeback *wb,
 	long wrote = 0;
 	struct inode *inode;
 
-	if (wbc.for_kupdate) {
-		wbc.older_than_this = &oldest_jif;
-		oldest_jif = jiffies -
-				msecs_to_jiffies(dirty_expire_interval * 10);
-	}
+	/*
+	 * We never write back data that was dirtied _after_ we
+	 * started writeback. But kupdate doesn't even want to
+	 * write back recently dirtied stuff, only older data.
+	 */
+	oldest_jif = jiffies-1;
+	wbc.older_than_this = &oldest_jif;
+	if (wbc.for_kupdate)
+		oldest_jif -= msecs_to_jiffies(dirty_expire_interval * 10);
+
 	if (!wbc.range_cyclic) {
 		wbc.range_start = 0;
 		wbc.range_end = LLONG_MAX;
-- 
The only person who always got his work done by Friday was Robinson Crusoe
Comment 11 John Hughes 2010-03-03 17:55:50 UTC
Jens Axboe wrote:
> Our sync is currently broken in that regard, since it'll wait for too
> long. We have a debated patch going, I have included it below. Any
> chance you could give it a whirl?
>
> The semantics of sync are supposed to be 'wait for dirty IO generated
> BEFORE this sync call'.
>
>   
This patch doesn't seem to change things.  It's a bit late here, I'll 
check my results tomorrow.
Comment 12 John Hughes 2010-03-04 11:17:20 UTC
Jens Axboe wrote:
> Our sync is currently broken in that regard, since it'll wait for too
> long. We have a debated patch going, I have included it below. Any
> chance you could give it a whirl?
>   
Ok, I confirm that this patch does not fix my problem.

I'll try the "simpler" patch.
Comment 13 John Hughes 2010-03-04 14:55:32 UTC
Andre Noll wrote:
> On 13:03, Jens Axboe wrote:
>
>   
>> Our sync is currently broken in that regard, since it'll wait for too
>> long. We have a debated patch going, I have included it below. Any
>> chance you could give it a whirl?
>>     
>
> That's the same patch you posted two weeks ago and on which Linus
> commented he thinks it is broken. He also proposed another, simpler
> patch (included below) which I'm running for a few days and which
> fixes the sync problem for me.
>   
I've tried the "simpler patch" on my system and it makes no difference 
for me.
Comment 14 Anonymous Emailer 2010-03-04 17:44:06 UTC
Reply-To: maan@systemlinux.org

On 15:55, John Hughes wrote:
> >That's the same patch you posted two weeks ago and on which Linus
> >commented he thinks it is broken. He also proposed another, simpler
> >patch (included below) which I'm running for a few days and which
> >fixes the sync problem for me.
> >  
> I've tried the "simpler patch" on my system and it makes no difference 
> for me.

Does it make no difference to Jens' patch, or no difference to an
unpatched kernel?

BTW, switching from cfq to noop or deadline also improved things a
bit for me.

Thanks
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe
Comment 15 John Hughes 2010-03-05 10:44:51 UTC
Andre Noll wrote:
> On 15:55, John Hughes wrote:
>   
>>> That's the same patch you posted two weeks ago and on which Linus
>>> commented he thinks it is broken. He also proposed another, simpler
>>> patch (included below) which I'm running for a few days and which
>>> fixes the sync problem for me.
>>>  
>>>       
>> I've tried the "simpler patch" on my system and it makes no difference 
>> for me.
>>     
>
> Does it make no difference to Jens' patch, or no difference to an
> unpatched kernel?
>   
No difference to an unpatched kernel.  I didn't think of stacking the 
patches.
> BTW, switching from cfq to noop or deadline also improved things a
> bit for me.
>   
Will try that.

Note You need to log in before you can comment on or make changes to this bug.