Bug 5900 - Extremely slow sync with anticipatory under certain conditions
Summary: Extremely slow sync with anticipatory under certain conditions
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Nick Piggin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-15 16:34 UTC by Indan
Modified: 2010-04-07 14:07 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.27-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
2 min. delay. No swap (61.79 KB, text/plain)
2007-02-02 09:43 UTC, Indan
Details
SysRq+Alt+t dump from 2.6.21-rc1 after 20-30 seconds of syncing. (61.76 KB, text/plain)
2007-02-21 08:26 UTC, Indan
Details
Output for 2.6.21-rc1 30 seconds later than 'as' (61.15 KB, text/plain)
2007-02-21 08:28 UTC, Indan
Details
Output for 2.6.21-rc1 37 seconds later than 'as2' (61.15 KB, text/plain)
2007-02-21 08:29 UTC, Indan
Details

Description Indan 2006-01-15 16:34:48 UTC
Most recent kernel where this bug did not occur: Unknown

Distribution: Arch Linux

Hardware Environment:

256 Mb RAM, 1300 MHz AMD Duron, Seagate ST31208227AS on SATA (using sata_sil).

Software Environment:

Tried multiple kernel versions, latest test with 2.6.15 from Mercurial.
Interbench 0.29. Tested with ext3, ext2 and vfat.

Problem Description:

Interbench creates a condition where sync calls can take extremely long; more
than one minute. This only happens with the anticipatory IO scheduler, syncs
never take more than a few seconds with noop, deadline or cfq.

The long syncs happen when interbench is reading at max disk speed, so it may be
some kind of write starvation. On the other hand, sync doesn't seem to return
even when there is zero dirty or writeback data according to /proc/meminfo, so
it could be something totally different too.

Kernel config, dmesg and test output can be found at
http://nul.nu/~indan/synctest2/interbench/

The logfiles have interbench, vmstat and partial meminfo output. Files in the
"short", "soso" and "long" directories are logfiles run as normal user without
background writes. The files in "withwrite" had vmstat output redirected to a
file, and the files in "auto" were generated by running the test in a loop.
Unfortunately the testscrip had a bug then, forgetting to delete the meminfo
output file after each test, so those meminfo values are concatenated from
previous runs and practically useless.

Steps to reproduce:

wget http://nul.nu/~indan/synctest2/synctest.tgz
tar xzf synctest.tgz
cd interbench.test/interbench-0.29/
make
cd ..
./test

Or install and compile interbench and apply the patch which can be found at
interbench.test/diff to only run the relevant interbench parts which trigger the
slow syncs and to get sync() timings.

The problem isn't always easy to reproduce, as it sometimes tends to hide for a
while. It seems (not sure, as the results are a bit erratic) that running
interbench as root makes it slightly worse, and also having a concurrent light
write load (e.g. vmstat 2 output redirected to a file) appears to make the syncs
longer.

I'm willing to try out different kernel versions or patches.

Email references:
http://lkml.org/lkml/2005/8/13/85
http://bhhdoa.org.au/pipermail/ck/2006-January/005161.html
Comment 1 Kirill Korotaev 2006-05-31 09:02:26 UTC
I'm not sure this is the same, but probably can help.

We have found that simple write(500bytes) + fsync() operation are 20-100 times
slower when tar is running on the same partition and AS I/O scheduler is used.
noop, deadline, CFQ didn't show such problem.

We also found, that, doing:
# echo 10 > /sys/block/sda/queue/iosched/read_batch_expire
# echo 10 > /sys/block/sda/queue/iosched/read_expire
helps very much and AS becomes as good as other schedulers (even better :) ).

Hope this can help.
Comment 2 Nick Piggin 2007-02-01 18:59:12 UTC
I'll have a look.

Is this still a problem for you in recent kernels? Can you get a sysrq+t task
trace of all processes at the time that a slow sync is occuring?
Comment 3 Indan 2007-02-02 09:42:08 UTC
I tried running the same tests, and the problem is still there with 2.6.20-rc6.
I also tried Kirill's advice, and that didn't help either (but I tried all kind
of different AS options before, without much improvement).

sysrq-t output attached.
Comment 4 Indan 2007-02-02 09:43:34 UTC
Created attachment 10260 [details]
2 min. delay. No swap
Comment 5 Nick Piggin 2007-02-07 15:47:32 UTC
OK, various things are waiting on IO. The AS IO scheduler does prioritise reads
over writes, however it is unusual that it would take so long, because it ensures
not to starve them.

Anyway thanks for the trace, I'll try to work out what's happening.
Comment 6 Indan 2007-02-08 15:46:43 UTC
I wonder, could it be related to some soft write deadlocking triggered by the
anticipatory IO scheduler? Reading your emails on lkml it seems not all is well
in that area. Would it make sense to try out your latest deadlock fix patches? 

(If so, how/weher can I get them? I've git, but no idea how to get only the
changes between two trees other than pulling the whole kernel source with changes.)
Comment 7 Indan 2007-02-21 08:26:38 UTC
Created attachment 10481 [details]
SysRq+Alt+t dump from 2.6.21-rc1 after 20-30 seconds of syncing.
Comment 8 Indan 2007-02-21 08:28:19 UTC
Created attachment 10482 [details]
Output for 2.6.21-rc1 30 seconds later than 'as'
Comment 9 Indan 2007-02-21 08:29:12 UTC
Created attachment 10483 [details]
Output for 2.6.21-rc1 37 seconds later than 'as2'
Comment 10 Indan 2007-04-16 05:13:56 UTC
It seems the problem is known by Al Boldi:
http://lkml.org/lkml/2007/4/16/38

I can confirm that lowering antic_expire does indeed work.

Worst case results of a few runs:

   antic_expire	  sync time
----------------------------------
	0-3	   0-1
	4	   7
	5	   16
	6	   25
	7-10	   30
	20	   373

The runtime of the test, without the syncing, is about 13 seconds.
6 is the default antic_expire value. Every value was run a few times, 
and the worst case is reported. Higher values seem to inhibit more
unstable and varying results.

CCing Al Boldi, in case he has more info.
Comment 11 Natalie Protasevich 2007-06-16 09:49:01 UTC
Any updates on this problem?
Thanks.
Comment 12 Indan 2007-06-16 11:10:09 UTC
(In reply to comment #11)
> Any updates on this problem?
> Thanks.
> 

Not really, still happens with 2.6.22-rc4.
Just ran the interbench test code I'm using to 
trigger this, and still can produce +30 s sync 
times.

Setting read_ahead_kb to 0 seems to "fix" it,
syncing takes still longer than with other
schedulers, but no more than 5 seconds with
antic_expire set to 20.

Which might mean it's caused by interaction
between anticipatory and the read-ahead code.
Or it could merely mean that the extra reading 
generated by read-ahead triggers the problem 
better.

At least for me it's no big problem, as I use 
CFQ, but I guess it does mean that Anticipatory
isn't very reliable to use (at least with the
default nonzero antic_expire value). That said,
it's probably hard to trigger with normal work
loads.
Comment 13 Natalie Protasevich 2007-06-16 11:32:36 UTC
OK, thanks for the report and analysis. I guess we'll see if anyone takes time to fix AS in the future.
Comment 14 Al Boldi 2007-06-17 13:22:21 UTC
Subject: Re:  Extremely slow sync with anticipatory under certain conditions

bugme-daemon@bugzilla.kernel.org wrote:
> ------- Comment #12 from indan@nul.nu  2007-06-16 11:10 -------
> (In reply to comment #11)
>
> > Any updates on this problem?
> > Thanks.
>
> At least for me it's no big problem, as I use
> CFQ, but I guess it does mean that Anticipatory
> isn't very reliable to use (at least with the
> default nonzero antic_expire value). That said,
> it's probably hard to trigger with normal work
> loads.

But it's really easy to trigger.  Just try syncing a write while the disk is 
being read.  It hangs until the read is complete or another read is queued.


Thanks!

--
Al
Comment 15 Indan 2007-11-17 06:04:23 UTC
Oops, seems I closed this bug somehow a while ago? Weird.

Anyway, it's still present with 2.6.24-rc2, but it seems harder to trigger, with the limited testing I've done.
Comment 16 Indan 2009-04-07 15:23:28 UTC
Can trigger 20+ second latencies easily with 2.6.27-rc4 with the anticipatory scheduler. Worst case summaries:

AS:       25s
AS:       26s  (antic_expire 20 instead of default 6)
AS:       7s   (antic_expire 0)
CFQ:      17s
noop:     6s
deadline: 13s

Keep in mind that this is a totally different system than for the original reports. Much more ram (3.2GB, dunno where the other 800MB went, cheapo mobo), faster hd and cpu. Tested with ext3, but if it is the same problem as earlier it's filesystem independent.

AS looks much better compared to the rest because although it's bad, the others became worse as well and AS doesn't seem to get stuck for minutes anymore. A higher antic_expire has not much effect, but lowering it improves sync latency a lot. Setting it to 0 gives stable latencies around 6 seconds.

All in all too much changed with my system to reproduce the original problem, though it's probably still there.

Interesting titbit: When setting read_ahead_kb to 1024 both cfq and AS get latencies above 30 seconds, deadline gets above 50 and noop gets 6s, but seems to achieve that by starving reads. Setting read_ahead_kb to 0 doesn't seem to change much, so the original problem with AS probably is gone, either because it's fixed or because it won't trigger on this system.
Comment 17 Indan 2010-04-07 14:07:35 UTC
Not very interesting anymore now that anticipatory is gone.

Note You need to log in before you can comment on or make changes to this bug.