Kernel Bug Tracker – Bug 5900
Extremely slow sync with anticipatory under certain conditions
Last modified: 2010-04-07 14:07:35 UTC
Most recent kernel where this bug did not occur: Unknown
Distribution: Arch Linux
256 Mb RAM, 1300 MHz AMD Duron, Seagate ST31208227AS on SATA (using sata_sil).
Tried multiple kernel versions, latest test with 2.6.15 from Mercurial.
Interbench 0.29. Tested with ext3, ext2 and vfat.
Interbench creates a condition where sync calls can take extremely long; more
than one minute. This only happens with the anticipatory IO scheduler, syncs
never take more than a few seconds with noop, deadline or cfq.
The long syncs happen when interbench is reading at max disk speed, so it may be
some kind of write starvation. On the other hand, sync doesn't seem to return
even when there is zero dirty or writeback data according to /proc/meminfo, so
it could be something totally different too.
Kernel config, dmesg and test output can be found at
The logfiles have interbench, vmstat and partial meminfo output. Files in the
"short", "soso" and "long" directories are logfiles run as normal user without
background writes. The files in "withwrite" had vmstat output redirected to a
file, and the files in "auto" were generated by running the test in a loop.
Unfortunately the testscrip had a bug then, forgetting to delete the meminfo
output file after each test, so those meminfo values are concatenated from
previous runs and practically useless.
Steps to reproduce:
tar xzf synctest.tgz
Or install and compile interbench and apply the patch which can be found at
interbench.test/diff to only run the relevant interbench parts which trigger the
slow syncs and to get sync() timings.
The problem isn't always easy to reproduce, as it sometimes tends to hide for a
while. It seems (not sure, as the results are a bit erratic) that running
interbench as root makes it slightly worse, and also having a concurrent light
write load (e.g. vmstat 2 output redirected to a file) appears to make the syncs
I'm willing to try out different kernel versions or patches.
I'm not sure this is the same, but probably can help.
We have found that simple write(500bytes) + fsync() operation are 20-100 times
slower when tar is running on the same partition and AS I/O scheduler is used.
noop, deadline, CFQ didn't show such problem.
We also found, that, doing:
# echo 10 > /sys/block/sda/queue/iosched/read_batch_expire
# echo 10 > /sys/block/sda/queue/iosched/read_expire
helps very much and AS becomes as good as other schedulers (even better :) ).
Hope this can help.
I'll have a look.
Is this still a problem for you in recent kernels? Can you get a sysrq+t task
trace of all processes at the time that a slow sync is occuring?
I tried running the same tests, and the problem is still there with 2.6.20-rc6.
I also tried Kirill's advice, and that didn't help either (but I tried all kind
of different AS options before, without much improvement).
sysrq-t output attached.
Created attachment 10260 [details]
2 min. delay. No swap
OK, various things are waiting on IO. The AS IO scheduler does prioritise reads
over writes, however it is unusual that it would take so long, because it ensures
not to starve them.
Anyway thanks for the trace, I'll try to work out what's happening.
I wonder, could it be related to some soft write deadlocking triggered by the
anticipatory IO scheduler? Reading your emails on lkml it seems not all is well
in that area. Would it make sense to try out your latest deadlock fix patches?
(If so, how/weher can I get them? I've git, but no idea how to get only the
changes between two trees other than pulling the whole kernel source with changes.)
Created attachment 10481 [details]
SysRq+Alt+t dump from 2.6.21-rc1 after 20-30 seconds of syncing.
Created attachment 10482 [details]
Output for 2.6.21-rc1 30 seconds later than 'as'
Created attachment 10483 [details]
Output for 2.6.21-rc1 37 seconds later than 'as2'
It seems the problem is known by Al Boldi:
I can confirm that lowering antic_expire does indeed work.
Worst case results of a few runs:
antic_expire sync time
The runtime of the test, without the syncing, is about 13 seconds.
6 is the default antic_expire value. Every value was run a few times,
and the worst case is reported. Higher values seem to inhibit more
unstable and varying results.
CCing Al Boldi, in case he has more info.
Any updates on this problem?
(In reply to comment #11)
> Any updates on this problem?
Not really, still happens with 2.6.22-rc4.
Just ran the interbench test code I'm using to
trigger this, and still can produce +30 s sync
Setting read_ahead_kb to 0 seems to "fix" it,
syncing takes still longer than with other
schedulers, but no more than 5 seconds with
antic_expire set to 20.
Which might mean it's caused by interaction
between anticipatory and the read-ahead code.
Or it could merely mean that the extra reading
generated by read-ahead triggers the problem
At least for me it's no big problem, as I use
CFQ, but I guess it does mean that Anticipatory
isn't very reliable to use (at least with the
default nonzero antic_expire value). That said,
it's probably hard to trigger with normal work
OK, thanks for the report and analysis. I guess we'll see if anyone takes time to fix AS in the future.
Subject: Re: Extremely slow sync with anticipatory under certain conditions
> ------- Comment #12 from firstname.lastname@example.org 2007-06-16 11:10 -------
> (In reply to comment #11)
> > Any updates on this problem?
> > Thanks.
> At least for me it's no big problem, as I use
> CFQ, but I guess it does mean that Anticipatory
> isn't very reliable to use (at least with the
> default nonzero antic_expire value). That said,
> it's probably hard to trigger with normal work
But it's really easy to trigger. Just try syncing a write while the disk is
being read. It hangs until the read is complete or another read is queued.
Oops, seems I closed this bug somehow a while ago? Weird.
Anyway, it's still present with 2.6.24-rc2, but it seems harder to trigger, with the limited testing I've done.
Can trigger 20+ second latencies easily with 2.6.27-rc4 with the anticipatory scheduler. Worst case summaries:
AS: 26s (antic_expire 20 instead of default 6)
AS: 7s (antic_expire 0)
Keep in mind that this is a totally different system than for the original reports. Much more ram (3.2GB, dunno where the other 800MB went, cheapo mobo), faster hd and cpu. Tested with ext3, but if it is the same problem as earlier it's filesystem independent.
AS looks much better compared to the rest because although it's bad, the others became worse as well and AS doesn't seem to get stuck for minutes anymore. A higher antic_expire has not much effect, but lowering it improves sync latency a lot. Setting it to 0 gives stable latencies around 6 seconds.
All in all too much changed with my system to reproduce the original problem, though it's probably still there.
Interesting titbit: When setting read_ahead_kb to 1024 both cfq and AS get latencies above 30 seconds, deadline gets above 50 and noop gets 6s, but seems to achieve that by starving reads. Setting read_ahead_kb to 0 doesn't seem to change much, so the original problem with AS probably is gone, either because it's fixed or because it won't trigger on this system.
Not very interesting anymore now that anticipatory is gone.