Most recent kernel where this bug did not occur: Unknown Distribution: Arch Linux Hardware Environment: 256 Mb RAM, 1300 MHz AMD Duron, Seagate ST31208227AS on SATA (using sata_sil). Software Environment: Tried multiple kernel versions, latest test with 2.6.15 from Mercurial. Interbench 0.29. Tested with ext3, ext2 and vfat. Problem Description: Interbench creates a condition where sync calls can take extremely long; more than one minute. This only happens with the anticipatory IO scheduler, syncs never take more than a few seconds with noop, deadline or cfq. The long syncs happen when interbench is reading at max disk speed, so it may be some kind of write starvation. On the other hand, sync doesn't seem to return even when there is zero dirty or writeback data according to /proc/meminfo, so it could be something totally different too. Kernel config, dmesg and test output can be found at http://nul.nu/~indan/synctest2/interbench/ The logfiles have interbench, vmstat and partial meminfo output. Files in the "short", "soso" and "long" directories are logfiles run as normal user without background writes. The files in "withwrite" had vmstat output redirected to a file, and the files in "auto" were generated by running the test in a loop. Unfortunately the testscrip had a bug then, forgetting to delete the meminfo output file after each test, so those meminfo values are concatenated from previous runs and practically useless. Steps to reproduce: wget http://nul.nu/~indan/synctest2/synctest.tgz tar xzf synctest.tgz cd interbench.test/interbench-0.29/ make cd .. ./test Or install and compile interbench and apply the patch which can be found at interbench.test/diff to only run the relevant interbench parts which trigger the slow syncs and to get sync() timings. The problem isn't always easy to reproduce, as it sometimes tends to hide for a while. It seems (not sure, as the results are a bit erratic) that running interbench as root makes it slightly worse, and also having a concurrent light write load (e.g. vmstat 2 output redirected to a file) appears to make the syncs longer. I'm willing to try out different kernel versions or patches. Email references: http://lkml.org/lkml/2005/8/13/85 http://bhhdoa.org.au/pipermail/ck/2006-January/005161.html
I'm not sure this is the same, but probably can help. We have found that simple write(500bytes) + fsync() operation are 20-100 times slower when tar is running on the same partition and AS I/O scheduler is used. noop, deadline, CFQ didn't show such problem. We also found, that, doing: # echo 10 > /sys/block/sda/queue/iosched/read_batch_expire # echo 10 > /sys/block/sda/queue/iosched/read_expire helps very much and AS becomes as good as other schedulers (even better :) ). Hope this can help.
I'll have a look. Is this still a problem for you in recent kernels? Can you get a sysrq+t task trace of all processes at the time that a slow sync is occuring?
I tried running the same tests, and the problem is still there with 2.6.20-rc6. I also tried Kirill's advice, and that didn't help either (but I tried all kind of different AS options before, without much improvement). sysrq-t output attached.
Created attachment 10260 [details] 2 min. delay. No swap
OK, various things are waiting on IO. The AS IO scheduler does prioritise reads over writes, however it is unusual that it would take so long, because it ensures not to starve them. Anyway thanks for the trace, I'll try to work out what's happening.
I wonder, could it be related to some soft write deadlocking triggered by the anticipatory IO scheduler? Reading your emails on lkml it seems not all is well in that area. Would it make sense to try out your latest deadlock fix patches? (If so, how/weher can I get them? I've git, but no idea how to get only the changes between two trees other than pulling the whole kernel source with changes.)
Created attachment 10481 [details] SysRq+Alt+t dump from 2.6.21-rc1 after 20-30 seconds of syncing.
Created attachment 10482 [details] Output for 2.6.21-rc1 30 seconds later than 'as'
Created attachment 10483 [details] Output for 2.6.21-rc1 37 seconds later than 'as2'
It seems the problem is known by Al Boldi: http://lkml.org/lkml/2007/4/16/38 I can confirm that lowering antic_expire does indeed work. Worst case results of a few runs: antic_expire sync time ---------------------------------- 0-3 0-1 4 7 5 16 6 25 7-10 30 20 373 The runtime of the test, without the syncing, is about 13 seconds. 6 is the default antic_expire value. Every value was run a few times, and the worst case is reported. Higher values seem to inhibit more unstable and varying results. CCing Al Boldi, in case he has more info.
Any updates on this problem? Thanks.
(In reply to comment #11) > Any updates on this problem? > Thanks. > Not really, still happens with 2.6.22-rc4. Just ran the interbench test code I'm using to trigger this, and still can produce +30 s sync times. Setting read_ahead_kb to 0 seems to "fix" it, syncing takes still longer than with other schedulers, but no more than 5 seconds with antic_expire set to 20. Which might mean it's caused by interaction between anticipatory and the read-ahead code. Or it could merely mean that the extra reading generated by read-ahead triggers the problem better. At least for me it's no big problem, as I use CFQ, but I guess it does mean that Anticipatory isn't very reliable to use (at least with the default nonzero antic_expire value). That said, it's probably hard to trigger with normal work loads.
OK, thanks for the report and analysis. I guess we'll see if anyone takes time to fix AS in the future.
Subject: Re: Extremely slow sync with anticipatory under certain conditions bugme-daemon@bugzilla.kernel.org wrote: > ------- Comment #12 from indan@nul.nu 2007-06-16 11:10 ------- > (In reply to comment #11) > > > Any updates on this problem? > > Thanks. > > At least for me it's no big problem, as I use > CFQ, but I guess it does mean that Anticipatory > isn't very reliable to use (at least with the > default nonzero antic_expire value). That said, > it's probably hard to trigger with normal work > loads. But it's really easy to trigger. Just try syncing a write while the disk is being read. It hangs until the read is complete or another read is queued. Thanks! -- Al
Oops, seems I closed this bug somehow a while ago? Weird. Anyway, it's still present with 2.6.24-rc2, but it seems harder to trigger, with the limited testing I've done.
Can trigger 20+ second latencies easily with 2.6.27-rc4 with the anticipatory scheduler. Worst case summaries: AS: 25s AS: 26s (antic_expire 20 instead of default 6) AS: 7s (antic_expire 0) CFQ: 17s noop: 6s deadline: 13s Keep in mind that this is a totally different system than for the original reports. Much more ram (3.2GB, dunno where the other 800MB went, cheapo mobo), faster hd and cpu. Tested with ext3, but if it is the same problem as earlier it's filesystem independent. AS looks much better compared to the rest because although it's bad, the others became worse as well and AS doesn't seem to get stuck for minutes anymore. A higher antic_expire has not much effect, but lowering it improves sync latency a lot. Setting it to 0 gives stable latencies around 6 seconds. All in all too much changed with my system to reproduce the original problem, though it's probably still there. Interesting titbit: When setting read_ahead_kb to 1024 both cfq and AS get latencies above 30 seconds, deadline gets above 50 and noop gets 6s, but seems to achieve that by starving reads. Setting read_ahead_kb to 0 doesn't seem to change much, so the original problem with AS probably is gone, either because it's fixed or because it won't trigger on this system.
Not very interesting anymore now that anticipatory is gone.