Bug 10756
Summary: | many pre-mature anticipation timeouts in anticipatory I/O scheduler | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Chuanpeng Li (chuanpengli) |
Component: | Block Layer | Assignee: | Jens Axboe (axboe) |
Status: | CLOSED OBSOLETE | ||
Severity: | normal | CC: | alan, io_other |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.23 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Chuanpeng Li
2008-05-19 23:29:40 UTC
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 19 May 2008 23:29:41 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10756 > > Summary: many pre-mature anticipation timeouts in anticipatory > I/O scheduler > Product: IO/Storage > Version: 2.5 > KernelVersion: 2.6.23 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Block Layer > AssignedTo: axboe@kernel.dk > ReportedBy: chuanpengli@yahoo.com > CC: io_other@kernel-bugs.osdl.org > > > Latest working kernel version: N/A > Earliest failing kernel version: 2.6.13 > Distribution: www.kernel.org > Hardware Environment: IBM eServer: dual 2G Xeon processors;IBM 36GB SCSI > drive > Software Environment: Redhat 9: gcc 3.2.2 > Problem Description: > Starting form 2.6.13, the switch of kernel timer frequency HZ from 1000 to > 250 > results in "default_antic_expire = 1 tick". 1 tick is 4 ms, BUT the > anticipation timeout can occur anywhere from 0 to 4 ms, because the timer may > be started anytime from 0 to 4 ms before the next system timer interrupt. In > practice, I observe anticipation timeout as short as 100 micro-seconds using > the LTT trace tool. Compared with HZ=1000, the new frequency (HZ=250) causes > frequent pre-mature anticipation timeouts and degraded I/O throughput under > concurrent I/O workload. I suggest to set the "default_antic_expire" to 2 > when > its value is calculated as 1. (see source "block/as-iosched.c") > Steps to reproduce: > (1) run a concurrent server with I/O-bound workload, such as a > micro-benchmark that sequentially reads 256 KB from random locations in > randomly chosen files. > (2) I/O throughput at HZ=250 is 10-15% lower than HZ=1000 > (3) At HZ=250, a lot of anticipation timeouts can be observed using trace > tools such as LTT. Interesting. It's often a bug to do mod_timer(timer, jiffies+1) for this very reason - the timer can expire any time between one jiffie down to zero seconds hence, which is a large (infinite) ratio, which can have unpredictable effects. A probably-suitable-but-dopey fix might be --- a/block/as-iosched.c~a +++ a/block/as-iosched.c @@ -416,6 +416,9 @@ static void as_antic_waitnext(struct as_ timeout = ad->antic_start + ad->antic_expire; + if (ad->antic_expire == 1) + timeout++; /* comment goes here */ + mod_timer(&ad->antic_timer, timeout); ad->antic_status = ANTIC_WAIT_NEXT; _ but a) It is unclear what in there prevents `timeout' from referring to a time which has already passed (say, there was a storm of slow-running onterrupts on this CPU) and b) I bet other IO schedulers have the same issue. Reply-To: jens.axboe@oracle.com On Mon, May 19 2008, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Mon, 19 May 2008 23:29:41 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=10756 > > > > Summary: many pre-mature anticipation timeouts in anticipatory > > I/O scheduler > > Product: IO/Storage > > Version: 2.5 > > KernelVersion: 2.6.23 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Block Layer > > AssignedTo: axboe@kernel.dk > > ReportedBy: chuanpengli@yahoo.com > > CC: io_other@kernel-bugs.osdl.org > > > > > > Latest working kernel version: N/A > > Earliest failing kernel version: 2.6.13 > > Distribution: www.kernel.org > > Hardware Environment: IBM eServer: dual 2G Xeon processors;IBM 36GB SCSI > drive > > Software Environment: Redhat 9: gcc 3.2.2 > > Problem Description: > > Starting form 2.6.13, the switch of kernel timer frequency HZ from 1000 > to > > 250 > > results in "default_antic_expire = 1 tick". 1 tick is 4 ms, BUT the > > anticipation timeout can occur anywhere from 0 to 4 ms, because the timer > may > > be started anytime from 0 to 4 ms before the next system timer interrupt. > In > > practice, I observe anticipation timeout as short as 100 micro-seconds > using > > the LTT trace tool. Compared with HZ=1000, the new frequency (HZ=250) > causes > > frequent pre-mature anticipation timeouts and degraded I/O throughput under > > concurrent I/O workload. I suggest to set the "default_antic_expire" to 2 > when > > its value is calculated as 1. (see source "block/as-iosched.c") > > Steps to reproduce: > > (1) run a concurrent server with I/O-bound workload, such as a > > micro-benchmark that sequentially reads 256 KB from random locations in > > randomly chosen files. > > (2) I/O throughput at HZ=250 is 10-15% lower than HZ=1000 > > (3) At HZ=250, a lot of anticipation timeouts can be observed using trace > > tools such as LTT. > > Interesting. > > It's often a bug to do mod_timer(timer, jiffies+1) for this very reason > - the timer can expire any time between one jiffie down to zero seconds > hence, which is a large (infinite) ratio, which can have unpredictable > effects. > > A probably-suitable-but-dopey fix might be > > --- a/block/as-iosched.c~a > +++ a/block/as-iosched.c > @@ -416,6 +416,9 @@ static void as_antic_waitnext(struct as_ > > timeout = ad->antic_start + ad->antic_expire; > > + if (ad->antic_expire == 1) > + timeout++; /* comment goes here */ > + > mod_timer(&ad->antic_timer, timeout); > > ad->antic_status = ANTIC_WAIT_NEXT; > _ > > but a) It is unclear what in there prevents `timeout' from referring to > a time which has already passed (say, there was a storm of slow-running > onterrupts on this CPU) and b) I bet other IO schedulers have the same > issue. I have another patch pending that just makes sure that the timer addition is always at least 2 for this very reason. CFQ needs a similar patch, it currently makes sure it's at least 1 (but should be 2). |