Bug 196737 - CFQ delays some I/Os excessively
Summary: CFQ delays some I/Os excessively
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-23 13:01 UTC by Douglas Miller
Modified: 2017-08-23 14:38 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.13-rc6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
debugging code used to gather statistics (1.15 KB, patch)
2017-08-23 13:01 UTC, Douglas Miller
Details | Diff

Description Douglas Miller 2017-08-23 13:01:44 UTC
Created attachment 258065 [details]
debugging code used to gather statistics

As a result of investigating the problem described in bugzilla #104891, I found that even the latest 4.13 kernel - in spite of having the vdisktime fix - continues to unfairly schedule many I/Os.

I am running on POWER platforms, using a tester "HTX" (powerpc only, https://github.com/open-power/HTX). This tester will complain about any I/O that takes longer than 10 minutes to complete. While running with CFQ enabled as the scheduler, I see HTX begin to complain almost immediately. I added some debugging code, and found that many I/Os are not even getting submitted to scsi_dispatch_cmd for a very long time.

My current testing is with an LSI3008 HBA and 4 SATA disks. I have seen this on other (vendor) HBAs as well, with as few as 2 disks.

I am attaching a patch of the debug code I am using.

I have never seen I/Os that permanently hang. Shutting down the tester always results in any delayed I/Os getting immediately completed, the OS is never hung.
Comment 1 Douglas Miller 2017-08-23 13:11:08 UTC
In order to build and run HTX (powerpc platforms only):

 1. clone the HTX repository.
 2. ensure dependencies are installed. Here is a typical list of packages that may not be normally installed:
  make gcc g++ git libncurses5-dev libcxl-dev libdapl-dev
 3. "make"
 4. for Debian/Ubuntu, "make deb", otherwise "make tar"
 5. install (for tar file, unpack and execute install script)

To run:

 1. "su - htx"
 2. select test file "mdt.io"
 3. hit Enter when prompted about log compression
 4. At menu, selection option "2", "h"alt all devices, then choose only the disks to be tested. HTX will not even present disks that appear to have valid data. "q" to return to main menu.
 5. Select option "4" and set continue-on-error (coe) for the disks being tested. "q" to return to main menu.
 6. Select option "1" to start the tests.
 7. Progress may be monitored using option "5" for the status screen.
Comment 2 Douglas Miller 2017-08-23 13:18:09 UTC
Some background: I discovered this because Ubuntu currently sets CFQ as the default scheduler, so all installations/disks will use CFQ unless reconfigured.

Note You need to log in before you can comment on or make changes to this bug.