Bug 12033 - Switching SATA from home grown queuing to block level causes 40% performance drop
Summary: Switching SATA from home grown queuing to block level causes 40% performance ...
Status: CLOSED INVALID
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jeff Garzik
URL:
Keywords:
Depends on:
Blocks: 11808
  Show dependency tree
 
Reported: 2008-11-14 20:53 UTC by Petr Vandrovec
Modified: 2008-11-15 21:28 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.28-rc4 + patches
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Petr Vandrovec 2008-11-14 20:53:19 UTC
Latest working kernel version: 2.6.28-rc1
Earliest failing kernel version: 2.6.28-rc2
Distribution: Debian
Hardware Environment: 4 1TB disks behind Sil3726 PMP connected to on Sil3132
Software Environment: 64bit kernel + 32bit userspace, random debugging enabled in kernel
Problem Description:

It took some time until conversion of SATA from home grown queing to block level one got stable (as bug 11898 was just fixed), but unfortunately although things are now stable, there is quite big performance drop - writes to disks behind PMP are now only 50-70% of speed before Jens's conversion.

When I revert Jen's fixes, I get writes ~48MBps to each of 4 disks - 192MBps total bandwidth (after sata_sil24 change to PCIe during 2.6.28-rc it was actually 52MBps/disk).  All 4 disks are being written to concurrently, and test completes on all 4 disks almost simultaneously.  Now with default values each disk gets completely different bandwidth, and when I watch LEDs on disks, I see that for most of the time I/O goes to only one of disks, and which disk is being used switches every ~2 seconds.  Only way how to get at least some part of bandwidth is to allow only 8 queued commands on each disk - they they mostly fit to 31 commands on channel, and starvation code is almost never triggered.

Test just starts 4 concurrent 'dd' to ext3 filesystems on each of 4 disks, writting 4GB of data to each one.

Default setting:

gwy:~# ./x.sh
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 110.417 s, 38.0 MB/s
gwy:~# 4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 155.827 s, 26.9 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 206.971 s, 20.3 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 206.301 s, 20.3 MB/s

Only 8 requests per drive; there are 4 drives sharing one tag map with 31
entries:

gwy:~# for a in /sys/block/*/queue/nr_requests; do echo 8 > $a; done
gwy:~# ./x.sh
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 103.588 s, 40.5 MB/s
gwy:~# 4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 110.86 s, 37.8 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 105.978 s, 39.6 MB/s
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 107.94 s, 38.9 MB/s
Comment 1 Andrew Morton 2008-11-14 21:06:20 UTC
What additional patches were applied?

Is this a plain old post-2.6.27 regression?
Comment 2 Petr Vandrovec 2008-11-14 21:16:04 UTC
Patch from bug 11898 commment 36 to get rid of crashes/hangs while running dd test above.  I do not know whether James already submitted it to you, but it is not present in Linus's kernel yet.  To be absolutely sure it is caused by Jens's changes I'm now building current Linus's tree with these 4 changes reverted:

43a49cbdf31e812c0d8f553d433b09b421f5d52c
3070f69b66b7ab2f02d8a2500edae07039c38508
e013e13bf605b9e6b702adffbe2853cfc60e7806
2fca5ccf97d2c28bcfce44f5b07d85e74e3cd18e

It looks like that difference between nr_requests 8 and 128 disappears when hddtemp (which sends SMART non-NCQ command every now and then) is killed.
Comment 3 Petr Vandrovec 2008-11-14 21:27:41 UTC
Um, it looks like that Tejun already reverted Jens's changes, and I did not notice after syncing.  In that case I'll have to figure out where else part of my bandwidth went...
Comment 4 Petr Vandrovec 2008-11-14 22:51:40 UTC
Sorry, after rerunning tests on current git with Tejun's revert I'm back on ~50MBps/drive.
Comment 5 Tejun Heo 2008-11-15 21:28:28 UTC
Hmm... performance drop is unexpected.  Strange.  Jens, maybe this is caused by the delay in freeing tags?

Note You need to log in before you can comment on or make changes to this bug.