Created attachment 24931 [details] kernel-log Performance of sata-nv (measured by hdparm -t) degrades from about 160 MB/s on 2.6.30 to max 80 MByte/s on 2.6.32. System is a Gigabyte GA-M55S-S3 (NForce 550) with a OCZ Vertex running 64-Bit linux on an AMD Athlon64 X2 5000+ and 4GByte ram. Both, a selfcompiled stock 2.6.32.7 kernel and the Debian 2.6.32-trunk-amd64 gives maximum performance of 85 MByte/s messured with hdparm -t, while a Debian 2.6.30-2-amd64 kernel gives 160 MByte/s. The Sata-device seems to be set up properly: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-8: OCZ-VERTEX, 1.5, max UDMA/133 ata1.00: configured for UDMA/133 kernel-log and lspci attached.
Does "echo deadline > /sys/block/sda/queue/scheduler" change anything?
Yes, it's getting worse. After “echo deadline > /sys/block/sda/queue/scheduler” performance degreases to about 60MB/sec.
How about "echo 1 > /sys/block/sda/device/queue_depth"?
That's better. It gives about 110 MB/sec But still is way off the 160 MB/sec I got with a 2.6.30 kernel.
How about "dd if=/dev/sda of=/dev/null iflag=direct bs=4M count=1024"?
“dd if=/dev/sda of=/dev/null iflag=direct bs=4M count=1024” gives about 200MB/s on both 2.6.32 and 2.6.30 kernels. Without the “iflag=direct” it's about 80MB/s on 2.6.32 (haven't tested on 2.6.30).
Yeah, well, that pretty much rules out the block layer and driver. It's probably readahead or something in vm layer. Andrew, can you please take a look? Thanks.
CC Mister Readahead. Let's check to see if it affects regular file and not just blockdevs: On a fresh (if possible) filesystem: dd if=/dev/zero of=foo bs=4M count=1024 echo 3 > /proc/sys/vm/drop_caches time dd if=foo of=/dev/null bs=4M Thanks.
I created a new partition (sda4) and put an ext2 filesystem on it. (I'm not sure about the effect, as of the mapping layer inside the SSD) # echo 3 > /proc/sys/vm/drop_caches # time dd if=foo of=/dev/null bs=4M 1024+0 records in 1024+0 records out 4294967296 bytes (4.3 GB) copied, 22.7644 s, 189 MB/s real 0m22.785s user 0m0.004s sys 0m7.184s so this gives about 190 MB/s (fine). Then I made a test on the blockdevice again: # echo 3 > /proc/sys/vm/drop_caches # time dd if=/dev/sda4 of=/dev/null bs=4M count=1024 1024+0 records in 1024+0 records out 4294967296 bytes (4.3 GB) copied, 53.3936 s, 80.4 MB/s real 0m53.402s user 0m0.012s sys 0m18.189s Tests were made on the 2.6.32-trunk-amd64-kernel of debian.
I tested three kernels and also compared block device with normal file read, and find no big differences. However I do remember Vladislav or Ronald encountered similar slow-block-dev problem. I have CCed them in another email: Vladislav Bolkhovitin <vst@vlnb.net> Ronald Moesbergen <intercommit@gmail.com> 2.6.33-rc7 # time dd if=/dev/sda8 of=/dev/null bs=4M count=100 100+0 records in 100+0 records out 419430400 bytes (419 MB) copied, 13.0285 s, 32.2 MB/s dd if=/dev/sda8 of=/dev/null bs=4M count=100 0.00s user 7.02s system 53% cpu 13.038 total 2.6.32 # time dd if=/dev/sda8 of=/dev/null bs=4M count=100 100+0 records in 100+0 records out 419430400 bytes (419 MB) copied, 13.2839 s, 31.6 MB/s dd if=/dev/sda8 of=/dev/null bs=4M count=100 0.00s user 5.29s system 39% cpu 13.316 total 2.6.30-1-amd64 $ time dd if=/dev/sda8 of=/dev/null bs=4M count=100 100+0 records in 100+0 records out 419430400 bytes (419 MB) copied, 12.4972 s, 33.6 MB/s dd if=/dev/sda8 of=/dev/null bs=4M count=100 0.00s user 2.44s system 19% cpu 12.501 total $ time dd if=/mm/iso/ghost.gho of=/dev/null bs=4M count=100 100+0 records in 100+0 records out 419430400 bytes (419 MB) copied, 11.4964 s, 36.5 MB/s dd if=/mm/iso/ghost.gho of=/dev/null bs=4M count=100 0.00s user 0.66s system 5% cpu 11.501 total
I tested three kernels and also compared block device with normal file read, and find no big differences. However I do remember Vladislav or Ronald encountered similar slow-block-dev problem. 2.6.33-rc7 # time dd if=/dev/sda8 of=/dev/null bs=4M count=100 100+0 records in 100+0 records out 419430400 bytes (419 MB) copied, 13.0285 s, 32.2 MB/s dd if=/dev/sda8 of=/dev/null bs=4M count=100 0.00s user 7.02s system 53% cpu 13.038 total 2.6.32 # time dd if=/dev/sda8 of=/dev/null bs=4M count=100 100+0 records in 100+0 records out 419430400 bytes (419 MB) copied, 13.2839 s, 31.6 MB/s dd if=/dev/sda8 of=/dev/null bs=4M count=100 0.00s user 5.29s system 39% cpu 13.316 total 2.6.30-1-amd64 $ time dd if=/dev/sda8 of=/dev/null bs=4M count=100 100+0 records in 100+0 records out 419430400 bytes (419 MB) copied, 12.4972 s, 33.6 MB/s dd if=/dev/sda8 of=/dev/null bs=4M count=100 0.00s user 2.44s system 19% cpu 12.501 total $ time dd if=/mm/iso/ghost.gho of=/dev/null bs=4M count=100 100+0 records in 100+0 records out 419430400 bytes (419 MB) copied, 11.4964 s, 36.5 MB/s dd if=/mm/iso/ghost.gho of=/dev/null bs=4M count=100 0.00s user 0.66s system 5% cpu 11.501 total Thanks, Fengguang
I testet with the “readahead: add blk_run_backing_dev“ patch (65a80b4c61f5b5f6eb0f5669c8fb120893bfb388) on 2.6.23 as suggested by Wu. It does not help: I got about 200 MB/s from file and about 90 MB/s from /dev/sda. I made some test with a “conventional” disk /dev/sda (SSD): 80 MB/s file (SSD): 190 MB/s /dev/sdd (Disk): 90 MB/s file (Disk): 90 MB/s Interestingly: reading from block device is slower on SSD than on disk.
I have an OCZ Vertex on an AMD Athlon64 X2 5000+ with 8GByte ram, too. Running in 64bit mode with a vanilla 2.6.33 kernel gives me similar results for blockdev vs. file access. /dev/sdb2 (SSD): 70 MB/s file (SSD): 150 MB/s /dev/sda2 (disc): 85 MB/s file (disc): 80 MB/s All these tests were made with bs=4M, count=250 and with droped caches. So for me block device on SSD is slower than on disk, too.
I'm sorry. Actually the values above came from an 2.6.33-rc7 kernel. Today I tested the 2.6.33 release which gave me a bit different results. Instead of 70 MB/s for blockdev I'm now getting 128 MB/s (and 150 MB/s for file). Interestingly I now get 150 MB/s if the partition is mounted, too. A shell session (modified for better reading) looks like that: # dd if=/dev/sdb2 of=/dev/null bs=4M count=250 -> 128 MB/s # mount /dev/sdb2 /mnt/ # dd if=/mnt/test_file of=/dev/null bs=4M count=250 -> 150 MB/s # dd if=/dev/sdb2 of=/dev/null bs=4M count=250 -> 150 MB/s # umount /dev/sdb2 # dd if=/dev/sdb2 of=/dev/null bs=4M count=250 -> 128 MB/s I added "echo 3 > /proc/sys/vm/drop_caches" in between, just to be sure and thereby discovered that cache on blockdev only workes for a mounted partition. But I guess this is intended?
What's the output of "stat /dev/sdb2" before/after the mount? I happen to notice it yestoday that block_read_full_page() is submitting IO in buffer heads in the bdev blocksize. The blocksize may be enlarged by the fs with sb_set_blocksize() at mount time.
In mounted and unmounted state "stat" gives me the same output: kuhli ~ # stat /dev/sdb2 File: „/dev/sdb2“ Size: 0 Blocks: 0 IO Block: 4096 blockorientierte Spezialdatei Device: eh/14d Inode: 4109 Links: 1 Device type: 8,12 Access: (0660/brw-rw----) Uid: ( 0/ root) Gid: ( 6/ disk) Access: 2010-02-27 23:36:46.704302705 +0100 Modify: 2010-02-27 23:36:30.174255460 +0100 Change: 2010-02-27 23:36:30.174255460 +0100
I can confirm the effect of mounted vs. unmounted blockdevices (2.6.33-Debian-kernel) /dev/sda4 mounted: 220 MB/s /dev/sda4 unmounted: 142 MB/s file on /dev/sda4: 197 MB/s stat-output does not change on mounting/unmounting.
Sorry. It seems that `stat` won't reflect the blocksize (and size) used by blockdev. The two vars are initilized together in bd_set_size() at blockdev (first) open time. We can do "echo 4096 > /sys/block/sda/queue/logical_block_size" to possibly enlarge blocksize computed in that function. Another difference may be, the unmounted case will flush data at blockdev file close() time, that may add some delay.
bd_set_size() will pick some sub-page blocksize if the blockdev size is not multiple of 4096. So what's your /dev/sdb2,sda4 sizes? "sfdisk -l" output?
Created attachment 25305 [details] To ensure a larger blocksize and reduce processing overheads.
Hello, I tested your patch today. (I didn't test the "echo 4096 > /sys/block/sda/queue/logical_block_size" as this didn't work (file's read only)). With the patch and umounted disc I get 145MB/s (before 128MB/s), with mounted disc I keep getting 150MB/s. So the patch increased the blockdev speed but with mounted disc it's still a bit faster.
I just tested 2.6.37-rc5+ with my rotational disk and am getting slightly faster results for the unmounted case (probably within the noise). Can you verify that on current kernels?
I just realized that my test on conventional disk is useless as the comments above seem to suggest it only affects 'fast enough' disks (ssd's). But nonetheless, can someone verify that the bug is still present on current kernels?
I just tested with a 2.6.37 kernel an I still get the same behaviour as noted in comment #14
I can probably see this bug with 2.6.37 and RAID10 of slow disks dd if=/dev/md2 of=/dev/null iflag=direct bs=4M count=128 128+0 records in 128+0 records out 536870912 bytes (537 MB) copied, 1.68345 s, 319 MB/s dd if=/dev/md2 of=/dev/null bs=4M count=128 128+0 records in 128+0 records out 536870912 bytes (537 MB) copied, 2.94772 s, 182 MB/s
I tried to boot PC without Xserver. dd if=/dev/md2 of=/dev/null bs=4M count=128 354 MB/s after Xserver is started 182-195 MB/s My PC has SATA controller: Intel Corporation 82801IB (ICH9) 4 port SATA AHCI Controller (rev 02)
With patch from comment #20 dd if=/dev/md2 of=/dev/null bs=4M count=128 354 MB/s after Xserver is started 170-185 MB/s - can't mount encrypted partition if I boot with patched kernel pam_mount(mount.c:68): crypt_activate_by_passphrase: Operation not permitted
It seems that performance is decreased after login to KDE, not after start of Xserver. If I stop KDE and Xserver, perf. isn't recovered.
Maybe KDE enables ALPM?
cat /sys/class/scsi_host/host[0-4]/link_power_management_policy max_performance max_performance max_performance max_performance max_performance Perf. drop doesn't happen with IceWM.
I found that one of following daemons, which are started by KDE, is causing it. /usr/libexec/udisks-daemon /usr/libexec/upowerd /usr/libexec/polkitd After disabling with renaming and reboot, perf drop doesn't happen.
Kay, u* daemons are slowing down IO accesses. Any idea on what's going on? wbrana, can you please use udevadm and/or blktrace to see what's going on? Which commands are those daemons issuing and how often? Thanks.
On Wed, Mar 2, 2011 at 09:26, <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=15241 > --- Comment #32 from Tejun Heo <tj@kernel.org> 2011-03-02 08:25:58 --- > Kay, u* daemons are slowing down IO accesses. Any idea on what's going on? The only known thing that udisks might be doing is polling with open() every 2 seconds for devices which mark the media as removable. Does simply killing udisks during runtime change the behavior? What's in the process list for udisks? We change the name to tell which devices are polled.
killing udisks doesn't change behavior there is process: udisks-daemon: not polling any devices complete output from udevadm monitor: monitor will print the received events for: UDEV - the event which udev sends out after rule processing KERNEL - the kernel uevent KERNEL[1299069807.367113] add /devices/virtual/bdi/253:0 (bdi) KERNEL[1299069807.367228] add /devices/virtual/block/dm-0 (block) UDEV [1299069807.367349] add /devices/virtual/bdi/253:0 (bdi) UDEV [1299069807.367396] add /devices/virtual/block/dm-0 (block) KERNEL[1299069807.367451] change /devices/virtual/block/dm-0 (block) KERNEL[1299069807.367480] change /devices/virtual/block/dm-0 (block) UDEV [1299069807.367608] change /devices/virtual/block/dm-0 (block) UDEV [1299069807.369444] change /devices/virtual/block/dm-0 (block) KERNEL[1299069807.372051] remove /devices/virtual/block/dm-0 (block) UDEV [1299069807.373665] remove /devices/virtual/block/dm-0 (block) KERNEL[1299069807.377309] remove /devices/virtual/bdi/253:0 (bdi) KERNEL[1299069807.377411] remove /devices/virtual/block/dm-0 (block) UDEV [1299069807.377417] remove /devices/virtual/bdi/253:0 (bdi) UDEV [1299069807.377520] remove /devices/virtual/block/dm-0 (block) KERNEL[1299069807.395488] change /devices/virtual/block/md23 (block) KERNEL[1299069807.395495] add /devices/virtual/bdi/253:0 (bdi) KERNEL[1299069807.395702] add /devices/virtual/block/dm-0 (block) UDEV [1299069807.395709] add /devices/virtual/bdi/253:0 (bdi) KERNEL[1299069807.395891] change /devices/virtual/block/dm-0 (block) UDEV [1299069807.395929] add /devices/virtual/block/dm-0 (block) UDEV [1299069807.507581] change /devices/virtual/block/md23 (block) UDEV [1299069807.712631] change /devices/virtual/block/dm-0 (block)
perf drop happens also when udisks-daemon is disabled and upowerd and polkitd are enabled
On Wed, Mar 2, 2011 at 15:07, <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=15241 > perf drop happens also when udisks-daemon is disabled and upowerd and polkitd > are enabled Might be stuff called by pm-utils. Things in: /usr/lib/pm-utils/power.d/sata_alpm or some other files close to that one. It looks like a mess. :)
On Wed, Mar 2, 2011 at 15:38, <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=15241 From #udev: <mbiebl> you can ask him to manually run pm-powersave true|false <mbiebl> to narrow it down to pm-utils
strange results pm-powersave true: dd if=/dev/md2 of=/dev/null bs=4M count=128 128+0 records in 128+0 records out 536870912 bytes (537 MB) copied, 1.51789 s, 354 MB/s cat /sys/class/scsi_host/host[0-4]/link_power_management_policy min_power min_power min_power min_power min_power pm-powersave false: dd if=/dev/md2 of=/dev/null bs=4M count=128 128+0 records in 128+0 records out 536870912 bytes (537 MB) copied, 2.87367 s, 187 MB/s cat /sys/class/scsi_host/host[0-4]/link_power_management_policy max_performance max_performance max_performance max_performance max_performance
I tried to change /sys/class/scsi_host/host[0-4]/link_power_management_policy from command line. Perf isn't changed. Perf of /dev/sd[a-d] isn't changed by pm-powersave.