I use XFS for my /var because it is excellent for read speeds on RAID disks (and I store quite a lot of readonly data on /var/local). However XFS performance is very bad for tempfiles, as shown by this micro-benchmark, where /tmp is tmpfs, /home is reiserfs and /var is xfs: nodev on /tmp type tmpfs (rw) /dev/mapper/vg--all-lv--home on /home type reiserfs (rw,noatime,notail) /dev/mapper/vg--all-lv--var on /var type xfs (rw,noatime,logbsize=262144,logbufs=8,logdev=/dev/sdg5,inode64) /dev/mapper/vg is a VG on /dev/md4, which is a RAID10 device, but shouldn't matter, /home is a reiserfs filesystem on same /dev/md4, and about the same speed as tmpfs. time (for i in `seq 1 10000 `; do echo this is a sentence | wc > /dev/null; done) real 0m21.755s user 0m10.653s sys 0m19.217s time (for i in `seq 1 10000 `; do echo this is a sentence > /tmp/wc.$$; wc /tmp/wc.$$ > /dev/null; done; rm /tmp/wc.$$ ) real 0m11.717s user 0m3.368s sys 0m11.125s time (for i in `seq 1 10000 `; do echo this is a sentence > /var/tmp/wc.$$; wc /var/tmp/wc.$$ > /dev/null; done; rm /var/tmp/wc.$$ ) real 1m4.887s user 0m2.604s sys 0m11.545s time (for i in `seq 1 10000 `; do echo this is a sentence > /home/edwin/wc.$$; wc /home/edwin/wc.$$ > /dev/null; done; rm /home/edwin/wc.$$ ) real 0m12.664s user 0m3.544s sys 0m11.565s
Linux debian 2.6.31 #14 SMP PREEMPT Fri Sep 11 18:57:10 EEST 2009 x86_64 GNU/Linux [ 0.613536] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled [ 0.614606] SGI XFS Quota Management subsystem [ 12.806548] XFS mounting filesystem dm-3 [ 12.908936] Ending clean XFS mount for filesystem: dm-3 [ 12.937266] XFS mounting filesystem dm-0 [ 13.045189] Ending clean XFS mount for filesystem: dm-0 [ 13.069771] XFS mounting filesystem dm-1 [ 13.214466] Ending clean XFS mount for filesystem: dm-1 [ 0.613536] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled [ 0.614606] SGI XFS Quota Management subsystem [ 12.806548] XFS mounting filesystem dm-3 [ 12.908936] Ending clean XFS mount for filesystem: dm-3 [ 12.937266] XFS mounting filesystem dm-0 [ 13.045189] Ending clean XFS mount for filesystem: dm-0 [ 13.069771] XFS mounting filesystem dm-1 [ 13.214466] Ending clean XFS mount for filesystem: dm-1
Forgot to mention, thanks to ssen on #llvm for the testcase.
This is probably due at least partly to the flush-on-close-after-truncate semantics in xfs. Rather than "echo" try a trivial c program or some other utility which will overwrite, rather than truncate & write, and see if you get a difference. IOW: $ time (for i in `seq 1 10000 `; do echo "this is a sentence" > /tmp/wc.$$; wc /tmp/wc.$$ > /dev/null; done; rm /tmp/wc.$$ ) real 2m52.457s user 0m7.137s sys 0m19.547s vs: $ echo "this is a sentence" > content $ time (for i in `seq 1 10000 `; do dd if=content of=/tmp/wc.$$ conv=notrunc &>/dev/null; wc /tmp/wc.$$ > /dev/null; done; rm /tmp/wc.$$ ) real 0m56.906s user 0m13.289s sys 0m35.788s
Yes that is better, but still not as good as reiserfs ;) time (for i in `seq 1 10000 `; do dd if=content of=/var/tmp/wc.$$ conv=notrunc 2>/dev/null; wc /var/tmp/wc.$$ > /dev/null; done; rm /var/tmp/wc.$$ ) real 0m24.638s user 0m7.076s sys 0m23.297s However using a different filename does speed it up: time (for i in `seq 1 10000 `; do echo "this is a sentence" > /var/tmp/wc.$i; wc /var/tmp/wc.$i > /dev/null; done; rm /var/tmp/wc.* ) real 0m14.925s user 0m3.336s sys 0m12.285s What kind of flush are you doing on truncate? metadata flush or data flush?
"the same as reiserfs on microbenchmarks" is not necessarily the goal of xfs ;) Anyway: xfs_release() /* * If we previously truncated this file and removed old data * in the process, we want to initiate "early" writeout on * the last close. This is an attempt to combat the notorious * NULL files problem which is particularly noticable from a * truncate down, buffered (re-)write (delalloc), followed by * a crash. What we are effectively doing here is * significantly reducing the time window where we'd otherwise * be exposed to that problem. */ truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED); if (truncated && VN_DIRTY(VFS_I(ip)) && ip->i_delayed_blks > 0) xfs_flush_pages(ip, 0, -1, XFS_B_ASYNC, FI_NONE); This essentially calls filemap_fdatawrite, which writes dirty data pages.
Well with the truncat case xfs spends a lot of time writing files, at only ~8-10Mb/s, while my disk can do a 160-200Mb/s on same partition with a dd. Is it that its flushing lots of small files the problem? Maybe the I/O scheduler could be taught to recognize writes to the same location and ignore the initial writes. iostat -k 5 with dd if=/dev/zero of=/var/tmp/foo bs=1M count=1000 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 165.20 0.00 64010.40 0 320052 sdb 166.00 0.00 64112.80 0 320564 sdc 164.20 0.00 64197.60 0 320988 sdd 141.00 0.00 62476.00 0 312380 sde 164.60 0.00 64386.40 0 321932 sdf 162.20 0.00 64300.00 0 321500 sdg 0.00 0.00 0.00 0 0 md3 0.00 0.00 0.00 0 0 md4 5337.60 0.00 204596.80 0 1022984 dm-0 0.00 0.00 0.00 0 0 dm-1 5334.20 0.00 204583.20 0 1022916 dm-2 3.40 0.00 13.60 0 68 dm-3 0.00 0.00 0.00 0 0 avg-cpu: %user %nice %system %iowait %steal %idle 1.06 0.00 7.65 37.87 0.00 53.42 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 149.80 2.40 58213.60 12 291068 sdb 149.60 0.00 58213.60 0 291068 sdc 145.60 0.00 58111.20 0 290556 sdd 145.60 0.80 66371.20 4 331856 sde 145.80 0.00 58177.60 0 290888 sdf 145.20 0.00 57996.00 0 289980 sdg 0.00 0.00 0.00 0 0 md3 0.40 1.60 0.00 8 0 md4 3681.40 1.60 159506.40 8 797532 dm-0 0.20 1.60 0.00 8 0 dm-1 3678.80 0.00 159496.80 0 797484 dm-2 2.40 0.00 9.60 0 48 dm-3 0.00 0.00 0.00 0 0 iostat -k 5 with the seq loop echoing something to same file: Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 1.60 0.00 15.20 0 76 sdb 1.80 0.80 15.20 4 76 sdc 135.40 0.00 549.60 0 2748 sdd 1.40 0.00 20.00 0 100 sde 1.40 0.00 20.00 0 100 sdf 135.60 0.00 550.40 0 2752 sdg 133.80 0.00 8563.20 0 42816 md3 0.00 0.00 0.00 0 0 md4 146.40 0.00 585.60 0 2928 dm-0 0.00 0.00 0.00 0 0 dm-1 133.80 0.00 535.20 0 2676 dm-2 12.60 0.00 50.40 0 252 dm-3 0.00 0.00 0.00 0 0 avg-cpu: %user %nice %system %iowait %steal %idle 4.66 0.00 4.71 0.34 0.00 90.28 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 0.20 0.00 0.10 0 0 sdb 0.20 0.00 0.10 0 0 sdc 131.80 0.00 535.20 0 2676 sdd 0.80 0.00 4.80 0 24 sde 1.00 0.80 4.80 4 24 sdf 132.00 0.00 536.00 0 2680 sdg 131.20 0.00 8396.80 0 41984 md3 0.00 0.00 0.00 0 0 md4 135.20 0.80 540.10 4 2700 dm-0 0.00 0.00 0.00 0 0 dm-1 132.20 0.80 528.10 4 2640 dm-2 3.00 0.00 12.00 0 60 dm-3 0.00 0.00 0.00 0 0
(In reply to comment #6) > Well with the truncat case xfs spends a lot of time writing files, at only > ~8-10Mb/s, most of which is writing to the log (the log is contigous, why isn't it writing faster?), and much slower in the actual data (maybe because there isn't much actual data written): 500Kb/s. > sda 0.20 0.00 0.10 0 0 > sdb 0.20 0.00 0.10 0 0 > sdc 131.80 0.00 535.20 0 2676 > sdd 0.80 0.00 4.80 0 24 > sde 1.00 0.80 4.80 4 24 > sdf 132.00 0.00 536.00 0 2680 > sdg 131.20 0.00 8396.80 0 41984 sdg is the drive holding the log. This disk can do 80Mb/s sustained write speed (by using dd): sdg 156.80 0.00 80281.60 0 401408 I would have expected log I/O to be about 40-60Mb/s, but its 8Mb/s.