Bug 14208 - XFS 5x slower than other filesystems when used as /var/tmp
Summary: XFS 5x slower than other filesystems when used as /var/tmp
Status: CLOSED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: XFS Guru
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-09-22 17:33 UTC by Török Edwin
Modified: 2012-06-13 17:06 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.31
Tree: Mainline
Regression: No


Attachments

Description Török Edwin 2009-09-22 17:33:44 UTC
I use XFS for my /var because it is excellent for read speeds on RAID disks (and I store quite a lot of readonly data on /var/local).

However XFS performance is very bad for tempfiles, as shown by this micro-benchmark, where /tmp is tmpfs, /home is reiserfs and /var is xfs:

nodev on /tmp type tmpfs (rw)
/dev/mapper/vg--all-lv--home on /home type reiserfs (rw,noatime,notail)
/dev/mapper/vg--all-lv--var on /var type xfs (rw,noatime,logbsize=262144,logbufs=8,logdev=/dev/sdg5,inode64)

/dev/mapper/vg is a VG on /dev/md4, which is a RAID10 device, but shouldn't matter, /home is a reiserfs filesystem on same /dev/md4, and about the same speed as tmpfs.

time (for i in `seq 1 10000 `; do echo this is a sentence | wc > /dev/null; done)

real    0m21.755s
user    0m10.653s
sys     0m19.217s

time (for i in `seq 1 10000 `; do echo this is a sentence > /tmp/wc.$$; wc /tmp/wc.$$ > /dev/null; done; rm /tmp/wc.$$ )

real    0m11.717s
user    0m3.368s
sys     0m11.125s


time (for i in `seq 1 10000 `; do echo this is a sentence > /var/tmp/wc.$$; wc /var/tmp/wc.$$ > /dev/null; done; rm /var/tmp/wc.$$ )

real    1m4.887s
user    0m2.604s
sys     0m11.545s

time (for i in `seq 1 10000 `; do echo this is a sentence > /home/edwin/wc.$$; wc /home/edwin/wc.$$ > /dev/null; done; rm /home/edwin/wc.$$ )

real    0m12.664s
user    0m3.544s
sys     0m11.565s
Comment 1 Török Edwin 2009-09-22 17:34:37 UTC
Linux debian 2.6.31 #14 SMP PREEMPT Fri Sep 11 18:57:10 EEST 2009 x86_64 GNU/Linux
[    0.613536] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[    0.614606] SGI XFS Quota Management subsystem
[   12.806548] XFS mounting filesystem dm-3
[   12.908936] Ending clean XFS mount for filesystem: dm-3
[   12.937266] XFS mounting filesystem dm-0
[   13.045189] Ending clean XFS mount for filesystem: dm-0
[   13.069771] XFS mounting filesystem dm-1
[   13.214466] Ending clean XFS mount for filesystem: dm-1
[    0.613536] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[    0.614606] SGI XFS Quota Management subsystem
[   12.806548] XFS mounting filesystem dm-3
[   12.908936] Ending clean XFS mount for filesystem: dm-3
[   12.937266] XFS mounting filesystem dm-0
[   13.045189] Ending clean XFS mount for filesystem: dm-0
[   13.069771] XFS mounting filesystem dm-1
[   13.214466] Ending clean XFS mount for filesystem: dm-1
Comment 2 Török Edwin 2009-09-22 17:38:35 UTC
Forgot to mention, thanks to ssen on #llvm for the testcase.
Comment 3 Eric Sandeen 2009-09-22 18:06:55 UTC
This is probably due at least partly to the flush-on-close-after-truncate semantics in xfs.

Rather than "echo" try a trivial c program or some other utility which will overwrite, rather than truncate & write, and see if you get a difference.

IOW:

$ time (for i in `seq 1 10000 `; do echo "this is a sentence" > /tmp/wc.$$; wc /tmp/wc.$$ > /dev/null; done; rm /tmp/wc.$$ )

real	2m52.457s
user	0m7.137s
sys	0m19.547s

vs:

$ echo "this is a sentence" > content

$ time (for i in `seq 1 10000 `; do dd if=content of=/tmp/wc.$$ conv=notrunc &>/dev/null; wc /tmp/wc.$$ > /dev/null; done; rm /tmp/wc.$$ )

real	0m56.906s
user	0m13.289s
sys	0m35.788s
Comment 4 Török Edwin 2009-09-22 18:28:11 UTC
Yes that is better, but still not as good as reiserfs ;)

time (for i in `seq 1 10000 `; do dd if=content of=/var/tmp/wc.$$ conv=notrunc 2>/dev/null; wc /var/tmp/wc.$$ > /dev/null; done; rm /var/tmp/wc.$$ )

real    0m24.638s
user    0m7.076s
sys     0m23.297s

However using a different filename does speed it up:

time (for i in `seq 1 10000 `; do echo "this is a sentence" > /var/tmp/wc.$i; wc /var/tmp/wc.$i > /dev/null; done; rm /var/tmp/wc.* )

real    0m14.925s
user    0m3.336s
sys     0m12.285s

What kind of flush are you doing on truncate? metadata flush or data flush?
Comment 5 Eric Sandeen 2009-09-22 18:41:48 UTC
"the same as reiserfs on microbenchmarks" is not necessarily the goal of xfs ;)

Anyway:

xfs_release()

                /*
                 * If we previously truncated this file and removed old data
                 * in the process, we want to initiate "early" writeout on
                 * the last close.  This is an attempt to combat the notorious
                 * NULL files problem which is particularly noticable from a
                 * truncate down, buffered (re-)write (delalloc), followed by
                 * a crash.  What we are effectively doing here is
                 * significantly reducing the time window where we'd otherwise
                 * be exposed to that problem.
                 */
                truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED);
                if (truncated && VN_DIRTY(VFS_I(ip)) && ip->i_delayed_blks > 0)
                        xfs_flush_pages(ip, 0, -1, XFS_B_ASYNC, FI_NONE);

This essentially calls filemap_fdatawrite, which writes dirty data pages.
Comment 6 Török Edwin 2009-09-22 19:08:43 UTC
Well with the truncat case xfs spends a lot of time writing files, at only ~8-10Mb/s, while my disk can do a 160-200Mb/s on same partition with a dd.

Is it that its flushing lots of small files the problem?
Maybe the I/O scheduler could be taught to recognize writes to the same location
and ignore the initial writes.

iostat -k 5 with dd if=/dev/zero of=/var/tmp/foo bs=1M count=1000
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             165.20         0.00     64010.40          0     320052
sdb             166.00         0.00     64112.80          0     320564
sdc             164.20         0.00     64197.60          0     320988
sdd             141.00         0.00     62476.00          0     312380
sde             164.60         0.00     64386.40          0     321932
sdf             162.20         0.00     64300.00          0     321500
sdg               0.00         0.00         0.00          0          0
md3               0.00         0.00         0.00          0          0
md4            5337.60         0.00    204596.80          0    1022984
dm-0              0.00         0.00         0.00          0          0
dm-1           5334.20         0.00    204583.20          0    1022916
dm-2              3.40         0.00        13.60          0         68
dm-3              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.06    0.00    7.65   37.87    0.00   53.42

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             149.80         2.40     58213.60         12     291068
sdb             149.60         0.00     58213.60          0     291068
sdc             145.60         0.00     58111.20          0     290556
sdd             145.60         0.80     66371.20          4     331856
sde             145.80         0.00     58177.60          0     290888
sdf             145.20         0.00     57996.00          0     289980
sdg               0.00         0.00         0.00          0          0
md3               0.40         1.60         0.00          8          0
md4            3681.40         1.60    159506.40          8     797532
dm-0              0.20         1.60         0.00          8          0
dm-1           3678.80         0.00    159496.80          0     797484
dm-2              2.40         0.00         9.60          0         48
dm-3              0.00         0.00         0.00          0          0

iostat -k 5 with the seq loop echoing something to same file:

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               1.60         0.00        15.20          0         76
sdb               1.80         0.80        15.20          4         76
sdc             135.40         0.00       549.60          0       2748
sdd               1.40         0.00        20.00          0        100
sde               1.40         0.00        20.00          0        100
sdf             135.60         0.00       550.40          0       2752
sdg             133.80         0.00      8563.20          0      42816
md3               0.00         0.00         0.00          0          0
md4             146.40         0.00       585.60          0       2928
dm-0              0.00         0.00         0.00          0          0
dm-1            133.80         0.00       535.20          0       2676
dm-2             12.60         0.00        50.40          0        252
dm-3              0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.66    0.00    4.71    0.34    0.00   90.28

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.20         0.00         0.10          0          0
sdb               0.20         0.00         0.10          0          0
sdc             131.80         0.00       535.20          0       2676
sdd               0.80         0.00         4.80          0         24
sde               1.00         0.80         4.80          4         24
sdf             132.00         0.00       536.00          0       2680
sdg             131.20         0.00      8396.80          0      41984
md3               0.00         0.00         0.00          0          0
md4             135.20         0.80       540.10          4       2700
dm-0              0.00         0.00         0.00          0          0
dm-1            132.20         0.80       528.10          4       2640
dm-2              3.00         0.00        12.00          0         60
dm-3              0.00         0.00         0.00          0          0
Comment 7 Török Edwin 2009-09-22 19:12:26 UTC
(In reply to comment #6)
> Well with the truncat case xfs spends a lot of time writing files, at only
> ~8-10Mb/s,

most of which is writing to the log (the log is contigous, why isn't it writing faster?), and much slower in the actual data (maybe because there isn't much actual data written): 500Kb/s.

> sda               0.20         0.00         0.10          0          0
> sdb               0.20         0.00         0.10          0          0
> sdc             131.80         0.00       535.20          0       2676
> sdd               0.80         0.00         4.80          0         24
> sde               1.00         0.80         4.80          4         24
> sdf             132.00         0.00       536.00          0       2680
> sdg             131.20         0.00      8396.80          0      41984

sdg is the drive holding the log.

This disk can do 80Mb/s sustained write speed (by using dd):
sdg             156.80         0.00     80281.60          0     401408

I would have expected log I/O to be about 40-60Mb/s, but its 8Mb/s.

Note You need to log in before you can comment on or make changes to this bug.