Bug 43013

Summary: creating files >8GB on xfs / very slow
Product: File System Reporter: ITec (itec)
Component: XFSAssignee: XFS Guru (xfs-masters)
Status: RESOLVED OBSOLETE    
Severity: normal CC: itec, sandeen, szg00000, tsenapathy
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2.x and 3.3.x Subsystem:
Regression: No Bisected commit-id:
Attachments: ver_linux and other output for 3.2.0-rc1
ver_linux and other output for 3.1.10

Description ITec 2012-03-30 12:36:20 UTC
Created attachment 72755 [details]
ver_linux and other output for 3.2.0-rc1

How to reproduce the issue:
  dd if=/dev/zero of=/tmp/zero1.tmp bs=1024 count=13000000

  Where "/tmp" is on "/"
  and "/" is an xfs

  It does not happen if I create a file on another xfs then "/".

What happens:

  The first 8GB are written with about 100MB/s ans the rest with about 1MB/s.
  After the first 8GB kworker starts working heavily.

  top - 18:20:57 up 2:44, 2 users, load average: 1.33, 0.50, 0.42
  Tasks: 110 total, 3 running, 107 sleeping, 0 stopped, 0 zombie
  Cpu(s): 0.0%us, 49.8%sy, 0.0%ni, 50.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
  Mem: 1791092k total, 1723636k used, 67456k free, 8k buffers
  Swap: 0k total, 0k used, 0k free, 1564360k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
   4160 root 20 0 0 0 0 R 47 0.0 0:06.81 kworker/0:1
   4217 root 20 0 0 0 0 R 47 0.0 0:06.12 kworker/1:0
   4216 root 20 0 11500 632 516 D 5 0.0 0:38.42 dd
   3281 root 20 0 17220 1240 904 R 0 0.1 0:11.11 top
   4079 root 20 0 0 0 0 S 0 0.0 0:00.43 kworker/1:2
   4093 root 20 0 0 0 0 S 0 0.0 0:00.21 kworker/0:0
      1 root 20 0 24172 2016 1100 S 0 0.1 0:00.46 init

I tested different kernel versions.

  first affected: kernel-3.2.0-rc1
  last affected:  current
  last unaffected: kernel-3.1.10

Further information: see attachments
Comment 1 ITec 2012-03-30 12:37:26 UTC
Created attachment 72756 [details]
ver_linux and other output for 3.1.10
Comment 2 ITec 2012-03-30 12:48:59 UTC
This bug is corresponding to the downstream bug in Ubuntu #887793

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/887793
Comment 3 Eric Sandeen 2012-03-30 14:31:03 UTC
Hm, for starters, many other things are happening on / which may be affecting the test.

Secondly, things such as how much free space is on that fs, and how fragmented it is, could also be a factor.

That's just a very rough first guess, but generally perf testing on the root filesystem is not too useful.

It might be interesting to see how fragmented the file is when you complete the test, using xfs_bmap.

Stating how much memory is in the machine would be useful too, the first part of your dd is likely doing buffered IO, after that things start flushing.
Comment 4 ITec 2012-03-30 15:14:34 UTC
Yes, I know, that there are many things that could affect the test.
And I am not speaking about loss of "some" performance. I am speaking about loss of 98% performance. 

But what could cause this?
And why does it happen only on 3.2+?

About fragmentation:

  # xfs_fsr -v /dev/sda3
  / start inode=0
  ino=20294080
  ino=20294080 already fully defragmented.

  # df -m /tmp
  Filesystem     1M-blocks  Used Available Use% Mounted on
  /dev/sda3          15248  2549     12700  17% /

  # dd if=/dev/zero of=/tmp/zero1.tmp bs=1024 count=9000000
  9000000+0 records in
  9000000+0 records out
  9216000000 bytes (9,2 GB) copied, 396,978 s, 23,2 MB/s

  The first 8GB go with 100MB/s the rest with 1.9MB/s.

  # xfs_bmap /tmp/zero1.tmp
  /tmp/zero1.tmp:
        0: [0..4194303]: 26708152..30902455
        1: [4194304..8388607]: 11237112..15431415
        2: [8388608..12076735]: 4123968..7812095
        3: [12076736..15671151]: 19841872..23436287
        4: [15671152..16201615]: 18355464..18885927
        5: [16201616..16547543]: 30902456..31248383
        6: [16547544..16797911]: 3523816..3774183
        7: [16797912..16816479]: 3783176..3801743
        8: [16816480..16835343]: 3804000..3822863
        9: [16835344..16950519]: 15431416..15546591
        10: [16950520..16950551]: 15546600..15546631
        11: [16950552..17065255]: 10757704..10872407
        12: [17065256..17065295]: 10873000..10873039
        13: [17065296..17083887]: 10739112..10757703
        14: [17083888..17102567]: 10720432..10739111
        15: [17102568..17219047]: 10603952..10720431
        16: [17219048..17219079]: 10603920..10603951
        17: [17219080..17241775]: 10581224..10603919
        18: [17241776..17262391]: 10560608..10581223
        19: [17262392..17283671]: 10539328..10560607
        20: [17283672..17302879]: 10018552..10037759
        21: [17302880..17302887]: 10037976..10037983
        22: [17302888..17323039]: 9998400..10018551
        23: [17323040..17342535]: 9966824..9986319
        24: [17342536..17573871]: 18960144..19191479
        25: [17573872..17655255]: 19264640..19346023
        26: [17655256..17688759]: 18919824..18953327
        27: [17688760..17706695]: 18901888..18919823
        28: [17706696..17725951]: 19191480..19210735
        29: [17725952..17745271]: 18142008..18161327
        30: [17745272..17764551]: 18122728..18142007
        31: [17764552..17783799]: 18103480..18122727
        32: [17783800..17861359]: 15546632..15624191
        33: [17861360..17899223]: 25742880..25780743
        34: [17899224..17969279]: 26181368..26251423
        35: [17969280..17999999]: 26399192..26429911

# df -m /tmp
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/sda3          15248 11338      3911  75% /
Comment 5 ITec 2012-03-30 15:18:43 UTC
Uups! Sorry, accidentally committed to early!
Comment 6 Eric Sandeen 2012-03-30 15:22:40 UTC
Just so we can see it, do you have numbers for the same test under 3.1.10?
Comment 7 ITec 2012-03-30 15:34:38 UTC
(Second attempt to post this:)

Yes, I know, that there are many things that could affect the test.
And I am not speaking about loss of "some" performance. I am speaking about
loss of 98% performance. 

But what could cause this?
And why does it happen only on 3.2+?

About fragmentation:

  # xfs_fsr -v /dev/sda3
  / start inode=0
  ino=20294080
  ino=20294080 already fully defragmented.

  # df -m /tmp
  Filesystem     1M-blocks  Used Available Use% Mounted on
  /dev/sda3          15248  2549     12700  17% /

  # dd if=/dev/zero of=/tmp/zero1.tmp bs=1024 count=9000000
  9000000+0 records in
  9000000+0 records out
  9216000000 bytes (9,2 GB) copied, 396,978 s, 23,2 MB/s

  The first 8GB go with 100MB/s the rest with 1.9MB/s.

  # xfs_bmap /tmp/zero1.tmp
  /tmp/zero1.tmp:
        0: [0..4194303]: 26708152..30902455
        1: [4194304..8388607]: 11237112..15431415
        2: [8388608..12076735]: 4123968..7812095
        3: [12076736..15671151]: 19841872..23436287
        4: [15671152..16201615]: 18355464..18885927
        5: [16201616..16547543]: 30902456..31248383
        6: [16547544..16797911]: 3523816..3774183
        7: [16797912..16816479]: 3783176..3801743
        8: [16816480..16835343]: 3804000..3822863
        9: [16835344..16950519]: 15431416..15546591
        10: [16950520..16950551]: 15546600..15546631
        11: [16950552..17065255]: 10757704..10872407
        12: [17065256..17065295]: 10873000..10873039
        13: [17065296..17083887]: 10739112..10757703
        14: [17083888..17102567]: 10720432..10739111
        15: [17102568..17219047]: 10603952..10720431
        16: [17219048..17219079]: 10603920..10603951
        17: [17219080..17241775]: 10581224..10603919
        18: [17241776..17262391]: 10560608..10581223
        19: [17262392..17283671]: 10539328..10560607
        20: [17283672..17302879]: 10018552..10037759
        21: [17302880..17302887]: 10037976..10037983
        22: [17302888..17323039]: 9998400..10018551
        23: [17323040..17342535]: 9966824..9986319
        24: [17342536..17573871]: 18960144..19191479
        25: [17573872..17655255]: 19264640..19346023
        26: [17655256..17688759]: 18919824..18953327
        27: [17688760..17706695]: 18901888..18919823
        28: [17706696..17725951]: 19191480..19210735
        29: [17725952..17745271]: 18142008..18161327
        30: [17745272..17764551]: 18122728..18142007
        31: [17764552..17783799]: 18103480..18122727
        32: [17783800..17861359]: 15546632..15624191
        33: [17861360..17899223]: 25742880..25780743
        34: [17899224..17969279]: 26181368..26251423
        35: [17969280..17999999]: 26399192..26429911

  # df -m /tmp
  Filesystem     1M-blocks  Used Available Use% Mounted on
  /dev/sda3          15248 11338      3911  75% /

  # rm /tmp/zero1.tmp

  dd if=/dev/zero of=/tmp/zero1.tmp bs=1024 count=1000000; dd if=/dev/zero of=/tmp/zero2.tmp bs=1024 count=8000000; echo syncing; time sync
  1000000+0 records in
  1000000+0 records out
  1024000000 bytes (1,0 GB) copied, 7,62766 s, 134 MB/s
  8000000+0 records in
  8000000+0 records out
  8192000000 bytes (8,2 GB) copied, 77,9598 s, 105 MB/s
  syncing

  real    0m2.602s
  user    0m0.000s
  sys     0m0.000s

  # xfs_bmap /tmp/zero*
  /tmp/zero1.tmp:
        0: [0..1048575]: 26708152..27756727
        1: [1048576..1999999]: 4123968..5075391
  /tmp/zero2.tmp:
        0: [0..2097151]: 27756728..29853879
        1: [2097152..4194303]: 5172544..7269695
        2: [4194304..8388607]: 11237112..15431415
        3: [8388608..11983023]: 19841872..23436287
        4: [11983024..13377527]: 29853880..31248383
        5: [13377528..13919927]: 7269696..7812095
        6: [13919928..14450391]: 18355464..18885927
        7: [14450392..14790055]: 10532744..10872407
        8: [14790056..15050135]: 18960144..19220223
        9: [15050136..15308935]: 3523816..3782615
        10: [15308936..15501711]: 15431416..15624191
        11: [15501712..15598863]: 5075392..5172543
        12: [15598864..15680247]: 19264640..19346023
        13: [15680248..15750303]: 26181368..26251423
        14: [15750304..15819727]: 26399192..26468615
        15: [15819728..15888935]: 25742880..25812087
        16: [15888936..15955831]: 18886432..18953327
        17: [15955832..15999999]: 18117160..18161327

So, creating one file with 9GB takes about 400s, but one file with 1GB and another file with 8GB takes about 90s together.
Comment 8 ITec 2012-03-30 15:36:03 UTC
The same test under 3.1.10:
Following soon!
Comment 9 ITec 2012-03-30 15:52:17 UTC
The same test under 3.1.10:

  # xfs_fsr -v /dev/sda3
  / start inode=0
  ino=20294080
  ino=20294080 already fully defragmented.

  # df -m /tmp
  Filesystem     1M-blocks  Used Available Use% Mounted on
  /dev/sda3          15248  2553     12696  17% /

  # dd if=/dev/zero of=/tmp/zero1.tmp bs=1024 count=9000000; echo syncing; time sync
  9000000+0 records in
  9000000+0 records out
  9216000000 bytes (9,2 GB) copied, 91,3123 s, 101 MB/s
  syncing

  real    0m3.029s
  user    0m0.004s
  sys     0m0.000s

  # xfs_bmap /tmp/zero*
  /tmp/zero1.tmp:
        0: [0..4194303]: 26708152..30902455
        1: [4194304..8388607]: 11237112..15431415
        2: [8388608..12076735]: 4123968..7812095
        3: [12076736..15671151]: 19841872..23436287
        4: [15671152..16201615]: 18355464..18885927
        5: [16201616..16547543]: 30902456..31248383
        6: [16547544..16777215]: 3523816..3753487
        7: [16777216..17113183]: 10532744..10868711
        8: [17113184..17139023]: 10011920..10037759
        9: [17139024..17399103]: 18960144..19220223
        10: [17399104..17569775]: 15431416..15602087
        11: [17569776..17633927]: 18886432..18950583
        12: [17633928..17715311]: 19264640..19346023
        13: [17715312..17785367]: 26181368..26251423
        14: [17785368..17854791]: 26399192..26468615
        15: [17854792..17923999]: 25742880..25812087
        16: [17924000..17988511]: 3155768..3220279
        17: [17988512..17999999]: 3223528..3235015

  # df -m /tmp
  Filesystem     1M-blocks  Used Available Use% Mounted on
  /dev/sda3          15248 11343      3906  75% /

Now creation of a single 9GB file takes a bit more than 90s.
Comment 10 Eric Sandeen 2016-09-22 16:24:39 UTC
I'm batch-closing all xfs bugs which are more than 1 year old, sorry about that.

If you still have this issue on a current kernel, please retest and re-open with this information.

Thanks,
-Eric