Bug 56771 - One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?
Summary: One random read streaming is fast (~1200MB/s), but two or more are slower (~7...
Status: RESOLVED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-17 23:22 UTC by Matt Pursley
Modified: 2022-09-30 14:53 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.9.0-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Matt Pursley 2013-04-17 23:22:36 UTC
Hi All, 

I have an LSI HBA card (LSI SAS 9207-8i) with 12 7200rpm SAS drives attached.  When it's formated with mdraid6+ext4 I get about 1200MB/s for multiple streaming random reads with iozone.  With btrfs in 3.9.0-rc4 I can also get about 1200MB/s, but only with one stream at a time.  

As soon as I add a second (or more), the speed will drop to about 750MB/s.  If I add more streams (10, 20, etc), the total throughput stays at around 750MB/s.  I only see the full 1200MB/s in btrfs when I'm running a single read at a time (e.g. sequential reads with dd, random reads with iozone, etc).

This feel like a bug or mis-configuration on my system.   As it can read at the full speed, but just only with one stream running at a time.  The options I have tried varying are "-l 64k" with mkfs.btrfs, and "-o thread_pool=16" when mounting.  But, neither of those options seem to change the behavior.



Anyone know any reasons why I would see the speed drop when going from one to more then one stream at a time with btrfs raid6?  We would like to use btrfs (mostly for snapshots), but we do need to get the full 1200MB/s streaming speeds too..





Thanks,
Matt



___
Here's some example output.. 



Single thread = ~1.1GB/s
_____
kura1 persist # sysctl vm.drop_caches=1 ; dd if=/dev/zero of=/var/data/persist/testfile bs=640k count=20000
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 7.14139 s, 1.8 GB/s

kura1 persist # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 11.2666 s, 1.2 GB/s

kura1 persist # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 11.5005 s, 1.1 GB/s

____



1 thread = ~1000MB/s ...
___
kura1 scripts # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd of=/dev/null if=/var/data/persist/testfile_$j bs=640k ; done
vm.drop_caches = 1
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 6.52018 s, 1.0 GB/s
kura1 scripts # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd of=/dev/null if=/var/data/persist/testfile_$j bs=640k ; done
vm.drop_caches = 1
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 6.55731 s, 999 MB/s
___

2 threads = ~750MB/s combined...
___
# sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null if=/var/data/persist/testfile_$j bs=640k & done
vm.drop_caches = 1
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 17.5068 s, 374 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 17.7599 s, 369 MB/s
___



20 threads = ~750MB/s combined...
___
# sysctl vm.drop_caches=1 ; for j in {1..20} ; do dd of=/dev/null if=/var/data/persist/testfile_$j bs=640k & done
vm.drop_caches = 1
kura1 scripts # 10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 168.223 s, 39.0 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 168.275 s, 38.9 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 169.466 s, 38.7 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 169.606 s, 38.6 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 170.503 s, 38.4 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 170.629 s, 38.4 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 170.633 s, 38.4 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 170.744 s, 38.4 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 170.844 s, 38.4 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 170.896 s, 38.3 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 171.027 s, 38.3 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 171.135 s, 38.3 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 171.389 s, 38.2 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 171.414 s, 38.2 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 171.674 s, 38.2 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 171.897 s, 38.1 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 171.956 s, 38.1 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 171.995 s, 38.1 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 172.044 s, 38.1 MB/s
10000+0 records in
10000+0 records out
6553600000 bytes (6.6 GB) copied, 172.08 s, 38.1 MB/s
____



### Similar results with random reads in iozone...

1 thread = ~1000MB/s
_____
kura1 scripts # for j in {1..1} ; do sysctl vm.drop_caches=1 ; iozone -f /var/data/10GBfolders/folder$j/iozone.DUMMY.1 -c -M -r 5120k -s 2g -i 1 -w -+A 1 | tail -n 5 & done
vm.drop_caches = 1
[1] 22298
kura1 scripts #                                                             random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                  1077376  7288014

iozone test complete.
____



2 threads = ~750 MB/s combined...
___
# for j in {1..2} ; do sysctl vm.drop_caches=1 ; iozone -f /var/data/10GBfolders/folder$j/iozone.DUMMY.1 -c -M -r 5120k -s 2g -i 1 -w -+A 1 | tail -n 5 & done
vm.drop_caches = 1
[1] 22302
vm.drop_caches = 1
[2] 22305
kura1 scripts #                                                             random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                   368864  5090095

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                   366834  5105457

iozone test complete.


20 threads = ~750MB/s combined...
___
# for j in {1..20} ; do sysctl vm.drop_caches=1 ; iozone -f /var/data/10GBfolders/folder$j/iozone.DUMMY.1 -c -M -r 5120k -s 2g -i 1 -w -+A 1 | tail -n 5 & done

                                                             random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    40424  6459500

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    39678  5749776

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    39548  5417189

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38988  5924904

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38484  1963969

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38556  1793398

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38610  1343518

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38346  1394609

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38367  1163930

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38375  1143491

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38647  1046416

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38180  1115287

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38086  1192537

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38356  1120244

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38293  1138119

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    37966  1273741

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    38059  1201688

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    37947  1243573

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    37965  1245834

iozone test complete.
                                                            random  random    bkwd   record   stride
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    5120                    37840  1354806

iozone test complete.
___



### Typical dstat output during multi-thread read running and then finish and go idle...
___
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
0  10  28  62   0   0| 716M    0 | 582B  870B|   0     0 |4398    16k
  0  12  28  59   0   0| 728M    0 | 454B  982B|   0     0 |4665    16k
  0  12  25  63   0   0| 761M    0 | 454B 1112B|   0     0 |4661    16k
  0  11  22  66   0   0| 719M    0 | 390B  742B|   0     0 |4519    16k
  0  13  21  65   0   0| 741M    0 | 524B 1036B|   0     0 |4706    16k
  0  17  19  63   0   0| 706M    0 |3302B 3558B|   0     0 |4638    15k
  0  16  17  67   0   0| 721M    0 |  16k   15k|   0     0 |5002    17k
  2  72   7  19   0   0| 514M    0 | 454B  486B|   0     0 |4174  8591
  3  97   0   0   0   0|   0     0 | 788B 2884B|   0     0 |1280   380
  1  38  61   0   0   0|   0     0 |1428B 7460B|   0     0 | 888   346
  0   0 100   0   0   0|   0     0 | 582B  678B|   0     0 |  92   106
  0   0 100   0   0   0|   0     0 |1606B 1766B|   0     0 |  66    59
  0   0 100   0   0   0|   0  4096B| 390B  742B|   0     0 |  90   112
  0   0 100   0   0   0|   0     0 | 454B  486B|   0     0 |  45    65
  0   0 100   0   0   0|   0     0 | 454B  614B|   0     0 |  56    77
___



### Some system info... 
____
## Kernel = 3.9.0-rc4
# uname -a
Linux server 3.9.0-rc4 #4 SMP Fri Apr 5 00:58:28 UTC 2013 x86_64 Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz GenuineIntel GNU/Linux
# grep MemTotal /proc/meminfo
MemTotal:       65975896 kB

___
## 12 2.3 GHz Xeon cores...
kura1 scripts # head -n 26 /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 45
model name	: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
stepping	: 6
microcode	: 0x616
cpu MHz		: 2301.000
cache size	: 15360 KB
physical id	: 0
siblings	: 12
core id		: 0
cpu cores	: 6
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 4600.26
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:
___
## Asus Z9PA-U8 MB
# dmidecode --type 1
# dmidecode 2.11
SMBIOS 2.7 present.

Handle 0x0001, DMI type 1, 27 bytes
System Information
	Manufacturer: ASUSTeK COMPUTER INC.
	Product Name: Z9PA-U8 Series
	Version: 1.0X
	Serial Number: To be filled by O.E.M.
	UUID: 598C1800-5BCB-11D9-8F58-3085A9A7CBC7
	Wake-up Type: Power Switch
	SKU Number: SKU
	Family: To be filled by O.E.M.
____
Comment 1 Matt Pursley 2013-04-18 01:50:34 UTC
Here are the results of making and reading back a 13GB file on "mdraid6 + ext4",  "mdraid6 + btrfs", and "btrfsraid6 + btrfs".

Seems to show that:
1) "mdraid6 + ext4" can do ~1100 MB/s for these sequential reads with either one or two files at once.
2) "btrfsraid6 + btrfs" can do ~1100 MB/s for sequential reads with one file at a time, but only ~750 MB/s with two (or more).
3) "mdraid6 + btrfs" can only do ~750 MB/s for these sequential reads with either one or two files at once.


So, seems like the speed drop is related more to the btrfs files system, then the experimental raid.  
Although it is interesting that btrfs can only do the full ~1100 MB/s with a single file on the btrfsraid6, but not mdraid6.


Anyway, just some more info and reproducible results...


Thanks,
Matt






___ mdraid6 + ext4 ___

kura1 / # mount | grep -i /var/data
/dev/md0 on /var/data type ext4 (rw)

kura1 / # cat /proc/mdstat 
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath] 
md0 : active raid6 sdm[11] sdl[10] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0]
      29302650880 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]
      [>....................]  resync =  0.0% (2731520/2930265088) finish=47268.1min speed=1031K/sec
      
unused devices: <none>



## Create two 13GB testfiles...

kura1 / #  sysctl vm.drop_caches=1 ; dd if=/dev/zero of=/var/data/persist/testfile1 bs=640k count=20000 conv=fdatasync
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 47.27 s, 277 MB/s
kura1 / #  sysctl vm.drop_caches=1 ; dd if=/dev/zero of=/var/data/persist/testfile2 bs=640k count=20000 conv=fdatasync
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 47.0237 s, 279 MB/s


## Read back one testfile... ~1300 MB/s

kura1 / # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 10.3469 s, 1.3 GB/s
kura1 / # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 10.0073 s, 1.3 GB/s
kura1 / # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 10.69 s, 1.2 GB/s



## Read back the two testfiles at the same time.. ~1100MB/s

kura1 / # (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=//dev/null if=/var/data/persist/testfile2 bs=640k) & wait
vm.drop_caches = 1
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 24.4988 s, 535 MB/s
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 24.591 s, 533 MB/s

kura1 / # (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=//dev/null if=/var/data/persist/testfile2 bs=640k) & wait
vm.drop_caches = 1
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 24.7013 s, 531 MB/s
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 24.7016 s, 531 MB/s

kura1 / # (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=//dev/null if=/var/data/persist/testfile2 bs=640k) & wait
vm.drop_caches = 1
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 24.5512 s, 534 MB/s
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 24.8276 s, 528 MB/s

________________________________


___ mdraid6 + btrfs _______________


kura1 ~ # mount | grep -i /var/data
/dev/md0 on /var/data type btrfs (rw,noatime)

kura1 ~ # cat  /proc/mdstat 
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath] 
md0 : active raid6 sdm[11] sdl[10] sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] sdb[0]
      29302650880 blocks super 1.2 level 6, 512k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]
      [>....................]  resync =  0.0% (1917184/2930265088) finish=44415.7min speed=1098K/sec
      
unused devices: <none>

kura1 ~ # btrfs filesystem show
failed to open /dev/sr0: No medium found
Label: none  uuid: 5eb756b5-03a1-4d06-8e91-0f683a763a88
	Total devices 1 FS bytes used 448.00KB
	devid    1 size 27.29TB used 2.04GB path /dev/md0

Label: none  uuid: 4546715c-8948-42b3-b529-a1c9cd175c2e
	Total devices 12 FS bytes used 80.74GB
	devid   12 size 2.73TB used 9.35GB path /dev/sdm
	devid   11 size 2.73TB used 9.35GB path /dev/sdl
	devid   10 size 2.73TB used 9.35GB path /dev/sdk
	devid    9 size 2.73TB used 9.35GB path /dev/sdj
	devid    8 size 2.73TB used 9.35GB path /dev/sdi
	devid    7 size 2.73TB used 9.35GB path /dev/sdh
	devid    6 size 2.73TB used 9.35GB path /dev/sdg
	devid    5 size 2.73TB used 9.35GB path /dev/sdf
	devid    4 size 2.73TB used 9.35GB path /dev/sde
	devid    3 size 2.73TB used 9.35GB path /dev/sdd
	devid    2 size 2.73TB used 9.35GB path /dev/sdc
	devid    1 size 2.73TB used 9.37GB path /dev/sdb

Btrfs v0.20-rc1-253-g7854c8b


## Create two 13GB testfiles...

kura1 ~ # sysctl vm.drop_caches=1 ; dd if=/dev/zero of=/var/data/persist/testfile1 bs=640k count=20000 conv=fdatasync
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 34.2789 s, 382 MB/s
kura1 ~ # sysctl vm.drop_caches=1 ; dd if=/dev/zero of=/var/data/persist/testfile2 bs=640k count=20000 conv=fdatasync
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 43.2937 s, 303 MB/s



## Read back one testfile... ~750 MB/s

kura1 ~ # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 16.7785 s, 781 MB/s
kura1 ~ # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 18.1361 s, 723 MB/s
kura1 ~ # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 19.1985 s, 683 MB/s


## Read back the two testfiles at the same time.. ~750MB/s

kura1 ~ # (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=//dev/null if=/var/data/persist/testfile2 bs=640k) & wait
vm.drop_caches = 1
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 30.8396 s, 425 MB/s
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 35.5478 s, 369 MB/s

kura1 ~ # (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=//dev/null if=/var/data/persist/testfile2 bs=640k) & wait
vm.drop_caches = 1
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 34.6504 s, 378 MB/s
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 35.7795 s, 366 MB/s

kura1 ~ # (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=//dev/null if=/var/data/persist/testfile2 bs=640k) & wait
vm.drop_caches = 1
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 36.9101 s, 355 MB/s
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 37.7395 s, 347 MB/s


________________________



___ btrfsraid6 + btrfs ___

kura1 ~ # mount | grep -i /var/data
/dev/sdl on /var/data type btrfs (rw,noatime)

kura1 ~ # btrfs filesystem show
failed to open /dev/sr0: No medium found
Label: none  uuid: 4546715c-8948-42b3-b529-a1c9cd175c2e
	Total devices 12 FS bytes used 80.74GB
	devid   12 size 2.73TB used 9.35GB path /dev/sdm
	devid   11 size 2.73TB used 9.35GB path /dev/sdl
	devid   10 size 2.73TB used 9.35GB path /dev/sdk
	devid    9 size 2.73TB used 9.35GB path /dev/sdj
	devid    8 size 2.73TB used 9.35GB path /dev/sdi
	devid    7 size 2.73TB used 9.35GB path /dev/sdh
	devid    6 size 2.73TB used 9.35GB path /dev/sdg
	devid    5 size 2.73TB used 9.35GB path /dev/sdf
	devid    4 size 2.73TB used 9.35GB path /dev/sde
	devid    3 size 2.73TB used 9.35GB path /dev/sdd
	devid    2 size 2.73TB used 9.35GB path /dev/sdc
	devid    1 size 2.73TB used 9.37GB path /dev/sdb

Btrfs v0.20-rc1-253-g7854c8b


## Create two 13GB testfiles...

kura1 data # sysctl vm.drop_caches=1 ; dd if=/dev/zero of=/var/data/persist/testfile2 bs=640k count=20000 conv=fdatasync
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 21.5018 s, 610 MB/s
kura1 data # sysctl vm.drop_caches=1 ; dd if=/dev/zero of=/var/data/persist/testfile1 bs=640k count=20000 conv=fdatasync
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 21.3389 s, 614 MB/s



## Read back one testfile... ~1100 MB/s

kura1 data # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 11.8312 s, 1.1 GB/s
kura1 data # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 11.7888 s, 1.1 GB/s
kura1 data # sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k
vm.drop_caches = 1
20000+0 records in
20000+0 records out

20000+0 records out
13107200000 bytes (13 GB) copied, 41.4113 s, 317 MB/s

kura1 data #  (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile2 bs=640k) & wait
[1] 19482
[2] 19483
vm.drop_caches = 1
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 36.0124 s, 364 MB/s
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 36.2298 s, 362 MB/s

kura1 data #  (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k) & (sysctl vm.drop_caches=1 ; dd of=/dev/null if=/var/data/persist/testfile2 bs=640k) & wait
[1] 19500
[2] 19501
vm.drop_caches = 1
vm.drop_caches = 1
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 35.4703 s, 370 MB/s
20000+0 records in
20000+0 records out
13107200000 bytes (13 GB) copied, 35.7789 s, 366 MB/s
[1]-  Done                    ( sysctl vm.drop_caches=1; dd of=/dev/null if=/var/data/persist/testfile1 bs=640k )
[2]+  Done                    ( sysctl vm.drop_caches=1; dd of=/dev/null if=/var/data/persist/testfile2 bs=640k )


_____
Comment 2 Josef Bacik 2013-04-30 17:18:12 UTC
So I tried to reproduce this and my combined values added up to the single threaded case.  Can you run perf record -ag and see if that shows any thing big for the multi-threaded case?  Also some sysrq+w a few times (spread out) during the multi-threaded run would be good so I can see what is going on.
Comment 3 Matt Pursley 2013-05-02 19:07:00 UTC
(In reply to comment #2)
> So I tried to reproduce this and my combined values added up to the single
> threaded case.  Can you run perf record -ag and see if that shows any thing
> big
> for the multi-threaded case?  Also some sysrq+w a few times (spread out)
> during
> the multi-threaded run would be good so I can see what is going on.



Ok, I will try that.. 


Thanks,
Matt
Comment 4 Matt Pursley 2013-05-02 19:08:24 UTC
Also, here is the results of a multi-drive that that I just emailed you..


Thanks Josef,
Matt





---------- Forwarded message ----------
From: Matt Pursley <mpursley@gmail.com>
Date: Thu, May 2, 2013 at 11:51 AM
Subject: Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?
To: Josef Bacik <jbacik@fusionio.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>


Hey Josef,

Were you able to try this multi-thread test on any more drives?


I did a test with 12, 6, 3, and 1 drive.  And, it looks like I see the
multi-thread speed reduces, as the number of drives in the raid goes
up.

Like this:
- 50% speed reduction with 2 threads on 12 drives
- 25% speed reduction with 2 threads on 6 drives
- 10% speed reduction with 2 threads on 3 drives
- 5% speed reduction with 2 threads on 1 drive



I only have 12 slots on my HBA card, but I wonder if 24 drives would
reduce the speed to 25% with 2 threads?

Matt










make btrfs fs...
___

12 drives...
mkfs.btrfs -f -d raid6 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde
/dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl

6 drives...
mkfs.btrfs -f -d raid6 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf

3 drives...
mkfs.btrfs -f -d raid5 /dev/sda /dev/sdb /dev/sdc

1 drive...
mkfs.btrfs -f /dev/sda

mount /dev/sda /tmp/btrfs_test/

___


make zero files...
___
kura1 ~ # for j in {1..2} ; do dd if=/dev/zero
of=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M count=10000
conv=fdatasync & done
___


===================

btrfs raid6 on 12 drives with 2 threads = ~650MB/s
___
kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
vm.drop_caches = 1
10485760000 bytes (10 GB) copied, 31.0431 s, 338 MB/s
10485760000 bytes (10 GB) copied, 31.2235 s, 336 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 29.869 s, 351 MB/s
10485760000 bytes (10 GB) copied, 30.5561 s, 343 MB/s

___


btrfs raid6 on 12 drives with 1 thread = ~1100MB/s
___
kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 9.69881 s, 1.1 GB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 9.56475 s, 1.1 GB/s
___


==================

btrfs raid6 on 6 drives with 2 thread =  ~500MB/s
___
 kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 41.3899 s, 253 MB/s
10485760000 bytes (10 GB) copied, 41.6916 s, 252 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 40.3178 s, 260 MB/s
10485760000 bytes (10 GB) copied, 41.4087 s, 253 MB/s

___



btrfs raid6 on 6 drives with 1 thread =  ~600MB/s
___

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 17.5686 s, 597 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 17.5396 s, 598 MB/s
___


==================

btrfs raid5 on 3 drives with 2 thread = ~300MB/s
___

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 67.636 s, 155 MB/s
10485760000 bytes (10 GB) copied, 70.1783 s, 149 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 69.4945 s, 151 MB/s
10485760000 bytes (10 GB) copied, 70.8279 s, 148 MB/s

___



btrfs raid5 on 3 drives with 1 thread = ~319MB/s
___

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 32.8559 s, 319 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 32.8483 s, 319 MB/s

___


==================


btrfs (no raid) on 1 drive with 2 thread =  ~155MB/s
___
 kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 134.982 s, 77.7 MB/s
10485760000 bytes (10 GB) copied, 135.237 s, 77.5 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 134.549 s, 77.9 MB/s
10485760000 bytes (10 GB) copied, 135.293 s, 77.5 MB/s


___


btrfs (no raid) on 1 drive with 1 thread =  ~162MB/s
___

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 64.5931 s, 162 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M &
done
10485760000 bytes (10 GB) copied, 64.6299 s, 162 MB/s

___



==================







On Fri, Apr 26, 2013 at 4:21 PM, Matt Pursley <mpursley@gmail.com> wrote:
> Hey Josef,
>
> Thanks for looking into this further!    That is about the same
> results that I was seeing, though I didn't test it with just one
> drive.. only with all 12 drives in my jbod.  I will do a test with
> just one disk, and see if I also get the same results.
>
> Let me know if you also see the same results with multiple drives in
> your raid...
>
>
> Thanks,
> Matt
>
>
>
>
>
> On Thu, Apr 25, 2013 at 2:10 PM, Josef Bacik <jbacik@fusionio.com> wrote:
>> On Thu, Apr 25, 2013 at 03:01:18PM -0600, Matt Pursley wrote:
>>> Ok, awesome, let me know how it goes..  I don't have the raid
>>> formatted to btrfs right now, but I could probably do that in about 30
>>> minutes or so.
>>>
>>
>> Huh so I'm getting the full bandwidth, 120 mb/s with one thread and 60 mb/s
>> with
>> two threads.  These are just cheap sata drives tho, I'll try and dig up a
>> box
>> with 3 fusion cards for something a little closer to the speeds you are
>> seeing
>> and see if that makes a difference.  Thanks,
>>
>> Josef
Comment 5 Matt Pursley 2013-05-02 21:22:22 UTC
Ok, here's targz file with a "perf_12drives_raid6_one-threads" and "perf_12drives_raid6_two-threads" tests.
See anything wrong in there?

https://docs.google.com/file/d/0BxdIbDDheBeHcjVzc1pqNEstZ28/edit?usp=sharing




Thanks,
Matt
Comment 6 Josef Bacik 2013-05-03 02:50:27 UTC
Ok looks like just ye olde lock contention between the completion threads and dd, I'll work some stuff up to try and address the low hanging fruit and let you test it to see how it helps.
Comment 7 Josef Bacik 2013-07-29 18:30:50 UTC
So Miao just did a whole bunch of work to help this lock contention and his work is in btrfs-next, could you build and test that and see if the performance is better?
Comment 8 David Sterba 2022-09-30 14:53:04 UTC
This is a semi-automated bugzilla cleanup, report is against an old kernel version. If the problem still happens, please open a new bug. Thanks.

Note You need to log in before you can comment on or make changes to this bug.