Bug 12565 - btrfs: fstress induces I/O stalls when run on multi-device file systems
Summary: btrfs: fstress induces I/O stalls when run on multi-device file systems
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Chris Mason
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-28 12:05 UTC by Eric Whitney
Modified: 2009-02-03 11:59 UTC (History)
0 users

See Also:
Kernel Version: 2.6.29-rc2
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Eric Whitney 2009-01-28 12:05:44 UTC
Latest working kernel version: Only rarely passes test in multidevice fs config
on 2.6.29-rc2

Earliest failing kernel version: Prior to mainline merge

Distribution: uname -a: Linux bl465cb.lnx.usa.hp.com 2.6.29-rc2-enw #1 SMP Sat Jan 17 14:19:15 EST 2009 x86_64 GNU/Linux

Hardware Environment: dual socket, quad core Intel and AMD x86_64 systems with backplane RAID

btrfs-show:
Label: none  uuid: e62fdd84-0dbd-4a66-a58b-6bb6399f8f2b
	Total devices 6 FS bytes used 4.40GB
	devid    3 size 68.33GB used 13.00GB path /dev/cciss/c1d2
	devid    5 size 68.33GB used 12.01GB path /dev/cciss/c1d4
	devid    1 size 68.33GB used 12.02GB path /dev/cciss/c1d0
	devid    4 size 68.33GB used 13.00GB path /dev/cciss/c1d3
	devid    2 size 68.33GB used 12.00GB path /dev/cciss/c1d1
	devid    6 size 68.33GB used 12.01GB path /dev/cciss/c1d5

Btrfs v0.18-12-ge3b0f66

Software Environment: autotest client modified to run fstress on btrfs
file system mounted as: /dev/cciss/c1d5 on /mnt type btrfs (rw)

Problem Description:  fstress routinely induces I/O stalls (where I/O appears to cease while fstress fails to make forward progress) when run against a multi-device btrfs file system.  This behavior is generally reproducable, though once in a while it will not occur and fstress will run to completion.  The behavior tends to appear some (variable) time into the run - typically more than an hour.
Once the bug occurs, iostat will report some (variable) percentage of iowait and no I/O activity.  This bug has also been reproduced using btrfs-unstable as of 28 Jan 09, commit a717531942f488209dded30f6bc648167bcefa72.

This bug does not occur on single device btrfs file systems;  a fix Chris made some time ago and prior to the mainline merge addressed a similar problem in that configuration.

Voluminous sysrq-w backtraces taken after the bug appeared are available:
2.6.29-rc2: http://free.linux.hp.com/~enw/multiiostall-2.6.29-rc2
btrfs-unstable: http://free.linux.hp.com/~enw/multiiostall-unstable

Please note that the I/O errors at mount reported in the -rc2 log snippet have not occurred in other test configurations and are believed to be unrelated.

Steps to reproduce:
1) mkfs multidevice fs
2) mount w/o options
3) run fstress (as found in autotest) against the mount point
Comment 1 Chris Mason 2009-01-28 12:40:30 UTC
I've reproduced this one here and have a patch in testing.  Thanks!
Comment 2 Eric Whitney 2009-02-03 11:47:45 UTC
Bug reproduced reliably in 2.6.29-rc3 testing (did not include fix).

Numerous tests using btrfs-unstable from 29 Jan (which includes a fix for this bug) on uncompressed filesystems have passed cleanly, so the patch looks good.  Testing included both multi- and single device filesystems for completeness.

Side note:  testing of fix for #12563 on same btrfs-unstable base has resulted in occasional I/O stalls on one subset of my test configurations using compressed filesystems, implying that there may be other holes to close.  However, Chris has previously noted that we may not be ready for compressed fstress testing just yet.
Comment 3 Chris Mason 2009-02-03 11:59:42 UTC
Great, I'll close this one down.

Note You need to log in before you can comment on or make changes to this bug.