Latest working kernel version: Only rarely passes test in multidevice fs config on 2.6.29-rc2 Earliest failing kernel version: Prior to mainline merge Distribution: uname -a: Linux bl465cb.lnx.usa.hp.com 2.6.29-rc2-enw #1 SMP Sat Jan 17 14:19:15 EST 2009 x86_64 GNU/Linux Hardware Environment: dual socket, quad core Intel and AMD x86_64 systems with backplane RAID btrfs-show: Label: none uuid: e62fdd84-0dbd-4a66-a58b-6bb6399f8f2b Total devices 6 FS bytes used 4.40GB devid 3 size 68.33GB used 13.00GB path /dev/cciss/c1d2 devid 5 size 68.33GB used 12.01GB path /dev/cciss/c1d4 devid 1 size 68.33GB used 12.02GB path /dev/cciss/c1d0 devid 4 size 68.33GB used 13.00GB path /dev/cciss/c1d3 devid 2 size 68.33GB used 12.00GB path /dev/cciss/c1d1 devid 6 size 68.33GB used 12.01GB path /dev/cciss/c1d5 Btrfs v0.18-12-ge3b0f66 Software Environment: autotest client modified to run fstress on btrfs file system mounted as: /dev/cciss/c1d5 on /mnt type btrfs (rw) Problem Description: fstress routinely induces I/O stalls (where I/O appears to cease while fstress fails to make forward progress) when run against a multi-device btrfs file system. This behavior is generally reproducable, though once in a while it will not occur and fstress will run to completion. The behavior tends to appear some (variable) time into the run - typically more than an hour. Once the bug occurs, iostat will report some (variable) percentage of iowait and no I/O activity. This bug has also been reproduced using btrfs-unstable as of 28 Jan 09, commit a717531942f488209dded30f6bc648167bcefa72. This bug does not occur on single device btrfs file systems; a fix Chris made some time ago and prior to the mainline merge addressed a similar problem in that configuration. Voluminous sysrq-w backtraces taken after the bug appeared are available: 2.6.29-rc2: http://free.linux.hp.com/~enw/multiiostall-2.6.29-rc2 btrfs-unstable: http://free.linux.hp.com/~enw/multiiostall-unstable Please note that the I/O errors at mount reported in the -rc2 log snippet have not occurred in other test configurations and are believed to be unrelated. Steps to reproduce: 1) mkfs multidevice fs 2) mount w/o options 3) run fstress (as found in autotest) against the mount point
I've reproduced this one here and have a patch in testing. Thanks!
Bug reproduced reliably in 2.6.29-rc3 testing (did not include fix). Numerous tests using btrfs-unstable from 29 Jan (which includes a fix for this bug) on uncompressed filesystems have passed cleanly, so the patch looks good. Testing included both multi- and single device filesystems for completeness. Side note: testing of fix for #12563 on same btrfs-unstable base has resulted in occasional I/O stalls on one subset of my test configurations using compressed filesystems, implying that there may be other holes to close. However, Chris has previously noted that we may not be ready for compressed fstress testing just yet.
Great, I'll close this one down.