A full write up of this can be found at http://strugglers.net/~andy/blog/2014/08/08/whats-my-btrfs-doing/ I will attempt to report a briefer version without losing nny useful details. I have a four (2TB) device RAID-10 for data and metadata btrfs volume. At a later date to test something I added a smaller (500G) device to the volume, but this device was not in use: ---------8<--------------------------------------------------------------------- $ sudo btrfs filesystem show Label: 'tank' uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb Total devices 5 FS bytes used 1.08TB devid 1 size 1.82TB used 555.03GB path /dev/sdh devid 3 size 1.82TB used 555.03GB path /dev/sdi devid 4 size 1.82TB used 555.03GB path /dev/sdj devid 5 size 465.76GB used 0.00 path /dev/sdk devid 2 size 1.82TB used 555.03GB path /dev/sdg Btrfs v0.20-rc1-358-g194aa4a $ sudo btrfs filesystem df /srv/tank Data, RAID10: total=1.08TB, used=1.08TB System, RAID10: total=64.00MB, used=128.00KB System: total=4.00MB, used=0.00 Metadata, RAID10: total=2.52GB, used=1.34GB ---------8<--------------------------------------------------------------------- One of the 2TB disks started misbehaving and disappeared from OS view: ---------8<--------------------------------------------------------------------- Aug 7 12:17:53 specialbrew kernel: [5392706.325072] btrfs: bdev /dev/sdh errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 Aug 7 12:17:53 specialbrew kernel: [5392706.325228] btrfs: bdev /dev/sdh errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 Aug 7 12:17:53 specialbrew kernel: [5392706.339976] sd 4:3:0:0: [sdh] Stopping disk Aug 7 12:17:53 specialbrew kernel: [5392706.346436] sd 4:3:0:0: [sdh] START_STOP FAILED Aug 7 12:17:53 specialbrew kernel: [5392706.352944] sd 4:3:0:0: [sdh] Aug 7 12:17:53 specialbrew kernel: [5392706.356489] end_request: I/O error, dev sdh, sector 0 Aug 7 12:17:53 specialbrew kernel: [5392706.365413] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Aug 7 12:17:53 specialbrew kernel: [5392706.475838] lost page write due to I/O error on /dev/sdh Aug 7 12:17:53 specialbrew kernel: [5392706.482266] lost page write due to I/O error on /dev/sdh Aug 7 12:17:53 specialbrew kernel: [5392706.488496] lost page write due to I/O error on /dev/sdh ---------8<--------------------------------------------------------------------- I told btrs to forget about that device: ---------8<--------------------------------------------------------------------- $ sudo btrfs device delete missing /srv/tank $ sudo btrfs filesystem show Label: 'tank' uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb Total devices 5 FS bytes used 1.08TB devid 3 size 1.82TB used 555.03GB path /dev/sdi devid 4 size 1.82TB used 555.03GB path /dev/sdj devid 5 size 465.76GB used 0.00 path /dev/sdk devid 2 size 1.82TB used 555.03GB path /dev/sdg *** Some devices missing Btrfs v0.20-rc1-358-g194aa4a ---------8<--------------------------------------------------------------------- I decided to re-plug the drive to see if it still was unhappy. It re-appeared as /dev/sdl and rejoined the volume: ---------8<--------------------------------------------------------------------- $ sudo btrfs filesystem show Label: 'tank' uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb Total devices 5 FS bytes used 1.08TB devid 1 size 1.82TB used 555.04GB path /dev/sdl devid 3 size 1.82TB used 555.03GB path /dev/sdi devid 4 size 1.82TB used 555.03GB path /dev/sdj devid 5 size 465.76GB used 0.00 path /dev/sdk devid 2 size 1.82TB used 555.03GB path /dev/sdg Btrfs v0.20-rc1-358-g194aa4a ---------8<--------------------------------------------------------------------- …but btrfs was still unable to write to it: ---------8<--------------------------------------------------------------------- Aug 7 17:46:46 specialbrew kernel: [5412439.946138] sd 4:3:0:0: [sdl] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) Aug 7 17:46:46 specialbrew kernel: [5412439.946142] sd 4:3:0:0: [sdl] 4096-byte physical blocks Aug 7 17:46:46 specialbrew kernel: [5412439.946247] sd 4:3:0:0: [sdl] Write Protect is off Aug 7 17:46:46 specialbrew kernel: [5412439.946252] sd 4:3:0:0: [sdl] Mode Sense: 00 3a 00 00 Aug 7 17:46:46 specialbrew kernel: [5412439.946294] sd 4:3:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 7 17:46:46 specialbrew kernel: [5412439.952286] sdl: unknown partition table Aug 7 17:46:46 specialbrew kernel: [5412439.990436] sd 4:3:0:0: [sdl] Attached SCSI disk Aug 7 17:46:47 specialbrew kernel: [5412440.471412] btrfs: device label tank devid 1 transid 504721 /dev/sdl Aug 7 17:47:17 specialbrew kernel: [5412470.408079] btrfs: bdev /dev/sdl errs: wr 7464, rd 0, flush 332, corrupt 0, gen 0 Aug 7 17:47:17 specialbrew kernel: [5412470.415931] lost page write due to I/O error on /dev/sdl ---------8<--------------------------------------------------------------------- At this point I accepted that the disk needed to be replaced, however I was still able to read from it without error (confirmed by dd and iostat). I did not expect the replacement disk to arrive until the next day, so I decided I would "btrfs device delete" the failing drive so that there would be no period of time during which I would be at risk of double device failure. The command did not immediately return to the prompt so some time after issuing it I issued a filesystem show in another window: ---------8<--------------------------------------------------------------------- $ sudo btrfs device delete /dev/sdl /srv/tank $ sudo btrfs filesystem show Label: 'tank' uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb Total devices 5 FS bytes used 1.08TB devid 1 size 1.82TB used 555.04GB path /dev/sdl devid 3 size 1.82TB used 556.03GB path /dev/sdi devid 4 size 1.82TB used 556.03GB path /dev/sdj devid 5 size 465.76GB used 26.00GB path /dev/sdk devid 2 size 1.82TB used 556.03GB path /dev/sdg ---------8<--------------------------------------------------------------------- From this I interpreted that some of the data was being balanced onto /dev/sdk, the 500G disk, and that this was taking place at around 1GB per minute so I could expect the process to finish in around 555 minutes. Much later I checked up on things and /dev/sdk now showed as full. The "device delete" had no returned to prompt. There was still a lot of write activity on the other four disks: ---------8<--------------------------------------------------------------------- $ sudo iostat -x -d 5 sd{g,i,j,k,l} Linux 3.13-0.bpo.1-amd64 (specialbrew.localnet) 08/08/14 _x86_64_ (2 CPU) Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 6.50 0.89 2.49 1.60 54.30 136.42 93.31 0.43 105.19 73.77 154.12 1.63 0.67 sdk 0.00 0.79 0.00 0.89 0.02 97.93 218.89 0.08 91.43 5.69 91.79 5.70 0.51 sdj 2.26 1.10 0.79 1.38 65.45 136.39 185.57 0.19 86.94 46.38 110.20 5.17 1.12 sdi 8.27 1.34 3.39 1.21 88.11 136.39 97.55 0.60 130.79 46.89 365.87 2.72 1.25 sdl 0.24 0.00 0.01 0.00 1.00 0.00 255.37 0.00 1.40 1.40 0.00 1.08 0.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdg 0.00 0.00 0.00 87.20 0.00 4202.40 96.39 0.64 7.39 0.00 7.39 4.43 38.64 sdk 0.00 0.20 0.00 102.40 0.00 3701.60 72.30 2.40 23.38 0.00 23.38 8.63 88.40 sdj 0.00 0.00 0.00 87.20 0.00 4202.40 96.39 0.98 11.28 0.00 11.28 5.20 45.36 sdi 0.00 0.20 0.00 118.00 0.00 4200.80 71.20 1.21 10.24 0.00 10.24 4.45 52.56 sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 $ sudo btrfs filesystem show Label: 'tank' uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb Total devices 5 FS bytes used 1.08TB devid 1 size 1.82TB used 555.04GB path /dev/sdl devid 3 size 1.82TB used 555.29GB path /dev/sdi devid 4 size 1.82TB used 555.29GB path /dev/sdj devid 5 size 465.76GB used 465.76GB path /dev/sdk devid 2 size 1.82TB used 555.29GB path /dev/sdg Btrfs v0.20-rc1-358-g194aa4a ---------8<--------------------------------------------------------------------- Worse still, btrfs now thinks the volume is out of space: ---------8<--------------------------------------------------------------------- $ touch foo touch: cannot touch `foo': No space left on device $ df -h . Filesystem Size Used Avail Use% Mounted on - 7.8T 2.2T 5.2T 30% /srv/tank/backups ---------8<--------------------------------------------------------------------- At this point I took some advice from someone more experiences with btfrs. They explained a few things: - RAID-10 was probably a poor choice if I was going to use disks of different sizes - My problem is that RAID-10 requires four devices of equal size and by attempting to delete one of the 2TB disks I was leaving the volume with three 2TB disks and one 500GB disk, which is why the 500GB disk has filled and the volume now thinks it is full. I could not interrupt the "device delete" so I had to shut down and boot again. In between I removed the suspect disk (at this point, dev/sdl). Upon boot, the btrfs volume would not mount: ---------8<--------------------------------------------------------------------- # mount /srv/tank Aug 8 19:05:37 specialbrew kernel: [ 426.358894] BTRFS: device label tank devid 5 transid 798058 /dev/sdj Aug 8 19:05:37 specialbrew kernel: [ 426.372031] BTRFS info (device sdj): disk space caching is enabled Aug 8 19:05:37 specialbrew kernel: [ 426.379825] BTRFS: failed to read the system array on sdj Aug 8 19:05:37 specialbrew kernel: [ 426.403095] BTRFS: open_ctree failed mount: wrong fs type, bad option, bad superblock on /dev/sdj, ---------8<--------------------------------------------------------------------- After boot, sdj is the 500G disk and the remaining 2TB disks are g, h and i: ---------8<--------------------------------------------------------------------- # btrfs fi sh Label: 'tank' uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb Total devices 5 FS bytes used 1.08TB devid 5 size 465.76GB used 465.76GB path /dev/sdj devid 3 size 1.82TB used 555.29GB path /dev/sdh devid 4 size 1.82TB used 555.29GB path /dev/sdi devid 2 size 1.82TB used 555.29GB path /dev/sdg *** Some devices missing ---------8<--------------------------------------------------------------------- Asking to mount via one of the 2TB disks, e.g. mount /dev/sdh /srv/tank produces the same result and still mentions being unable to read the system array on sdj. Running "btrfs check" against any of the disks produces the following result: ---------8<--------------------------------------------------------------------- # btrfs check /dev/sdj Aug 8 19:13:15 specialbrew kernel: [ 884.840987] BTRFS: device label tank devid 2 transid 798058 /dev/sdg Aug 8 19:13:15 specialbrew kernel: [ 885.058896] BTRFS: device label tank devid 4 transid 798058 /dev/sdi Aug 8 19:13:15 specialbrew kernel: [ 885.091042] BTRFS: device label tank devid 3 transid 798058 /dev/sdh Aug 8 19:13:15 specialbrew kernel: [ 885.097790] BTRFS: device label tank devid 5 transid 798058 /dev/sdj Aug 8 19:13:15 specialbrew kernel: [ 885.129491] BTRFS: device label tank devid 2 transid 798058 /dev/sdg Aug 8 19:13:15 specialbrew kernel: [ 885.137456] BTRFS: device label tank devid 4 transid 798058 /dev/sdi Aug 8 19:13:15 specialbrew kernel: [ 885.145731] BTRFS: device label tank devid 3 transid 798058 /dev/sdh Aug 8 19:13:16 specialbrew kernel: [ 885.151907] BTRFS: device label tank devid 5 transid 798058 /dev/sdj warning, device 1 is missing warning, device 1 is missing warning devid 1 not found already Checking filesystem on /dev/sdj UUID: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb checking extents checking free space cache checking fs roots checking csums checking root refs found 49947638987 bytes used err is 0 total csum bytes: 1160389912 total tree bytes: 1439944704 total fs tree bytes: 150958080 total extent tree bytes: 55762944 btree space waste bytes: 69500665 file data blocks allocated: 1570420359168 referenced 1568123219968 Btrfs v0.20-rc1-358-g194aa4a ---------8<--------------------------------------------------------------------- Attempting to mount with -odegraded hangs: ---------8<--------------------------------------------------------------------- # mount -odegraded /srv/tank Aug 8 19:20:58 specialbrew kernel: [ 1347.388182] BTRFS: device label tank devid 5 transid 798058 /dev/sdj Aug 8 19:20:58 specialbrew kernel: [ 1347.628728] BTRFS info (device sdj): allowing degraded mounts Aug 8 19:20:58 specialbrew kernel: [ 1347.633978] BTRFS info (device sdj): disk space caching is enabled Aug 8 19:20:58 specialbrew kernel: [ 1347.725065] BTRFS: bdev (null) errs: wr 122025014, rd 0, flush 293476, corrupt 0, gen 0 Aug 8 19:20:58 specialbrew kernel: [ 1347.730473] BTRFS: bdev /dev/sdg errs: wr 3, rd 8, flush 0, corrupt 0, gen 0 ---------8<--------------------------------------------------------------------- After a while a hung task timeout will appear: ---------8<--------------------------------------------------------------------- Aug 8 19:24:32 specialbrew kernel: [ 1562.105676] INFO: task btrfs-transacti:5162 blocked for more than 120 seconds. Aug 8 19:24:32 specialbrew kernel: [ 1562.111386] Not tainted 3.14-0.bpo.2-amd64 #1 Aug 8 19:24:32 specialbrew kernel: [ 1562.117124] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 8 19:24:32 specialbrew kernel: [ 1562.122939] btrfs-transacti D ffff88011fc14340 0 5162 2 0x00000000 Aug 8 19:24:32 specialbrew kernel: [ 1562.128827] ffff8800d29e4550 0000000000000046 ffff8800d29e4550 ffffffff81813480 Aug 8 19:24:32 specialbrew kernel: [ 1562.134791] 0000000000014340 ffff880061b9dfd8 0000000000014340 ffff8800d29e4550 Aug 8 19:24:32 specialbrew kernel: [ 1562.140874] 0000000000000246 ffff88005c955e50 ffff8800d37dd1e8 0000000000000400 Aug 8 19:24:32 specialbrew kernel: [ 1562.146977] Call Trace: Aug 8 19:24:33 specialbrew kernel: [ 1562.153135] [<ffffffffa02f2fb1>] ? wait_current_trans.isra.34+0xa1/0x100 [btrfs] Aug 8 19:24:33 specialbrew kernel: [ 1562.159442] [<ffffffff810a69a0>] ? __wake_up_sync+0x10/0x10 Aug 8 19:24:33 specialbrew kernel: [ 1562.165887] [<ffffffffa02f4718>] ? start_transaction+0x368/0x560 [btrfs] Aug 8 19:24:33 specialbrew kernel: [ 1562.172415] [<ffffffffa02f1683>] ? transaction_kthread+0x73/0x240 [btrfs] Aug 8 19:24:33 specialbrew kernel: [ 1562.179017] [<ffffffffa02f1610>] ? open_ctree+0x20a0/0x20a0 [btrfs] Aug 8 19:24:33 specialbrew kernel: [ 1562.185657] [<ffffffff81086f3c>] ? kthread+0xbc/0xe0 Aug 8 19:24:33 specialbrew kernel: [ 1562.192217] [<ffffffff81086e80>] ? flush_kthread_worker+0xa0/0xa0 Aug 8 19:24:33 specialbrew kernel: [ 1562.198795] [<ffffffff81504f4c>] ? ret_from_fork+0x7c/0xb0 Aug 8 19:24:33 specialbrew kernel: [ 1562.205376] [<ffffffff81086e80>] ? flush_kthread_worker+0xa0/0xa0 Aug 8 19:26:33 specialbrew kernel: [ 1682.211331] INFO: task mount:5140 blocked for more than 120 seconds. Aug 8 19:26:33 specialbrew kernel: [ 1682.218003] Not tainted 3.14-0.bpo.2-amd64 #1 ---------8<--------------------------------------------------------------------- It is not possible to interrupt this mount, so I rebooted. It *is* possible to mount -oro,recovery,degraded: ---------8<--------------------------------------------------------------------- Aug 8 20:09:52 specialbrew kernel: [ 1157.331457] BTRFS: device label tank devid 3 transid 798058 /dev/sdh Aug 8 20:09:52 specialbrew kernel: [ 1157.626651] BTRFS info (device sdj): enabling auto recovery Aug 8 20:09:52 specialbrew kernel: [ 1157.633645] BTRFS info (device sdj): allowing degraded mounts Aug 8 20:09:52 specialbrew kernel: [ 1157.640624] BTRFS info (device sdj): disk space caching is enabled Aug 8 20:09:52 specialbrew kernel: [ 1157.812040] BTRFS: bdev (null) errs: wr 122025014, rd 0, flush 293476, corrupt 0, gen 0 Aug 8 20:09:52 specialbrew kernel: [ 1157.819138] BTRFS: bdev /dev/sdg errs: wr 3, rd 8, flush 0, corrupt 0, gen 0 ---------8<--------------------------------------------------------------------- As the replacement disk arrived today I have now inserted it, put a filesystem on it and am at present taking a local copy of the data via the read-only mount. Until that completes my only backups are remote and loss of this filesystem would require some effort and time to recover from. Once that backup is complete I am willing to try any procedure that you think may help to get this filesystem mounted. I can also upload a btrfs-image file somewhere if anyone is interested. This system is Debian wheezy using the backports kernel. At the time of the "device delete" it was running kernel package 3.13-0 and from the first reboot it has been running 3.14-0.bpo.2-amd64. I could try a mainline kernel if that might help. In terms of hardware resources available, aside from the new disk that is currently being backed up onto, I have another new disk, the old suspect disk that probably can be read from but not written to, and one spare slot in the disk chassis. Thanks, Andy
A file made from: btrfs-image -c9 -t4 /dev/sdj is now available on request. It's 1.2GB in size.
I have compiled a 3.16 kernel and using this kernel I seem to have been able to successfully: mount -odegraded,recovery /srv/tank (i.e. it's now read-write) I was then able to umount and remount again: mount -odegraded /srv/tank So as far as I can determine so far this bug went away somewhere between 3.14 and 3.16. My next tasks will be to: - Insert a new disk - "dev replace" away the 500G one (sdj) for the new device - "dev resize" the new device to use more than 500G Hopefully at this point thevolume mounts without -odegraded, so then: - convert to raid-1 for easier usage with different-sized devices.
I've ran into what I believe is the same bug under 3.14 as well, but have an additional detail to add that could be important for duplicating this. That detail is that you need to break the RAID by pulling the devid 1 disk. That is, I had two disks (devid 1 and devid 2) in a RAID1 configuration. 1 - I powered off, pulled the first, and powered back on. This leaves me in a state where I can mount the second drive with the degraded option. Any attempt to write to it causes the process to hang and the kernel to eventually spits out a backtrace (see further down) though. 2 - I powered off, put back in the first, pulled the second, and powered back on. Writing to this one is fine though. I even converted it back to to a single setup with the "btrfs rebalance" command (strangely converting the metadata also converted they system data, and I also couldn't do the direct raid1 -> dup conversion but had to go raid1 -> single -> dup instead). Here's the "btrfs filesystem show" and "btrfs filesystem df" outputs from the first (the conversion seems to have produced as strange df "unknown" category) Label: none uuid: 455ffac4-ecd1-4b31-b17a-31e54fb0c7a1 Total devices 1 FS bytes used 128.40GiB devid 1 size 146.54GiB used 132.06GiB path /dev/mapper/Debian-Root Btrfs v3.14.1 Data, single: total=130.00GiB, used=127.81GiB System, DUP: total=32.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=611.12MiB unknown, single: total=208.00MiB, used=0.00 Here's the "btrfs filesystem show" and "btrfs filesystem df" outputs from the second Label: none uuid: 455ffac4-ecd1-4b31-b17a-31e54fb0c7a1 Total devices 21 FS bytes used 130.61GB devid 2 size 449.26GiB used 133.03GiB path /dev/sdb3 *** Some devices missing Btrfs v3.14.1 Data, RAID1: total=131.00GiB, used=130.00GiB System, RAID1: total=32.00MiB, used=48.00KiB Metadata, RAID1: total=2.00GiB, used=622.98MiB Here is also the backtrace I'm seeing from the kernel (there may be some typos in here as I copied this by hand off the screen) INFO: task btrfs-transacti:193 blocked for more than 120 seconds. Not tainted 3.14-1-amd64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. btrfs-transacti D ffff88040a52a528 0 193 2 0x00000000 ffff88040a52a110 0000000000000046 0000000000014380 ffff88040a6d1fd8 0000000000014380 ffff88040a52a110 ffff88041ed94c10 ffff88041efbb708 0000000000000002 ffffffff8111f380 ffff88040a6d1ad0 ffff88040a6d1bb8 Call Trace: [<ffffffff8111f380>] ? wait_on_page_read+0x60/0x60 [<ffffffff814bd1f4>] ? io_schedule+0x94/0x130 [<ffffffff8111f385>] ? sleep_on_page+0x5/0x10 [<ffffffff814bd564>] ? __wait_on_bit+0x54/0x80 [<ffffffff8111f18f>] ? wait_on_page_bit+0x7f/0x90 [<ffffffff8109e390>] ? autoremove_wake_function+0x30/0x30 [<ffffffff8112x248>] ? pagevec_lookup_tag+0x18/0x20 [<ffffffff8111d270>] ? filemap_fdatawait_range+0xd0/0x160 [<ffffffffa0312e05>] ? btrfs_wait_ordered_range+0x65/0x120 [btrfs] [<ffffffffa03391de>] ? __btrfs_write_out_cache+0x6fe/0x8f0 [btrfs] [<ffffffffa03396a9>] ? btrfs_write_out_cache+0x99/0xd0 [btrfs] [<ffffffffa02eaf8e>] ? btrfs_write_dirty_block_groups+0x58e/0x680 [btrfs] [<ffffffffa0364fed>] ? commit_cowonly_roots+0x14b/0x202 [btrfs] [<ffffffffa02fa53a>] ? btrfs_commit_transaction+0x42a/0x990 [btrfs] [<ffffffffa02fab2b>] ? start_transaction+0x8b.0x550 [btrfs] [<ffffffffa02f638d>] ? transaction_kthread+0x1ad/0x240 [btrfs] [<ffffffffa02f61e0>] ? btrfs_cleanup_transaction+0x510/0x510 [btrfs] [<ffffffff8107f8e8>] ? kthread+0xb8/0xd0 [<ffffffff8107f830>] ? kthread_create_on_node+0x170/0x170 [<ffffffff814c800c>] ? ret_from_fork+0x7c/0xb0 [<ffffffff8107f830>] ? kthread_create_on_node+0x170/0x170
I would add that booting a 3.16 kernel restored R/W access to the system for me too in degraded mode when devid 1 is missing.
This is a semi-automated bugzilla cleanup, report is against an old kernel version. If the problem still happens, please open a new bug. Thanks.