Created attachment 134651 [details] dmesg with sysrq-w 0. # mkfs.btrfs -d raid1 -m raid1 /dev/sd[bcd] # mount /dev/sdb /mnt ## default mount options 1. Populate with data until one of three devices is nearly full, but no enospc has occurred. 2. Poweroff, remove /dev/sdb (devid 1), replace with new device 3. Poweron 4. # mount /dev/sdc -o degraded # btrfs device add /dev/sdb /mnt Actual result: Performing full device TRIM (6.00GiB) … ...hang... does not recover after 120 minutes From another shell: # btrfs fi show Label: none uuid: 7387b661-4e72-45e4-a0a1-e6c4ad460164 Total devices 4 FS bytes used 7.90GiB devid 2 size 6.00GiB used 5.99GiB path /dev/sdc devid 3 size 6.00GiB used 5.99GiB path /dev/sdd devid 4 size 6.00GiB used 0.00 path /dev/sdb *** Some devices missing Shows the add has occurred, but inspection of /dev/sdb shows no Btrfs metadata has been written. So this status is not yet committed to disk. Issued sysrq-w and collected dmesg, will attach full output. Partial output of blocked task: [ 360.370589] INFO: task btrfs-transacti:1162 blocked for more than 120 seconds. [ 360.374966] Not tainted 3.15.0-0.rc3.git3.1.fc21.x86_64 #1 [ 360.375911] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 360.377151] btrfs-transacti D ffff88008571cd40 6456 1162 2 0x00000080 [ 360.379152] ffff880085769d20 0000000000000046 ffff88008571cd40 00000000001d5dc0 [ 360.379915] ffff880085769fd8 00000000001d5dc0 ffff880099194d40 ffff8800966ea210 [ 360.380839] ffff8800854331e8 ffff880085433000 ffff8800854331e8 0000000000000000 [ 360.381747] Call Trace: [ 360.382674] [<ffffffff817ec479>] schedule+0x29/0x70 [ 360.383594] [<ffffffffa00ca01f>] wait_current_trans.isra.19+0xcf/0x120 [btrfs] [ 360.386590] [<ffffffff810eed10>] ? abort_exclusive_wait+0xb0/0xb0 [ 360.390772] [<ffffffffa00cb988>] start_transaction+0x3f8/0x650 [btrfs] [ 360.391981] [<ffffffffa00cbc97>] btrfs_attach_transaction+0x17/0x20 [btrfs] [ 360.395811] [<ffffffffa00c6b5e>] transaction_kthread+0x1de/0x290 [btrfs] [ 360.396996] [<ffffffffa00c6980>] ? btrfs_cleanup_transaction+0x630/0x630 [btrfs] [ 360.398178] [<ffffffff810c3348>] kthread+0x108/0x120 [ 360.398880] [<ffffffff810dbaa8>] ? sched_clock_cpu+0x98/0xc0 [ 360.399741] [<ffffffff810f8f6d>] ? trace_hardirqs_on_caller+0x15d/0x200 [ 360.400605] [<ffffffff810c3240>] ? insert_kthread_work+0x80/0x80 [ 360.401523] [<ffffffff817fc97c>] ret_from_fork+0x7c/0xb0 [ 360.402495] [<ffffffff810c3240>] ? insert_kthread_work+0x80/0x80 [ 360.403445] 1 lock held by btrfs-transacti/1162: [ 360.407557] #0: (&fs_info->transaction_kthread_mutex){+.+...}, at: [<ffffffffa00c6a08>] transaction_kthread+0x88/0x290 [btrfs] [ 360.409707] INFO: task btrfs:1165 blocked for more than 120 seconds. [ 360.410962] Not tainted 3.15.0-0.rc3.git3.1.fc21.x86_64 #1 [ 360.411851] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 360.412709] btrfs D ffff880089053380 2624 1165 997 0x00000080 [ 360.413599] ffff8800850b57f8 0000000000000046 ffff880089053380 00000000001d5dc0 [ 360.414517] ffff8800850b5fd8 00000000001d5dc0 ffff880099194d40 ffff88009d9d6758 [ 360.415449] ffff88009e3e34a8 0000000000000002 ffffffff811b0850 ffff8800850b5870 [ 360.416307] Call Trace: [ 360.416983] [<ffffffff811b0850>] ? wait_on_page_read+0x60/0x60 [ 360.418657] [<ffffffff817ec80f>] io_schedule+0xaf/0x150 [ 360.419515] [<ffffffff811b085e>] sleep_on_page+0xe/0x20 [ 360.420360] [<ffffffff817ecc43>] __wait_on_bit+0x83/0xa0 [ 360.421205] [<ffffffff811b1ab2>] ? find_get_pages_tag+0x2d2/0x370 [ 360.421857] [<ffffffff811b05e9>] wait_on_page_bit+0x89/0xa0 [ 360.422667] [<ffffffff810eed50>] ? autoremove_wake_function+0x40/0x40 [ 360.423466] [<ffffffff811b06f9>] filemap_fdatawait_range+0xf9/0x1c0 [ 360.424707] [<ffffffffa00e6a3c>] btrfs_wait_ordered_range+0x6c/0x150 [btrfs] [ 360.429249] [<ffffffffa01108a0>] __btrfs_write_out_cache+0x640/0x860 [btrfs] [ 360.429890] [<ffffffffa011144d>] btrfs_write_out_cache+0x8d/0xe0 [btrfs] [ 360.430742] [<ffffffffa00ba1f1>] btrfs_write_dirty_block_groups+0x691/0x730 [btrfs] [ 360.431650] [<ffffffffa0142539>] commit_cowonly_roots+0x17a/0x234 [btrfs] [ 360.432456] [<ffffffffa00caf5b>] btrfs_commit_transaction+0x44b/0xa80 [btrfs] [ 360.433271] [<ffffffffa00fa734>] btrfs_init_new_device+0x884/0xeb0 [btrfs] [ 360.433903] [<ffffffff811dfcae>] ? might_fault+0x5e/0xc0 [ 360.434683] [<ffffffff811dfd09>] ? might_fault+0xb9/0xc0 [ 360.435545] [<ffffffffa010392c>] btrfs_ioctl+0xc4c/0x2aa0 [btrfs] [ 360.436334] [<ffffffff8135e8f2>] ? avc_has_perm+0x142/0x350 [ 360.436919] [<ffffffff810dbaa8>] ? sched_clock_cpu+0x98/0xc0 [ 360.437688] [<ffffffff810dbb26>] ? local_clock+0x16/0x30 [ 360.438589] [<ffffffff810f5a7f>] ? lock_release_holdtime.part.28+0xf/0x200 [ 360.439359] [<ffffffff8135e90a>] ? avc_has_perm+0x15a/0x350 [ 360.439930] [<ffffffff8135e7e4>] ? avc_has_perm+0x34/0x350 [ 360.440638] [<ffffffff81360f01>] ? inode_has_perm.isra.47+0x51/0x90 [ 360.441367] [<ffffffff812513e0>] do_vfs_ioctl+0x2f0/0x520 [ 360.441926] [<ffffffff81251691>] SyS_ioctl+0x81/0xa0 [ 360.443840] [<ffffffff81155eac>] ? __audit_syscall_entry+0x9c/0xf0 [ 360.447746] [<ffffffff817fca29>] system_call_fastpath+0x16/0x1b [ 360.449563] 4 locks held by btrfs/1165: [ 360.450279] #0: (&fs_info->volume_mutex){+.+.+.}, at: [<ffffffffa01038fe>] btrfs_ioctl+0xc1e/0x2aa0 [btrfs] [ 360.450874] #1: (sb_internal){.+.+..}, at: [<ffffffffa00cb9ed>] start_transaction+0x45d/0x650 [btrfs] [ 360.451686] #2: (&fs_info->reloc_mutex){+.+...}, at: [<ffffffffa00cae8d>] btrfs_commit_transaction+0x37d/0xa80 [btrfs] [ 360.452462] #3: (&fs_info->tree_log_mutex){+.+...}, at: [<ffffffffa00caf01>] btrfs_commit_transaction+0x3f1/0xa80 [btrfs] Expected result: Device should be added, or I should get an error message. Regression: This command works: # btrfs replace start 1 /dev/sdb /mnt The missing device is removed from fi show, and the replacement is rebuilt correctly. No hang. Volume works as expected.
Originally reported on Btrfs list with kernel 3.14.1-1-ARCH #1 SMP PREEMPT Mon Apr 14 20:40:47 CEST 2014 x86_64 GNU/Linux; and btrfs-progs 3.14. http://www.spinics.net/lists/linux-btrfs/msg33677.html
This is reproduced on a VirtualBox 4.3.10 VM, so it's strange to see the device trim message on add. Although there is also a device trim message when doing the mkfs. No change with this command, it still hangs. btrfs device add -Kf /dev/sdb /mnt
Created attachment 135041 [details] dmesg selinux=0 I'm seeing what might be a similar problem (similar blocked task messages anyway) with mdadm trying to stop an array. Bug 75451. The similarity made me wonder if this is an selinux problem, so I changed boot param from enforcing=0 to selinux=0. Problem is still reproducible. This dmesg is with kernel-3.15.0-0.rc3.git0.1.fc21.x86_64 (non-debug), selinux=0, and includes sysrq-w.
Created attachment 135051 [details] console messages, hang/failure to power off Bumping importance because clean unmount of other volumes isn't possible, hang at reboot and poweroff. Some subvolumes are failing to unmount.
Created attachment 154921 [details] another dmesg with sysrq-w I experienced the same with 3.14.9-1 on Debian. Accidentally wiped a disk so was missing from the raid1 pair on boot, re-added it and the add command hangs, doing no disk io. I power cycled and reproduce this several times. Attached sysrq-w output. But I upgraded to 3.16.3-2 (userspace tools too) and it's fixed. I added the disk succesfully and was able to remove the missing one.