Bug 81981
Summary: | mount hangs after forced reboot from "device delete" | ||
---|---|---|---|
Product: | File System | Reporter: | Andy Smith (andy-bugzilla.kernel.org) |
Component: | btrfs | Assignee: | Josef Bacik (josef) |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, andy-bugzilla.kernel.org, dsterba, szg00000, twhitehead |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.14-0-bp | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Andy Smith
2014-08-08 21:05:18 UTC
A file made from: btrfs-image -c9 -t4 /dev/sdj is now available on request. It's 1.2GB in size. I have compiled a 3.16 kernel and using this kernel I seem to have been able to successfully: mount -odegraded,recovery /srv/tank (i.e. it's now read-write) I was then able to umount and remount again: mount -odegraded /srv/tank So as far as I can determine so far this bug went away somewhere between 3.14 and 3.16. My next tasks will be to: - Insert a new disk - "dev replace" away the 500G one (sdj) for the new device - "dev resize" the new device to use more than 500G Hopefully at this point thevolume mounts without -odegraded, so then: - convert to raid-1 for easier usage with different-sized devices. I've ran into what I believe is the same bug under 3.14 as well, but have an additional detail to add that could be important for duplicating this. That detail is that you need to break the RAID by pulling the devid 1 disk. That is, I had two disks (devid 1 and devid 2) in a RAID1 configuration. 1 - I powered off, pulled the first, and powered back on. This leaves me in a state where I can mount the second drive with the degraded option. Any attempt to write to it causes the process to hang and the kernel to eventually spits out a backtrace (see further down) though. 2 - I powered off, put back in the first, pulled the second, and powered back on. Writing to this one is fine though. I even converted it back to to a single setup with the "btrfs rebalance" command (strangely converting the metadata also converted they system data, and I also couldn't do the direct raid1 -> dup conversion but had to go raid1 -> single -> dup instead). Here's the "btrfs filesystem show" and "btrfs filesystem df" outputs from the first (the conversion seems to have produced as strange df "unknown" category) Label: none uuid: 455ffac4-ecd1-4b31-b17a-31e54fb0c7a1 Total devices 1 FS bytes used 128.40GiB devid 1 size 146.54GiB used 132.06GiB path /dev/mapper/Debian-Root Btrfs v3.14.1 Data, single: total=130.00GiB, used=127.81GiB System, DUP: total=32.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=611.12MiB unknown, single: total=208.00MiB, used=0.00 Here's the "btrfs filesystem show" and "btrfs filesystem df" outputs from the second Label: none uuid: 455ffac4-ecd1-4b31-b17a-31e54fb0c7a1 Total devices 21 FS bytes used 130.61GB devid 2 size 449.26GiB used 133.03GiB path /dev/sdb3 *** Some devices missing Btrfs v3.14.1 Data, RAID1: total=131.00GiB, used=130.00GiB System, RAID1: total=32.00MiB, used=48.00KiB Metadata, RAID1: total=2.00GiB, used=622.98MiB Here is also the backtrace I'm seeing from the kernel (there may be some typos in here as I copied this by hand off the screen) INFO: task btrfs-transacti:193 blocked for more than 120 seconds. Not tainted 3.14-1-amd64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. btrfs-transacti D ffff88040a52a528 0 193 2 0x00000000 ffff88040a52a110 0000000000000046 0000000000014380 ffff88040a6d1fd8 0000000000014380 ffff88040a52a110 ffff88041ed94c10 ffff88041efbb708 0000000000000002 ffffffff8111f380 ffff88040a6d1ad0 ffff88040a6d1bb8 Call Trace: [<ffffffff8111f380>] ? wait_on_page_read+0x60/0x60 [<ffffffff814bd1f4>] ? io_schedule+0x94/0x130 [<ffffffff8111f385>] ? sleep_on_page+0x5/0x10 [<ffffffff814bd564>] ? __wait_on_bit+0x54/0x80 [<ffffffff8111f18f>] ? wait_on_page_bit+0x7f/0x90 [<ffffffff8109e390>] ? autoremove_wake_function+0x30/0x30 [<ffffffff8112x248>] ? pagevec_lookup_tag+0x18/0x20 [<ffffffff8111d270>] ? filemap_fdatawait_range+0xd0/0x160 [<ffffffffa0312e05>] ? btrfs_wait_ordered_range+0x65/0x120 [btrfs] [<ffffffffa03391de>] ? __btrfs_write_out_cache+0x6fe/0x8f0 [btrfs] [<ffffffffa03396a9>] ? btrfs_write_out_cache+0x99/0xd0 [btrfs] [<ffffffffa02eaf8e>] ? btrfs_write_dirty_block_groups+0x58e/0x680 [btrfs] [<ffffffffa0364fed>] ? commit_cowonly_roots+0x14b/0x202 [btrfs] [<ffffffffa02fa53a>] ? btrfs_commit_transaction+0x42a/0x990 [btrfs] [<ffffffffa02fab2b>] ? start_transaction+0x8b.0x550 [btrfs] [<ffffffffa02f638d>] ? transaction_kthread+0x1ad/0x240 [btrfs] [<ffffffffa02f61e0>] ? btrfs_cleanup_transaction+0x510/0x510 [btrfs] [<ffffffff8107f8e8>] ? kthread+0xb8/0xd0 [<ffffffff8107f830>] ? kthread_create_on_node+0x170/0x170 [<ffffffff814c800c>] ? ret_from_fork+0x7c/0xb0 [<ffffffff8107f830>] ? kthread_create_on_node+0x170/0x170 I would add that booting a 3.16 kernel restored R/W access to the system for me too in degraded mode when devid 1 is missing. This is a semi-automated bugzilla cleanup, report is against an old kernel version. If the problem still happens, please open a new bug. Thanks. |