With lastest 4.14.x kernel, BTRFS hangs randomly with this type of error: [Wed Jun 19 09:27:44 2019] INFO: task btrfs-cleaner:733 blocked for more than 1200 seconds. [Wed Jun 19 09:27:44 2019] Tainted: G O 4.14.125 #1 [Wed Jun 19 09:27:44 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 19 09:27:44 2019] btrfs-cleaner D 0 733 2 0x80000000 [Wed Jun 19 09:27:44 2019] Call Trace: [Wed Jun 19 09:27:44 2019] ? __schedule+0x292/0x8b0 [Wed Jun 19 09:27:44 2019] schedule+0x39/0x90 [Wed Jun 19 09:27:44 2019] btrfs_tree_lock+0xc3/0x1c0 [btrfs] [Wed Jun 19 09:27:44 2019] ? wait_woken+0x80/0x80 [Wed Jun 19 09:27:44 2019] do_walk_down+0xe7/0x580 [btrfs] [Wed Jun 19 09:27:44 2019] walk_down_tree+0xac/0xd0 [btrfs] [Wed Jun 19 09:27:44 2019] btrfs_drop_snapshot+0x393/0x880 [btrfs] [Wed Jun 19 09:27:44 2019] ? __schedule+0x29a/0x8b0 [Wed Jun 19 09:27:44 2019] ? btrfs_destroy_inode+0x1a9/0x290 [btrfs] [Wed Jun 19 09:27:44 2019] btrfs_clean_one_deleted_snapshot+0xaf/0x100 [btrfs] [Wed Jun 19 09:27:44 2019] cleaner_kthread+0x10d/0x130 [btrfs] [Wed Jun 19 09:27:44 2019] kthread+0xff/0x140 [Wed Jun 19 09:27:44 2019] ? btree_invalidatepage+0x90/0x90 [btrfs] [Wed Jun 19 09:27:44 2019] ? kthread_create_on_node+0x40/0x40 [Wed Jun 19 09:27:44 2019] ret_from_fork+0x1f/0x40 [Wed Jun 19 09:27:44 2019] INFO: task btrfs-transacti:734 blocked for more than 1200 seconds. [Wed Jun 19 09:27:44 2019] Tainted: G O 4.14.125 #1 [Wed Jun 19 09:27:44 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 19 09:27:44 2019] btrfs-transacti D 0 734 2 0x80000000 [Wed Jun 19 09:27:44 2019] Call Trace: [Wed Jun 19 09:27:44 2019] ? __schedule+0x292/0x8b0 [Wed Jun 19 09:27:44 2019] schedule+0x39/0x90 [Wed Jun 19 09:27:44 2019] wait_current_trans+0x8c/0xd0 [btrfs] [Wed Jun 19 09:27:44 2019] ? wait_woken+0x80/0x80 [Wed Jun 19 09:27:44 2019] start_transaction+0x200/0x430 [btrfs] [Wed Jun 19 09:27:44 2019] transaction_kthread+0x142/0x1b0 [btrfs] [Wed Jun 19 09:27:44 2019] kthread+0xff/0x140 [Wed Jun 19 09:27:44 2019] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs] [Wed Jun 19 09:27:44 2019] ? kthread_create_on_node+0x40/0x40 [Wed Jun 19 09:27:44 2019] ret_from_fork+0x1f/0x40 After debugging, I think that the bug is caused by this commit (kernel 4.14.123): commit d819d97ea025f8c32c12adef0ff55b2c7bf5c853 Author: Josef Bacik <josef@toxicpanda.com> Date: Wed Jan 16 11:00:57 2019 -0500 btrfs: honor path->skip_locking in backref code commit 38e3eebff643db725633657d1d87a3be019d1018 upstream. The previous calls to btrfs_tree_read_lock and btrfs_set_lock_blocking_rw were not removed by the patch : diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index b517ef1477ea..d826fbaf7d50 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1281,9 +1292,14 @@ again: } btrfs_tree_read_lock(eb); btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); + if (!path->skip_locking) { + btrfs_tree_read_lock(eb); + btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK); + } ret = find_extent_in_eb(eb, bytenr, *extent_item_pos, &eie); - btrfs_tree_read_unlock_blocking(eb); + if (!path->skip_locking) + btrfs_tree_read_unlock_blocking(eb); free_extent_buffer(eb); if (ret < 0) goto out; I currently testing a the kernel 4.14.130 with the two lines in excess removed and for now the bug seems gone.
Posted fix for this: https://lore.kernel.org/stable/20190708120130.GA25587@redhat.com/T/#u