Bug 119541

Summary: btrfs-cleaner causes processes to hang for a long time
Product: File System Reporter: mark
Component: btrfsAssignee: Josef Bacik (josef)
Status: RESOLVED OBSOLETE    
Severity: high CC: dsterba
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.4.11 Subsystem:
Regression: No Bisected commit-id:
Attachments: a few sysreq w's during lockups

Description mark 2016-06-02 19:44:59 UTC
I have a laptop with a 500gb btrfs filesystem on an SSD. I am using snapper to generate hourly/daily snapshots - roughly 10 snaps stored over the past week. When the snapshots get deleted and the btrfs-cleaner process runs, the system periodically freezes for long periods of time with the disk reading at ~100mb/sec and kworker or btrfs-cleaner running at 100% OR one of my userspace processes running at 100% of system CPU usage. I reduced the hung task blocker to 20sec to produce the following dump - the system is responsive but hangs the process the moment it tries to IO to the disk during this period. This period of ~20-30sec IO hangs lasts for about 10 minutes. When snapshot deletion is not occurring the system runs really smoothly; no IO issues or anything.

I recently upgraded from ubuntu 14.04 lts on a 4.2.6 kernel which did not have these hangs. This issue occurs on ubuntu 16.04 lts standard kernel which I believe is 4.4.8, and also with this kernel build: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.11-xenial/

Mark

[27914.006195] INFO: task btrfs-transacti:317 blocked for more than 20 seconds.
[27914.006199]       Tainted: P        W  OE   4.4.11-040411-generic #201605182255
[27914.006199] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27914.006200] btrfs-transacti D ffff880270383da8     0   317      2 0x00000000
[27914.006203]  ffff880270383da8 ffff880270383df0 ffffffff81e11500 ffff8800358e3b00
[27914.006205]  ffff880270384000 ffff880277a4d1f0 ffff880277a4d000 ffff880277a4d1f0
[27914.006206]  0000000000000000 ffff880270383dc0 ffffffff81804245 ffff88023e009b00
[27914.006208] Call Trace:
[27914.006214]  [<ffffffff81804245>] schedule+0x35/0x80
[27914.006245]  [<ffffffffc0327fb3>] wait_current_trans.isra.21+0xd3/0x120 [btrfs]
[27914.006248]  [<ffffffff810c2860>] ? wake_atomic_t_function+0x60/0x60
[27914.006257]  [<ffffffffc032988b>] start_transaction+0x27b/0x4c0 [btrfs]
[27914.006265]  [<ffffffffc0329b87>] btrfs_attach_transaction+0x17/0x20 [btrfs]
[27914.006274]  [<ffffffffc0324082>] transaction_kthread+0x82/0x240 [btrfs]
[27914.006281]  [<ffffffffc0324000>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[27914.006284]  [<ffffffff8109f428>] kthread+0xd8/0xf0
[27914.006286]  [<ffffffff8109f350>] ? kthread_create_on_node+0x1a0/0x1a0
[27914.006288]  [<ffffffff818086cf>] ret_from_fork+0x3f/0x70
[27914.006289]  [<ffffffff8109f350>] ? kthread_create_on_node+0x1a0/0x1a0
[27914.006301] INFO: task cups-browsed:1472 blocked for more than 20 seconds.
[27914.006302]       Tainted: P        W  OE   4.4.11-040411-generic #201605182255
[27914.006302] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27914.006303] cups-browsed    D ffff880075a2fb40     0  1472      1 0x00000000
[27914.006305]  ffff880075a2fb40 0001880000000001 ffffffff81e11500 ffff88027744bb00
[27914.006306]  ffff880075a30000 ffff880277f7a9f0 ffff880277f7a800 ffff880277f7a9f0
[27914.006307]  0000000000000001 ffff880075a2fb58 ffffffff81804245 ffff88023e009b00
[27914.006309] Call Trace:
[27914.006310]  [<ffffffff81804245>] schedule+0x35/0x80
[27914.006318]  [<ffffffffc0327fb3>] wait_current_trans.isra.21+0xd3/0x120 [btrfs]
[27914.006320]  [<ffffffff810c2860>] ? wake_atomic_t_function+0x60/0x60
[27914.006327]  [<ffffffffc03298db>] start_transaction+0x2cb/0x4c0 [btrfs]
[27914.006334]  [<ffffffffc0329ae8>] btrfs_start_transaction+0x18/0x20 [btrfs]
[27914.006342]  [<ffffffffc0334f12>] btrfs_setattr+0x182/0x360 [btrfs]
[27914.006344]  [<ffffffff81226bf5>] notify_change+0x235/0x360
[27914.006346]  [<ffffffff81207c05>] do_truncate+0x75/0xc0
[27914.006348]  [<ffffffff8121811a>] path_openat+0x27a/0x1330
[27914.006349]  [<ffffffff8121a3c1>] do_filp_open+0x91/0x100
[27914.006351]  [<ffffffff81227b36>] ? __alloc_fd+0x46/0x190
[27914.006352]  [<ffffffff8120917a>] do_sys_open+0x13a/0x230
[27914.006353]  [<ffffffff8120928e>] SyS_open+0x1e/0x20
[27914.006355]  [<ffffffff81808332>] entry_SYSCALL_64_fastpath+0x16/0x71
[27914.006413] INFO: task kded5:3570 blocked for more than 20 seconds.
[27914.006414]       Tainted: P        W  OE   4.4.11-040411-generic #201605182255
[27914.006415] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27914.006415] kded5           D ffff880244203b40     0  3570   3560 0x00000000
[27914.006417]  ffff880244203b40 0001880200000001 ffff880277da1d80 ffff88024fcb2c40
[27914.006418]  ffff880244204000 ffff880075f889f0 ffff880075f88800 ffff880075f889f0
[27914.006419]  0000000000000001 ffff880244203b58 ffffffff81804245 ffff88023e009b00
[27914.006421] Call Trace:
[27914.006422]  [<ffffffff81804245>] schedule+0x35/0x80
[27914.006430]  [<ffffffffc0327fb3>] wait_current_trans.isra.21+0xd3/0x120 [btrfs]
[27914.006431]  [<ffffffff810c2860>] ? wake_atomic_t_function+0x60/0x60
[27914.006438]  [<ffffffffc03298db>] start_transaction+0x2cb/0x4c0 [btrfs]
[27914.006445]  [<ffffffffc0329ae8>] btrfs_start_transaction+0x18/0x20 [btrfs]
[27914.006453]  [<ffffffffc0334f12>] btrfs_setattr+0x182/0x360 [btrfs]
[27914.006454]  [<ffffffff81226bf5>] notify_change+0x235/0x360
[27914.006455]  [<ffffffff81207c05>] do_truncate+0x75/0xc0
[27914.006456]  [<ffffffff81207942>] ? do_dentry_open+0x272/0x310
[27914.006458]  [<ffffffff8121811a>] path_openat+0x27a/0x1330
[27914.006459]  [<ffffffff812197e1>] ? filename_lookup+0xf1/0x180
[27914.006461]  [<ffffffff8121a3c1>] do_filp_open+0x91/0x100
[27914.006462]  [<ffffffff81227b36>] ? __alloc_fd+0x46/0x190
[27914.006463]  [<ffffffff8120917a>] do_sys_open+0x13a/0x230
[27914.006465]  [<ffffffff8120928e>] SyS_open+0x1e/0x20
[27914.006466]  [<ffffffff81808332>] entry_SYSCALL_64_fastpath+0x16/0x71
[27914.006477] INFO: task Thread (pooled):32343 blocked for more than 20 seconds.
[27914.006478]       Tainted: P        W  OE   4.4.11-040411-generic #201605182255
[27914.006478] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27914.006479] Thread (pooled) D ffff88022ac37b98     0 32343   3560 0x00000000
[27914.006480]  ffff88022ac37b98 0001880200000001 ffff880277da2c40 ffff8801438b49c0
[27914.006482]  ffff88022ac38000 ffff880075f889f0 ffff880075f88800 ffff880075f889f0
[27914.006483]  0000000000000001 ffff88022ac37bb0 ffffffff81804245 ffff88023e009b00
[27914.006484] Call Trace:
[27914.006485]  [<ffffffff81804245>] schedule+0x35/0x80
[27914.006493]  [<ffffffffc0327fb3>] wait_current_trans.isra.21+0xd3/0x120 [btrfs]
[27914.006495]  [<ffffffff810c2860>] ? wake_atomic_t_function+0x60/0x60
[27914.006501]  [<ffffffffc03298db>] start_transaction+0x2cb/0x4c0 [btrfs]
[27914.006508]  [<ffffffffc0329ae8>] btrfs_start_transaction+0x18/0x20 [btrfs]
[27914.006516]  [<ffffffffc03385ca>] btrfs_create+0x4a/0x210 [btrfs]
[27914.006517]  [<ffffffff812158e5>] vfs_create+0xc5/0x120
[27914.006519]  [<ffffffff812190ac>] path_openat+0x120c/0x1330
[27914.006534]  [<ffffffff8118b7b9>] ? unlock_page+0x69/0x70
[27914.006536]  [<ffffffff8118ba34>] ? filemap_map_pages+0x224/0x230
[27914.006538]  [<ffffffff8121a3c1>] do_filp_open+0x91/0x100
[27914.006540]  [<ffffffff81227b36>] ? __alloc_fd+0x46/0x190
[27914.006541]  [<ffffffff8120917a>] do_sys_open+0x13a/0x230
[27914.006543]  [<ffffffff8120928e>] SyS_open+0x1e/0x20
[27914.006545]  [<ffffffff81808332>] entry_SYSCALL_64_fastpath+0x16/0x71
[27914.006612] INFO: task BrowserBlocking:27381 blocked for more than 20 seconds.
[27914.006613]       Tainted: P        W  OE   4.4.11-040411-generic #201605182255
[27914.006614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27914.006614] BrowserBlocking D ffff8801cb17fb98     0 27381      1 0x00000000
[27914.006616]  ffff8801cb17fb98 0001880100000001 ffffffff81e11500 ffff880276783b00
[27914.006617]  ffff8801cb180000 ffff880075f889f0 ffff880075f88800 ffff880075f889f0
[27914.006618]  0000000000000001 ffff8801cb17fbb0 ffffffff81804245 ffff88023e009b00
[27914.006620] Call Trace:
[27914.006621]  [<ffffffff81804245>] schedule+0x35/0x80
[27914.006629]  [<ffffffffc0327fb3>] wait_current_trans.isra.21+0xd3/0x120 [btrfs]
[27914.006631]  [<ffffffff810c2860>] ? wake_atomic_t_function+0x60/0x60
[27914.006637]  [<ffffffffc03298db>] start_transaction+0x2cb/0x4c0 [btrfs]
[27914.006643]  [<ffffffffc0329ae8>] btrfs_start_transaction+0x18/0x20 [btrfs]
[27914.006651]  [<ffffffffc03385ca>] btrfs_create+0x4a/0x210 [btrfs]
[27914.006652]  [<ffffffff812158e5>] vfs_create+0xc5/0x120
[27914.006654]  [<ffffffff812190ac>] path_openat+0x120c/0x1330
[27914.006655]  [<ffffffff810aac12>] ? wake_up_q+0x32/0x70
[27914.006657]  [<ffffffff8121a3c1>] do_filp_open+0x91/0x100
[27914.006659]  [<ffffffff81101607>] ? do_futex+0x307/0x540
[27914.006660]  [<ffffffff81209ab9>] ? __vfs_write+0xc9/0x110
[27914.006662]  [<ffffffff81227b36>] ? __alloc_fd+0x46/0x190
[27914.006663]  [<ffffffff8120917a>] do_sys_open+0x13a/0x230
[27914.006678]  [<ffffffff8120928e>] SyS_open+0x1e/0x20
[27914.006679]  [<ffffffff81808332>] entry_SYSCALL_64_fastpath+0x16/0x71
[27914.006681] INFO: task Chrome_CacheThr:27386 blocked for more than 20 seconds.
[27914.006682]       Tainted: P        W  OE   4.4.11-040411-generic #201605182255
[27914.006682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27914.006683] Chrome_CacheThr D ffff88024af03b20     0 27386      1 0x00000000
[27914.006684]  ffff88024af03b20 ffff88024af03b70 ffff880277da49c0 ffff88006a9f6740
[27914.006686]  ffff88024af04000 ffff880075f889f0 ffff880075f88800 ffff880075f889f0
[27914.006687]  0000000000000001 ffff88024af03b38 ffffffff81804245 ffff88023e009b00
[27914.006688] Call Trace:
[27914.006689]  [<ffffffff81804245>] schedule+0x35/0x80
[27914.006696]  [<ffffffffc0327fb3>] wait_current_trans.isra.21+0xd3/0x120 [btrfs]
[27914.006711]  [<ffffffff810c2860>] ? wake_atomic_t_function+0x60/0x60
[27914.006717]  [<ffffffffc032988b>] start_transaction+0x27b/0x4c0 [btrfs]
[27914.006737]  [<ffffffffc0329b27>] btrfs_join_transaction+0x17/0x20 [btrfs]
[27914.006747]  [<ffffffffc032f1a5>] btrfs_dirty_inode+0x35/0xd0 [btrfs]
[27914.006756]  [<ffffffffc032f2bf>] btrfs_update_time+0x7f/0xc0 [btrfs]
[27914.006771]  [<ffffffff81224c3d>] file_update_time+0xbd/0x110
[27914.006778]  [<ffffffffc0339759>] btrfs_page_mkwrite+0xc9/0x380 [btrfs]
[27914.006780]  [<ffffffff810c21a2>] ? __wake_up_common+0x52/0x90
[27914.006782]  [<ffffffff811b9b46>] do_page_mkwrite+0x56/0xb0
[27914.006783]  [<ffffffff811bbef8>] do_wp_page+0x158/0x6c0
[27914.006785]  [<ffffffff8118b8de>] ? filemap_map_pages+0xce/0x230
[27914.006787]  [<ffffffff811bd6ed>] handle_mm_fault+0xcdd/0x1820
[27914.006789]  [<ffffffff8106a537>] __do_page_fault+0x197/0x400
[27914.006790]  [<ffffffff8106a7c2>] do_page_fault+0x22/0x30
[27914.006791]  [<ffffffff8180a4b8>] page_fault+0x28/0x30
[27914.006792] INFO: task Chrome_IOThread:27387 blocked for more than 20 seconds.
[27914.006793]       Tainted: P        W  OE   4.4.11-040411-generic #201605182255
[27914.006794] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27914.006794] Chrome_IOThread D ffff8801dc353de8     0 27387      1 0x00000000
[27914.006796]  ffff8801dc353de8 00000000214ad60e ffff880277da49c0 ffff88006f4149c0
[27914.006797]  ffff8801dc354000 ffff880118b9d2e8 ffff880118b9d300 ffff8801dc353f58
[27914.006798]  ffff88006f4149c0 ffff8801dc353e00 ffffffff81804245 ffff88006f4149c0
[27914.006799] Call Trace:
[27914.006801]  [<ffffffff81804245>] schedule+0x35/0x80
[27914.006802]  [<ffffffff818070c0>] rwsem_down_read_failed+0xe0/0x140
[27914.006804]  [<ffffffff813de924>] call_rwsem_down_read_failed+0x14/0x30
[27914.006805]  [<ffffffff81806690>] ? down_read+0x20/0x30
[27914.006807]  [<ffffffff8106a715>] __do_page_fault+0x375/0x400
[27914.006808]  [<ffffffff8106a7c2>] do_page_fault+0x22/0x30
[27914.006808]  [<ffffffff8180a4b8>] page_fault+0x28/0x30
[27914.006810] INFO: task CompositorTileW:27389 blocked for more than 20 seconds.
[27914.006811]       Tainted: P        W  OE   4.4.11-040411-generic #201605182255
[27914.006811] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27914.006812] CompositorTileW D ffff88017e677df8     0 27389      1 0x00000000
[27914.006813]  ffff88017e677df8 0000000000000001 ffff880277da2c40 ffff880277449d80
[27914.006814]  ffff88017e678000 ffff880118b9d300 ffff880118b9d2e8 ffffffff00000000
[27914.006815]  ffffffff00000001 ffff88017e677e10 ffffffff81804245 ffff880277449d80
[27914.006817] Call Trace:
[27914.006818]  [<ffffffff81804245>] schedule+0x35/0x80
[27914.006819]  [<ffffffff81806e92>] rwsem_down_write_failed+0x202/0x350
[27914.006821]  [<ffffffff813de953>] call_rwsem_down_write_failed+0x13/0x20
[27914.006822]  [<ffffffff818066cd>] ? down_write+0x2d/0x40
[27914.006823]  [<ffffffff811c4703>] vm_munmap+0x33/0x60
[27914.006824]  [<ffffffff811c56f2>] SyS_munmap+0x22/0x30
[27914.006826]  [<ffffffff81808332>] entry_SYSCALL_64_fastpath+0x16/0x71
[27914.006827] INFO: task Chrome_HistoryT:27427 blocked for more than 20 seconds.
[27914.006828]       Tainted: P        W  OE   4.4.11-040411-generic #201605182255
[27914.006828] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[27914.006829] Chrome_HistoryT D ffff8801cb2fbde8     0 27427      1 0x00000000
[27914.006830]  ffff8801cb2fbde8 0000000000000001 ffff880277da6740 ffff880016570ec0
[27914.006832]  ffff8801cb2fc000 ffff880118b9d2e8 ffff880118b9d300 ffff8801cb2fbf58
[27914.006833]  ffff880016570ec0 ffff8801cb2fbe00 ffffffff81804245 ffff880016570ec0
[27914.006834] Call Trace:
[27914.006835]  [<ffffffff81804245>] schedule+0x35/0x80
[27914.006836]  [<ffffffff818070c0>] rwsem_down_read_failed+0xe0/0x140
[27914.006838]  [<ffffffff81209ab9>] ? __vfs_write+0xc9/0x110
[27914.006839]  [<ffffffff813de924>] call_rwsem_down_read_failed+0x14/0x30
[27914.006841]  [<ffffffff81806690>] ? down_read+0x20/0x30
[27914.006842]  [<ffffffff8106a715>] __do_page_fault+0x375/0x400
[27914.006843]  [<ffffffff8106a7c2>] do_page_fault+0x22/0x30
[27914.006844]  [<ffffffff8180a4b8>] page_fault+0x28/0x30
Comment 1 mark 2016-06-05 07:56:18 UTC
Created attachment 219061 [details]
a few sysreq w's during lockups

Here is the kernel log during the lockups - it was mostly chrome or soffice.bin that got stuck at 100% system CPU, system was frozen for ~40-50sec during each freeze period and I ran sysreq w a few times during those periods
Comment 2 mark 2016-06-26 20:07:39 UTC
Appears this was caused by quotas - disabling quotas caused this hang issue to stop on the 4.4 series kernels.
Comment 3 David Sterba 2022-10-04 13:49:09 UTC
This is a semi-automated bugzilla cleanup, report is against an old kernel version. If the problem still happens, please open a new bug. Thanks.