Bug 116801

Summary: btrfs uses up all available RAM, system unusable
Product: File System Reporter: Konstantin Svist (fry.kun)
Component: btrfsAssignee: Josef Bacik (josef)
Status: RESOLVED OBSOLETE    
Severity: normal CC: dsterba, otto, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.4.6-301.fc23.x86_64 Subsystem:
Regression: No Bisected commit-id:
Attachments: htop output from system before it hangs
picture of console when system crashes
picture of kernel messages on another boot
htop after quota disable with healthy btrfs-cleaner

Description Konstantin Svist 2016-04-20 20:20:15 UTC
I had set up my Fedora 23 x64 with dnf-snapper plugin (it takes a snapshot after every dnf transaction). So far, I have ~150 snapshots since ~January.
I tried to remove several oldest snapshots (10-20?), but as soon as btrfs starts cleaning/synching the deletion to disk, system becomes unusable (all programs frozen; eventually mouse cursor doesn't move, either). This continues even after reboot with following symptoms:

* reads/writes are okay until btrfs-cleaner/btrfs-transacti processes kick in (I think ~10sec after mount?)
* when they start, btrfs-cleaner gets into D state (IO wait) and btrfs-transacti is using 100% CPU
* at this point, reads are still okay, but ANY attempted write freezes both reads and writes
* cleaner/transacti keep working until all available memory is depleted (8G total on the machine). The processes' reported memory is 0, but buff/cache is shown using all of it. Trying to flush the caches doesn't accomplish anything
* when RAM is depleted, kernel oom-killer kills random processes and I have to reboot with REISUB

Finally was able to finish the transaction by moving the drive to a system with 32G RAM (used a live USB stick with kernel 4.2.3 and btrfs-progs upgraded to 4.5.1 from rawhide repos)
Comment 1 Konstantin Svist 2016-04-20 20:50:16 UTC
Note: "btrfs balance" on the volume seems to have the same effect -- system freezes
I had to boot to a live USB again, and issue "mount /dev/sda2 /mnt && btrfs balance cancel /mnt"
This froze up in GUI and emergency modes (seems that "balance cancel" is preempted by the running balance and never gets a chance to cancel)
Comment 2 Otto Kekäläinen 2016-10-02 05:35:18 UTC
The description above seem very much similar to the bug I am experiencing, so I'll add to this bug report.

On my machine btrfs-cleaner crashes with these symptoms:

* system starts up normally and everything works for about 1-2 minutes after boot
* some kernel process (most likely btrfs-cleaner) kicks in and starts consuming CPU and RAM (interestingly the process does not show up in the htop process listing, but resouce usage is seen by htop - screenshot attached)
* after 2-3 minutes I can see in syslog that btrfs-cleaner crashed and system becomes completely unresponsive
* about 1 minute later the OOM killer kicks the kernel panics and the system completely freezes, only hard reset can make it reboot (see attached pictures)

This is repeatable on every boot. Sometimes the logs show OOM killer in action before the system freezes, but not always. System is completely unusable until this is fixed.

I noticed that these bug reports seems very much similar, maybe the same thing makes btrfs-cleaner crash in latest kernel 4.4?:
* https://bugzilla.kernel.org/show_bug.cgi?id=119541
* https://bugzilla.kernel.org/show_bug.cgi?id=76451

My system is a Ubuntu 16.04 with latest 4.4.0 from Ubuntu main.

The root filesystem is btrfs, and all block devices are encrypted with luks/dm-crypt.

I use subvolumes, read-only snaphots and quota. This problem occured a few days after a deleted many read-only snapshots (I needed to purge old backups to free some space when less than 10% was available).

Stacktrace from journalctl -f:

loka 02 08:14:58 bup-offsite kernel: INFO: task btrfs-cleaner:3154 blocked for more than 120 seconds.
loka 02 08:14:58 bup-offsite kernel:       Not tainted 4.4.0-38-generic #57-Ubuntu
loka 02 08:14:58 bup-offsite kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
loka 02 08:14:58 bup-offsite kernel: btrfs-cleaner   D ffff880073297c08     0  3154      2 0x00000000
loka 02 08:14:58 bup-offsite kernel:  ffff880073297c08 ffffffff810ab219 ffff880074cf0000 ffff880070451900
loka 02 08:14:58 bup-offsite kernel:  ffff880073298000 ffff8800710574a0 ffff880071057498 ffff880070451900
loka 02 08:14:58 bup-offsite kernel:  ffff88006ff12000 ffff880073297c20 ffffffff8182c5f5 7fffffffffffffff
loka 02 08:14:58 bup-offsite kernel: Call Trace:
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810ab219>] ? ttwu_do_wakeup+0x19/0xe0
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff8182c5f5>] schedule+0x35/0x80
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff8182f715>] schedule_timeout+0x1b5/0x270
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810ac115>] ? wake_up_process+0x15/0x20
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff81098401>] ? insert_work+0x81/0xc0
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810abd97>] ? try_to_wake_up+0x47/0x3b0
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff8182d053>] wait_for_completion+0xb3/0x140
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810ac190>] ? wake_up_q+0x70/0x70
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc0371d97>] btrfs_async_run_delayed_refs+0x127/0x150 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc038dde7>] __btrfs_end_transaction+0x1f7/0x340 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc038e353>] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc0376289>] btrfs_drop_snapshot+0x539/0x830 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc038cb02>] btrfs_clean_one_deleted_snapshot+0xb2/0x100 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc03853cf>] cleaner_kthread+0xcf/0x220 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc0385300>] ? check_leaf+0x360/0x360 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810a08d8>] kthread+0xd8/0xf0
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810a0800>] ? kthread_create_on_node+0x1e0/0x1e0
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff81830a8f>] ret_from_fork+0x3f/0x70
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810a0800>] ? kthread_create_on_node+0x1e0/0x1e0
loka 02 08:14:58 bup-offsite kernel: INFO: task kworker/u8:6:3416 blocked for more than 120 seconds.
loka 02 08:14:58 bup-offsite kernel:       Not tainted 4.4.0-38-generic #57-Ubuntu
loka 02 08:14:58 bup-offsite kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
loka 02 08:14:58 bup-offsite kernel: kworker/u8:6    D ffff880031cfbcc8     0  3416      2 0x00000000
loka 02 08:14:58 bup-offsite kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
loka 02 08:14:58 bup-offsite kernel:  ffff880031cfbcc8 ffffffff81190bb9 ffffffff81e11500 ffff880033a56400
loka 02 08:14:58 bup-offsite kernel:  ffff880031cfc000 ffff88007153c1f0 ffff88007153c000 ffff88007153c1f0
loka 02 08:14:58 bup-offsite kernel:  0000000000000001 ffff880031cfbce0 ffffffff8182c5f5 ffff88006ff0c000
loka 02 08:14:58 bup-offsite kernel: Call Trace:
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff81190bb9>] ? mempool_free+0x29/0x80
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff8182c5f5>] schedule+0x35/0x80
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc038bfd3>] wait_current_trans.isra.21+0xd3/0x120 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810c3d80>] ? wake_atomic_t_function+0x60/0x60
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc038d8a5>] start_transaction+0x275/0x4b0 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc038db37>] btrfs_join_transaction+0x17/0x20 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc0377c58>] delayed_ref_async_start+0x18/0x90 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc03bee9a>] btrfs_scrubparity_helper+0xca/0x2f0 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffffc03bf0fe>] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff8109a395>] process_one_work+0x165/0x480
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff8109a6fb>] worker_thread+0x4b/0x4c0
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff8109a6b0>] ? process_one_work+0x480/0x480
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff8109a6b0>] ? process_one_work+0x480/0x480
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810a08d8>] kthread+0xd8/0xf0
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810a0800>] ? kthread_create_on_node+0x1e0/0x1e0
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff81830a8f>] ret_from_fork+0x3f/0x70
loka 02 08:14:58 bup-offsite kernel:  [<ffffffff810a0800>] ? kthread_create_on_node+0x1e0/0x1e0


I tried to disable quota, which was a confirmed work-around in #119541, but it didn't have effect in my case.
Comment 3 Otto Kekäläinen 2016-10-02 05:37:45 UTC
Created attachment 240551 [details]
htop output from system before it hangs
Comment 4 Otto Kekäläinen 2016-10-02 05:42:29 UTC
Created attachment 240561 [details]
picture of console when system crashes
Comment 5 Otto Kekäläinen 2016-10-02 05:43:31 UTC
Created attachment 240571 [details]
picture of kernel messages on another boot
Comment 6 Otto Kekäläinen 2016-10-02 06:19:36 UTC
I now tested by adding a 30 GB swap file and by running 'btrfs quota disable /data' again. Now btrfs-cleaner does not crash. I am not sure which one was the cure, but swap usage is zero, so I assume it was the quota group that was the root cause here.

I also noticed that in Ubuntu 16.04 the new default is to hide kernel processes. Attached now a new image that shows a healthy btrfs-cleaner at 5-10% CPU sporadically.
Comment 7 Otto Kekäläinen 2016-10-02 06:20:19 UTC
Created attachment 240581 [details]
htop after quota disable with healthy btrfs-cleaner
Comment 8 Otto Kekäläinen 2016-11-03 07:40:43 UTC
I noticed that on this same system btrfs balance can also trigger a hang:

X kernel: INFO: task btrfs:1955 blocked for more than 120 seconds.
X kernel:       Not tainted 4.4.0-45-generic #66-Ubuntu
X kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
X kernel: btrfs           D ffff88003e937a88     0  1955   1954 0x00000000
X kernel:  ffff88003e937a88 ffff88003e937ab0 ffff88016c488000 ffff88016921a640
X kernel:  ffff88003e938000 ffff8800826e271c ffff88016921a640 00000000ffffffff
X kernel:  ffff8800826e2720 ffff88003e937aa0 ffffffff8182d7c5 ffff8800826e2718
X kernel: Call Trace:
X kernel:  [<ffffffff8182d7c5>] schedule+0x35/0x80
X kernel:  [<ffffffff8182da6e>] schedule_preempt_disabled+0xe/0x10
X kernel:  [<ffffffff8182f6a9>] __mutex_lock_slowpath+0xb9/0x130
X kernel:  [<ffffffff8182f73f>] mutex_lock+0x1f/0x30
X kernel:  [<ffffffffc0466f2b>] btrfs_relocate_block_group+0x1ab/0x290 [btrfs]
X kernel:  [<ffffffffc043c577>] btrfs_relocate_chunk.isra.38+0x47/0xd0 [btrfs]
X kernel:  [<ffffffffc043d9d9>] __btrfs_balance+0x5b9/0xba0 [btrfs]
X kernel:  [<ffffffffc043e250>] btrfs_balance+0x290/0x5f0 [btrfs]
X kernel:  [<ffffffffc044a967>] btrfs_ioctl_balance+0x377/0x390 [btrfs]
X kernel:  [<ffffffffc044c0e0>] btrfs_ioctl+0x550/0x28b0 [btrfs]
X kernel:  [<ffffffff8119e609>] ? pagevec_lru_move_fn+0x109/0x120
X kernel:  [<ffffffff8119d5c0>] ? pagevec_move_tail_fn+0x70/0x70
X kernel:  [<ffffffff8119e6e5>] ? __lru_cache_add+0x45/0x70
X kernel:  [<ffffffff8119ea27>] ? lru_cache_add_active_or_unevictable+0x27/0xa0
X kernel:  [<ffffffff811c04da>] ? handle_mm_fault+0xcaa/0x1820
X kernel:  [<ffffffff8122124f>] do_vfs_ioctl+0x29f/0x490
X kernel:  [<ffffffff8106b504>] ? __do_page_fault+0x1b4/0x400
X kernel:  [<ffffffff812214b9>] SyS_ioctl+0x79/0x90
X kernel:  [<ffffffff818318b2>] entry_SYSCALL_64_fastpath+0x16/0x71


The blance operation is not progressing and always reports just:

$ sudo btrfs balance status /data
Balance on '/data' is running
0 out of about 992 chunks balanced (1 considered), 100% left
Comment 9 David Sterba 2022-10-04 13:45:33 UTC
This is a semi-automated bugzilla cleanup, report is against an old kernel version. If the problem still happens, please open a new bug. Thanks.