Bug 108361

Summary: btrfs-cleaner refuses to freeze
Product: File System Reporter: Martin Ziegler (ziegler)
Component: btrfsAssignee: Josef Bacik (josef)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: high CC: blinxwang, dsterba, evangelos, jikos, lilydjwg, marci_r
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.4-rc2 Subsystem:
Regression: No Bisected commit-id:
Attachments: Relevant part of syslog

Description Martin Ziegler 2015-11-23 23:35:08 UTC
Created attachment 195261 [details]
Relevant part of syslog

Several times a week suspend fails because btrfs-cleaner refuses to freeze. Suspension works if I try again. This behaviour started with 4.4-rc1.

I use a Lenovo T450s with a 300GB SSD

The SSD is mounted as btrfs (rw,noatime,nodiratime,compress=lzo, ssd,noacl,space_cache,commit=3600,subvolid=5, subvol=/)

I attach the relevant syslog.
Comment 1 Martin Ziegler 2015-11-23 23:38:43 UTC
Here is the output of btrfs fi df /

Data, single: total=282.00GiB, used=238.77GiB
System, single: total=64.00MiB, used=48.00KiB
Metadata, single: total=3.00GiB, used=905.77MiB
GlobalReserve, single: total=304.00MiB, used=0.00B
Comment 2 Martin Ziegler 2015-12-09 20:47:57 UTC
The bug is still there in 4.4-rc4
Comment 3 Martin Ziegler 2016-01-12 23:54:59 UTC
I have bisected the bug between 4.3.0 and 4.4-rc1. The first bad bug
was


 commit 696249132158014d594896df3a81390616069c5c
 Author: Jiri Kosina <jkosina@suse.cz>
 Date:   Mon Oct 26 15:06:19 2015 +0900

 btrfs: clear PF_NOFREEZE in cleaner_kthread()

 cleaner_kthread() kthread calls try_to_freeze() at the beginning of every
 cleanup attempt. This operation can't ever succeed though, as the kthread
 hasn't marked itself as freezable.

 Before (hopefully eventually) kthread freezing gets converted to fileystem
 freezing, we'd rather mark cleaner_kthread() freezable (as my understanding
 is that it can generate filesystem I/O during suspend).

The bug is still present in 4.4.0. 

I can reliably reproduce the bug, doing the following steps immediately after each other:
1. a substantial incremental backup with btrfs send/receive to an external disk.
2. copy some files to the disk with rsync
3. unmount the disk
4. wait until the disk is unmounted
5. suspend

Next I will check, if this was triggered by my commit interval, which
btrfs does not seem to like:
 BTRFS warning (device sda5): excessive commit interval 3600
Comment 4 Evangelos Foutras 2016-01-15 08:08:11 UTC
XFS appears to suffer from the same issue; interestingly enough, it has a similar commit to the one listed above (calling set_freezable()):

http://oss.sgi.com/pipermail/xfs/2016-January/045864.html
Comment 5 Martin Ziegler 2016-01-20 11:20:52 UTC
The following seems to be a workaround: before suspending, I remount the
internal ssd with 
 
   mount -o remount,commit=15

I did not try other commit-intervals.
Comment 6 Martin Ziegler 2016-01-20 11:22:48 UTC
Sorry, I meant: mount -o remount,commit=15 /
Comment 7 Martin Ziegler 2016-01-20 22:21:05 UTC
What actually worked, was
mount -o remount,commit=15 / ; sync
before suspend
Comment 8 David Sterba 2016-01-22 08:44:38 UTC
Seems that the sync is the thing that helps. Remount implies sync, the commit=15 just shortens the interval of periodic transaction commit (implies sync)
Comment 9 Evangelos Foutras 2016-01-23 10:54:16 UTC
(In reply to Evangelos Foutras from comment #4)
> XFS appears to suffer from the same issue; interestingly enough, it has a
> similar commit to the one listed above (calling set_freezable()):
> 
> http://oss.sgi.com/pipermail/xfs/2016-January/045864.html

The XFS commit has been reverted; seems to me that btrfs needs to do the same:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3e85286e7522
Comment 10 David Sterba 2016-01-25 10:05:27 UTC
Thanks for your help. https://patchwork.kernel.org/patch/8105741/ scheduled for 4.5-rc2 and 4.4.x.