Bug 207809

Summary: Bcache does not consistently work with swap file hibernation
Product: IO/Storage Reporter: Alec Feldman (alecfeldman)
Component: OtherAssignee: io_other
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.6.12 Subsystem:
Regression: No Bisected commit-id:

Description Alec Feldman 2020-05-21 02:17:31 UTC
I followed the directions to get a swap file working on btrfs, including using resume=swap_device and resume_offset=swap_file_offset. The physical offset was retrieved using this script: https://github.com/osandov/osandov-linux/blob/master/scripts/btrfs_map_physical.c. I then divided the physical offset by the page size to get the resume offset. Hibernating into a swap file only works the first time after creation. Every subsequent hibernation leads my system to boot as if it were not resuming. Hibernating can also cause the backing device to unattach from the caching device, leading to an unbootable system. I then have to forcefully recreate the cache in order to fix it. Eventually, the above happens, but the filesystem also becomes corrupted as I can no longer reattatch the cache. This can be reproduced on any setup with bcache, btrfs on the cache, and a swap file on the btrfs system. I am not sure if this affects other filesystems.
Comment 1 Alec Feldman 2020-06-24 04:39:57 UTC
From looking deeper into this issue and reading the FAQ on the bcache site, this appears to be a catch 22 issue. During resume, you are not allowed to make any changes to the disk. However, with bcache, this can be tricky: any read you make from a bcache device could result in a write to update the caching device. Currently bcache has no good ways of solving this. This means resume must finish before bcache starts. However, if the resume image is on the bcache backing device, the image must be accessed first. Using resume_offset to use the swap file on the backing device must be updating the caching device, and subsequently causing data loss. If I have reached the correct conclusion from the data loss that occured on my system and the limitation listed on the bcache FAQ, then that means resuming from a swap file on a backing device, at least with the cache enabled, is not possible. Disabling the cache during resume may prevent data loss from occuring.