Bug 198505 - Errors from EXt4 FS when resuming from single hibernation image
Summary: Errors from EXt4 FS when resuming from single hibernation image
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: ARM Linux
: P1 blocking
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-18 11:12 UTC by Omkar Bolla
Modified: 2018-01-18 16:19 UTC (History)
1 user (show)

See Also:
Kernel Version: kernel-3.18.49
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Omkar Bolla 2018-01-18 11:12:47 UTC
I am trying to optimize boot time in Android M, using hibernation method. Finally I am able to do hibernate and resume properly, but for every boot I should take snapshot image of RAM and have to resume from that image. Here my requirement is I want to resume every time from single snapshot image. But i am having problem with that image after 2nd boot onwards.

I am getting ext4 error, seems like block number mismatch and that file-system on that partition is being remounted into read-only mode.

Below errors i am getting every time after 2nd boot onwards:

[   64.250735] EXT4-fs error (device mmcblk0p24): ext4_mb_generate_buddy:758: group 9, block bitmap and bg descriptor inconsistent: 31166 vs 31165 free clusters
[   64.252813] <3> (3)[1507:PackageManager]Aborting journal on device mmcblk0p24-8.
[   64.267680] <3> (3)[1507:PackageManager]EXT4-fs (mmcblk0p24): Remounting filesystem read-only
[   64.268667] <3> (1)[1:init]init: Starting service 'media'...
[   64.269479] <3> (3)[1507:PackageManager]EXT4-fs error (device mmcblk0p24) in ext4_free_blocks:4881: Journal has aborted
[   64.272235] <3> (3)[1507:PackageManager]EXT4-fs error (device mmcblk0p24) in ext4_reserve_inode_write:4999: Journal has aborted
[   64.278537] <3> (3)[1507:PackageManager]EXT4-fs error (device mmcblk0p24) in ext4_reserve_inode_write:4999: Journal has aborted
[   64.283132] <3> (3)[1507:PackageManager]EXT4-fs error (device mmcblk0p24) in ext4_ext_remove_space:3035: Journal has aborted
[   64.286970] <3> (3)[1507:PackageManager]EXT4-fs error (device mmcblk0p24) in ext4_ext_truncate:4669: Journal has aborted
[   64.291775] <3> (3)[1507:PackageManager]EXT4-fs error (device mmcblk0p24) in ext4_reserve_inode_write:4999: Journal has aborted
[   64.294465] <3> (3)[1507:PackageManager]EXT4-fs error (device mmcblk0p24) in ext4_truncate:3894: Journal has aborted
[   64.297764] <3> (3)[1507:PackageManager]EXT4-fs error (device mmcblk0p24) in ext4_reserve_inode_write:4999: Journal has aborted
[   64.300437] <3> (3)[1507:PackageManager]EXT4-fs error (device mmcblk0p24) in ext4_orphan_del:2888: Journal has aborted


This is happening because mounted partition meta-data in RAM(old snapshot image) is not matched with actual data in disk.

Please suggest me some way to overcome this problem.
Also is it possible to update file-system's meta-data to RAM from partition(mounted and used by some apps)?
Comment 1 Theodore Tso 2018-01-18 16:19:35 UTC
This is not a bug, this a fundamental problem with your proposed technique.

The issue is that file system metadata will be actively in use, and in memory.  Dealing with the block group descriptors are doable (but would require kernel changes).  The much bigger problem is going to be with inodes in use by the mounted data partition.    If you want to boot from a frozen hibernation image, and reuse it over and over again, this approach is pretty much doomed to failure, I'm afraid.   All it takes is for one of the system daemons to have some file opened for writing --- say, such as a log file, and if you try to reuse the hibernation image, it's a recipe for file system corruption and user data loss.

If you can change userspace so that you can unmount the data partition, you could make it work, since in Android the root partition is read-only, and thus guaranteed not to change.   But that means forcing all of the system daemons (where by system daemons I am referring to all long-running processes started at boot before you to suspend your the system  in your fundamentally flawed quick boot scheme) to close their open files and not have any processes set with their current working directory in the data partition.   If you could do that, you could then after the hibernation, remount the data partition, and then send a signal to all of the system daemons to reopen any open files and chdir back into /data.

But if you're going to do all of this, you might as well just simply fix the userspace to have a faster boot sequence.  I'll note that with my Pixel 2 XL, it has a very fast boot sequence, as does any of my Chromebooks.   So fixing this problem in userspace is definitely the right way to go --- not by trying some dirty hack like what you're proposing.

Note You need to log in before you can comment on or make changes to this bug.