Bug 216012 - Data loss on VirtualBox VMs
Summary: Data loss on VirtualBox VMs
Status: RESOLVED INSUFFICIENT_DATA
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-05-21 17:52 UTC by AA
Modified: 2022-05-25 08:22 UTC (History)
1 user (show)

See Also:
Kernel Version: Ubuntu 5.13.0-35-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description AA 2022-05-21 17:52:41 UTC
From recent kernel and/or vbox versions forward, I've started to experience data loss and/or filesystem corruption on a regular enough basis that I've enabled fsck on every reboot. Mac OSX hosted VM.

It's not clear to me what the source of the problem is, but it may not be ext4 itself vs some element of the block device drivers and/or related hypervisor facilities.


Message from syslogd@ubuntu at May 21 15:56:49 ...
 kernel:[1958392.548682] EXT4-fs (dm-0): failed to convert unwritten extents to written extents -- potential data loss!  (inode 259291, error -30)

Message from syslogd@ubuntu at May 21 15:56:50 ...
 kernel:[1958392.830326] EXT4-fs (dm-0): failed to convert unwritten extents to written extents -- potential data loss!  (inode 262111, error -30)
Comment 1 Artem S. Tashkinov 2022-05-22 06:04:23 UTC
There's zero info as to how it's all happening. If you need help you need to let people understand how it can be reproduced.

Lastly, you're using the Ubuntu kernel and bugs against it need to be reported here: https://bugs.launchpad.net/ubuntu
Comment 2 AA 2022-05-22 20:18:56 UTC
I provided the information I had. You're not talking to a complete noob, Artem. If this issue was deterministically reproducible, you can be assured I'd have provided the steps to reproduce it and any additional details.

There is some multi-factor problem going on here, and frankly, I don't even know where to start in order to determine the cause! This is a linux virtual machine hosted by virtualbox on mac os x.

Since the error is reported by ext4-fs, and I did not immediately find other errors reported, I reported the ext4-fs errors to ext4.

One possibility is that this occurs when the os x system is under relatively high load. Under 5x kernel versions on this VB host I have also seen some occasional CPU X stuck for 22s messages. However these messages have not been correlated with the ext4 errors.

If anyone has knowledge as to the conditions that can cause this ext4 error message, I will try to dig deeper next time it happens. The combination of factors seems to point to core issues in the VirtualBox drivers and/or virtualization interfaces but I'm just not sure. Maybe the virtual block device driver bubbles up a timeout of some kind as an unrecoverable write to ext4 via whatever kernel interface is being used to write fs blocks???

If you still feel ubuntu is the right place to chase down the issue I'll go there. I've not gotten any traction from reporting the CPU stuck issue to Oracle.
Comment 3 Theodore Tso 2022-05-23 21:05:36 UTC
Error -30 is EROFS in this message:

EXT4-fs (dm-0): failed to convert unwritten extents to written extents -- potential data loss!  (inode 259291, error -30)

This typically means that *before* this point, the ext4 file system detected an inconsistency, and the file system was set up to remount the file system read-only when an Ext4.  So there would be an "EXT4-fs error" message.  For example, you can trigger this behaviour like this:

root@kvm-xfstests:~# tune2fs -e remount-ro /dev/vdc
tune2fs 1.46.4-orphan-file-02827d06 (4-Nov-2021)
Setting error behavior to 2
root@kvm-xfstests:~# mount /dev/vdc /vdc
[   83.142333] EXT4-fs (vdc): mounted filesystem with ordered data mode. Quota mode: none.
root@kvm-xfstests:~# echo test-corruption-handling > /sys/fs/ext4/vdc/trigger_fs_error 
[   91.189272] EXT4-fs error (device vdc): trigger_test_error:126: comm bash: test-corruption-handling
[   91.190375] Aborting journal on device vdc-8.
[   91.193756] EXT4-fs (vdc): Remounting filesystem read-only
root@kvm-xfstests:~# 

Typically, when this happens, in 99.9999% of the time, it's caused by an I/O error.   In a hypervisor situation, that includes a potential hypervisor bug.   In any case, without any other evidence to the contrary, it's probably not an ext4 bug.  And even if it was, unless you can replicate the bug on an upstream kernel, the proper place to report it is with Canonical.    After all, that's why you've paid $$$ for a support contract with Canonical for Ubuntu, right?  :-)    And depending on Canonical's support contract, they might or might not be willing to track down a Virtualbox bug unless you've paid for a more comprehensive support contract.   In any case, upstream developers don't have time to chase down something like this, especially since the probabilities are extremely high that it's not an upstream kernel issue.
Comment 4 Artem S. Tashkinov 2022-05-25 08:22:09 UTC
The kernel you're using has a solid ext4 filesystem driver as there are no similar reports on the net which could mean that you're having issue with either MacOS or VirtualBox or both and I'm sorry to break it to you but you're on your own.

It makes sense to try this VM on another PC or recreate the VM from scratch.

Note You need to log in before you can comment on or make changes to this bug.