Bug 211733 - ext4 file system unrecoverable corruption
Summary: ext4 file system unrecoverable corruption
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: i386 Linux
: P1 high
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-13 15:55 UTC by martrw
Modified: 2021-02-17 16:37 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.4.0-65-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description martrw 2021-02-13 15:55:11 UTC
Kubuntu 20.04, two week old installation
500 SATA HDD, 50GiB / partition/ 180GiB /home partition
Dual boot with Win7 on 50GiB partition

Observation:
Was switching between the Win7 OS and Linux with multiple reboots in short spans of time(<5min).  From Linux OS using Dolphin I moved ~5MB document files from Win7 partition to Linux /home/xxx/Documents directory and rebooted system to return to Win7.  Made changes in Win7 as needed and booted back into Linux.  I noticed the entire Documents directory was missing, about 50GiB files.  Immediately shut down system and booted up Linux on duplicate drive containing image from about two weeks prior.  Made read only image of /home directory from corrupted drive and placed on external 1 GiB backup drive.

Using R-Linux, extundelete, debugfs no trace of the Documents directory can be located on the image or the original /home directory.  I can see files I intentionally deleted during normal operations for over a week prior.

fsck, smartctl indicate no disk issues.

I have not tried to reproduce this issue.

This event seems very similar to the one discuss in this link but I have not been able to locate that particular bug.

https://www.itnews.com.au/news/stable-linux-kernels-hit-by-serious-file-system-bug-320709

I entered bug report on the bugs.kde.org bug tracker(432762) but was told that the issue is lower level than the Dolphin gui which I was using.

Apologies if this is a duplicate, but I could not find a similar issue on this tracker.
Comment 1 Theodore Tso 2021-02-16 18:30:16 UTC
The symptoms may be the same as a news article from 8 or 9 years ago, but that particular bug was solved a *long* time ago.

Unfortunately, there are many different potential causes of data loss.  It could be caused by bad partition tables, such that (for example) the Windows 7 partition overlaps (or Windows 7 thinks that) the partition overlaps with the Linux system.   It could be caused by hardware problems.   It could becaused by the user incorrectly using the GUI.  There's no way to tell based on the complete lack of data in the bug report.

It's much like sending a doctor an e-mail complaining with a tinghtness of chest and trouble breathing, but not giving the doctor any medical history, no ability for the doctor to give the patient a reading of an ECG, etc.

You're going to have to reproduce it, and do this with a large number of small checks.   Try copying data from Windows 7 to Linux.  Check to see if the data is there in Linux.  Try rebooting from Linux into Linux, and see if the data is there.  Then try rebooting into Windows and do some things, recording exactly what you are doing, and then try rebooting back into Linux and check the Documents folder.

Then (using a command line interface, so it's easier to capture the output and report it to a bug tracker), you need to get a printout of the partition table, and/or the Logical Volume and Physical Volume layout if you are using LVM, and also grab the kernel logs to see if there are any errors reported by the file system or device drivers, etc.

If you don't know how to do this, it's much more likely that the problem is user error, and my best suggestion is to find a local Linux user's group and ask for help.   Those folks might ask lots of potentially insultning questions, such as making sure that you were cleanly shutting down the system before rebooting back from Linux to Windows, or before powering down the computer; but those sorts of questions tend to be less insulting when someone asks you in person as opposed to via phone or e-mail tech support when people are obligated to ask the "are you sure the computer is plugged in" kind of basic questions.

Good luck!
Comment 2 martrw 2021-02-17 13:30:54 UTC
Thank you very much for such a detailed response.  I acknowledge the lack of actionable data in the initial report.  The event was initially anticipated to be a recoverable crisis and so no log data was captured to report.  In hindsight, this was a mistake.

I do not think intentional reproduction of the event will occur.  Recovery from this event was difficult and I am still not whole.  I would have to set up a separate machine with sacrificial data to not feel at extreme risk to do so.  However, should such a repetition occur, I will be much more detailed with my report.

I greatly appreciate your patience, insight and attention to detail in your response.
Comment 3 Theodore Tso 2021-02-17 16:37:59 UTC
Free advice?   Before you do anything else, back up *everything* before you even breathe on the system.  You may think it's not going to reproduce again, but if it does, you may end up losing more data.

I tend to keep things very simple.  Which is to say, I try not dual-boot Windows and Linux, and if I do, I use separate HDD's for the Windows and Linux systems.   So if I were doing anything like this at all, I'd boot into a Linux system, and then copy everything from the Windows partition to the Linux partition in a single go, and then be done with it.   The KISS (Keep things simple, stupid) principle is always a good way to follow especially with valuable data.

And we're only talking about a 500GB HDD.  Getting a second 500GB disk, or for that matter, an external 1TB HDD or even SSD, is cheap, compared to the value of your time.

Backups.  Backups.   Backups.   I've worked at MIT, and seen a graduate student lose ten years worth of their research data due to lack of backups.   One could perhaps claim that someone who was dumb enough not to make backups doesn't deserve to have a Ph.D., but regardless, it's still a tragedy; and totally avoidable.

Note You need to log in before you can comment on or make changes to this bug.