Bug 98501 - md raid0 w/ fstrim causing data loss
Summary: md raid0 w/ fstrim causing data loss
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: MD (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: io_md
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-17 19:41 UTC by Eric Work
Modified: 2015-06-03 13:11 UTC (History)
8 users (show)

See Also:
Kernel Version: 3.19.7
Tree: Fedora
Regression: Yes


Attachments
Move reassignment of "sector" in raid0_make_request (779 bytes, patch)
2015-05-18 08:46 UTC, Eric Work
Details | Diff
md/raid0: fix restore to sector variable in raid0_make_request (1.65 KB, patch)
2015-05-19 07:00 UTC, Eric Work
Details | Diff

Description Eric Work 2015-05-17 19:41:19 UTC
Hardware: 2 x Crucial_CT256MX100SSD1 (MU02)
Software: md raid0 w/ ext4
Kernel: 3.19.7-200.fc21.x86_64

I started to notice data corruption after a recent kernel update on Fedora 21.  Eventually the system failed to boot due to this corruption issue.  Later it turned out to be data loss in that files had the correct size and timestamps usually, but the file was zero filled.  Sometimes entire directories became empty after fsck.  I first thought maybe it was a SSD firmware bug so I updated the above mentioned SSDs from MU01 to MU02, which was rumored to be problematic and blacklisted already in the kernel with regards to NCQ TRIM.  Since I had to start over I decided to try installing Fedora 22 Final TC4 and updated to kernel 4.0.3-301.fc22.x86_64.  During my own manual post-install process I enabled "discard" in fstab and also ran an initial fstrim.  It didn't take long before the system couldn't execute /bin/sh and the system failed to boot.  This lead me to believe it was not fixed by the firmware update or a recent kernel.

So I decided to try and reproduce the problem.  To start I simply reinstalled Fedora 21 without any raid on the first SSD.  Then after updating to the latest kernel I ran fstrim a few times, reinstalled some large packages and everything seemed fine.  So I used my second SSD to create two 50 GB partitions on the same disk and then created an md raid0 array with an rsync copy of my root filesystem.  After doing the following experiment just once I saw a problem.

host$ mount /dev/md0 /mnt  # no discard
host$ mount -o bind /proc /mnt/proc  # same /sys, /proc
host$ chroot /mnt
test$ rpm -Va | tee rpm-verify.old
test$ yum reinstall kernel-devel
host$ fstrim -v /mnt
host$ echo 3 > /proc/sys/vm/drop_caches
test$ rpm -Va | tee rpm-verify.new
test$ diff rpm-verify.old rpm-verify.new

The diff reported that some files had md5sum mismatches now, mostly in the files installed by kernel-devel.  Quickly opening them in od and less showed they now contained only zeros.  I then reinstalled kernel-devel again and the file contents were restored to normal text.  After doing a few variations of this experiment eventually rpm wouldn't run properly.  I ran fsck and it reported *tons* of errors, just like when this first happened.  I then decided to go back to the original install kernel 3.17.4-301.fc21.x86_64 and reformated + rsynced the filesystem.  I repeated the above experiment and no corruption.  I updated back to 3.19.7-200.fc21.x86_64 and on the first attempt it happened again that rpm verify showed differences.  So thinking that I did an update that started this whole problem I reinstalled the second newest kernel 3.19.6-200.fc21.x86_64.  Repeated the above experiment 3 times and no corruption showed up.  Switched back to 3.19.7-200.fc21.x86_64 and right away I had minor corruption that manifested itself as zero filled files.

I can't be 100% sure that 3.19.7 is the first kernel with this issue because nothing is printed to dmesg to confirm it until the filesystem is internally very corrupted.  It just seems to be strongly correlated.  I'm not sure if this happens with other raid levels since I only had raid0 before and only tested raid0 now.
Comment 1 Eric Work 2015-05-18 08:46:27 UTC
Created attachment 177181 [details]
Move reassignment of "sector" in raid0_make_request

I reverted commit 7595f5425cad83e037639e228ee24d5052510139 and the problem went away.  This commit created a problem by reassigning "sector" at the wrong location.  At that point where this commit was doing the reassignment "bio" had already advanced.  I have no idea what this code is really doing and what these variables contain, but I did a bit of research with cscope and found that "bio_split" calls some *_iter_advance function which is before the reassignment.  I'll need to do some more testing tomorrow to see if this patch fixes the problem while maintaining the goal of the original fix.  This bug is still present in linux-stable.git and linus' tree.
Comment 2 Eric Work 2015-05-19 07:00:07 UTC
Created attachment 177291 [details]
md/raid0: fix restore to sector variable in raid0_make_request

I can confirm with reasonably strong confidence that the attached patch fixes the mentioned regression.  After 3 rounds of the above procedure I see no differences.  Going back to the unpatched kernel again I see a difference after the first round.

The updated patch is now in "git am" format.
Comment 3 Neil Brown 2015-05-20 23:06:58 UTC
Thanks a lot, and sorry for letting that bug in.
Patch will go to Linus shortly.
Comment 4 Eric Work 2015-05-20 23:18:06 UTC
It was a bit of a pain to recover from, but it was an interesting challenge to find and test the fix.  I got really lucky that I had been keeping my Fedora kernel up to date so the versions to check were just a few :-)
Comment 5 Renato Mendes Figueiredo 2015-05-21 03:31:39 UTC
I just lost an entire partition for this bug.
I have the fstrim.timer running and unfortunelly it runned right before the kernel 3.19.7-200 was updated at my fedora 21

After running e2fsck there are a lot of inodes corruption messages. Any help on how can I rebuild this partition? Should I replace my journal of something like that?

Well, the first indication for those looking for this error, is a huge amount number of errors in ldconfig, complaining about the libs headers.

p.s.: Yes, I should be sleeping now, but I`m looking for a solution (I didn`t lost my home folder, but everything else is lost)

Thank you
Comment 6 Josh Triplett 2015-05-21 03:37:08 UTC
Does turning off "discard" and not running fstrim entirely avoid this bug, as a workaround until stable kernels get this fix?
Comment 7 Renato Mendes Figueiredo 2015-05-21 03:38:18 UTC
Probably yes josh, keep away from all fstrim operations on raid0
Comment 8 Neil Brown 2015-05-21 03:45:36 UTC
A 4K (or larger) IO request that is not 4K-aligned on the array can still be handled wrongly.
Most filesystems do all their IO 4k aligned so problems are unlikely.
However if you partition an md/raid0 with a non-4K alignment, you could hit problems fairly easily.
Comment 9 Renato Mendes Figueiredo 2015-05-21 03:55:11 UTC
Neil,

Do you have any suggestions to recover the partition?

Thank you!
Comment 10 Neil Brown 2015-05-21 04:09:14 UTC
I wish I did.
"DISCARD" was told to discard something that shouldn't have been discarded.
Unless there is someway to revert all recent DISCARDs, which I very much doubt, there is no way to get that discarded data back.
Sorry :-(
Comment 11 Renato Mendes Figueiredo 2015-05-22 01:27:46 UTC
Ok Neil, I'm starting over! :)

How do I follow this bug until it's released?
Comment 12 Eric Work 2015-05-22 23:14:30 UTC
The fix for this bug has been merged into Linus' tree as commit a81157768a00e8cf8a7b43b5ea5cac931262374f
Comment 13 Evangelos Foutras 2015-05-23 02:11:30 UTC
Can you please clarify whether this issue is specific to ext4 file systems (as reported by some news sites) or affects any file system with discard support? (The latter seems more likely since the bug was in the md/raid0 layer.)
Comment 14 Neil Brown 2015-05-23 03:21:07 UTC
Bug is not specific to ext4.  Your analysis is correct.
Comment 15 Mateusz Jończyk 2015-05-23 17:25:59 UTC
From what I see, this bug is limited to RAID0 only.
Is RAID1 safe even on affected kernels? Was there any separate issue concerning RAID1 (or any other RAID levels)?
Comment 16 Eric Work 2015-05-23 18:08:18 UTC
This bug affects systems with kernel 3.19.7+ or 4.0.2+ running any filesystem on top of MD RAID 0 that supports and enables TRIM.  No other RAID levels are affected. I believe Intel fakeraid is also affected. If you don't use fstrim or have the 'discard' option enabled in fstab then you wouldn't be affected.  Removing these TRIM options is also the workaround.  Fedora has already included the fix in their next kernel update for F21 and F22.  Last check Arch Linux has not included the fix.
Comment 17 Evangelos Foutras 2015-05-23 18:21:24 UTC
Arch applied the fix yesterday:

https://www.archlinux.org/news/data-corruption-on-software-raid-0-when-discard-is-used/
Comment 18 Ortwin Glück 2015-06-03 13:11:52 UTC
This bug now also affects "stable" 3.18.14 because the buggy commit went in as:

d2c861b700b0af90da2d60b1b256173628fa6785 md/raid0: fix bug with chunksize not a power of 2.

But this fix did not.

Waving good bye to my raid-0 for the 2nd time.

Note You need to log in before you can comment on or make changes to this bug.