Bug 194071

Summary: data loss using fallocate and mmap
Product: File System Reporter: Michael Zimmer (michael)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: NEW ---    
Severity: high CC: david, jack
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.4.0+ Subsystem:
Regression: No Bisected commit-id:
Attachments: Example C program
[PATCH] ext4: Fix data corruption for mmap writes

Description Michael Zimmer 2017-02-06 10:59:24 UTC
Created attachment 254231 [details]
Example C program

After calling fallocate() on a shared mmap'ed file and writing data into the newly allocated region, occasionally (first observed after running for ~1 week) some data is replaced by 0s. The address and size of corrupted data is also not reproducible.

The initial failure was debugged and reduced to a C++ program that failed with both gcc and clang, and later to the attached C program. The amount allocated every iteration was reduced to 1 byte because that caused faster failures, and wasn't reproducible with higher power of 2 sizes.

Is this a bug or user error?

OS: Ubuntu 16.04.1 LTS
kernel versions: 4.4.0-38-generic, 4.9.7-040907-generic
block device: Observed on both /dev/ram0 and local SSD
ext4 mount options: (rw, relatime,data=ordered)

Unable to reproduce when using the "FALLOC_FL_ZERO_RANGE" flag, and on a tmpfs ram disk.

Reproduction steps:
sudo mkdir /mnt/ram0
sudo mkfs.ext4 /dev/ram0
sudo mount /dev/ram0 /mnt/ram0/
gcc -O2 tests_mmap_fallocate.c -o tests_mmap_fallocate_gcc
while sudo rm -f /mnt/ram0/tests_mmap_fallocate && sudo ./tests_mmap_fallocate_gcc; do date && sleep 1; done
...
...
...
Value has been modified
(Also nothing found in /var/log/kern.log)

On a development machine the failure only occurs after several days of running in a loop, but fails within minutes on a virtualized Linux machine on a server.
Comment 1 Michael Zimmer 2017-04-26 10:46:17 UTC
Has anyone investigated or been able to reproduce this failure?
Comment 2 Jan Kara 2017-05-25 08:59:38 UTC
Looks like a bug in ext4... I'm investigating...
Comment 3 Jan Kara 2017-05-25 11:29:21 UTC
Created attachment 256719 [details]
[PATCH] ext4: Fix data corruption for mmap writes

This patch fixes the issue for me.
Comment 4 Jan Kara 2017-05-25 11:55:53 UTC
BTW, can I base a testcase for fstests on your example program?
Comment 5 Michael Zimmer 2017-09-05 10:02:27 UTC
Thanks for investigating and making the patch. Sorry that I missed your last comment, feel free to base a testcase on the example program.