Bug 16380

Summary: ext2/ext4 filesystem "hangs" for a few seconds when trying to write a lot of data to a freshly RW mounted FS
Product: File System Reporter: Artem S. Tashkinov (aros)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED INVALID    
Severity: low CC: florian, pshah.mumbai, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.x 3.x Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    
Attachments: Video demonstration

Description Artem S. Tashkinov 2010-07-13 23:21:26 UTC
Since starting using kernel 2.6.35-rc5 I've noticed that it behaves strangely with loop devices.

Steps to reproduce:

1) On top of ext4 fs: 
$ dd if=/dev/zero of=ext2.img bs=100M count=20
2) mke2fs ext2.img
3) mount -o loop,noatime ext2.img /mnt/loop
4) now start copying files (10-100Mb in size) from any media to /mnt/loop using cp -a

Result: at times the computer will stall for up to 5-15 seconds with zero IO activity however load average will still be above 1.0. I estimate IO activity by observing data from GKRellM utility.

Expected results: steady copying process.
Comment 1 Florian Mickler 2010-09-08 10:41:49 UTC
Can you bisect the issue?
Comment 2 Florian Mickler 2010-09-30 06:12:56 UTC
Did you have any success debugging this issue further?
Comment 3 Artem S. Tashkinov 2010-09-30 07:37:15 UTC
I've no idea how to debug it, however this testcase is 100% reproducible.

http://www.mediafire.com/?ah63babkaabbfn2
Comment 4 Florian Mickler 2010-09-30 11:42:08 UTC
Look at the manual page for git-bisect to see how to find the patchset that introduced this regression.

I fear else we may not know what or where to look for. 

Jens, do you have any idea?
Comment 5 Artem S. Tashkinov 2010-12-03 10:48:26 UTC
In 2.6.36 this issue has become even worse.

After creating 3GB (using dd, so the file is fully allocated) looped ext2 partition in a file on top of ext4 filesystem, I can only copy files by 300MB chunks, and after copying each chunk the system "hangs" (everything works, just copying process completely stalls) for up to 20 seconds doing nothing (with load average = 1.0).

Filling this 3GB looped ext2 partition took roughly 5 minutes, even though I have a HDD capable of 100MB/sec throughput.

Something is terribly broken in Linux kernel.

"Host" filesystem fragmentation is quite normal (~5%).

ext4 module runs all filesystems in question.
Comment 6 Artem S. Tashkinov 2010-12-03 10:52:16 UTC
Checking if file is heavily fragmented:

$ time dd if=ext2.img of=/dev/zero
6144000+0 records in
6144000+0 records out
3145728000 bytes (3.1 GB) copied, 25.5556 s, 123 MB/s

real    0m25.676s
user    0m1.630s
sys     0m7.596s

It's obviously not fragmented at all. (Maximum HDD throughput here is 135MB/sec at the beginning of the disk).
Comment 7 Florian Mickler 2010-12-03 19:24:44 UTC
Did you have any luck using git-bisect?
Comment 8 Artem S. Tashkinov 2011-03-22 22:02:54 UTC
This is an ext2 fs bug.
Comment 9 Artem S. Tashkinov 2012-08-04 11:19:36 UTC
It's reproducible with ext4fs as well under Linux 3.5.
Comment 10 Prashant Shah 2012-08-04 11:53:19 UTC
Didnt face this issue with 3.2.0-23-generic - 64 bit.
Comment 11 Prashant Shah 2012-08-04 12:16:16 UTC
(In reply to comment #10)
> Didnt face this issue with 3.2.0-23-generic - 64 bit.

Sorry I was able to reproduce it under 3.2.0-23-generic - 64 bit
Comment 12 Theodore Tso 2012-08-04 18:13:20 UTC
I wasn't able to duplicate this with an underlying ext4 file system, with the loop file system being ext2, using a 3.5 kernel on a system with 16G.

I tried both using a variety of small and medium sized files, as well as one very large file, and I didn't see any unexplained stuttering in write bandwidth.   (There were times when cp -r was clearly writing to the in-memory page cache, and the writeback hadn't begone yet.   There were also times when we couldn't do any more writing, because we were busy reading from the source drive.  But it all looked pretty normal to me.)
Comment 13 Theodore Tso 2012-08-04 18:14:08 UTC
BTW, I was using iostat 1 to measure I/O activity.  Call me old-school; I don't exactly trust GUI tools.  :-)
Comment 14 Artem S. Tashkinov 2012-08-04 18:51:24 UTC
(In reply to comment #12)
> I wasn't able to duplicate this with an underlying ext4 file system, with the
> loop file system being ext2, using a 3.5 kernel on a system with 16G.
> 
> I tried both using a variety of small and medium sized files, as well as one
> very large file, and I didn't see any unexplained stuttering in write
> bandwidth.   (There were times when cp -r was clearly writing to the
> in-memory
> page cache, and the writeback hadn't begone yet.   There were also times when
> we couldn't do any more writing, because we were busy reading from the source
> drive.  But it all looked pretty normal to me.)

Hm, I cannot reproduce the problem with a loop device on top of tmpfs either, but I've just checked my 250GB ext4 partition and the bug is still there:

When I try to copy a 3GB file (residing in tmpfs so it's all cached) to it, there's a 4 (four) seconds delay before a single byte gets written to the destination partition. After that all consequent files get written without any delays.
Comment 15 Theodore Tso 2012-08-04 19:51:51 UTC
The delay before the write starts is normal; that's encoded in how the writeback code handles dirty pages.  We don't start writes the instant that pages are dirtied in the buffer cache.

In any case, if you have arguments about how the writeback code makes its choices, it's kinda of pointless to be complaining on a bug targetted at the ext4 developers, since we start the writeback when the VM system asks us to start write pages belonging to a particular inode....

See /usr/src/linux/Documentation/sysctl/vm.txt for more information.
Comment 16 Artem S. Tashkinov 2012-08-04 20:00:28 UTC
(In reply to comment #15)
> The delay before the write starts is normal; that's encoded in how the
> writeback code handles dirty pages.  We don't start writes the instant that
> pages are dirtied in the buffer cache.
> 
> In any case, if you have arguments about how the writeback code makes its
> choices, it's kinda of pointless to be complaining on a bug targetted at the
> ext4 developers, since we start the writeback when the VM system asks us to
> start write pages belonging to a particular inode....
> 
> See /usr/src/linux/Documentation/sysctl/vm.txt for more information.

Thanks for letting us know, probably what I perceived as a regression in ext4fs was in fact a change in the VM writeback code.

It's still weird and terribly counter intuitive as no other OS exhibit this behavior, but now that I know why it happens I need to file a different bug report.
Comment 17 Artem S. Tashkinov 2015-02-07 10:25:33 UTC
Created attachment 166031 [details]
Video demonstration

I'm NOT reopening this bug report. I'm merely adding a video clip which pertains to it.