Bug 14972 - [regression] msync() call on ext4 causes disk thrashing
Summary: [regression] msync() call on ext4 causes disk thrashing
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-01 23:09 UTC by Artem S. Tashkinov
Modified: 2012-03-17 09:07 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.32.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
run this way ./test test.txt 100 (3.03 KB, text/plain)
2010-01-01 23:09 UTC, Artem S. Tashkinov
Details
.config; dmesg; lspci; hdparm -I (27.41 KB, application/octet-stream)
2010-01-01 23:14 UTC, Artem S. Tashkinov
Details
trace pipe output (10.60 KB, application/octet-stream)
2010-01-03 10:31 UTC, Artem S. Tashkinov
Details
blktrace for /dev/sda while running the test application (25.67 KB, application/octet-stream)
2010-01-04 10:05 UTC, Artem S. Tashkinov
Details

Description Artem S. Tashkinov 2010-01-01 23:09:49 UTC
Created attachment 24398 [details]
run this way ./test test.txt 100

The attached application causes useless disk thrashing in case of ext4, but works normally in case of ext3 (and possibly other FS's).

msync() for unchanged mmap'ed files shouldn't cause disk/HDD activity which I do observe.
Comment 1 Artem S. Tashkinov 2010-01-01 23:14:38 UTC
Created attachment 24399 [details]
.config; dmesg; lspci; hdparm -I
Comment 2 Theodore Tso 2010-01-02 13:19:23 UTC
Regression from what?  Ext3?  Or some earlier kernel version?

What arguments are you giving to this test program of yours?   It looks like it does some number of reads and/or writes to the file, and then calls 10,000 msyncs with 50ms wait between each msync.

I'm not seeing any disk activity as a result.   Was anything else reading or writing to the file or to the file system at the same time?
Comment 3 Artem S. Tashkinov 2010-01-02 15:10:03 UTC
(In reply to comment #2)
> Regression from what?  Ext3?  Or some earlier kernel version?

Regression from ext3.

> 
> What arguments are you giving to this test program of yours?   It looks like
> it
> does some number of reads and/or writes to the file, and then calls 10,000
> msyncs with 50ms wait between each msync.

While doing those msync()s mmap file is unchanged, but ...

> 
> I'm not seeing any disk activity as a result.   Was anything else reading or
> writing to the file or to the file system at the same time?

... my HDD led keeps flashing continuously which certainly means some disk activity. Should I attach a video showing this abnormality on kernel 2.6.32.2 on runlevel 1 with no application running except bash and this application?
Comment 4 Theodore Tso 2010-01-02 16:47:35 UTC
OK, so that's not technically a regression as far as what the kernel bugzilla field is concerned.

>... my HDD led keeps flashing continuously which certainly means some disk
>activity. Should I attach a video showing this abnormality on kernel 2.6.32.2
>on runlevel 1 with no application running except bash and this application?

What would be much more useful would be to install blktrace, and then attach the output of "btrace /dev/sdXX" while this application is running.

When I do the test, I am seeing some excess write barriers which we can optimize away:

254,3    0     1148    31.613064584 10904  Q  WB [test]
254,3    0     1149    31.664270330 10904  Q  WB [test]
254,3    0     1150    31.715259078 10904  Q  WB [test]
254,3    0     1151    31.772662156 10904  Q  WB [test]
254,3    0     1152    31.827932269 10904  Q  WB [test]
254,3    0     1153    31.883122551 10904  Q  WB [test]

but that's not a disaster.   (If there is no pending writes from other applications, this won't cause any extra hard drive activity.)

I'd like to confirm whether you are seeing anything more, since at least for me on my system, empty write barriers don't cause the hard drive activity light to go on.   Maybe it does for your system, though.  I'd like to confirm this since because if it's just a matter of extra (unnecessary) write barriers, we would prioritize this as a much lower priority bug to tackle than if there's something else going on.
Comment 5 Artem S. Tashkinov 2010-01-02 16:54:28 UTC
What debug options are needed for blktrace?

blktrace /dev/sda
Invalid debug path /sys/kernel/debug: 2/No such file or directory
Comment 6 Theodore Tso 2010-01-02 17:54:54 UTC
You need to compile kernel with CONFIG_BLK_DEV_IO_TRACE and then you need to make sure that the debugfs file system is mounted in /sys/kernel/debug, i.e.:

    mount -t debugfs none /sys/kernel/debug

The following documentation is from 2007, so there are some newer features (and the underlying implementation has moved to using ftrace), but the basic user interface hasn't changed much:

    http://pdfedit.petricek.net/bt/file_download.php?file_id=17&type=bug

More information:

    http://www.gelato.org/pdf/apr2006/gelato_ICE06apr_blktrace_brunelle_hp.pdf
Comment 7 Artem S. Tashkinov 2010-01-03 10:31:49 UTC
Created attachment 24417 [details]
trace pipe output

blktrace produces zero output, so I'm attaching trace pipe output while running this test application.

I tried using blktrace this way:

# mount -t debugfs nul /sys/kernel/debug
# echo blk > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /sys/block/sda/sda3/trace/enable

# blktrace /dev/sda3
BLKTRACESETUP(2) /dev/sda3 failed: 16/Device or resource busy
Thread 0 failed open /sys/kernel/debug/block/(null)/trace0: 2/No such file or directory
Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory
FAILED to start thread on CPU 0: 1/Operation not permitted
FAILED to start thread on CPU 1: 1/Operation not permitted
Comment 8 Theodore Tso 2010-01-03 12:11:56 UTC
I'm not sure why you are getting the device busy or resource busy error to the BLKTRACESETUP ioctl.   Are you sure this is a kernel with the blktrace support configured in?  And I assume the '#' error message indicates that you are running as root.  You don't have SELinux or some other LSM configured, do you?

Blktrace running alone does not produce output; it produces trace files for each CPU that can be parsed using the blkparse program.  If you look back, you'll see that I suggested running the btrace program after you install the blktrace package.   The btrace program is a convenience script which runs blktrace and blkparse to produce immediate output to standard out.   There are man pages for blktrace, blkparse, and btrace which should have been installed when you installed the binaries for blktrace.

Still, it shouldn't have spit out those error messages....
Comment 9 Artem S. Tashkinov 2010-01-03 12:38:54 UTC
grep BLK.*TRACE /usr/src/linux/.config

CONFIG_BLK_DEV_IO_TRACE=y

Yes, I'm running under root. SeLinux is disabled in the kernel. I've no idea what to do :(
Comment 10 Eric Sandeen 2010-01-04 03:35:22 UTC
(In reply to comment #9)
> grep BLK.*TRACE /usr/src/linux/.config
> 
> CONFIG_BLK_DEV_IO_TRACE=y
> 
> Yes, I'm running under root. SeLinux is disabled in the kernel. I've no idea
> what to do :(

Drop the echos.  Just mounting debugfs & running blktrace will suffice:

[root@inode ~]# mount -t debugfs none /sys/kernel/debug/
[root@inode ~]# blktrace /dev/sda
^C=== sda ===
  CPU  0:                    0 events,        0 KiB data
  CPU  1:                    1 events,        1 KiB data
  Total:                     1 events (dropped 0),        1 KiB data

but if you do this first:

[root@inode ~]# echo blk > /sys/kernel/debug/tracing/current_tracer
[root@inode ~]# echo 1 > /sys/block/sda/sda3/trace/enable

you get the -EBUSY:

[root@inode ~]# blktrace /dev/sda
BLKTRACESETUP(2) /dev/sda failed: 16/Device or resource busy
Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory
Thread 0 failed open /sys/kernel/debug/block/(null)/trace0: 2/No such file or directory
FAILED to start thread on CPU 0: 1/Operation not permitted
FAILED to start thread on CPU 1: 1/Operation not permitted

It's not related to selinux.

-Eric
Comment 11 Artem S. Tashkinov 2010-01-04 10:05:42 UTC
Created attachment 24427 [details]
blktrace for /dev/sda while running the test application

Thank you very much, Eric!

It seems like a lot of documentation needs to be updated.
Comment 12 Sergiy Zuban 2011-03-03 11:25:02 UTC
due to bad implementation of some apps/libs this bug cause continual disk access (HDD led is always ON) and possible disk damage. Is there a chance to somehow speed up release of fix for this bug (it's been reported over a year ago)? I believe the first step to change status to CONFIRMED.

Many people noticed unusual disk activity after upgrade to ext4:
https://bbs.archlinux.org/viewtopic.php?pid=692073
http://bugs.winehq.org/show_bug.cgi?id=24044#c13
http://www.justlinux.com/forum/showthread.php?p=891737
Comment 13 Artem S. Tashkinov 2012-01-11 00:07:59 UTC
In one of recent kernels (somewhere around 2.6.38) this issue was fixed.
Comment 14 Sergiy Zuban 2012-03-17 08:46:13 UTC
can someone else confirm it was really fixed? I'm still getting continual HDD activity on Linux laptop 3.0.0-16-generic #28-Ubuntu SMP Fri Jan 27 17:50:54 UTC 2012 i686 i686 i386 GNU/Linux in some apps (e.g. Miranda IM http://bugs.winehq.org/show_bug.cgi?id=24044#c13)
Comment 15 Artem S. Tashkinov 2012-03-17 09:07:57 UTC
(In reply to comment #14)
> can someone else confirm it was really fixed? I'm still getting continual HDD
> activity on Linux laptop 3.0.0-16-generic #28-Ubuntu SMP Fri Jan 27 17:50:54
> UTC 2012 i686 i686 i386 GNU/Linux in some apps (e.g. Miranda IM
> http://bugs.winehq.org/show_bug.cgi?id=24044#c13)

Please, try the attachment from comment 1. If you run it your HDD LED shouldn't light up, if it does then this bug is certainly not fixed.

Like I said "it works for me":

/dev/sdaX on / type ext4 (rw,noatime,nobarrier)
/dev/sdaX on /home type ext4 (rw,noatime,nobarrier)

Note You need to log in before you can comment on or make changes to this bug.