Bug 42895

Summary: jbd2 makes all system unresponsive
Product: File System Reporter: Fabio Varesano (fabio.varesano)
Component: ext3Assignee: fs_ext3 (fs_ext3)
Status: RESOLVED UNREPRODUCIBLE    
Severity: normal CC: huguelectro, ivanovstm2, jack, jcanivet, nicolas.ecarnot, sami.liedes, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2.9 Subsystem:
Regression: No Bisected commit-id:

Description Fabio Varesano 2012-03-09 10:42:56 UTC
I'm experiencing system performance problems when using ext3. It seems that on any read from the disk, jbd2 process are making lot of additional IO and consuming CPU resulting in system unresponsiveness and slow down.

Some more info:

mount http://sprunge.us/PMAV

uname -a
  Linux gamma 3.2.9-1-ARCH #1 SMP PREEMPT Thu Mar 1 09:31:13 CET 2012 x86_64 Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz GenuineIntel GNU/Linux

Let me know additional details you need.

Thanks,

Fabio Varesano
Comment 1 Fabio Varesano 2012-03-09 10:49:20 UTC
It seems that even if the fil systems are ext3, the ext4 module is being used due to CONFIG_EXT4_USE_FOR_EXT23 enabled in the kernel.
Comment 2 Hugues Tranli 2012-05-02 12:10:49 UTC
Same issue on my computer using ext4 with Archlinux. I can hear my HDD writing every 3 seconds.
Checked with iotop, it's jbd2 related. I don't remember having this in 3.2.
Running 3.3.3 kernel was just painful. Switched to 3.3.4, it's a bit better but I can still hear my HDD very often. Especially when I use Chromium in spite of the cache is on tmpfs.
Even I verify HDD S.M.A.R.T. sometimes, I hope this will be fixed soon.

Here mount output: http://pastebin.archlinux.fr/441560
Comment 3 Theodore Tso 2012-05-02 16:08:48 UTC
It sounds like something in userspace is simply calling fsync() very frequently.  Can you try the following:

echo 1 > /sys/kernel/debug/tracing/events/ext4/ext4_sync_file_enter/enable 

... then wait for a minute or so, and then capture the output of:

cat /sys/kernel/debug/tracing/trace

That should allow us to see what is causing the excess commits.
Comment 4 Theodore Tso 2012-05-02 16:10:21 UTC
Oh, the above requires that CONFIG_FTRACE be enabled, and that your distribution is mounting the debugfs file system on /sys/kernel/debug.  If not, you may need to recompile your kernel to enable FTRACE, and/or manually mount the debugfs filesystem before you try the above experiment.
Comment 5 Hugues Tranli 2012-05-02 18:49:27 UTC
Ok mate, I tried what you asked. I am lucky, the FTRACE is unable in my kernel and I mounted the debugfs.
After around one minute (maybe more), here what I get: http://pastebin.archlinux.fr/441700

Best Regards.
Comment 6 Theodore Tso 2012-05-02 19:50:49 UTC
As I expected, this is being caused by some application --- it looks like Chrome's DB thread, is calling fsync() very aggressively.  If you exit your Chrome browser, it should cause your disk from being used all the time.

I suspect the new kernel and the use of ext4 was just a coincidence.  Since I'm using the very latest Chrome 19 beta and I'm not seeing this, I'm going to guess that it's some Chrome extension, or some page which you are visiting that is causing the large amounts of fsync() activity.

You can get a continuous reading of fsync() activity by enabling the above tracepoint, and then in a window, running the command "cat /sys/kernel/debug/tracing/trace_pipe".

If the fsync() activity goes away after you exit the Chrome browser, try starting up the Chrome browser and then see if you see the heavy fsync() activity with just a empty browser window before you start opening lots and lots of tabs.  Also try disabling all of your Chrome extensions, and see if that quiet things done.

Once you get things quiet, try enabling Chrome extensions one at a time, and opening various web pages one at a time, and see if you can figure out what causes the fsync storm.  It may very well be some flash-enabled page --- I run with FlashControl to take out potential web page malware, or other things that I don't want (and which can often burn CPU time and hence battery life).

Cheers,
Comment 7 Hugues Tranli 2012-05-05 20:42:44 UTC
Ok, I turned off my extensions (was just using "Silver Bird" and "Adblock"). That's all.
Tried with an empty browser and the result was almost the same as before.
Even with a single tab (this bug report), the fsync() activity is very high.
I am using Chromium 18.0.1024 build which seems to be buggy.

Pastebin: http://pastebin.archlinux.fr/443140

Wanted thank you for your help (especially debug, I didn't know it).
Comment 8 Eugene 2012-05-13 15:30:31 UTC
There's related jbd2 bug in Ubuntu:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/607560 - ext4 jbd2 writing block every 5-10 seconds, preventing disk spin-down and making noise. Even when PC is idle.
Bug report open 2010.08 and still active; includes newest Ubuntu 12.04.
+ half a dozen of threads on Ubuntu and Archlinux forums including thread for this bug:
https://bbs.archlinux.org/viewtopic.php?id=139087 - Intense disk IO (jbd2 and flush) blocks user interface.

I personally not sure that this is jbd2's fault. Something must be going on in the system. The problem is that nobody is able to point a finger at the cause except that jbd2 writes to disk.


Following instructions given by Theodore Ts'O, I don't see excessive usage of fsync:
http://paste.ubuntu.com/981398

block_dump of my idle system doesn't shows something out of ordinary except much of flush/jbd2 activity:
http://paste.ubuntu.com/983568/
(sda3 /, sda6 /home. Network Manager writes every 5min, xfconfd - every 10min; open apps - terminal, gedit, xfce thunar and light screensaver; syslog disabled).

When I watch /proc/meminfo "Dirty:" field, it shows that during and *following* any system activity - memory being dirtied ~every 3-15 seconds (accompanied with audible HDD clicks) and flushed to disk. This goes on for some time then all quietens.
"iotop" shows jbd2 writes and writes of process which it can't properly recognize.


Is there a way to find a cause of flush and jbd2 writes? Like apllication name or PID?
Preferably without need to patch/compile new kernel, so that as many people as possible can use it.


P.S. Subjective feeling is that any work done on system (and often idling too) no matter how light is accompanied by noticeable HDD clicks every ~5sec. And opening even relatively light apps like gedit leads to bursts of loud HDD usage.
Comment 9 Hugues Tranli 2012-05-14 10:44:01 UTC
What command did you use to watch block_dump?

Even if I switched to Firefox and reduced the HDD writes, iotop shows me often jbd2/flush and I would like to see if something is wrong.

About your question, I can't help you but maybe it's related to a daemon/software running in the background.

What distribution are you using? Paste the ouput of "uname -a" and your /etc/fstab would be a good start.

You can try noatime option, in your fstab and see if it's better.
https://wiki.archlinux.org/index.php/Fstab#atime_options

Regards.
Comment 10 Jan Kara 2012-05-14 21:43:40 UTC
(In reply to comment #8)
> Is there a way to find a cause of flush and jbd2 writes? Like apllication
> name
> or PID?
> Preferably without need to patch/compile new kernel, so that as many people
> as
> possible can use it.
  If you have tracepoints enabled in your kernel (they usually are) and you have relatively recent kernel, then you can enable tracepoint in ext4_mark_inode_dirty() by:
echo 1 >/sys/kernel/debug/tracing/events/ext4/ext4_mark_inode_dirty/enable

Then you can run:
cat /sys/kernel/debug/tracing/trace_pipe
and watch how inodes are dirtied. That should catch practically all cases where JBD2 eventually ends up doing some IO.
Comment 11 Eugene 2012-05-15 14:40:05 UTC
Thanks for the answers guys.

to Hugues Tranli:

>maybe it's related to a daemon/software running in the background.
Well, that's pretty much what everybody want to find out. But seems it's not that easy.

I'm on Xubuntu 11.10 on 3.0.0-17-generic #30-Ubuntu SMP Thu Mar 8 20:45:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux on AMD CPU.
cat /etc/fstab and cat /proc/mounts give me usual default settings:
/ ext4 rw,noatime,errors=remount-ro,user_xattr,barrier=1,data=ordered
Except that I changed relatime to noatime. Though it's almost the same and on desktop it doesn't give noticeable change in disk usage.

Block_dump:
See message 73 in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/607560 but overall it's like this:
echo 1 > /proc/sys/vm/block_dump
Watch results in syslog (note that syslog will frequently write to disk) or by periodically running dmesg. To get more "clean" results - disable kernel logging in syslog, enable block_dump, wait some time or do something in the system, then reenable kernel logging - results of block_dump should appear in syslog.
I'm not aware of convenient way to watch it in realtime.


to Jan Kara:

Enabling ext4_mark_inode_dirty gives me constant flood of events where xfce4-terminal dirties mostly two same inodes:
http://paste.ubuntu.com/988896/
Is it expected behavior? This is default terminal in Xubuntu.
Comment 12 Jan Kara 2012-05-15 14:54:40 UTC
(In reply to comment #11)
> Enabling ext4_mark_inode_dirty gives me constant flood of events where
> xfce4-terminal dirties mostly two same inodes:
> http://paste.ubuntu.com/988896/
> Is it expected behavior? This is default terminal in Xubuntu.
  Yeah, forgot about it. Sorry for extra work. Terminal writes to some pipe inode or socket when it outputs to the screen... Just create tmpfs if you don't have one (e.g. mkdir /tmp/mem; mount -t tmpfs none /tmp/mem/) and then gather the output like "cat /sys/kernel/debug/tracing/trace_pipe >/tmp/mem/dirty_trace".

BTW writing to sockets or named pipes is often source of filesystem modifications because mtime of those inodes change - e.g. postfix is causing these relatively frequently, xfce4-terminal is another such case. It's a bit tricky to avoid these because I'd be recluctant to completely stop updating mtime on socket/pipe inodes as some strange tool may rely on that. But I've got some idea which may work so if this is indeed the problem, I can look at it again.
Comment 13 Eugene 2012-05-16 16:34:39 UTC
Thanks, now it works.
Traced system for around 10mins and I don't see previously unseen processes. Just normal system activity. But there was a number of hdd writes (around 6 in 10mins) which doesn't have corresponding entries in trace results and separated from entries by roughly 30sec (flush?) - that's not including 1-2 writes closely following (~3-5sec) almost each entry (jbd2?).
http://paste.ubuntu.com/990927/

Can I trace some function which gives output in trace results when actual write takes place but shows initial cause of the write (not just flush or jbd2)? Like block_dump but more informative?

If it's not possible then for now I'm going to presume that it's just the way how system works. But still it's lots of flushes/jbd2 following writes of even small files:
https://launchpadlibrarian.net/101378473/15mins_almost_idle.txt

For example in above block_dump Network Manager's "timestamps" file is 4910 bytes long and followed by ~20 writes by jbd2 and then by 10 flushes.
Or xfconfd's "xfce4-panel.xml" is 9467 bytes long and followed by 8 writes by jbd2 and then by 20 flushes all around the disk.
Is it normal?
Comment 14 Jan Kara 2012-05-16 19:29:54 UTC
I've looked at the results and they look normal. Different processes dirty some inodes - after a while jbd2 journal thread writes these changes to the journal (it checks once per 5s). The journal thread writes somewhat more because there are some descriptor blocks etc.

Also once per 5s, flusher wakes up and writes all dirty data older than 30s.

So really sources of all the activity are those few processes (gedit, NetworkManager, xfconfd) which do dirtying and the rest is just triggered by that.
Comment 15 Theodore Tso 2012-05-16 20:24:41 UTC
Eugene, one thing which may not be obvious is that it's not the number of disk blocks that are written; it's how often the disk drive seeks, and more importantly, wakes up.  Once you wake up the hard drive, and the hard drive has positioned the hard drive heads to the right place on disk, whether you write 8k or 32k or 128k doesn't make that much difference in terms of time and power requirements.

So for example, if gedit writes a single file, it might cause multiple jbd2 writes: for the block allocation bitmap, for the inode allocation bitmap, the inode table, the block group summary block, and then the data block itself.  But the jbd2 writes are contiguous, and happen all at once, during a journal commit (which is caused either by an explicit fsync or by the 5 second commit timer which starts a journal commit 5 seconds after metadata changes have been applied to the file system).   So you might see "lots of jbd2 writes", but it's misleading in terms of the power requirements or disk time fraction consumed by the journalling.   Of course, if you don't care about consistency after a crash, you can always disable journalling.  But then you might lose data files after a system crash.

The bottom line is that if you want to keep the disk sleeping for longer, you have to be absolutely firm about not having programs write to the file system.   For things like gedit, it's caused by user activity, and that's fine and good.  But if NetworkManager is dirtying the disk every few seconds or minutes, especially if it's something useless like some kind of timestamp file, it's a bug that should be filed with extreme prejudice against NetworkManager, since it's guaranteed to be constantly waking up the disk.
Comment 16 Eugene 2012-05-17 16:34:06 UTC
to Jan Kara, Theodore Ts'o:

Seems there is some difference in how various systems work (maybe specifics of work of some hardware configurations and/or distro configuration choices). My older system on 2.6.28 with ext3 had overall less HDD activity. And seems many people are deceived by similar changes in system behavior into thinking that something isn't right with the system. Which in turn leads them to notice jbd2 process and deduce that it's the main culprit in constant writes on disk since information on jbd2 is scarce.

Thank you very much for your help and explanation.
I will try to point here people in launchpad's bug thread.

Eugene Ivanov
Comment 17 Theodore Tso 2012-05-17 18:34:58 UTC
Eugene,

I suspect this is much more of a change with the userspace components of the distribution.  Ext3 hasn't changed much since 2.6.38, and everything I've said about how jbd2 works also applies to ext3 and jbd.   So people who are convinced this is a jbd2 problem can certainly try building a kernel with ext3 (for those distro's who are using ext4 for ext3 and ext4 file system formats).

My experience is that when comparing ext3 and ext4, I haven't found a situation where ext4 has been slower than ext3.  If you can demonstrate a situation where using ext3 you get one speed, and then using exactly the same configuration, ext4 is slower, please let me know and we'll certainly take a look at it.

Something else you might try is dropping a modern (3.2 or 3.3) kernel unto an older distribution.  I'm currently typing this on a Ubuntu LTS 10.04 system running with a 3.2 kernel.  That would allow for a controlled experiment.  But all I can say is if Network Manager is doing something as crazy as writing to a Timestamp file every n seconds, there really is nothing you can do at the kernel level.
Comment 18 Fabio Varesano 2012-05-17 20:02:05 UTC
I'm the one who firstly opened this issue.. the problem showed up all at a time, probably after one upgrade. I'm running ArchLinux on my system and it used ext4 also on ext3 partitions by default. I did tried to recompile the kernel to allow mounting fs as native ext3... no difference at all.. still slow and always working.

Personally, I "fixed" the problem by switching my /home fs to reiser3. After that system became fast as it was before.

IMHO this is really kernel stuff...
Comment 19 Sami Liedes 2012-05-17 21:18:25 UTC
I have seen this too; basically latencies have always been a problem with KDE (at least KDE 4) on ext4, as far as I can tell. Switching to XFS solved the problem for me.

One thing I might be able to add to the discussion:

While this problem manifests without disk crypto, adding dm-crypt to the soup, especially on a computer _without_ hardware AES support, generally makes it *much* worse. I've seen disk accesses, especially fsync()s (per latencytop), take minutes in the worst case under some medium disk load; 5-10 seconds are a norm on light I/O load. I can't see much reason why disk crypto should make it that much slower. But since it's a problem also without crypto, perhaps dm-crypt just somehow makes it worse and more obvious?
Comment 20 Eugene 2012-05-18 14:52:26 UTC
to Theodore Ts'o:

I'm on freshly formatted pure ext4 fs and it's at the very least not any slower than my ext3 was, no questions here. I just used an opportunity to ask for assistance in bug report related to my own and launchpad's "jbd2 writing block every 5-10 seconds".

I'm not experiencing slowdowns. Just periodic writes. And after monitoring disk activity and making some tweaks to applications (mostly firefox) it's got better. After that the only thing that confused me was number of inodes which jbd2 writes and you explained it.
Comment 21 Nicolas Ecarnot 2012-07-30 20:00:49 UTC
Hi,

I eventually found the thread I was looking for and I feel the need to testify what I'm experiencing.

I have some IT skills but here I won't go deep into details.
I'm a very conservative user and my setups and software usage are *very* stable. The set of apps I'm using month after month do not change, and I'm not the kind of people that installs a new app or plugin every day.
That allows me to observe the improvement or flows of system upgrades as time goes by.

I'm using Xubuntu Precise 12.04 on a 64 bits cpu :
| root@xxxxx:~# uname -a
| Linux xxxxx 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
and I'm running the common almost daily/weekly upgrades.

Since say one week, every action I do leads to a massive disk I/O activity that freezes my whole system (GUI or via ssh, same punishment). The action I'm talking about are very complex : open a new tab in firefox, launch xterm, create a new mail, close xterm, click to expand the menu (!)...
To post the message you are reading, I had to create an account.
Clicking on the "Send" button to submit my form took a solid two minutes to complete. This makes no sense.

Read my lips : my setup was very simple and efficient, and all of a sudden, it became unusable.

As other tried it, I tried to play with iotop, htop and so on. I tried to blame my home partition in btrfs, but I'm using it with no issue for months. I tried to blame some specific apps, and thought I found it when disabling a iphone-automount related bin, but issue arose since.
I discovered the recent use of cgroups, but I've no clue whether this deserves blaming. From times to times, iotop is showing my jbd2 activity, of btrfs caching, but this is not related to 100% of freeze events.
Googling showed me this was not an ubuntu-specific issue.
Experts will have to decide if this is a filesystem-specific issue...

This bug appears to me quite serious and deserves attention. I'm very surprised such a thing isn't more widely spread and then fixed.
Comment 22 Theodore Tso 2012-08-01 02:43:12 UTC
Nicolas,

The problem you are describing is not really related to the other problems (and I use plural, because this bug has suffered from Ubuntu Launchpad disease --- people are glomming onto a bug without understanding whether what they have are the same bug or a different bug).

I don't know what is causing your freeze events, but by posting here, you're only going to get the attention of ext4 developers.  The fact that you're not seeing an exact correlation with jbd2 events means it's probably something else.  The fact is that jbd2 activity is correlated with *any* change to the file system.  So if you have something doing lots of unnecessary file system writes, it's going to cause jbd2 activity.  That doesn't mean a jbd2 bug; it just means the userspace is doing something unnecessary.

As far as your complaint is concerned, it's not at all obvious whether it's a memory shortage problem, perhaps only in certain cgroup containers, or something else entirely different.

I'm certainly not seeing any problems with my Debian wheezy laptop, but I'm using Xfce, not GNOME 3, and I'm not using systemd.
Comment 23 Theodore Tso 2012-08-01 02:45:22 UTC
I'm closing this bug since the original poster never replied beyond comment #3 with more information, and in the meantime, more people have chimed in with other unrelated complaints.  Hopefully the ftrace recipe is helpful in some people debugging userspace bugs.