Bug 39072

Summary: jbd2 writes on disk every few seconds
Product: File System Reporter: mkkot (marcin2006)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED INVALID    
Severity: normal CC: jcanivet, lars.stokholm, marcin2006, sandeen, tytso
Priority: P1    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.39.2 arch Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Journal tracepoint patch
Output from trace_pipe

Description mkkot 2011-07-09 15:12:46 UTC
If I don't specify in /etc/fstab some large commit value, jbd2 is killing my disk, I think even more frequently than default 5 seconds. Mount says:
/dev/sda6 on / type ext4 (rw,noatime,user_xattr,acl,barrier=1,data=ordered)
In iotop jbd2 constantly appears on top and sysctl vm.block_dump=1, dmesg | grep -e READ -e WRITE shows that jdb2 is writting on the disk every few seconds and only jdb2. I have IBM ThinkPad x30 here, but simmilar thing happens on my PC. I guess it should not write to disk if there isn't really anything to write. It is impossible to spin-down the HDD.

I have a minimal archlinux system here with xfce, running from failsafe entry from GRUB, while pursuing this problem I have killed X, killed all daemons, swapoff, unloaded acpi modules, usb modules and other - nothing helps.

I'm not even sure what logs should I attach as people are complaining on different configurations for a year, it's all there:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/607560
https://bbs.archlinux.org/viewtopic.php?id=113516&p=1
Comment 1 Theodore Tso 2011-07-10 03:48:59 UTC
By enabling the jbd2_run_stats tracepoint, I'm seeing a handful of handles getting started every 5 seconds.

If I go into single user mode, with all user daemons killed off, jbd2 doesn't trigger.   So it's one of the userspace daemons which is causing file system activity.

I don't know why iotop isn't showing the activity, but I'm pretty sure that's a shortcoming of iotop.

I'll add some jbd2 trace points so we can see what is constantly starting jbd2 handles.  But my quick testing seems to indicate to me that jbd2 is functioning correctly, and it's active because something from userspace is ticking the file system.
Comment 2 Lars Stokholm 2011-07-10 20:08:51 UTC
I just wanted to say that I agree
Comment 3 Lars Stokholm 2011-07-10 20:13:38 UTC
Oops. Sorry, I'm very new to this bug list, I committed before I should have.

I just wanted to say that I agree with you Theodore. Going into single user mode helps. In fact just killing off my daemons (syslog-ng, cron and so on) along with X was enough. I didn't have to go to single user mode.
Comment 4 Theodore Tso 2011-07-11 03:15:46 UTC
Created attachment 65212 [details]
Journal tracepoint patch

The following patch should help people figure out which process is responsible for starting an jbd2 transaction handle.   After applying the patch, enable it via:

echo 1 > /sys/kernel/debug/tracing/events/ext4/ext4_journal_start/enable
echo 1 > /sys/kernel/debug/tracing/events/jbd2/jbd2_run_stats/enable
cat /sys/kernel/debug/tracing/trace_pipe

... and wait.

(All of this assumes you have ftrace enabled, of course).
Comment 5 Lars Stokholm 2011-07-11 09:14:01 UTC
Created attachment 65282 [details]
Output from trace_pipe
Comment 6 Lars Stokholm 2011-07-11 09:21:47 UTC
As you can see I uploaded the trace_pipe content from a little more than 10 minutes of idling on my part. There's a lot of writes from syslog-ng (as expected).
Comment 7 Theodore Tso 2011-07-11 10:26:45 UTC
Lars, thanks for trying out the tracepoint.  

Marcin, would it be convenient / posible for you to give this patch a try?   And would you be willing to work with some of the other folks at Arch linux and Ubuntu?   More importantly, once you find the guilty parties that are constantly syslogging or constantly modifying files (log files or database files or whatever), can you help with filing bugs upstream with the applications in question?

The reason people with many different applications are seeing this problem is because many application writers are careless with their programming, and so we need as many people as possible investigating problems and complaining to the application programmers.

Thanks!
Comment 8 mkkot 2011-07-11 10:49:19 UTC
Yes, of course, I was already asking for some help with this in archlinux forum. But due to an examu it has to wait until tomorrow, sorry.
Comment 9 mkkot 2011-07-13 19:57:55 UTC
Okay, thanks to this patch most people agreed it's cron which triggers syslog to write. I'm not convinced so since yesterday trying to compile kernel but had some difficulties (first of all, thought it will take 15 minutes on my pc, phew). Need some more time but I'm going to test this patch for sure.
Comment 10 mkkot 2011-07-14 17:02:39 UTC
After 5 and half hour of compilation ;) I have some results:

- command sysctl vm.block_dump=1 (so I can see writes and reads with dmesg | grep -e READ -e WRITE) makes syslog-ng to write every few seconds which triggers jbd2 as well. So I fell in my own trap. Killing syslog helps there.

- console-kit-daemon also triggers jbd2. I'm not sure why but it writes over and over again in text-mode. Killing it helps until you log in within another console, then it starts again. It quiets in X mode, though. Strange.

- Killing console-kit-daemon in text-mode enables possibility to spin-down the disk for a minute or two. Then some pdflush or cron job starts.

So it seems there is nothing wrong with ext4. Anyway, thanks for the patch, it was very helpful to determine what is wrong and solve people's problems.
Comment 11 Eric Sandeen 2011-07-14 17:17:39 UTC
It would be worth filing bugs, or at least asking about, those applications.

If syslog-ng is syncing every few seconds without writing any data, that sounds like a bug.

I collected some of these for fedora, for a while:

"Tracker bug for over-eager apps that won't let disks spin down"
https://bugzilla.redhat.com/show_bug.cgi?id=454582

but I probably need to go hunting again.

-Eric
Comment 12 mkkot 2011-07-14 17:25:48 UTC
It does in special circumstances which is vm.block_dump enabled. For the rest go to: https://lists.balabit.hu/pipermail/syslog-ng/2011-July/016893.html
Comment 13 Eric Sandeen 2011-07-14 17:30:08 UTC
Oh, right, it's a heisenbug; with block_dump enabled you constantly generate log traffic which must be written, which generates log traffic which must be written, which generates log traffic which must be written, which ... ;)