Bug 12224
Summary: | journal activity on inactive partition causes inactive harddrive spinup | ||
---|---|---|---|
Product: | File System | Reporter: | C Sights (csights) |
Component: | ext3 | Assignee: | fs_ext3 (fs_ext3) |
Status: | CLOSED CODE_FIX | ||
Severity: | low | CC: | rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27.6 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 11808 | ||
Attachments: |
patch reverting old change & adding Jan's proposed fix
Updated patch |
Description
C Sights
2008-12-14 11:39:33 UTC
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 14 Dec 2008 11:39:34 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12224 > > Summary: journal activity on inactive partition causes inactive > harddrive spinup > Product: File System > Version: 2.5 > KernelVersion: 2.6.27.6 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: low > Priority: P1 > Component: ext3 > AssignedTo: fs_ext3@kernel-bugs.osdl.org > ReportedBy: csights@fastmail.fm > > > Latest working kernel version: 2.6.27.5 > Earliest failing kernel version: 2.6.27.6 Thanks for being this specific. > Distribution: Debian stable with custom compiled vanilla kernel > > Problem Description: > Activity on one partition (hda1) causes kjournald to also write to > another > partition (sda1). This in turn causes the harddrive which contains sda1 to > spinup or not spindown. > I've activated laptop-mode using laptop-mode-tools. hda1 and sda1 are > both > formatted with ext3. > On unaffected kernels (e.g. 2.6.27.5) btrace produces no output. On > affected kernels (e.g. 2.6.27.6, 2.6.27-rc7) btrace produces output as below. I think I've seen other reports of this. Maybe it was you. > /dev/hda1 on / type ext3 (rw,noatime,errors=remount-ro,commit=1209600) > tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) > proc on /proc type proc (rw,noexec,nosuid,nodev) > sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) > udev on /dev type tmpfs (rw,mode=0755) > tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) > devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) > debugfs on /sys/kernel/debug type debugfs (rw) > > Steps to reproduce: > 1. start "btrace /dev/sda" > 2. begin compiling kernel > 3. within seconds btrace begins output similar to below. > > > Thanks for all your work! Sorry if I've misplaced blame for this undesirable > change in behavior. > > Using 2.6.27.8, btrace of /dev/sda while compiling on /dev/hda1 OK, that's a huge mess because bugzilla went and wordwrapped it. Let me fix it: > Dec 9 11:15:32 localhost logger: 8,0 0 1 > 0.000000000 1109 A WS 14375 + 8 <- (8,1) 14312 > Dec 9 11:15:32 localhost logger: 8,0 0 2 > 0.000006705 1109 Q WS 14375 + 8 [kjournald] > Dec 9 11:15:32 localhost logger: 8,0 0 3 > 0.000043581 1109 G WS 14375 + 8 [kjournald] > Dec 9 11:15:32 localhost logger: 8,0 0 4 > 0.000065651 1109 P N [kjournald] > Dec 9 11:15:32 localhost logger: 8,0 0 5 > 0.000076267 1109 I W 14375 + 8 [kjournald] > Dec 9 11:15:32 localhost logger: 8,0 0 6 > 0.000300318 1109 D W 14375 + 8 [kjournald] > Dec 9 11:15:32 localhost logger: 8,0 0 7 > 0.000423518 1109 U N [kjournald] 1 > Dec 9 11:15:32 localhost logger: 8,0 0 8 > 0.010028373 3 C W 14375 + 8 [0] > Dec 9 11:15:35 localhost logger: 8,0 0 9 > 3.219049689 1109 A WS 14383 + 8 <- (8,1) 14320 > Dec 9 11:15:35 localhost logger: 8,0 0 10 > 3.219058350 1109 Q WS 14383 + 8 [kjournald] > Dec 9 11:15:35 localhost logger: 8,0 0 11 > 3.219095505 1109 G WS 14383 + 8 [kjournald] > Dec 9 11:15:35 localhost logger: 8,0 0 12 > 3.219115061 1109 P N [kjournald] > Dec 9 11:15:35 localhost logger: 8,0 0 13 > 3.219127074 1109 I W 14383 + 8 [kjournald] > Dec 9 11:15:35 localhost logger: 8,0 0 14 > 3.219344699 1109 D W 14383 + 8 [kjournald] > Dec 9 11:15:35 localhost logger: 8,0 0 15 > 3.219543607 1109 U N [kjournald] 1 > Dec 9 11:15:35 localhost logger: 8,0 0 16 > 3.229477275 3 C W 14383 + 8 [0] I'm not familar with btrace. What are we looking at here? The only likely commit I see in ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.27.6 is "ext3: wait on all pending commits in ext3_sync_fs", but I don't see how it could cause this. bugme-daemon@bugzilla.kernel.org wrote: > OK, that's a huge mess because bugzilla went and wordwrapped it. Let > me fix it: > >> Dec 9 11:15:32 localhost logger: 8,0 0 1 >> 0.000000000 1109 A WS 14375 + 8 <- (8,1) 14312 >> Dec 9 11:15:32 localhost logger: 8,0 0 2 >> 0.000006705 1109 Q WS 14375 + 8 [kjournald] >> Dec 9 11:15:32 localhost logger: 8,0 0 3 >> 0.000043581 1109 G WS 14375 + 8 [kjournald] >> Dec 9 11:15:32 localhost logger: 8,0 0 4 >> 0.000065651 1109 P N [kjournald] >> Dec 9 11:15:32 localhost logger: 8,0 0 5 >> 0.000076267 1109 I W 14375 + 8 [kjournald] >> Dec 9 11:15:32 localhost logger: 8,0 0 6 >> 0.000300318 1109 D W 14375 + 8 [kjournald] >> Dec 9 11:15:32 localhost logger: 8,0 0 7 >> 0.000423518 1109 U N [kjournald] 1 >> Dec 9 11:15:32 localhost logger: 8,0 0 8 >> 0.010028373 3 C W 14375 + 8 [0] >> Dec 9 11:15:35 localhost logger: 8,0 0 9 >> 3.219049689 1109 A WS 14383 + 8 <- (8,1) 14320 >> Dec 9 11:15:35 localhost logger: 8,0 0 10 >> 3.219058350 1109 Q WS 14383 + 8 [kjournald] >> Dec 9 11:15:35 localhost logger: 8,0 0 11 >> 3.219095505 1109 G WS 14383 + 8 [kjournald] >> Dec 9 11:15:35 localhost logger: 8,0 0 12 >> 3.219115061 1109 P N [kjournald] >> Dec 9 11:15:35 localhost logger: 8,0 0 13 >> 3.219127074 1109 I W 14383 + 8 [kjournald] >> Dec 9 11:15:35 localhost logger: 8,0 0 14 >> 3.219344699 1109 D W 14383 + 8 [kjournald] >> Dec 9 11:15:35 localhost logger: 8,0 0 15 >> 3.219543607 1109 U N [kjournald] 1 >> Dec 9 11:15:35 localhost logger: 8,0 0 16 >> 3.229477275 3 C W 14383 + 8 [0] > > I'm not familar with btrace. What are we looking at here? Andrew, it's blktrace/blkparse output, basically. blkparse(1) manpage has the details. maj,min cpu seq time pid act rwbs off+len 8,0 0 1 0.000000000 1109 A WS 14375 + 8 <- (8,1) 14312 ... etc I'm not sure it gives us enough info; all we see is that there was an IO on sdb (from sdb1) I might suggest # sysctl -w vm.block_dump=1 to see if you get any info about inodes on sda1 being dirtied. (Also, I didn't see any info about sda1 in the posted fstab...) -Eric Reply-To: goswin-v-b@web.de Eric Sandeen <sandeen@redhat.com> writes: > maj,min cpu seq time pid act rwbs off+len > 8,0 0 1 0.000000000 1109 A WS 14375 + 8 <- (8,1) 14312 > > ... etc > > I'm not sure it gives us enough info; all we see is that there was an IO > on sdb (from sdb1) > > I might suggest > > # sysctl -w vm.block_dump=1 > > to see if you get any info about inodes on sda1 being dirtied. Like atime/diratime updates? > (Also, I didn't see any info about sda1 in the posted fstab...) > > -Eric MfG Goswin http://bugzilla.kernel.org/show_bug.cgi?id=12224 > I think I've seen other reports of this. > Activity on one partition (hda1) causes kjournald to also write > to another partition (sda1). This in turn causes the harddrive > which contains sda1 to spinup or not spindown. I've activated > laptop-mode using laptop-mode-tools. hda1 and sda1 are both > formatted with ext3. On unaffected kernels (e.g. 2.6.27.5) btrace > produces no output. On affected kernels (e.g. 2.6.27.6, 2.6.27-rc7) > btrace produces output as below. I've looked at the changes between 2.6.27.5 and 2.6.27.6, and I don't see anything that might explain such a regression. Indeed, there's only one patch between .5 and .6 that even touches ext3, and the patch seems pretty innocent. Is /dev/sda1 even mounted? It doesn't included up in your /etc/mtab listing. The only thing I can suggest is to try 2.6.27.6 with the only patch that touches ext3 reverted (i.e., unapply it with patch -R). If the problem still shows up, I would suggest getting the linux-2.6.27.y tree from: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.27.y.git And then doing a git bisection search to try to find the guilty commit. Regards, - Ted commit a0b8bfb34743b6e6c2bb06ad5a270590d90270f7 Author: Arthur Jones <ajones@riverbed.com> Date: Fri Nov 7 00:05:17 2008 +0000 ext3: wait on all pending commits in ext3_sync_fs commit c87591b719737b4e91eb1a9fa8fd55a4ff1886d6 upstream In ext3_sync_fs, we only wait for a commit to finish if we started it, but there may be one already in progress which will not be synced. In the case of a data=ordered umount with pending long symlinks which are delayed due to a long list of other I/O on the backing block device, this causes the buffer associated with the long symlinks to not be moved to the inode dirty list in the second phase of fsync_super. Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT flag and the dirty pages are never written to the backing block device, causing long symlink corruption and exposing new or previously freed block data to userspace. This can be reproduced with a script created by Eric Sandeen <sandeen@redhat.com>: #!/bin/bash umount /mnt/test2 mount /dev/sdb4 /mnt/test2 rm -f /mnt/test2/* dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512 touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename /mnt/test2/link umount /mnt/test2 mount /dev/sdb4 /mnt/test2 ls /mnt/test2/ umount /mnt/test2 To ensure all commits are synced, we flush all journal commits now when sync_fs'ing ext3. Signed-off-by: Arthur Jones <ajones@riverbed.com> Cc: Eric Sandeen <sandeen@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> diff --git a/fs/ext3/super.c b/fs/ext3/super.c index f38a5af..810bf7c 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -2365,13 +2365,12 @@ static void ext3_write_super (struct super_block * sb) static int ext3_sync_fs(struct super_block *sb, int wait) { - tid_t target; - sb->s_dirt = 0; - if (journal_start_commit(EXT3_SB(sb)->s_journal, &target)) { - if (wait) - log_wait_commit(EXT3_SB(sb)->s_journal, target); - } + if (wait) + ext3_force_commit(sb); + else + journal_start_commit(EXT3_SB(sb)->s_journal, NULL); + return 0; } Dec 14 20:02:18 localhost kernel: kjournald(1109): WRITE block 12520 on sda1 is the first thing I see in the log above, with no indication of anything else on sda1 before that. If you let the system just sit idle, do you get sda1 hits from the sysctl and/or blktrace? Hi, Following Ted's suggestion, I began with 2.6.27.6 and then reversed the patch for commit a0b8bfb34743b6e6c2bb06ad5a270590d90270f7. This "fixed" the problem such that activity on hda1 no longer caused kjournald to write to sda1. > Is /dev/sda1 even mounted? It doesn't included up in your /etc/mtab > listing. Just to confirm, I posted the wrong mtab previously. sda1 is mounted during my tests. This is how it looks: /dev/hda1 on / type ext3 (rw,noatime,errors=remount-ro,commit=1209600) tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) udev on /dev type tmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) /dev/sda1 on /mnt/sda type ext3 (rw,noatime,commit=1209600) debugfs on /sys/kernel/debug type debugfs (rw) > ------- Comment #6 from sandeen@redhat.com http://bugzilla.kernel.org/show_bug.cgi?id=12224 Hi all, You might be interested to know that this problem can be reproduced by using an ext3 image file mounted loopback. (I.e. You don't need a separate harddrive or partition in order to reproduce the problem.) C. Are the laptop-mode tools doing remounts behind your back? The only way I see this is if I do "mount -o remount /"; then I see kjournald activity on other ext3 filesystems. I'll dig into that. On Tuesday 16 December 2008 17:01:27 bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12224 > ------- Comment #9 from sandeen@redhat.com 2008-12-16 14:01 ------- > Are the laptop-mode tools doing remounts behind your back? > > The only way I see this is if I do "mount -o remount /"; then I see > kjournald activity on other ext3 filesystems. I'll dig into that. I don't think so: I was able to reproduce this bug using an ext3 loopback mounted file. laptop-mode-tools does not touch loopback devices as far as I can tell. Here is my laptop-mode.conf: > # > # The drives that laptop mode controls. > # Separate them by a space, e.g. HD="/dev/hda /dev/hdb". The default is a > # wildcard, which will get you all your IDE and SCSI/SATA drives. > # > HD="/dev/[hs]d[abcdefgh]" And here is the loopback mount: /tmp/ext3Image on /mnt/loop type ext3 (rw,noatime,loop=/dev/loop/0,commit=1209600) (I set the commit=1209600 and noatime by hand on the loopback mounted ext3 image file.) Since I'm not a kernel hacker, I don't know how the code is structured. It feels like the kernel commit code is not fully separated by device (or partition). When the active partition's journal is "synced" it triggers activity on other partitions, even if the other partitions aren't active. C. Handled-By : Eric Sandeen <sandeen@redhat.com> Notify-Also : Theodore Tso <tytso@mit.edu> First-Bad-Commit : c87591b719737b4e91eb1a9fa8fd55a4ff1886d6 In a nutshell, with laptop mode on, we go down this path: static void laptop_timer_fn(unsigned long unused) { pdflush_operation(laptop_flush, 0); } static void laptop_flush(unsigned long unused) { sys_sync(); } which gets us to the new static int ext3_sync_fs(struct super_block *sb, int wait) { sb->s_dirt = 0; if (wait) ext3_force_commit(sb); ... which is then committing empty transactions. Seems like we should be smart enough to skip committing transactions w/ nothing in them...? Care to test the patch at http://patchwork.ozlabs.org/patch/14755/ ? http://bugzilla.kernel.org/show_bug.cgi?id=12224 > > > > ------- Comment #13 from sandeen@redhat.com 2009-01-13 14:17 ------- > Care to test the patch at http://patchwork.ozlabs.org/patch/14755/ ? Sure! I manually patched 2.6.28. btrace showed lots of output as above. However, I didn't wait to see if the observed activity affected the drive spinup/down b/c I thought maybe patching 2.6.28 was not proper. I don't know how to git the right kernel sources to patch. Could you prepare a patch against an RC (or in some other way hold my hand)? Or I could wait for some RC which includes the patch already. Sorry for my gimpiness, C. Sure, I'll gladly attach a patch. You tell me a specific kernel version, I'll attach a patch. :) 2.6.28? http://bugzilla.kernel.org/show_bug.cgi?id=12224 > ------- Comment #15 from sandeen@redhat.com 2009-01-14 12:44 ------- > Sure, I'll gladly attach a patch. You tell me a specific kernel version, > I'll attach a patch. :) 2.6.28? Yes, please! :) Created attachment 19805 [details]
patch reverting old change & adding Jan's proposed fix
Here you go!
http://bugzilla.kernel.org/show_bug.cgi?id=12224 I applied the patch to 2.6.28.... However there is still activity on the inactive partition (sda) if there is activity on the other partition (hda). I've pasted the current output of 'btrace /dev/sda' in case it has changed from before. Jan 14 19:51:29 localhost logger: 8,0 0 1 0.000000000 1138 A WS 12599 + 8 <- (8,1) 12536 Jan 14 19:51:29 localhost logger: 8,0 0 2 0.000006426 1138 Q WS 12599 + 8 [kjournald] Jan 14 19:51:29 localhost logger: 8,0 0 3 0.000044978 1138 G WS 12599 + 8 [kjournald] Jan 14 19:51:29 localhost logger: 8,0 0 4 0.000066489 1138 P N [kjournald] Jan 14 19:51:29 localhost logger: 8,0 0 5 0.000081295 1138 I W 12599 + 8 [kjournald] Jan 14 19:51:29 localhost logger: 8,0 0 6 0.000300597 1138 D W 12599 + 8 [kjournald] Jan 14 19:51:29 localhost logger: 8,0 0 7 0.000492521 1138 U N [kjournald] 1 Jan 14 19:51:29 localhost logger: 8,0 0 8 0.041120321 3 C W 12599 + 8 [0] Jan 14 19:51:51 localhost logger: 8,0 0 9 21.646946652 1138 A WS 12607 + 8 <- (8,1) 12544 Jan 14 19:51:51 localhost logger: 8,0 0 10 21.646955591 1138 Q WS 12607 + 8 [kjournald] Jan 14 19:51:51 localhost logger: 8,0 0 11 21.646991071 1138 G WS 12607 + 8 [kjournald] Jan 14 19:51:51 localhost logger: 8,0 0 12 21.647011464 1138 P N [kjournald] Jan 14 19:51:51 localhost logger: 8,0 0 13 21.647027109 1138 I W 12607 + 8 [kjournald] Jan 14 19:51:51 localhost logger: 8,0 0 14 21.647229369 1138 D W 12607 + 8 [kjournald] Jan 14 19:51:51 localhost logger: 8,0 0 15 21.647339439 1138 U N [kjournald] 1 Jan 14 19:51:51 localhost logger: 8,0 0 16 21.688494962 3 C W 12607 + 8 [0] Jan 14 19:51:55 localhost logger: 8,0 0 17 26.471770149 1138 A WS 12615 + 8 <- (8,1) 12552 Jan 14 19:51:55 localhost logger: 8,0 0 18 26.471779368 1138 Q WS 12615 + 8 [kjournald] Jan 14 19:51:55 localhost logger: 8,0 0 19 26.471812613 1138 G WS 12615 + 8 [kjournald] Jan 14 19:51:55 localhost logger: 8,0 0 20 26.471831889 1138 P N [kjournald] Jan 14 19:51:55 localhost logger: 8,0 0 21 26.471846136 1138 I W 12615 + 8 [kjournald] Jan 14 19:51:55 localhost logger: 8,0 0 22 26.472050632 1138 D W 12615 + 8 [kjournald] Jan 14 19:51:55 localhost logger: 8,0 0 23 26.472162099 1138 U N [kjournald] 1 Jan 14 19:51:55 localhost logger: 8,0 0 24 26.481685099 3 C W 12615 + 8 [0] http://bugzilla.kernel.org/show_bug.cgi?id=12224 > ------- Comment #17 from sandeen@redhat.com 2009-01-14 14:58 ------- > Created an attachment (id=19805) > --> (http://bugzilla.kernel.org/attachment.cgi?id=19805&action=view) > patch reverting old change & adding Jan's proposed fix Does this patch "revert the old change"? Created attachment 19814 [details]
Updated patch
sorry, didn't drive quilt right last time!
http://bugzilla.kernel.org/show_bug.cgi?id=12224 > ------- Comment #20 from sandeen@redhat.com 2009-01-15 08:12 ------- > Created an attachment (id=19814) > --> (http://bugzilla.kernel.org/attachment.cgi?id=19814&action=view) > Updated patch With the latest patch the problem seems to be fixed. btrace /dev/sda output nothing after several minutes of activity on hda. Thanks! C. The fix for this has landed in mainline as commit 02ac59 for ext3, and commit 9eddac for ext4. |