Subject : pdflush stuck in D state with v2.6.24-rc1-192-gef49c32 Submitter : Florin Iucha <florin@iucha.net> References : http://lkml.org/lkml/2007/10/28/65 Handled-By : Trond Myklebust <trond.myklebust@fys.uio.no> Fengguang Wu <wfg@mail.ustc.edu.cn>
Reply-To: wfg@mail.ustc.edu.cn I'm now cloning linux-2.6.git to a reiserfs and to make extensive use of it. Hope that will help reproduce the regression. Fengguang
*** Bug 9441 has been marked as a duplicate of this bug. ***
Patch that partially fixes the problem: http://lkml.org/lkml/2007/11/1/417
* Thomas <gimpel@sonnenkinder.org> wrote: > I can confirm this issue too on any .24-rc. I'm also using reiserfs on > a LVM. > > And there is one more user on Gentoo forums having the same issue. > http://forums.gentoo.org/viewtopic-t-612959.html > > So you are not alone, florian. any progress on this issue? Seems a bit stalled. Ingo
I cannot reproduce the problem since 2.6.24-rc3 . florin
Reply-To: gimpel@sonnenkinder.org On Di, 04.12.07 11:28 Ingo Molnar <mingo@elte.hu> wrote: > > * Thomas <gimpel@sonnenkinder.org> wrote: > > > I can confirm this issue too on any .24-rc. I'm also using reiserfs > > on a LVM. > > > > And there is one more user on Gentoo forums having the same issue. > > http://forums.gentoo.org/viewtopic-t-612959.html > > > > So you are not alone, florian. > > any progress on this issue? Seems a bit stalled. > > Ingo For me the two patches * mm-speed-up-writeback-ramp-up-on-clean-systems.patch * reiserfs-writeback-fix.patch solved the issue. IIRC one was from this thread, the other from http://lkml.org/lkml/2007/10/23/93 So since 2.6.24-rc2-git5 all is fine again. No problems since. Regards, Thomas
I think we can close the bug, then?
+1 on closing florin
"Tvrtko A. Ursulin" <tvrtko@ursulin.net> reports that the problem is still present in 2.6.24-rc6. Reopening.
Reply-To: wfg@mail.ustc.edu.cn Hi Tvrtko, I cannot find your report in LKML. Could you resend a copy to me? Thank you, Fengguang
> I cannot find your report in LKML. Could you resend a copy to me? > Tvrtko uses jfs (http://lkml.org/lkml/2007/11/22/15). Question is on both machines? And is it possible for him to use ext3 or xfs on one of it? I see pdflush in d-state and 50% iowait on a system with 2 Opteron 250 and jfs, but not on the same hardware using xfs. Tested kernels: 2.6.24-rc5 and -rc6. 'echo w > /proc/sysrq-trigger' on the jfs system: SysRq : Show Blocked State task PC stack pid father pdflush D 0000000000000000 0 217 2 ffff81007e59bdb0 0000000000000046 0000000000000000 0000000000000202 ffff81007f064750 ffff81007fb36ea0 0000000102cad7ee 000000007e59bdc0 00000000ffffffff 0000000102cad84f 0000000000000000 0000000000000000 Call Trace: [<ffffffff8045b1ff>] schedule_timeout+0x5f/0xd0 [<ffffffff80243f80>] process_timeout+0x0/0x10 [<ffffffff8045b0e8>] io_schedule_timeout+0x28/0x40 [<ffffffff8027b6b9>] congestion_wait+0x99/0xc0 [<ffffffff80250240>] autoremove_wake_function+0x0/0x30 [<ffffffff80275714>] wb_kupdate+0xc4/0x120 [<ffffffff80275b90>] pdflush+0x0/0x1e0 [<ffffffff80275b90>] pdflush+0x0/0x1e0 [<ffffffff80275ca1>] pdflush+0x111/0x1e0 [<ffffffff80275650>] wb_kupdate+0x0/0x120 [<ffffffff8024fe6b>] kthread+0x4b/0x80 [<ffffffff8020d1e8>] child_rip+0xa/0x12 [<ffffffff8024fe20>] kthread+0x0/0x80 [<ffffffff8020d1de>] child_rip+0x0/0x12 mmg-amd64:/proc# uname -a Linux mmg-amd64 2.6.24-rc6 #6 SMP PREEMPT Sat Dec 22 16:24:45 CET 2007 x86_64 GNU/Linux mmg-amd64:/proc# echo w > /proc/sysrq-trigger mmg-amd64:/proc# ps aux | grep pdflus root 217 0.0 0.0 0 0 ? D Dec22 0:01 [pdflush] root 218 0.0 0.0 0 0 ? S Dec22 0:07 [pdflush] mmg-amd64:/proc# vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 0 2986700 292 543624 0 0 5 2 97 150 0 0 50 49 0 1 0 2986652 292 543624 0 0 0 0 104 36 0 0 50 50 0 1 0 2986652 292 543624 0 0 0 0 103 34 0 0 50 50
[lkml seems to have blocked my original report - oh well..] Yes, I use JFS on both machines, afraid I can't switch to another fs due to time constraints. My trace from original email: pdflush D 00140f4c 0 140 2 b02af660 00000046 eba03f10 00140f4c 0000444d 000f22fd b02aa9a9 eb5e092c eba03f74 b03cbfe8 b03cbfe8 00140f4c b0124700 eaab7500 b03cbd80 b02aa9a4 6c666470 00687375 00000000 00000000 0000008c 00000064 b034da20 b02aa20b Call Trace: [<b02aa9a9>] schedule_timeout+0x49/0xc0 [<b0124700>] process_timeout+0x0/0x10 [<b02aa9a4>] schedule_timeout+0x44/0xc0 [<b02aa20b>] __sched_text_start+0xb/0x20 [<b015a273>] congestion_wait+0x73/0x90 [<b012f3f0>] autoremove_wake_function+0x0/0x50 [<b0155625>] wb_kupdate+0x95/0xe0 [<b0155a80>] pdflush+0x0/0x210 [<b0155b8a>] pdflush+0x10a/0x210 [<b0155590>] wb_kupdate+0x0/0xe0 [<b012f0d2>] kthread+0x42/0x70 [<b012f090>] kthread+0x0/0x70 [<b0104b6f>] kernel_thread_helper+0x7/0x18
Reply-To: wfg@mail.ustc.edu.cn On Sat, Dec 22, 2007 at 11:40:31PM -0800, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9291 > > > ------- Comment #12 from tvrtko@ursulin.net 2007-12-22 23:40 ------- > > [lkml seems to have blocked my original report - oh well..] > > Yes, I use JFS on both machines, afraid I can't switch to another fs due to > time constraints. Thank you. I tried JFS compiled with: # Linux kernel version: 2.6.24-rc6 CONFIG_JFS_FS=m CONFIG_JFS_POSIX_ACL=y CONFIG_JFS_SECURITY=y # CONFIG_JFS_DEBUG is not set # CONFIG_JFS_STATISTICS is not set but till now have not been able to reproduce the bug. I'll return to it tomorrow. Fengguang
My config and fs configuration: tvrtko@sol:~$ grep JFS /storage/kernel/linux-2.6.24-rc6/.config CONFIG_JFS_FS=m CONFIG_JFS_POSIX_ACL=y CONFIG_JFS_SECURITY=y # CONFIG_JFS_DEBUG is not set CONFIG_JFS_STATISTICS=y tvrtko@sol:~$ cat /proc/mounts rootfs / rootfs rw 0 0 none /sys sysfs rw,nosuid,nodev,noexec 0 0 none /proc proc rw,nosuid,nodev,noexec 0 0 udev /dev tmpfs rw 0 0 /dev/mapper/vg0-root / jfs rw,noatime 0 0 /dev/mapper/vg0-root /dev/.static/dev jfs rw 0 0 tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0 tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0 tmpfs /dev/shm tmpfs rw 0 0 devpts /dev/pts devpts rw 0 0 tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0 tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0 /dev/sda1 /boot ext2 rw,noatime 0 0 /dev/mapper/vg0-home /home jfs rw,noatime 0 0 /dev/mapper/vg0-storage /storage jfs rw,nosuid,nodev,noatime 0 0 /dev/mapper/vg0-var /var jfs rw,noatime 0 0 /dev/mapper/vg0-cache /var/cache jfs rw,noatime 0 0 /dev/sda2 /tmp jfs rw,nosuid,nodev,noatime 0 0 none /proc/bus/usb usbfs rw 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec 0 0
Created attachment 14153 [details] Complete kernel configuration
Created attachment 14154 [details] config
Tvrtko, are your PCs both 'dual core/processor'? Below fstab and JFS part of config from me: proc /proc proc defaults 0 0 /dev/sda2 / jfs defaults,errors=remount-ro 0 1 /dev/sda6 /home jfs defaults 0 2 /dev/sda5 /tmp jfs defaults 0 2 /dev/sda3 /var jfs defaults 0 2 /dev/sda1 none swap sw 0 0 /dev/hda /media/cdrom0 udf,iso9660 user,noauto 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0 CONFIG_JFS_FS=m CONFIG_JFS_POSIX_ACL=y CONFIG_JFS_SECURITY=y # CONFIG_JFS_DEBUG is not set # CONFIG_JFS_STATISTICS is not set
Created attachment 14155 [details] dmesg
Created attachment 14156 [details] cpuinfo
Created attachment 14157 [details] lspci
Reply-To: wfg@mail.ustc.edu.cn On Sun, Dec 23, 2007 at 10:35:45AM -0800, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9291 > > > > > > ------- Comment #17 from Markus.Rehbach@gmx.de 2007-12-23 10:35 ------- > Tvrtko, are your PCs both 'dual core/processor'? > > Below fstab and JFS part of config from me: > > proc /proc proc defaults 0 0 > /dev/sda2 / jfs defaults,errors=remount-ro 0 1 > /dev/sda6 /home jfs defaults 0 2 > /dev/sda5 /tmp jfs defaults 0 2 > /dev/sda3 /var jfs defaults 0 2 > /dev/sda1 none swap sw 0 0 > /dev/hda /media/cdrom0 udf,iso9660 user,noauto 0 0 > /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0 > > CONFIG_JFS_FS=m > CONFIG_JFS_POSIX_ACL=y > CONFIG_JFS_SECURITY=y > # CONFIG_JFS_DEBUG is not set > # CONFIG_JFS_STATISTICS is not set Hmm, I just tried JFS on LVM - still OK. It seems not related to LVM. Fengguang
Reply-To: wfg@mail.ustc.edu.cn On Mon, Dec 24, 2007 at 10:25:53AM +0800, Fengguang Wu wrote: > On Sun, Dec 23, 2007 at 10:35:45AM -0800, bugme-daemon@bugzilla.kernel.org > wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=9291 > > Hmm, I just tried JFS on LVM - still OK. > It seems not related to LVM. I can now reproduce the bug on JFS with the following command: debootstrap --arch i386 etch /mnt/jfs http://debian.ustc.edu.cn/debian It's a rather compound procedure, but I just cannot trigger the bug through simple operations like cp/concatenate/truncate ... The symptoms: - one pdflush stuck in D state: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 321 0.0 0.0 0 0 ? D 13:45 0:01 [pdflush] root 15397 0.0 0.0 0 0 ? S 14:21 0:00 [pdflush] - `sync` temporarily breaks pdflush out of the loop. but 5s later wb_kupdate() wakes up and pdflush goes D again. - the loop in wb_kupdate() goes like this: [ 4188.005005] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4188.005028] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4188.105452] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4188.105473] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4188.205814] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4188.205835] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4188.306080] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4188.306108] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4188.406563] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4188.406585] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4188.506988] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4188.507009] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4188.607438] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4188.607459] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4188.707892] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4188.707926] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4188.808286] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4188.808309] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4188.908625] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4188.908646] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4189.009169] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4189.009182] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 [ 4189.109454] requeue_io 301: inode 0 size 320753664 at 08:18(sdb8) [ 4189.109476] mm/page-writeback.c 668 wb_kupdate: pdflush(321) 39494 global 16 0 0 wc _M tw 1024 sk 0 Here is the printk for wb_kupdate lines: printk(KERN_DEBUG "%s %d %s: %s(%d) %ld " "global %lu %lu %lu " "wc %c%c tw %ld sk %ld\n", file, line, func, current->comm, current->pid, n, global_page_state(NR_FILE_DIRTY), global_page_state(NR_WRITEBACK), global_page_state(NR_UNSTABLE_NFS), wbc->encountered_congestion ? 'C':'_', wbc->more_io ? 'M':'_', wbc->nr_to_write, wbc->pages_skipped); The requeue_io lines show that the special inode 0 in JFS is tagged dirty in the radix tree but does not have any dirty pages to sync. Any ideas on possible causes? Thank you, Fengguang
Reply-To: wfg@mail.ustc.edu.cn On Mon, Dec 24, 2007 at 03:13:56PM +0800, Fengguang Wu wrote: > On Mon, Dec 24, 2007 at 10:25:53AM +0800, Fengguang Wu wrote: > > On Sun, Dec 23, 2007 at 10:35:45AM -0800, bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=9291 > > > > Hmm, I just tried JFS on LVM - still OK. > > It seems not related to LVM. > > I can now reproduce the bug on JFS with the following command: > > debootstrap --arch i386 etch /mnt/jfs http://debian.ustc.edu.cn/debian > > It's a rather compound procedure, but I just cannot trigger the bug through > simple operations like cp/concatenate/truncate ... > > The symptoms: > > - one pdflush stuck in D state: > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 321 0.0 0.0 0 0 ? D 13:45 0:01 [pdflush] > root 15397 0.0 0.0 0 0 ? S 14:21 0:00 [pdflush] It was confirmed that the source of this bug lies in metapage_writepage(): if (!mp || !test_bit(META_dirty, &mp->flag)) continue; That logic skips the following line: set_page_writeback(page); which should be called to clear the PAGECACHE_TAG_DIRTY tag. The META_dirty bit could be cleared in several places, e.g. __invalidate_metapages(). Any ideas about a solution?
Reply-To: wfg@mail.ustc.edu.cn This patch fixes the bug in my system. It simply clears PAGECACHE_TAG_DIRTY if the page should not be written. Now pdflush won't keep trying to write these pages. Also the redirty call is canceled if the bio is submitted. The (perhaps imperfect) error handling logic is untouched. diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c index f5cd8d3..0c3ffc4 100644 --- a/fs/jfs/jfs_metapage.c +++ b/fs/jfs/jfs_metapage.c @@ -353,7 +353,8 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc) { struct bio *bio = NULL; unsigned int block_offset; /* block offset of mp within page */ - struct inode *inode = page->mapping->host; + struct address_space *mapping = page->mapping; + struct inode *inode = mapping->host; unsigned int blocks_per_mp = JFS_SBI(inode->i_sb)->nbperpage; unsigned int len; unsigned int xlen; @@ -449,9 +450,15 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc) goto dump_bio; submit_bio(WRITE, bio); - } - if (redirty) + } else if (redirty) { redirty_page_for_writepage(wbc, page); + } else { + write_lock_irq(&mapping->tree_lock); + radix_tree_tag_clear(&mapping->page_tree, + page_index(page), + PAGECACHE_TAG_DIRTY); + write_unlock_irq(&mapping->tree_lock); + } unlock_page(page);
Works for me, too. Nice!
Created attachment 14219 [details] Alternate patch to clear dirty tag This may be a cleaner patch. It passes some simple regression testing.
Yes, seems to work, too. I will not touch a keyboard until 3rd of January, hope Tvrtko can confirm that the patch(es) will work for him. Einen guten Rutsch ins neue Jahr! / Happy New Year!
I only had time to test one patch for which I chose Dave's and it seems good - fixes the problem on one of my machines. Other machine is unavailable until 3rd January or a couple of days after, But I don't think it is necessary to test on it so I am giving my thumbs up for this patch.
Is it necessary to have one of the patches in 2.6.24? AFAIK it is not in -rc7, or?
I'm told this is not a regression from 2.6.23. Can anyone confirm or deny it, please?
Rafael, This is a regression from early (pre -rc1) 2.6.24 , so it is a regression from .23 as well. I am not seeing it with 2.6.24-rc6, so it was fixed for me. I am only using ext3 and reiserfs. florin
Thanks for the confirmation.
Why is this still marked as open? Tthe bug has been fixed by the following three commits: Writeback fix: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=5fce25a9df4865bdd5e3dc4853b269dc1677a02a Reiserfs fix: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c06a018fa5362fa9ed0768bd747c0fab26bc8849 JFS fix: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=29a424f28390752a4ca2349633aaacc6be494db5