Bug 18632

Summary: "INFO: task" dpkg "blocked for more than 120 seconds.
Product: IO/Storage Reporter: Thomas Creutz (thomas.creutz)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: RESOLVED OBSOLETE    
Severity: normal CC: akpm, alan, bugzilla, bzed, clopez, daaugusto, justincase, kakoimiku, kernel-bugzilla, kernel.org, listposter, Martin, nemesis
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32-24-generic (Ubuntu) Subsystem:
Regression: Yes Bisected commit-id:
Attachments: sysrq+t output
Output of /proc/modules and dmesg with sysrq+t

Description Thomas Creutz 2010-09-15 19:45:19 UTC
I have for a while the problem, that I can not install packages with the package manager from ubuntu. Every try results in a blocked dpkg task.

I cant kill the task with kill -9 `pgrep dpkg`

The only solution is to reboot my server. But when I try again to install/upgrade my system it hang again.

In dmesg I get this:

[ 1321.499551] INFO: task dpkg:3764 blocked for more than 120 seconds.
[ 1321.499822] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1321.526755] dpkg          D 00000000ffffffff     0  3764   3719 0x00000000
[ 1321.526761]  ffff8802084fbdb8 0000000000000082 0000000000015bc0 0000000000015bc0
[ 1321.526766]  ffff88020860b1a0 ffff8802084fbfd8 0000000000015bc0 ffff88020860ade0
[ 1321.526770]  0000000000015bc0 ffff8802084fbfd8 0000000000015bc0 ffff88020860b1a0
[ 1321.526780] Call Trace:
[ 1321.526791]  [<ffffffff811664b0>] ? bdi_sched_wait+0x0/0x20
[ 1321.526799]  [<ffffffff811664be>] bdi_sched_wait+0xe/0x20
[ 1321.526809]  [<ffffffff815429ef>] __wait_on_bit+0x5f/0x90
[ 1321.526816]  [<ffffffff811664b0>] ? bdi_sched_wait+0x0/0x20
[ 1321.526824]  [<ffffffff81542a98>] out_of_line_wait_on_bit+0x78/0x90
[ 1321.526834]  [<ffffffff81085470>] ? wake_bit_function+0x0/0x40
[ 1321.526842]  [<ffffffff81166474>] ? bdi_queue_work+0xa4/0xe0
[ 1321.526849]  [<ffffffff8116782f>] bdi_sync_writeback+0x6f/0x80
[ 1321.526857]  [<ffffffff81167860>] sync_inodes_sb+0x20/0x30
[ 1321.526865]  [<ffffffff8116b332>] __sync_filesystem+0x82/0x90
[ 1321.526873]  [<ffffffff8116b419>] sync_filesystems+0xd9/0x130
[ 1321.526881]  [<ffffffff8116b4d1>] sys_sync+0x21/0x40
[ 1321.526890]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
[ 1441.509612] INFO: task dpkg:3764 blocked for more than 120 seconds.
[ 1441.523584] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1441.550418] dpkg          D 00000000ffffffff     0  3764   3719 0x00000000
[ 1441.550424]  ffff8802084fbdb8 0000000000000082 0000000000015bc0 0000000000015bc0
[ 1441.550428]  ffff88020860b1a0 ffff8802084fbfd8 0000000000015bc0 ffff88020860ade0
[ 1441.550433]  0000000000015bc0 ffff8802084fbfd8 0000000000015bc0 ffff88020860b1a0
[ 1441.550437] Call Trace:
[ 1441.550447]  [<ffffffff811664b0>] ? bdi_sched_wait+0x0/0x20
[ 1441.550455]  [<ffffffff811664be>] bdi_sched_wait+0xe/0x20
[ 1441.550465]  [<ffffffff815429ef>] __wait_on_bit+0x5f/0x90
[ 1441.550472]  [<ffffffff811664b0>] ? bdi_sched_wait+0x0/0x20
[ 1441.550480]  [<ffffffff81542a98>] out_of_line_wait_on_bit+0x78/0x90
[ 1441.550490]  [<ffffffff81085470>] ? wake_bit_function+0x0/0x40
[ 1441.550497]  [<ffffffff81166474>] ? bdi_queue_work+0xa4/0xe0
[ 1441.550505]  [<ffffffff8116782f>] bdi_sync_writeback+0x6f/0x80
[ 1441.550513]  [<ffffffff81167860>] sync_inodes_sb+0x20/0x30
[ 1441.550521]  [<ffffffff8116b332>] __sync_filesystem+0x82/0x90
[ 1441.550529]  [<ffffffff8116b419>] sync_filesystems+0xd9/0x130
[ 1441.550537]  [<ffffffff8116b4d1>] sys_sync+0x21/0x40
[ 1441.550546]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

I created a bugreport to ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/624877

Hope we can find a solution for the problem

Regards,
Thomas
Comment 1 Andrew Morton 2010-09-15 19:51:57 UTC
Recategorised to "IO storeage/block layer".  Kernel version appears to be 2.6.32.
Comment 2 Thomas Creutz 2010-09-20 18:24:46 UTC
anything I can do to provide you with more information?
Comment 3 Andrew Morton 2010-09-20 18:33:47 UTC
We've tried to fix a lot of things in this area since 2.6.32.  If you can test a more recent kernel that would be good, thanks.
Comment 4 Thomas Creutz 2010-09-20 20:25:13 UTC
ok, I will see if i can copy the mainline image from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.35.4-maverick/ to my server.

but thought the bug I cant install it with dpkg... :-(
Comment 5 Thomas Creutz 2010-09-20 22:39:49 UTC
ok, I booted to the rescue console provided at my provider.

There I could chroot my server and installed with dpkg the needed packages.. but without success...

after reboot and a try to upgrade my packages it stuck again in the dpkg process...

I also got listed a new bug in dmesg: https://bugzilla.kernel.org/show_bug.cgi?id=18912

why the dpkg process and no other one?
Comment 6 Douglas Augusto 2010-09-27 02:29:23 UTC
I'm experiencing a similar bug--possibly the same bug--on two different Debian 64-bit machines, both with Linux 2.6.35 (Debian package 2.6.35-1~experimental.3).

The message "task dpkg:PID blocked for more than 120 seconds" normally occurs when there are some heavy disk activity being done by other process (even from normal users), for instance, when I'm making backup. I've noticed that dpkg eventually works, but only after a long time and not necessarily immediately after the high disk activity has finished.

During the time dpkg is blocked, besides not being able to kill it, I cannot sync the disk or even soft reboot the machine, otherwise these processes become also blocked. Fortunately, most of the other disk activities seem to work normally.

Machine A is an Intel Xeon W3680 with 12GB RAM and 2x2TB WD20EARS disks in RAID10 (f2). This is how the call trace looks like:

   Sep 24 09:41:23 gnu kernel: [157565.846882] INFO: task dpkg:7258 blocked for more than 120 seconds.
   Sep 24 09:41:23 gnu kernel: [157565.846884] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   Sep 24 09:41:23 gnu kernel: [157565.846886] dpkg          D 000000010257d0a1     0  7258  22003 0x00000000
   Sep 24 09:41:23 gnu kernel: [157565.846889]  ffff8803375be630 0000000000000082 ffff880336791e50 0000000000000001
   Sep 24 09:41:23 gnu kernel: [157565.846893]  ffff88033e0e4420 00000000000154c0 00000000000154c0 00000000000154c0
   Sep 24 09:41:23 gnu kernel: [157565.846895]  ffff880336791fd8 00000000000154c0 ffff8803375be630 ffff880336791fd8
   Sep 24 09:41:23 gnu kernel: [157565.846897] Call Trace:
   Sep 24 09:41:23 gnu kernel: [157565.846904]  [<ffffffff813054df>] ? schedule_timeout+0x2d/0xd7
   Sep 24 09:41:23 gnu kernel: [157565.846907]  [<ffffffff81038c8f>] ? enqueue_task_fair+0x21/0x62
   Sep 24 09:41:23 gnu kernel: [157565.846911]  [<ffffffff8100e7d9>] ? sched_clock+0x5/0x8
   Sep 24 09:41:23 gnu kernel: [157565.846914]  [<ffffffff81305354>] ? wait_for_common+0xd1/0x14e
   Sep 24 09:41:23 gnu kernel: [157565.846918]  [<ffffffff8103f630>] ? default_wake_function+0x0/0xf
   Sep 24 09:41:23 gnu kernel: [157565.846921]  [<ffffffff81106882>] ? sync_one_sb+0x0/0x20
   Sep 24 09:41:23 gnu kernel: [157565.846925]  [<ffffffff81102e85>] ? sync_inodes_sb+0x89/0x140
   Sep 24 09:41:23 gnu kernel: [157565.846927]  [<ffffffff81106882>] ? sync_one_sb+0x0/0x20
   Sep 24 09:41:23 gnu kernel: [157565.846929]  [<ffffffff8110684c>] ? __sync_filesystem+0x44/0x7a
   Sep 24 09:41:23 gnu kernel: [157565.846932]  [<ffffffff810eab16>] ? iterate_supers+0x61/0xa0
   Sep 24 09:41:23 gnu kernel: [157565.846934]  [<ffffffff811068e1>] ? sys_sync+0x3f/0x54
   Sep 24 09:41:23 gnu kernel: [157565.846937]  [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b

After having tried to sync while dpkg is blocked:

   Sep 24 09:49:23 gnu kernel: [158045.141824] INFO: task sync:18651 blocked for more than 120 seconds.
   Sep 24 09:49:23 gnu kernel: [158045.141827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   Sep 24 09:49:23 gnu kernel: [158045.141829] sync          D ffff88033b4f86d0     0 18651   1329 0x00000000
   Sep 24 09:49:23 gnu kernel: [158045.141833]  ffff88033b4f86d0 0000000000000086 ffff8802a4a4fe50 0000000000000001
   Sep 24 09:49:23 gnu kernel: [158045.141836]  ffff88033e22d890 00000000000154c0 00000000000154c0 00000000000154c0
   Sep 24 09:49:23 gnu kernel: [158045.141839]  ffff8802a4a4ffd8 00000000000154c0 ffff88033b4f86d0 ffff8802a4a4ffd8
   Sep 24 09:49:23 gnu kernel: [158045.141842] Call Trace:
   Sep 24 09:49:23 gnu kernel: [158045.141848]  [<ffffffff813054df>] ? schedule_timeout+0x2d/0xd7
   Sep 24 09:49:23 gnu kernel: [158045.141851]  [<ffffffff81038c8f>] ? enqueue_task_fair+0x21/0x62
   Sep 24 09:49:23 gnu kernel: [158045.141855]  [<ffffffff8100e7d9>] ? sched_clock+0x5/0x8
   Sep 24 09:49:23 gnu kernel: [158045.141858]  [<ffffffff81305354>] ? wait_for_common+0xd1/0x14e
   Sep 24 09:49:23 gnu kernel: [158045.141862]  [<ffffffff8103f630>] ? default_wake_function+0x0/0xf
   Sep 24 09:49:23 gnu kernel: [158045.141865]  [<ffffffff81106882>] ? sync_one_sb+0x0/0x20
   Sep 24 09:49:23 gnu kernel: [158045.141869]  [<ffffffff81102e85>] ? sync_inodes_sb+0x89/0x140
   Sep 24 09:49:23 gnu kernel: [158045.141871]  [<ffffffff81106882>] ? sync_one_sb+0x0/0x20
   Sep 24 09:49:23 gnu kernel: [158045.141873]  [<ffffffff8110684c>] ? __sync_filesystem+0x44/0x7a
   Sep 24 09:49:23 gnu kernel: [158045.141877]  [<ffffffff810eab16>] ? iterate_supers+0x61/0xa0
   Sep 24 09:49:23 gnu kernel: [158045.141879]  [<ffffffff811068e1>] ? sys_sync+0x3f/0x54
   Sep 24 09:49:23 gnu kernel: [158045.141883]  [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b

Machine B is an AMD Phenon II 1090T with 8GB RAM and 1.5TB WD15EADS disk (no RAID). Call trace:

   Sep 25 20:49:41 phenom kernel: [26640.848742] INFO: task dpkg:23736 blocked for more than 120 seconds.
   Sep 25 20:49:41 phenom kernel: [26640.848744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
   Sep 25 20:49:41 phenom kernel: [26640.848750] dpkg          D 000000010060537e     0 23736  23721 0x00000000
   Sep 25 20:49:41 phenom kernel: [26640.848753]  ffff88020c2a1470 0000000000000086 ffff880001d17e50 0000000000000000
   Sep 25 20:49:41 phenom kernel: [26640.848756]  ffff88020ecba8e0 00000000000154c0 00000000000154c0 00000000000154c0
   Sep 25 20:49:41 phenom kernel: [26640.848758]  ffff880001d17fd8 00000000000154c0 ffff88020c2a1470 ffff880001d17fd8
   Sep 25 20:49:41 phenom kernel: [26640.848760] Call Trace:
   Sep 25 20:49:41 phenom kernel: [26640.848766]  [<ffffffff813054df>] ? schedule_timeout+0x2d/0xd7
   Sep 25 20:49:41 phenom kernel: [26640.848769]  [<ffffffff81038c8f>] ? enqueue_task_fair+0x21/0x62
   Sep 25 20:49:41 phenom kernel: [26640.848771]  [<ffffffff81305354>] ? wait_for_common+0xd1/0x14e
   Sep 25 20:49:41 phenom kernel: [26640.848774]  [<ffffffff8103f630>] ? default_wake_function+0x0/0xf
   Sep 25 20:49:41 phenom kernel: [26640.848776]  [<ffffffff81106882>] ? sync_one_sb+0x0/0x20
   Sep 25 20:49:41 phenom kernel: [26640.848778]  [<ffffffff81102e85>] ? sync_inodes_sb+0x89/0x140
   Sep 25 20:49:41 phenom kernel: [26640.848780]  [<ffffffff81106882>] ? sync_one_sb+0x0/0x20
   Sep 25 20:49:41 phenom kernel: [26640.848781]  [<ffffffff8110684c>] ? __sync_filesystem+0x44/0x7a
   Sep 25 20:49:41 phenom kernel: [26640.848784]  [<ffffffff810eab16>] ? iterate_supers+0x61/0xa0
   Sep 25 20:49:41 phenom kernel: [26640.848785]  [<ffffffff811068e1>] ? sys_sync+0x3f/0x54
   Sep 25 20:49:41 phenom kernel: [26640.848788]  [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b

-- 
Douglas A. Augusto
Comment 7 Thomas Creutz 2010-09-30 09:37:14 UTC
I could resolve the problem: I had "iommu=soft" and "nomce" defined as boot option.

When I removed this options, it worked fine.
Comment 8 Thomas Creutz 2010-10-02 11:58:54 UTC
no fail, remove the both params not fix the problem :-(

today I got it again. Every time with dpkg!?
Comment 9 Thomas Creutz 2010-10-02 12:01:03 UTC
[180480.018397] INFO: task dpkg:20086 blocked for more than 120 seconds.
[180480.018686] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[180480.019102] dpkg          D 0000000000000000     0 20086  20068 0x00000000
[180480.019107]  ffff880066993d38 0000000000000082 0000000000015bc0 0000000000015bc0
[180480.019112]  ffff88023400c890 ffff880066993fd8 0000000000015bc0 ffff88023400c4d0
[180480.019117]  0000000000015bc0 ffff880066993fd8 0000000000015bc0 ffff88023400c890
[180480.019121] Call Trace:
[180480.019130]  [<ffffffff81541add>] schedule_timeout+0x22d/0x300
[180480.019135]  [<ffffffff8105da22>] ? enqueue_entity+0x122/0x1a0
[180480.019140]  [<ffffffff81061e14>] ? check_preempt_wakeup+0x1c4/0x3c0
[180480.019144]  [<ffffffff815416fb>] wait_for_common+0xdb/0x180
[180480.019149]  [<ffffffff8105a124>] ? try_to_wake_up+0x284/0x380
[180480.019153]  [<ffffffff8105a220>] ? default_wake_function+0x0/0x20
[180480.019157]  [<ffffffff8154185d>] wait_for_completion+0x1d/0x20
[180480.019161]  [<ffffffff81165e07>] sync_inodes_sb+0x87/0xb0
[180480.019166]  [<ffffffff8116a692>] __sync_filesystem+0x82/0x90
[180480.019169]  [<ffffffff8116a779>] sync_filesystems+0xd9/0x130
[180480.019173]  [<ffffffff8116a831>] sys_sync+0x21/0x40
[180480.019178]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
Comment 10 Thomas Creutz 2010-10-02 12:02:06 UTC
current kernel is 2.6.32-25-generic
Comment 11 Thomas Creutz 2010-10-02 12:04:23 UTC
is there a filesystem mount option or modprobe param where we can try?
Comment 12 Nathan Grennan 2010-10-26 01:45:26 UTC
I have seen this problem on Ubuntu Lucid 2.6.32-25-generic, Ubuntu Maverick 2.6.35-22-generic, Ubuntu Maverick development 2.6.36-020636-generic, Ubuntu Lucid 2.6.31-10-rt, and Fedora 2.6.34.7-56.fc13.x86_64. I am pretty sure it happens on other kernels too, but not all have CONFIG_DETECT_HUNG_TASK=y. Yes, I was running dpkg on Fedora via a chroot.

I suspect that the problem relates to dpkg's use of fsync.


I have easy steps to reproduce.

1. watch -n1 'dd if=/dev/sda2 of=bigfile bs=1024000 count=30720'

You might have to change the device file for your system. /dev/zero will probably work.

2. watch -n1 'apt-get -y remove iotop ; apt-get -y install iotop'


Fedora 2.6.34.7-56.fc13.x86_64:
Oct 25 18:17:33 higgs kernel: INFO: task dpkg:3722 blocked for more than 120 seconds.
Oct 25 18:17:33 higgs kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 25 18:17:33 higgs kernel: dpkg          D 0000000000000003     0  3722   3711 0x00000080
Oct 25 18:17:33 higgs kernel: ffff880194947d88 0000000000000082 ffff880194947dc8 ffffffff8103e383
Oct 25 18:17:33 higgs kernel: ffff880194947fd8 ffff88034c9b2ee0 00000000000153c0 ffff880194947fd8
Oct 25 18:17:33 higgs kernel: 00000000000153c0 00000000000153c0 00000000000153c0 00000000000153c0
Oct 25 18:17:33 higgs kernel: Call Trace:
Oct 25 18:17:33 higgs kernel: [<ffffffff8103e383>] ? select_task_rq_fair+0x514/0x89a
Oct 25 18:17:33 higgs kernel: [<ffffffff811291b9>] ? bdi_sched_wait+0x0/0x12
Oct 25 18:17:33 higgs kernel: [<ffffffff811291c7>] bdi_sched_wait+0xe/0x12
Oct 25 18:17:33 higgs kernel: [<ffffffff8144ba60>] __wait_on_bit+0x48/0x7b
Oct 25 18:17:33 higgs kernel: [<ffffffff8144bb01>] out_of_line_wait_on_bit+0x6e/0x79
Oct 25 18:17:33 higgs kernel: [<ffffffff811291b9>] ? bdi_sched_wait+0x0/0x12
Oct 25 18:17:33 higgs kernel: [<ffffffff8106618c>] ? wake_bit_function+0x0/0x33
Oct 25 18:17:33 higgs kernel: [<ffffffff81129e38>] wait_on_bit.clone.1+0x1e/0x20
Oct 25 18:17:33 higgs kernel: [<ffffffff81129e9e>] bdi_sync_writeback+0x64/0x6b
Oct 25 18:17:33 higgs kernel: [<ffffffff81129ec7>] sync_inodes_sb+0x22/0xec
Oct 25 18:17:33 higgs kernel: [<ffffffff8112cf85>] __sync_filesystem+0x4c/0x83
Oct 25 18:17:33 higgs kernel: [<ffffffff8112d064>] sync_filesystems+0xa8/0xf5
Oct 25 18:17:33 higgs kernel: [<ffffffff8112d10b>] sys_sync+0x21/0x33
Oct 25 18:17:33 higgs kernel: [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
Comment 13 Thomas Creutz 2010-10-31 18:19:02 UTC
hi!

I have good news to this topic: I installed mainline kernel 2.6.36 (from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.36-maverick/) and now it work like a charm (tested it for ~ a week). 

I also try now the 2.6.36 kernel from https://launchpad.net/~kernel-ppa/+archive/ppa/+packages

So I will see if there the bug is also gone.
Comment 14 Nathan Grennan 2010-11-01 19:00:11 UTC
I had no trouble reproducing it on the latest kernel, compiled on Oct 21st, from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.36-maverick/.

Linux version 2.6.36-020636-generic (root@zinc) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #201010210905 SMP Thu Oct 21 09:08:58 UTC 2010


I used my simple dd + apt-get test I mentioned before.

[  361.230737] INFO: task jbd2/sda2-8:311 blocked for more than 120 seconds.
[  361.230740] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.230746] jbd2/sda2-8   D ffff880001ed4cc0     0   311      2 0x00000000
[  361.230749]  ffff8801b51b3d10 0000000000000046 ffff8801b51b3fd8 0000000000014cc0
[  361.230752]  0000000000014cc0 ffff8801b51b3fd8 0000000000014cc0 ffff8801b51b3fd8
[  361.230755]  0000000000014cc0 ffff8801b4ae3158 ffff8801b4ae3160 ffff8801b4ae2dc0
[  361.230757] Call Trace:
[  361.230764]  [<ffffffff8122bde6>] jbd2_journal_commit_transaction+0x1e6/0x11e0
[  361.230769]  [<ffffffff8100989b>] ? __switch_to+0xbb/0x2e0
[  361.230773]  [<ffffffff8105327e>] ? put_prev_entity+0x2e/0x70
[  361.230776]  [<ffffffff81588514>] ? _raw_spin_lock_irqsave+0x34/0x50
[  361.230780]  [<ffffffff81070dfb>] ? lock_timer_base+0x3b/0x70
[  361.230784]  [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40
[  361.230787]  [<ffffffff81230f45>] kjournald2+0xc5/0x240
[  361.230789]  [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40
[  361.230791]  [<ffffffff81230e80>] ? kjournald2+0x0/0x240
[  361.230793]  [<ffffffff81082b27>] kthread+0x97/0xa0
[  361.230796]  [<ffffffff8100bee4>] kernel_thread_helper+0x4/0x10
[  361.230798]  [<ffffffff81082a90>] ? kthread+0x0/0xa0
[  361.230800]  [<ffffffff8100bee0>] ? kernel_thread_helper+0x0/0x10
[  361.230803] INFO: task jbd2/sda3-8:816 blocked for more than 120 seconds.
[  361.230804] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.230805] jbd2/sda3-8   D ffff880001e54cc0     0   816      2 0x00000000
[  361.230808]  ffff8801b7d09b00 0000000000000046 ffff8801b7d09fd8 0000000000014cc0
[  361.230810]  0000000000014cc0 ffff8801b7d09fd8 0000000000014cc0 ffff8801b7d09fd8
[  361.230813]  0000000000014cc0 ffff8801b785c838 ffff8801b785c840 ffff8801b785c4a0
[  361.230815] Call Trace:
[  361.230819]  [<ffffffff81586543>] io_schedule+0x73/0xc0
[  361.230823]  [<ffffffff81104005>] sync_page+0x45/0x60
[  361.230824]  [<ffffffff81586e1e>] __wait_on_bit+0x5e/0x90
[  361.230827]  [<ffffffff81103fc0>] ? sync_page+0x0/0x60
[  361.230829]  [<ffffffff811041b5>] wait_on_page_bit+0x75/0x80
[  361.230831]  [<ffffffff81083240>] ? wake_bit_function+0x0/0x40
[  361.230834]  [<ffffffff8110f057>] ? pagevec_lookup_tag+0x27/0x40
[  361.230837]  [<ffffffff81104733>] filemap_fdatawait_range+0xf3/0x180
[  361.230839]  [<ffffffff811047eb>] filemap_fdatawait+0x2b/0x30
[  361.230842]  [<ffffffff8122b81e>] journal_finish_inode_data_buffers+0x5e/0x170
[  361.230845]  [<ffffffff8122c1c5>] jbd2_journal_commit_transaction+0x5c5/0x11e0
[  361.230847]  [<ffffffff81588514>] ? _raw_spin_lock_irqsave+0x34/0x50
[  361.230850]  [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40
[  361.230852]  [<ffffffff81230f45>] kjournald2+0xc5/0x240
[  361.230854]  [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40
[  361.230856]  [<ffffffff81230e80>] ? kjournald2+0x0/0x240
[  361.230858]  [<ffffffff81082b27>] kthread+0x97/0xa0
[  361.230860]  [<ffffffff8100bee4>] kernel_thread_helper+0x4/0x10
[  361.230863]  [<ffffffff81082a90>] ? kthread+0x0/0xa0
[  361.230865]  [<ffffffff8100bee0>] ? kernel_thread_helper+0x0/0x10
[  361.230870] INFO: task flush-8:0:1170 blocked for more than 120 seconds.
[  361.230871] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.230873] flush-8:0     D ffff880001e54cc0     0  1170      2 0x00000000
[  361.230875]  ffff8801af695530 0000000000000046 ffff8801af695fd8 0000000000014cc0
[  361.230878]  0000000000014cc0 ffff8801af695fd8 0000000000014cc0 ffff8801af695fd8
[  361.230880]  0000000000014cc0 ffff8801af72b158 ffff8801af72b160 ffff8801af72adc0
[  361.230883] Call Trace:
[  361.230885]  [<ffffffff81586543>] io_schedule+0x73/0xc0
[  361.230889]  [<ffffffff812a7f70>] get_request_wait+0xd0/0x170
[  361.230891]  [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40
[  361.230894]  [<ffffffff812a2029>] ? elv_merge+0x59/0x110
[  361.230896]  [<ffffffff812a809c>] __make_request+0x8c/0x4b0
[  361.230899]  [<ffffffff812a674e>] generic_make_request+0x20e/0x430
[  361.230902]  [<ffffffff811064ad>] ? mempool_alloc+0x6d/0x140
[  361.230904]  [<ffffffff812a69ed>] submit_bio+0x7d/0x100
[  361.230908]  [<ffffffff811801a3>] submit_bh+0xf3/0x140
[  361.230910]  [<ffffffff81181f83>] __block_write_full_page+0x1d3/0x350
[  361.230913]  [<ffffffff81181210>] ? end_buffer_async_write+0x0/0x170
[  361.230916]  [<ffffffff811f2910>] ? noalloc_get_block_write+0x0/0x30
[  361.230918]  [<ffffffff81181210>] ? end_buffer_async_write+0x0/0x170
[  361.230921]  [<ffffffff81182426>] block_write_full_page_endio+0x116/0x120
[  361.230923]  [<ffffffff811f2910>] ? noalloc_get_block_write+0x0/0x30
[  361.230925]  [<ffffffff81182445>] block_write_full_page+0x15/0x20
[  361.230927]  [<ffffffff811f2f9b>] ext4_writepage+0x10b/0x230
[  361.230930]  [<ffffffff811ef1ea>] mpage_da_submit_io+0x14a/0x1b0
[  361.230932]  [<ffffffff811f6218>] mpage_add_bh_to_extent+0xf8/0x100
[  361.230934]  [<ffffffff811f633e>] __mpage_da_writepage+0x11e/0x180
[  361.230936]  [<ffffffff811f6524>] write_cache_pages_da+0x184/0x2a0
[  361.230939]  [<ffffffff811f68cf>] ext4_da_writepages+0x28f/0x580
[  361.230942]  [<ffffffff810830c7>] ? bit_waitqueue+0x17/0xd0
[  361.230944]  [<ffffffff8110e051>] do_writepages+0x21/0x40
[  361.230946]  [<ffffffff81179a5f>] writeback_single_inode+0xaf/0x260
[  361.230948]  [<ffffffff81179e44>] writeback_sb_inodes+0xe4/0x160
[  361.230951]  [<ffffffff8117a032>] writeback_inodes_wb+0xd2/0xf0
[  361.230953]  [<ffffffff8117a24f>] wb_writeback+0x1ff/0x310
[  361.230955]  [<ffffffff8117a411>] wb_check_old_data_flush+0xb1/0xc0
[  361.230957]  [<ffffffff8117a4a9>] wb_do_writeback+0x89/0xd0
[  361.230959]  [<ffffffff8117a4f0>] ? bdi_writeback_thread+0x0/0x260
[  361.230961]  [<ffffffff8117a5a3>] bdi_writeback_thread+0xb3/0x260
[  361.230963]  [<ffffffff8117a4f0>] ? bdi_writeback_thread+0x0/0x260
[  361.230965]  [<ffffffff81082b27>] kthread+0x97/0xa0
[  361.230968]  [<ffffffff8100bee4>] kernel_thread_helper+0x4/0x10
[  361.230970]  [<ffffffff81082a90>] ? kthread+0x0/0xa0
[  361.230972]  [<ffffffff8100bee0>] ? kernel_thread_helper+0x0/0x10
[  361.230976] INFO: task dd:1756 blocked for more than 120 seconds.
[  361.230977] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.230978] dd            D ffff880001ed4cc0     0  1756   1755 0x00000000
[  361.230981]  ffff8801192ef848 0000000000000082 ffff8801192effd8 0000000000014cc0
[  361.230983]  0000000000014cc0 ffff8801192effd8 0000000000014cc0 ffff8801192effd8
[  361.230986]  0000000000014cc0 ffff8801aa580398 ffff8801aa5803a0 ffff8801aa580000
[  361.230988] Call Trace:
[  361.230991]  [<ffffffff81586543>] io_schedule+0x73/0xc0
[  361.230993]  [<ffffffff812a7f70>] get_request_wait+0xd0/0x170
[  361.230996]  [<ffffffff812b7056>] ? cfq_find_rq_fmerge+0x66/0x70
[  361.230999]  [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40
[  361.231001]  [<ffffffff812a20cb>] ? elv_merge+0xfb/0x110
[  361.231003]  [<ffffffff812a809c>] __make_request+0x8c/0x4b0
[  361.231006]  [<ffffffff8105b564>] ? find_busiest_group+0x544/0xa70
[  361.231008]  [<ffffffff812a674e>] generic_make_request+0x20e/0x430
[  361.231010]  [<ffffffff811064ad>] ? mempool_alloc+0x6d/0x140
[  361.231013]  [<ffffffff811088ea>] ? zone_watermark_ok+0x2a/0xf0
[  361.231016]  [<ffffffff81147946>] ? kmem_cache_alloc+0x96/0x120
[  361.231018]  [<ffffffff812a69ed>] submit_bio+0x7d/0x100
[  361.231020]  [<ffffffff811801a3>] submit_bh+0xf3/0x140
[  361.231022]  [<ffffffff8118422f>] block_read_full_page+0x29f/0x300
[  361.231025]  [<ffffffff811878f0>] ? blkdev_get_block+0x0/0x70
[  361.231028]  [<ffffffff8111ce15>] ? __inc_zone_page_state+0x35/0x40
[  361.231030]  [<ffffffff81104349>] ? add_to_page_cache_locked+0xe9/0x150
[  361.231033]  [<ffffffff81187498>] blkdev_readpage+0x18/0x20
[  361.231035]  [<ffffffff8110e56c>] read_pages+0xcc/0xe0
[  361.231037]  [<ffffffff8110e6c8>] __do_page_cache_readahead+0x148/0x160
[  361.231039]  [<ffffffff8110e701>] ra_submit+0x21/0x30
[  361.231041]  [<ffffffff8110ead8>] ondemand_readahead+0xa8/0x1b0
[  361.231044]  [<ffffffff8110ed76>] page_cache_sync_readahead+0x36/0x50
[  361.231046]  [<ffffffff81105436>] do_generic_file_read+0x266/0x430
[  361.231049]  [<ffffffff81104f80>] ? file_read_actor+0x0/0x190
[  361.231051]  [<ffffffff811056eb>] generic_file_aio_read+0xeb/0x220
[  361.231054]  [<ffffffff81156885>] do_sync_read+0xd5/0x120
[  361.231056]  [<ffffffff812c1e5f>] ? rb_erase+0xbf/0x140
[  361.231060]  [<ffffffff812635ba>] ? security_file_permission+0x6a/0x80
[  361.231062]  [<ffffffff81156c09>] vfs_read+0xc9/0x180
[  361.231064]  [<ffffffff81156e45>] sys_read+0x55/0x90
[  361.231066]  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[  361.231068] INFO: task apt-get:1811 blocked for more than 120 seconds.
[  361.231069] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.231070] apt-get       D ffff880001ed4cc0     0  1811   1781 0x00000000
[  361.231073]  ffff88012a7557b8 0000000000000082 ffff88012a755fd8 0000000000014cc0
[  361.231076]  0000000000014cc0 ffff88012a755fd8 0000000000014cc0 ffff88012a755fd8
[  361.231078]  0000000000014cc0 ffff8801b4afb158 ffff8801b4afb160 ffff8801b4afadc0
[  361.231080] Call Trace:
[  361.231083]  [<ffffffff81586543>] io_schedule+0x73/0xc0
[  361.231085]  [<ffffffff812a7f70>] get_request_wait+0xd0/0x170
[  361.231087]  [<ffffffff812b7056>] ? cfq_find_rq_fmerge+0x66/0x70
[  361.231090]  [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40
[  361.231092]  [<ffffffff812a20cb>] ? elv_merge+0xfb/0x110
[  361.231094]  [<ffffffff812a809c>] __make_request+0x8c/0x4b0
[  361.231097]  [<ffffffff812a674e>] generic_make_request+0x20e/0x430
[  361.231099]  [<ffffffff811064ad>] ? mempool_alloc+0x6d/0x140
[  361.231102]  [<ffffffff811513a4>] ? __mem_cgroup_try_charge+0x1f4/0x2b0
[  361.231104]  [<ffffffff812a69ed>] submit_bio+0x7d/0x100
[  361.231107]  [<ffffffff811801a3>] submit_bh+0xf3/0x140
[  361.231109]  [<ffffffff81181f83>] __block_write_full_page+0x1d3/0x350
[  361.231111]  [<ffffffff81181210>] ? end_buffer_async_write+0x0/0x170
[  361.231113]  [<ffffffff811f2910>] ? noalloc_get_block_write+0x0/0x30
[  361.231116]  [<ffffffff81181210>] ? end_buffer_async_write+0x0/0x170
[  361.231118]  [<ffffffff81182426>] block_write_full_page_endio+0x116/0x120
[  361.231120]  [<ffffffff811f2910>] ? noalloc_get_block_write+0x0/0x30
[  361.231123]  [<ffffffff81182445>] block_write_full_page+0x15/0x20
[  361.231125]  [<ffffffff811f2f9b>] ext4_writepage+0x10b/0x230
[  361.231127]  [<ffffffff811ef1ea>] mpage_da_submit_io+0x14a/0x1b0
[  361.231129]  [<ffffffff811f638e>] __mpage_da_writepage+0x16e/0x180
[  361.231131]  [<ffffffff811f6524>] write_cache_pages_da+0x184/0x2a0
[  361.231134]  [<ffffffff811f68cf>] ext4_da_writepages+0x28f/0x580
[  361.231136]  [<ffffffff81122905>] ? unmap_page_range+0x265/0x2c0
[  361.231139]  [<ffffffff8110e051>] do_writepages+0x21/0x40
[  361.231141]  [<ffffffff8110484e>] __filemap_fdatawrite_range+0x5e/0x70
[  361.231144]  [<ffffffff811048ba>] filemap_write_and_wait_range+0x5a/0x80
[  361.231146]  [<ffffffff8117e583>] vfs_fsync_range+0x63/0xa0
[  361.231148]  [<ffffffff8117e62c>] vfs_fsync+0x1c/0x20
[  361.231151]  [<ffffffff8112c2b7>] sys_msync+0x147/0x1c0
[  361.231153]  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Comment 15 listposter 2011-03-15 21:33:53 UTC
We have this problem all over our busy Ubuntu 10.04 hosts :(
Comment 16 Yill Din 2011-03-23 13:49:02 UTC
I'm hitting this too, only when I'm transfering files via USB. The problem is the sync() function, dpkg hangs when doing a sync(). If I call the sync command in another console, the command hangs too. Using Ubuntu Maverick (10.10): Linux 2.6.35-28-generic #49-Ubuntu SMP Tue Mar 1 14:39:03 UTC 2011 x86_64 GNU/Linux
Comment 17 Justin Wheeler 2011-05-06 20:49:12 UTC
A google search shows a lot of people hitting this bug.  (Myself included.)

The regression seems to carry all the way back to somewhere around 2.6.18.  Presumably it happens mostly on server-grade hardware, and most people on servers are running older versions of linux that come with their stable distributions, which is why it's being noticed more now.

I can trigger this bug by simply doing:

fio -readonly -name iops -rw=randread -bs=512 -runtime=300 -iodepth 24 -filename /dev/cciss/c0d0 -ioengine libaio -direct=1

(But dpkg installs are known to trigger it too.)

This is because almost all of the hewlett packard server management programs will have blocked while fio is running for > 120s, and often, even task 'jbd/cciss!c0d0' is one of the tasks that will block too.

(I can make dpkg do it too.)

I'm running a bone stock 2.6.38.5 kernel I've compiled myself to ensure it's not any kernel patches provided by the distro (debian squeeze).

If there's anything I can do to assist in testing, let me know.  This is a server I just purchased to put into service shortly, so there's currently no data on it that I am concerned about losing.
Comment 18 Justin Wheeler 2011-05-06 20:53:09 UTC
D'oh.  Accidentally hit submit too soon, and there were some updates I made to my post (mostly to fix the horrible 'brain dump' text.

Before 'This is because' is supposed to have, 'my screen fills with hung task messages'.
Comment 19 Bernd Zeimetz 2011-05-20 16:40:47 UTC
Easy to reproduce with 2.6.38.6, too - seems to be even worse than with 2.6.32 from squeeze.
Comment 20 Bernd Zeimetz 2011-05-29 11:27:03 UTC
Same for 2.6.39.
I'll attach a sysrq dump of the hungs tasks. Interesting is that always when the machine hangs cciss_vol_status (same happens with arrayprobe) is running/hung and seems to produce the blocked access to the cciss device.
Comment 21 Bernd Zeimetz 2011-05-29 11:29:23 UTC
Created attachment 59972 [details]
sysrq+t output

-> stuck cciss_vol_status processes
Comment 22 Thomas Creutz 2011-06-07 18:08:54 UTC
hmm.. I dont got the problem any more, since I updated to kernel 2.6.36

Currently I use 2.6.38-8-server and also not hit the bug again there...

root@s01:~# LANG=C apt-cache policy linux-image-2.6.38-8-server
linux-image-2.6.38-8-server:
  Installed: 2.6.38-8.42~lucid1
  Candidate: 2.6.38-8.42~lucid1
  Version table:
 *** 2.6.38-8.42~lucid1 0
        500 http://ppa.launchpad.net/kernel-ppa/ppa/ubuntu/ lucid/main Packages
        100 /var/lib/dpkg/status

possible that it is a config option cause the problem? Can somebody try the same kernel from kernel-ppa?
Comment 23 Carlos Alberto Lopez Perez 2011-06-08 21:37:18 UTC
I saw this bug many times after the upgrade from Debian 5.0 to 6.0 on some AMD64 boxes.

I can reproduce it easily.

1) Run a job that does heavy I/O
# ssh root@192.168.0.2 "dd if=/dev/sda1" | dd of=/dev/mapper/vg-backup
2) Start many other processes that does heavy I/O also
# For example: open 10 shells and start to execute random commands

* In few minutes some of this commands will become unresponsive
* And if you continue to execute more commands on another shell you will trigger the bug

Many people is suffering this bug. There are a lot of people complaining on Internet about this.

Some interesting discussion:
http://lkml.org/lkml/2010/2/19/240


I am attaching the output of dmesg with sysrq+t when this bug triggers on one of my boxes
Comment 24 Carlos Alberto Lopez Perez 2011-06-08 21:38:42 UTC
Created attachment 61222 [details]
Output of /proc/modules and dmesg with sysrq+t
Comment 25 Andrew Morton 2011-06-08 22:02:44 UTC
(udate by email because bugzilla is broken)

On Wed, 8 Jun 2011 21:38:43 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=18632

I always get a bit worried when I see dm-crypt involved - it doesn't
seem terribly well-behaved or reliable.

That being said, some of the other reports here indicate that we are
hitting a sync() livelock, sometimes fsync().  A pile of patches from
Wu Fengguang which could/should address these issues is about to hit
the linux-next tree.
Comment 26 Carlos Alberto Lopez Perez 2011-06-08 22:27:50 UTC
In this case the dm-crypt mount was not being used during the test so you can ignore it.

The "block stack" of the mountpoints where the I/O was happening (/) is the following:

(RAID1 over two SATA disks) -> LVM -> XFS

I find very interesting this comment from http://lkml.org/lkml/2010/2/24/2 :
"""
I guess the different layers doesn't really know that much about each 
other? I can imagine a few different scenarios where one only wants to do 
most of the scheduling on the lvm layer, and then wants to keep the 
queueing to a minimum on the other layers and keep the queue as small as 
possible there, so it can do the proper re-ordering.

Anyone has any thoughts to share on this? I don't have much experience 
with this when it comes to block devices, I'm a network engineer and I'm 
trying to use my experience in QoS/packet schedulers in different layers, 
where for instance when one runs an IP QoS scheduler, one doesn't want a 
lot of buffering on the underlying ATM layer, because it makes the IP 
schedulers job much harder.
"""

Wild question... could be this problem triggered because some queue or lock between the block layers (LVM<->RAID<->[sda|sdb]) ?

In the first job I write directly to the LVM->RAID1 queue with dd an on the other jobs I read/write random stuff from the root partition which is XFS->LVM->RAID1
Comment 27 Wu Fengguang 2011-06-09 04:03:59 UTC
On Thu, Jun 09, 2011 at 11:32:17AM +0800, Wu Fengguang wrote:
> On Thu, Jun 09, 2011 at 06:02:41AM +0800, Andrew Morton wrote:
> > (udate by email because bugzilla is broken)
> > 
> > On Wed, 8 Jun 2011 21:38:43 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=18632
> > 
> > I always get a bit worried when I see dm-crypt involved - it doesn't
> > seem terribly well-behaved or reliable.
> > 
> > That being said, some of the other reports here indicate that we are
> > hitting a sync() livelock, sometimes fsync().
> 
> These comments all mention "blocked sync(1) on concurrent heavy writes".
> 
>         Comment #6 From Douglas Augusto
>         Comment #12 From Nathan Grennan 
>         Comment #16 From justincase@yopmail.com 

>         Comment #23 From Carlos Alberto Lopez Perez

Oh it's not sync(1) in Carlos' case, but random commands like man/dd
as well as

        task xfssyncd:451 blocked for more than 120 seconds.

It looks more like XFS related livelock, because I ran into similar
problem in a very simple XFS setup on some plain disk partition.

Thanks,
Fengguang

> Comment #17 From Justin Wheeler is special in that fio is doing
> read-only IO. Is it updating the mtime and the mtime writes are in
> turn starved by the higher-priority continuous reads? (or at least the
> write speed cannot keep up with the new mtime dirties)
> 
> > A pile of patches from
> > Wu Fengguang which could/should address these issues is about to hit
> > the linux-next tree.
> 
> Some livelock fixes will show up in tomorrow's linux-next tree. As for
> now the fixes are available here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback.git fs-writeback
> 
> Thanks,
> Fengguang
Comment 28 Wu Fengguang 2011-06-09 04:58:29 UTC
On Thu, Jun 09, 2011 at 06:02:41AM +0800, Andrew Morton wrote:
> (udate by email because bugzilla is broken)
> 
> On Wed, 8 Jun 2011 21:38:43 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=18632
> 
> I always get a bit worried when I see dm-crypt involved - it doesn't
> seem terribly well-behaved or reliable.
> 
> That being said, some of the other reports here indicate that we are
> hitting a sync() livelock, sometimes fsync().

These comments all mention "blocked sync(1) on concurrent heavy writes".

        Comment #6 From Douglas Augusto
        Comment #12 From Nathan Grennan 
        Comment #16 From justincase@yopmail.com 
        Comment #23 From Carlos Alberto Lopez Perez

Comment #17 From Justin Wheeler is special in that fio is doing
read-only IO. Is it updating the mtime and the mtime writes are in
turn starved by the higher-priority continuous reads? (or at least the
write speed cannot keep up with the new mtime dirties)

> A pile of patches from
> Wu Fengguang which could/should address these issues is about to hit
> the linux-next tree.

Some livelock fixes will show up in tomorrow's linux-next tree. As for
now the fixes are available here:

git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback.git fs-writeback

Thanks,
Fengguang
Comment 29 Anonymous Emailer 2011-06-09 09:48:28 UTC
Reply-To: hch@infradead.org

On Thu, Jun 09, 2011 at 11:54:26AM +0800, Wu Fengguang wrote:
> Oh it's not sync(1) in Carlos' case, but random commands like man/dd
> as well as
> 
>         task xfssyncd:451 blocked for more than 120 seconds.
> 
> It looks more like XFS related livelock, because I ran into similar
> problem in a very simple XFS setup on some plain disk partition.

Care to explain what issues you saw, and with what setup and kernel
version?  Also usually the task blocked more than 120 seconds display
should come with a stacktrace, how does it look like?
Comment 30 Wu Fengguang 2011-06-09 10:34:16 UTC
On Thu, Jun 09, 2011 at 04:27:18PM +0800, Christoph Hellwig wrote:
> On Thu, Jun 09, 2011 at 11:54:26AM +0800, Wu Fengguang wrote:
> > Oh it's not sync(1) in Carlos' case, but random commands like man/dd
> > as well as
> > 
> >         task xfssyncd:451 blocked for more than 120 seconds.
> > 
> > It looks more like XFS related livelock, because I ran into similar
> > problem in a very simple XFS setup on some plain disk partition.
> 
> Care to explain what issues you saw, and with what setup and kernel
> version?  Also usually the task blocked more than 120 seconds display
> should come with a stacktrace, how does it look like?

I have a sync livelock test script and it sometimes livelocked on XFS
even with the livelock fix patches. Ext4 is always OK.

[ 3581.181253] sync            D ffff8800b6ca15d8  4560  4403   4392 0x00000000
[ 3581.181734]  ffff88006f775bc8 0000000000000046 ffff8800b6ca12b8 00000001b6ca1938
[ 3581.182411]  ffff88006f774000 00000000001d2e40 00000000001d2e40 ffff8800b6ca1280
[ 3581.183088]  00000000001d2e40 ffff88006f775fd8 00000340af111ef2 00000000001d2e40
[ 3581.183765] Call Trace:
[ 3581.184008]  [<ffffffff8109be73>] ? lock_release_holdtime+0xa3/0xab
[ 3581.184392]  [<ffffffff8108cc0d>] ? prepare_to_wait+0x6c/0x79
[ 3581.184756]  [<ffffffff8108cc0d>] ? prepare_to_wait+0x6c/0x79
[ 3581.185120]  [<ffffffff812ed520>] xfs_ioend_wait+0x87/0x9f
[ 3581.185474]  [<ffffffff8108c97a>] ? wake_up_bit+0x2a/0x2a
[ 3581.185827]  [<ffffffff812f742a>] xfs_sync_inode_data+0x92/0x9d
[ 3581.186198]  [<ffffffff812f76e2>] xfs_inode_ag_walk+0x1a5/0x287
[ 3581.186569]  [<ffffffff812f779b>] ? xfs_inode_ag_walk+0x25e/0x287
[ 3581.186946]  [<ffffffff812f7398>] ? xfs_sync_worker+0x69/0x69
[ 3581.187311]  [<ffffffff812e2354>] ? xfs_perag_get+0x68/0xd0
[ 3581.187669]  [<ffffffff81092175>] ? local_clock+0x41/0x5a
[ 3581.188020]  [<ffffffff8109be73>] ? lock_release_holdtime+0xa3/0xab
[ 3581.188403]  [<ffffffff812e22ec>] ? xfs_check_sizes+0x160/0x160
[ 3581.188773]  [<ffffffff812e2354>] ? xfs_perag_get+0x68/0xd0
[ 3581.189130]  [<ffffffff812e236c>] ? xfs_perag_get+0x80/0xd0
[ 3581.189488]  [<ffffffff812e22ec>] ? xfs_check_sizes+0x160/0x160
[ 3581.189858]  [<ffffffff812f7831>] ? xfs_inode_ag_iterator+0x6d/0x8f
[ 3581.190241]  [<ffffffff812f7398>] ? xfs_sync_worker+0x69/0x69
[ 3581.190606]  [<ffffffff812f780b>] xfs_inode_ag_iterator+0x47/0x8f
[ 3581.190982]  [<ffffffff811611f5>] ? __sync_filesystem+0x7a/0x7a
[ 3581.191352]  [<ffffffff812f7877>] xfs_sync_data+0x24/0x43
[ 3581.191703]  [<ffffffff812f7911>] xfs_quiesce_data+0x2c/0x88
[ 3581.192065]  [<ffffffff812f5556>] xfs_fs_sync_fs+0x21/0x48
[ 3581.192419]  [<ffffffff811611e1>] __sync_filesystem+0x66/0x7a
[ 3581.192783]  [<ffffffff8116120b>] sync_one_sb+0x16/0x18
[ 3581.193128]  [<ffffffff8113e3e3>] iterate_supers+0x72/0xce
[ 3581.193482]  [<ffffffff81161140>] sync_filesystems+0x20/0x22
[ 3581.193842]  [<ffffffff8116127e>] sys_sync+0x21/0x33
[ 3581.194177]  [<ffffffff819016c2>] system_call_fastpath+0x16/0x1b

I just reconfirmed the problem on 3.0-rc2 with/without the livelock
fix patches (writeback fixes and cleanups v5 at http://lkml.org/lkml/2011/6/7/569),
and find that situation has improved for XFS. It still has much longer
sync time than ext4, however won't stuck until dd exits.

root@fat /home/wfg# ./sync-livelock.sh

3.0-rc2, xfs:

        sync time: 20
        sync time: 26
        sync time: 27

3.0-rc2, ext4:
        sync time: 4
        sync time: 4
        sync time: 3

3.0-rc2 with livelock fix patches, xfs:
        sync time: 18
        sync time: 21
        sync time: 14
        sync time: 20
        sync time: 21

Thanks,
Fengguang
---
$ cat ./sync-livelock.sh
#!/bin/sh

umount /dev/sda7
mkfs.xfs -f /dev/sda7
# mkfs.ext4 /dev/sda7
# mkfs.btrfs /dev/sda7
mount /dev/sda7 /fs

echo $((50<<20)) > /proc/sys/vm/dirty_bytes

pid=
for i in `seq 10`
do     
        dd if=/dev/zero of=/fs/zero-$i bs=1M count=1000 &
        pid="$pid $!"
done

sleep 1

tic=$(date +'%s')
sync
tac=$(date +'%s')

echo
echo sync time: $((tac-tic))
egrep '(Dirty|Writeback|NFS_Unstable)' /proc/meminfo

pidof dd > /dev/null && { kill -9 $pid; echo sync NOT livelocked; }
Comment 31 Anonymous Emailer 2011-06-09 11:02:40 UTC
Reply-To: hch@infradead.org

On Thu, Jun 09, 2011 at 05:09:06PM +0800, Wu Fengguang wrote:
> I have a sync livelock test script and it sometimes livelocked on XFS
> even with the livelock fix patches. Ext4 is always OK.

This sounds similar to the cfq issue just posted to lkml as
"CFQ: async queue blocks the whole system".

Does this happen with non-CFQ I/O schedulers, too?

> [ 3581.185120]  [<ffffffff812ed520>] xfs_ioend_wait+0x87/0x9f

This waits for the I/O completion to actually arrive - something that
XFS does correctly in both sync and fsync, but ext4 only does for fsync.
It might have some issues in the way it's implemented, I'll look if
we can do something.  But I suspect cfq delaying async writes too much
is definitively going to cause issues for us here.
Comment 32 Wu Fengguang 2011-06-09 12:17:17 UTC
> > It might have some issues in the way it's implemented, I'll look if
> > we can do something.  But I suspect cfq delaying async writes too much
> > is definitively going to cause issues for us here.
> 
> It's definitely a problem that cfq delays async writes too much.
> However in Carlos's report,
> 
>         https://bugzilla.kernel.org/attachment.cgi?id=61222
> 
> there are no sync(1) or fsync running at all. So it may be indicating
> a different problem.

Sorry, your assumption is still valid: there are several bash doing
xfs_read()s which will generate the required sync IO to livelock async
writes.

Thanks,
Fengguang
Comment 33 Anonymous Emailer 2011-06-09 12:18:07 UTC
Reply-To: hch@infradead.org

On Thu, Jun 09, 2011 at 08:11:17PM +0800, Wu Fengguang wrote:
> > This waits for the I/O completion to actually arrive - something that
> > XFS does correctly in both sync and fsync, but ext4 only does for fsync.
> 
> Will it benefit to flush the disk _once_ at the end of sync?
> (perhaps it's not be as easy in complex storage setups or whatever)

XFS currently does it's own passes over data and metadata in sync. To
some extent they are overkill, but things like the iowait that's
prominent in your trace actually is needed to catch all in-flight
direct I/O.

> It's definitely a problem that cfq delays async writes too much.
> However in Carlos's report,
> 
>         https://bugzilla.kernel.org/attachment.cgi?id=61222
> 
> there are no sync(1) or fsync running at all. So it may be indicating
> a different problem.

If you have a heavy read load we still need to write back timestamps,
and it kinda sounds like he's hitting that.  No guarantee that actuall
is the cause, though.
Comment 34 Wu Fengguang 2011-06-09 12:43:29 UTC
> > It's definitely a problem that cfq delays async writes too much.
> > However in Carlos's report,
> > 
> >         https://bugzilla.kernel.org/attachment.cgi?id=61222
> > 
> > there are no sync(1) or fsync running at all. So it may be indicating
> > a different problem.
> 
> If you have a heavy read load we still need to write back timestamps,
> and it kinda sounds like he's hitting that.  No guarantee that actuall
> is the cause, though.

Right, in the other email I do find the reads.

So Carlos, it's worthwhile to try the deadline scheduler, which will
at least improve read/write workloads (likely the one you are running)
that don't involve sync/fsync.

Thanks,
Fengguang
Comment 35 Anonymous Emailer 2011-06-09 13:24:43 UTC
Reply-To: hch@infradead.org

Here is a patch that removes the XFS-internal writing of data in sync
and instead trusts the VFS code to get it right, and also does the
ioend_wait under a lock, to protect against new pending I/Os showing up.

It passed a single xfsqa run so far, so at least it won't blow up
completely.

Can you check if that helps with your livelock test case?  (And yes, I
should go and try it myself.  Will do as soon as I'll get a bit more
time).


Index: xfs/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_super.c	2011-06-09 14:17:06.952532698 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_super.c	2011-06-09 14:17:36.742531190 +0200
@@ -991,7 +991,7 @@ xfs_fs_write_inode(
 		 * ->sync_fs call do that for thus, which reduces the number
 		 * of synchronous log foces dramatically.
 		 */
-		xfs_ioend_wait(ip);
+//		xfs_ioend_wait(ip);
 		xfs_ilock(ip, XFS_ILOCK_SHARED);
 		if (ip->i_update_core) {
 			error = xfs_log_inode(ip);
Index: xfs/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c	2011-06-09 14:16:59.575866405 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_sync.c	2011-06-09 14:22:31.675849591 +0200
@@ -215,6 +215,19 @@ xfs_inode_ag_iterator(
 }
 
 STATIC int
+xfs_wait_ioend_cb(
+	struct xfs_inode	*ip,
+	struct xfs_perag	*pag,
+	int			flags)
+{
+	xfs_ilock(ip, XFS_IOLOCK_SHARED);
+	xfs_ioend_wait(ip);
+	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+
+	return 0;
+}
+
+STATIC int
 xfs_sync_inode_data(
 	struct xfs_inode	*ip,
 	struct xfs_perag	*pag,
@@ -359,14 +372,14 @@ xfs_quiesce_data(
 {
 	int			error, error2 = 0;
 
-	/* push non-blocking */
-	xfs_sync_data(mp, 0);
-	xfs_qm_sync(mp, SYNC_TRYLOCK);
-
-	/* push and block till complete */
-	xfs_sync_data(mp, SYNC_WAIT);
 	xfs_qm_sync(mp, SYNC_WAIT);
 
+	error = xfs_inode_ag_iterator(mp, xfs_wait_ioend_cb, 0);
+	if (error)
+		return XFS_ERROR(error);
+
+	xfs_log_force(mp, XFS_LOG_SYNC);
+
 	/* write superblock and hoover up shutdown errors */
 	error = xfs_sync_fsdata(mp);
Comment 36 Wu Fengguang 2011-06-09 13:38:06 UTC
On Thu, Jun 09, 2011 at 07:02:14PM +0800, Christoph Hellwig wrote:
> On Thu, Jun 09, 2011 at 05:09:06PM +0800, Wu Fengguang wrote:
> > I have a sync livelock test script and it sometimes livelocked on XFS
> > even with the livelock fix patches. Ext4 is always OK.
> 
> This sounds similar to the cfq issue just posted to lkml as
> "CFQ: async queue blocks the whole system".

I once ran two dd's doing sequential reads and writes in parallel, and
find the write dd to be completely blocked (note: I can no longer
reproduce it today, on 3.0-rc2). At the time I thought: "Wow, this is
good for typical desktop". But yes, it is livelocking async IOs, which
is bad.

> Does this happen with non-CFQ I/O schedulers, too?

Just tried the deadline scheduler, sync times are still long:

echo deadline > /sys/block/sda/queue/scheduler

sync time: 21
sync time: 22
sync time: 29

Also tried disabling the cfq low latency feature,

echo cfq > /sys/block/sda/queue/scheduler
echo 0   > /sys/block/sda/queue/iosched/low_latency

However the low_latency value seem to have NO much effects in the
sync time (and also don't considerably improve async dd write
bandwidth at the presence of another parallel dd read).

sync time: 19
sync time: 24
sync time: 22

> > [ 3581.185120]  [<ffffffff812ed520>] xfs_ioend_wait+0x87/0x9f
> 
> This waits for the I/O completion to actually arrive - something that
> XFS does correctly in both sync and fsync, but ext4 only does for fsync.

Will it benefit to flush the disk _once_ at the end of sync?
(perhaps it's not be as easy in complex storage setups or whatever)

> It might have some issues in the way it's implemented, I'll look if
> we can do something.  But I suspect cfq delaying async writes too much
> is definitively going to cause issues for us here.

It's definitely a problem that cfq delays async writes too much.
However in Carlos's report,

        https://bugzilla.kernel.org/attachment.cgi?id=61222

there are no sync(1) or fsync running at all. So it may be indicating
a different problem.

Thanks,
Fengguang
Comment 37 Anonymous Emailer 2011-06-09 14:22:02 UTC
Reply-To: hch@infradead.org

On Thu, Jun 09, 2011 at 10:12:38PM +0800, Tao Ma wrote:
> Just want to say more about the situation here. Actually the flusher is
> too much easier to be blocked by the sync requests. And whenever it is
> blocked, it takes a quite long time to get back(because several cfq
> designs), so do you think we can use WRITE_SYNC for the bdev inodes in
> flusher? AFAICS, in most of the cases when a volume is mounted, the
> writeback for a bdev inode means the metadata writeback. And they are
> very important for a file system and should be written as far as
> possible. I ran my test cases with the change, and now the livelock
> doesn't show up anymore.

It's not a very good guestimate for metadata.  A lot of metadata is
either stored in directories (often happens for directories) or doesn't
use the pagecache writeback functions at all.

The major problem here seems to be that async requests simply starve
sync requests far too much.
Comment 38 Tao Ma 2011-06-09 14:57:17 UTC
On 06/09/2011 08:43 PM, Wu Fengguang wrote:
>>> It's definitely a problem that cfq delays async writes too much.
>>> However in Carlos's report,
>>>
>>>         https://bugzilla.kernel.org/attachment.cgi?id=61222
>>>
>>> there are no sync(1) or fsync running at all. So it may be indicating
>>> a different problem.
>>
>> If you have a heavy read load we still need to write back timestamps,
>> and it kinda sounds like he's hitting that.  No guarantee that actuall
>> is the cause, though.
> 
> Right, in the other email I do find the reads.
> 
> So Carlos, it's worthwhile to try the deadline scheduler, which will
> at least improve read/write workloads (likely the one you are running)
> that don't involve sync/fsync.
yes, deadline scheduler surely should help in this cases.

Regards,
Tao
Comment 39 Tao Ma 2011-06-09 15:13:30 UTC
Hi Fengguang and Christoph,
On 06/09/2011 08:11 PM, Wu Fengguang wrote:
> On Thu, Jun 09, 2011 at 07:02:14PM +0800, Christoph Hellwig wrote:
>> On Thu, Jun 09, 2011 at 05:09:06PM +0800, Wu Fengguang wrote:
>>> I have a sync livelock test script and it sometimes livelocked on XFS
>>> even with the livelock fix patches. Ext4 is always OK.
>>
>> This sounds similar to the cfq issue just posted to lkml as
>> "CFQ: async queue blocks the whole system".
Just want to say more about the situation here. Actually the flusher is
too much easier to be blocked by the sync requests. And whenever it is
blocked, it takes a quite long time to get back(because several cfq
designs), so do you think we can use WRITE_SYNC for the bdev inodes in
flusher? AFAICS, in most of the cases when a volume is mounted, the
writeback for a bdev inode means the metadata writeback. And they are
very important for a file system and should be written as far as
possible. I ran my test cases with the change, and now the livelock
doesn't show up anymore.

Regards,
Tao
> 
> I once ran two dd's doing sequential reads and writes in parallel, and
> find the write dd to be completely blocked (note: I can no longer
> reproduce it today, on 3.0-rc2). At the time I thought: "Wow, this is
> good for typical desktop". But yes, it is livelocking async IOs, which
> is bad.
> 
>> Does this happen with non-CFQ I/O schedulers, too?
> 
> Just tried the deadline scheduler, sync times are still long:
> 
> echo deadline > /sys/block/sda/queue/scheduler
> 
> sync time: 21
> sync time: 22
> sync time: 29
> 
> Also tried disabling the cfq low latency feature,
> 
> echo cfq > /sys/block/sda/queue/scheduler
> echo 0   > /sys/block/sda/queue/iosched/low_latency
> 
> However the low_latency value seem to have NO much effects in the
> sync time (and also don't considerably improve async dd write
> bandwidth at the presence of another parallel dd read).
> 
> sync time: 19
> sync time: 24
> sync time: 22
> 
>>> [ 3581.185120]  [<ffffffff812ed520>] xfs_ioend_wait+0x87/0x9f
>>
>> This waits for the I/O completion to actually arrive - something that
>> XFS does correctly in both sync and fsync, but ext4 only does for fsync.
> 
> Will it benefit to flush the disk _once_ at the end of sync?
> (perhaps it's not be as easy in complex storage setups or whatever)
> 
>> It might have some issues in the way it's implemented, I'll look if
>> we can do something.  But I suspect cfq delaying async writes too much
>> is definitively going to cause issues for us here.
> 
> It's definitely a problem that cfq delays async writes too much.
> However in Carlos's report,
> 
>         https://bugzilla.kernel.org/attachment.cgi?id=61222
> 
> there are no sync(1) or fsync running at all. So it may be indicating
> a different problem.
> 
> Thanks,
> Fengguang
Comment 40 Anonymous Emailer 2011-06-09 15:35:59 UTC
Reply-To: vgoyal@redhat.com

On Thu, Jun 09, 2011 at 10:21:33AM -0400, Christoph Hellwig wrote:
> On Thu, Jun 09, 2011 at 10:12:38PM +0800, Tao Ma wrote:
> > Just want to say more about the situation here. Actually the flusher is
> > too much easier to be blocked by the sync requests. And whenever it is
> > blocked, it takes a quite long time to get back(because several cfq
> > designs), so do you think we can use WRITE_SYNC for the bdev inodes in
> > flusher? AFAICS, in most of the cases when a volume is mounted, the
> > writeback for a bdev inode means the metadata writeback. And they are
> > very important for a file system and should be written as far as
> > possible. I ran my test cases with the change, and now the livelock
> > doesn't show up anymore.
> 
> It's not a very good guestimate for metadata.  A lot of metadata is
> either stored in directories (often happens for directories) or doesn't
> use the pagecache writeback functions at all.
> 
> The major problem here seems to be that async requests simply starve
> sync requests far too much.

You mean sync requests starve async requests?

It is possible that CFQ can starve async requests for long time in
presence of sync reqeusts. If that's the case, all the reported issues
should go away with deadline scheduler.

As I mentioned in other mail, one commit made the dias very heavily
loaded in favor of sync requests.

commit f8ae6e3eb8251be32c6e913393d9f8d9e0609489
Author: Shaohua Li <shaohua.li@intel.com>
Date:   Fri Jan 14 08:41:02 2011 +0100

    block cfq: make queue preempt work for queues from different workload


I will do some quick tests and try to write a small patch where we can
keep track of how many times sync and async workloads have been scheduled
and make sure we don't starve async req completely.

Thanks
Vivek
Comment 41 Tao Ma 2011-06-09 15:54:20 UTC
On 06/09/2011 10:32 PM, Vivek Goyal wrote:
> On Thu, Jun 09, 2011 at 10:21:33AM -0400, Christoph Hellwig wrote:
>> On Thu, Jun 09, 2011 at 10:12:38PM +0800, Tao Ma wrote:
>>> Just want to say more about the situation here. Actually the flusher is
>>> too much easier to be blocked by the sync requests. And whenever it is
>>> blocked, it takes a quite long time to get back(because several cfq
>>> designs), so do you think we can use WRITE_SYNC for the bdev inodes in
>>> flusher? AFAICS, in most of the cases when a volume is mounted, the
>>> writeback for a bdev inode means the metadata writeback. And they are
>>> very important for a file system and should be written as far as
>>> possible. I ran my test cases with the change, and now the livelock
>>> doesn't show up anymore.
>>
>> It's not a very good guestimate for metadata.  A lot of metadata is
>> either stored in directories (often happens for directories) or doesn't
>> use the pagecache writeback functions at all.
>>
>> The major problem here seems to be that async requests simply starve
>> sync requests far too much.
> 
> You mean sync requests starve async requests?
> 
> It is possible that CFQ can starve async requests for long time in
> presence of sync reqeusts. If that's the case, all the reported issues
> should go away with deadline scheduler.
> 
> As I mentioned in other mail, one commit made the dias very heavily
> loaded in favor of sync requests.
> 
> commit f8ae6e3eb8251be32c6e913393d9f8d9e0609489
> Author: Shaohua Li <shaohua.li@intel.com>
> Date:   Fri Jan 14 08:41:02 2011 +0100
> 
>     block cfq: make queue preempt work for queues from different workload
> 
> 
> I will do some quick tests and try to write a small patch where we can
> keep track of how many times sync and async workloads have been scheduled
> and make sure we don't starve async req completely.
oh, so you mean the patch is the culprit? I will try to revert it to see
whether the system works better.

Regards,
Tao
Comment 42 Wu Fengguang 2011-06-10 03:22:02 UTC
On Thu, Jun 09, 2011 at 09:23:57PM +0800, Christoph Hellwig wrote:
> Here is a patch that removes the XFS-internal writing of data in sync
> and instead trusts the VFS code to get it right, and also does the
> ioend_wait under a lock, to protect against new pending I/Os showing up.
> 
> It passed a single xfsqa run so far, so at least it won't blow up
> completely.
> 
> Can you check if that helps with your livelock test case?  (And yes, I
> should go and try it myself.  Will do as soon as I'll get a bit more
> time).

My pleasure. It's easier to compare results on the same test bed and
I've made it very convenient to test patches :)

The patch does improve the sync time.

before patch                   after patch
--------------------------------------------------
sync time: 18                  sync time: 18
sync time: 21                  sync time: 15
sync time: 14                  sync time: 14
sync time: 20                  sync time: 16
sync time: 21                  sync time: 14
                               
avg        18.800              avg        15.400 
stddev      2.638              stddev      1.497 

I also did extra test runs and collect sync(1) call traces by doing
"echo t > /proc/sysrq-trigger" randomly.  The ones in XFS are:

[   78.395664] sync            D 0000000000000002  5296  3024   3003 0x00000000
[   78.396162]  ffff8800b4229de8 0000000000000046 ffffffff819252fa 0000000000000000
[   78.396850]  ffff8800b4228000 00000000001d1a40 00000000001d1a40 ffff8800aefa22f0
[   78.397519]  00000000001d1a40 ffff8800b4229fd8 00000000001d1a40 ffff8800b4229fd8
[   78.398189] Call Trace:
[   78.398431]  [<ffffffff819252fa>] ? schedule+0x374/0x992
[   78.398778]  [<ffffffff812e906f>] ? _xfs_log_force+0x1de/0x226
[   78.399143]  [<ffffffff81092be5>] ? local_clock+0x41/0x5a
[   78.399493]  [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab
[   78.399875]  [<ffffffff812e906f>] ? _xfs_log_force+0x1de/0x226
[   78.400239]  [<ffffffff812e9074>] _xfs_log_force+0x1e3/0x226
[   78.400599]  [<ffffffff81067c10>] ? try_to_wake_up+0x28b/0x28b
[   78.400964]  [<ffffffff813052f9>] ? xfs_inode_ag_iterator+0x6d/0x8f
[   78.401347]  [<ffffffff81164771>] ? __sync_filesystem+0x7a/0x7a
[   78.401715]  [<ffffffff812e90cf>] xfs_log_force+0x18/0x3a
[   78.402064]  [<ffffffff813053e8>] xfs_quiesce_data+0x3a/0x96
[   78.402424]  [<ffffffff81302f82>] xfs_fs_sync_fs+0x21/0x48
[   78.402777]  [<ffffffff8116475d>] __sync_filesystem+0x66/0x7a
[   78.403138]  [<ffffffff81164787>] sync_one_sb+0x16/0x18
[   78.403482]  [<ffffffff81142d37>] iterate_supers+0x72/0xc8
[   78.403835]  [<ffffffff811646bc>] sync_filesystems+0x20/0x22
[   78.404198]  [<ffffffff811647fa>] sys_sync+0x21/0x33
[   78.404533]  [<ffffffff8192f742>] system_call_fastpath+0x16/0x1b

[  221.106751] sync            D 0000000000000007  5296  3136   3116 0x00000000
[  221.107229]  ffff8800aee7bb40 0000000000000046 ffffffff819252fa 0000000000000000
[  221.107903]  ffff8800aee7a000 00000000001d1a40 00000000001d1a40 ffff88009c6a45e0
[  221.108574]  00000000001d1a40 ffff8800aee7bfd8 00000000001d1a40 ffff8800aee7bfd8
[  221.109246] Call Trace:
[  221.109486]  [<ffffffff819252fa>] ? schedule+0x374/0x992
[  221.109831]  [<ffffffff81928012>] ? rwsem_down_failed_common+0xc7/0x103
[  221.110225]  [<ffffffff81092be5>] ? local_clock+0x41/0x5a
[  221.110574]  [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab
[  221.110956]  [<ffffffff81928012>] ? rwsem_down_failed_common+0xc7/0x103
[  221.111352]  [<ffffffff8192801e>] rwsem_down_failed_common+0xd3/0x103
[  221.111738]  [<ffffffff81928075>] rwsem_down_read_failed+0x12/0x14
[  221.112116]  [<ffffffff813c81a4>] call_rwsem_down_read_failed+0x14/0x30
[  221.112513]  [<ffffffff812de648>] ? xfs_ilock+0x4d/0xba
[  221.112855]  [<ffffffff812de62b>] ? xfs_ilock+0x30/0xba
[  221.113199]  [<ffffffff8109153b>] ? down_read_nested+0x74/0x89
[  221.113563]  [<ffffffff812de648>] ? xfs_ilock+0x4d/0xba
[  221.113904]  [<ffffffff812de648>] xfs_ilock+0x4d/0xba
[  221.114243]  [<ffffffff81304ee0>] xfs_wait_ioend_cb+0x1b/0x36
[  221.114604]  [<ffffffff813051a9>] xfs_inode_ag_walk+0x1a5/0x288
[  221.114973]  [<ffffffff81305263>] ? xfs_inode_ag_walk+0x25f/0x288
[  221.115348]  [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d
[  221.115722]  [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab
[  221.116103]  [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160
[  221.116473]  [<ffffffff812efbfc>] ? xfs_perag_get+0x68/0xd1
[  221.116828]  [<ffffffff812efc14>] ? xfs_perag_get+0x80/0xd1
[  221.117185]  [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160
[  221.117553]  [<ffffffff813052f9>] ? xfs_inode_ag_iterator+0x6d/0x8f
[  221.117936]  [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d
[  221.118312]  [<ffffffff813052d3>] xfs_inode_ag_iterator+0x47/0x8f
[  221.118689]  [<ffffffff81164771>] ? __sync_filesystem+0x7a/0x7a
[  221.119058]  [<ffffffff813053d1>] xfs_quiesce_data+0x23/0x96
[  221.119417]  [<ffffffff81302f82>] xfs_fs_sync_fs+0x21/0x48
[  221.119771]  [<ffffffff8116475d>] __sync_filesystem+0x66/0x7a
[  221.120134]  [<ffffffff81164787>] sync_one_sb+0x16/0x18
[  221.120477]  [<ffffffff81142d37>] iterate_supers+0x72/0xc8
[  221.120828]  [<ffffffff811646bc>] sync_filesystems+0x20/0x22
[  221.121188]  [<ffffffff811647fa>] sys_sync+0x21/0x33

[  349.972430] sync            D 0000000000000007  5296  3312   3292 0x00000000
[  349.972907]  ffff88009c77fc08 0000000000000046 ffffffff819252fa 0000000100000070
[  349.973575]  ffff88009c77e000 00000000001d1a40 00000000001d1a40 ffff8800b56b0000
[  349.974250]  00000000001d1a40 ffff88009c77ffd8 00000000001d1a40 ffff88009c77ffd8
[  349.974921] Call Trace:
[  349.975159]  [<ffffffff819252fa>] ? schedule+0x374/0x992
[  349.975504]  [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab
[  349.975880]  [<ffffffff8108d5c5>] ? prepare_to_wait+0x6c/0x79
[  349.976238]  [<ffffffff812fade3>] xfs_ioend_wait+0x87/0x9f
[  349.976588]  [<ffffffff8108d332>] ? wake_up_bit+0x2a/0x2a
[  349.976932]  [<ffffffff812de648>] ? xfs_ilock+0x4d/0xba
[  349.977272]  [<ffffffff81304ee8>] xfs_wait_ioend_cb+0x23/0x36
[  349.977631]  [<ffffffff813051a9>] xfs_inode_ag_walk+0x1a5/0x288
[  349.977997]  [<ffffffff81305263>] ? xfs_inode_ag_walk+0x25f/0x288
[  349.978371]  [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d
[  349.978748]  [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab
[  349.979129]  [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160
[  349.979499]  [<ffffffff812efbfc>] ? xfs_perag_get+0x68/0xd1
[  349.979856]  [<ffffffff812efc14>] ? xfs_perag_get+0x80/0xd1
[  349.980211]  [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160
[  349.980582]  [<ffffffff813052f9>] ? xfs_inode_ag_iterator+0x6d/0x8f
[  349.980964]  [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d
[  349.981338]  [<ffffffff813052d3>] xfs_inode_ag_iterator+0x47/0x8f
[  349.981717]  [<ffffffff81164771>] ? __sync_filesystem+0x7a/0x7a
[  349.982085]  [<ffffffff813053d1>] xfs_quiesce_data+0x23/0x96
[  349.982443]  [<ffffffff81302f82>] xfs_fs_sync_fs+0x21/0x48
[  349.982798]  [<ffffffff8116475d>] __sync_filesystem+0x66/0x7a
[  349.983159]  [<ffffffff81164787>] sync_one_sb+0x16/0x18
[  349.983504]  [<ffffffff81142d37>] iterate_supers+0x72/0xc8
[  349.983857]  [<ffffffff811646bc>] sync_filesystems+0x20/0x22
[  349.984215]  [<ffffffff811647fa>] sys_sync+0x21/0x33
[  349.984552]  [<ffffffff8192f742>] system_call_fastpath+0x16/0x1b


[  419.417496] sync            D 0000000000000006  5296  3412   3392 0x00000000
[  419.417972]  ffff8800aef6dc08 0000000000000046 ffffffff819252fa 0000000100000070
[  419.418645]  ffff8800aef6c000 00000000001d1a40 00000000001d1a40 ffff8800b704a2f0
[  419.419317]  00000000001d1a40 ffff8800aef6dfd8 00000000001d1a40 ffff8800aef6dfd8
[  419.419987] Call Trace:
[  419.420228]  [<ffffffff819252fa>] ? schedule+0x374/0x992
[  419.420576]  [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab
[  419.420957]  [<ffffffff8108d5c5>] ? prepare_to_wait+0x6c/0x79
[  419.421321]  [<ffffffff812fade3>] xfs_ioend_wait+0x87/0x9f
[  419.421673]  [<ffffffff8108d332>] ? wake_up_bit+0x2a/0x2a
[  419.422020]  [<ffffffff812de648>] ? xfs_ilock+0x4d/0xba
[  419.422364]  [<ffffffff81304ee8>] xfs_wait_ioend_cb+0x23/0x36
[  419.422724]  [<ffffffff813051a9>] xfs_inode_ag_walk+0x1a5/0x288
[  419.423093]  [<ffffffff81305263>] ? xfs_inode_ag_walk+0x25f/0x288
[  419.423470]  [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d
[  419.423844]  [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab
[  419.424226]  [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160
[  419.424596]  [<ffffffff812efbfc>] ? xfs_perag_get+0x68/0xd1
[  419.424951]  [<ffffffff812efc14>] ? xfs_perag_get+0x80/0xd1
[  419.425307]  [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160
[  419.425676]  [<ffffffff813052f9>] ? xfs_inode_ag_iterator+0x6d/0x8f
[  419.426058]  [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d
[  419.426435]  [<ffffffff813052d3>] xfs_inode_ag_iterator+0x47/0x8f
[  419.426812]  [<ffffffff81164771>] ? __sync_filesystem+0x7a/0x7a
[  419.427180]  [<ffffffff813053d1>] xfs_quiesce_data+0x23/0x96
[  419.427542]  [<ffffffff81302f82>] xfs_fs_sync_fs+0x21/0x48
[  419.427895]  [<ffffffff8116475d>] __sync_filesystem+0x66/0x7a
[  419.428258]  [<ffffffff81164787>] sync_one_sb+0x16/0x18
[  419.428601]  [<ffffffff81142d37>] iterate_supers+0x72/0xc8
[  419.428954]  [<ffffffff811646bc>] sync_filesystems+0x20/0x22
[  419.429315]  [<ffffffff811647fa>] sys_sync+0x21/0x33
[  419.429648]  [<ffffffff8192f742>] system_call_fastpath+0x16/0x1b

Thanks,
Fengguang
Comment 43 Anonymous Emailer 2011-06-19 16:48:33 UTC
Reply-To: hch@infradead.org

On Fri, Jun 10, 2011 at 11:21:49AM +0800, Wu Fengguang wrote:
> > Can you check if that helps with your livelock test case?  (And yes, I
> > should go and try it myself.  Will do as soon as I'll get a bit more
> > time).
> 
> My pleasure. It's easier to compare results on the same test bed and
> I've made it very convenient to test patches :)

Here's one that should be even better, although in it's current form it
break a subtile corner case of sync semantics, so it'll need more work
if I go down that way:

Index: xfs/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c	2011-06-17 14:16:18.442399481 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_sync.c	2011-06-18 17:55:44.864025123 +0200
@@ -359,14 +359,16 @@ xfs_quiesce_data(
 {
 	int			error, error2 = 0;
 
-	/* push non-blocking */
-	xfs_sync_data(mp, 0);
 	xfs_qm_sync(mp, SYNC_TRYLOCK);
-
-	/* push and block till complete */
-	xfs_sync_data(mp, SYNC_WAIT);
 	xfs_qm_sync(mp, SYNC_WAIT);
 
+	/* flush all pending size updates and unwritten extent conversions */
+	flush_workqueue(xfsconvertd_workqueue);
+	flush_workqueue(xfsdatad_workqueue);
+
+	/* force out the newly dirtied log buffers */
+	xfs_log_force(mp, XFS_LOG_SYNC);
+
 	/* write superblock and hoover up shutdown errors */
 	error = xfs_sync_fsdata(mp);
 
Index: xfs/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_super.c	2011-06-18 17:51:05.660705925 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_super.c	2011-06-18 17:52:50.107367305 +0200
@@ -929,45 +929,12 @@ xfs_fs_write_inode(
 		 * ->sync_fs call do that for thus, which reduces the number
 		 * of synchronous log foces dramatically.
 		 */
-		xfs_ioend_wait(ip);
 		xfs_ilock(ip, XFS_ILOCK_SHARED);
-		if (ip->i_update_core) {
+		if (ip->i_update_core)
 			error = xfs_log_inode(ip);
-			if (error)
-				goto out_unlock;
-		}
-	} else {
-		/*
-		 * We make this non-blocking if the inode is contended, return
-		 * EAGAIN to indicate to the caller that they did not succeed.
-		 * This prevents the flush path from blocking on inodes inside
-		 * another operation right now, they get caught later by
-		 * xfs_sync.
-		 */
-		if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
-			goto out;
-
-		if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip))
-			goto out_unlock;
-
-		/*
-		 * Now we have the flush lock and the inode is not pinned, we
-		 * can check if the inode is really clean as we know that
-		 * there are no pending transaction completions, it is not
-		 * waiting on the delayed write queue and there is no IO in
-		 * progress.
-		 */
-		if (xfs_inode_clean(ip)) {
-			xfs_ifunlock(ip);
-			error = 0;
-			goto out_unlock;
-		}
-		error = xfs_iflush(ip, SYNC_TRYLOCK);
+		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 	}
 
- out_unlock:
-	xfs_iunlock(ip, XFS_ILOCK_SHARED);
- out:
 	/*
 	 * if we failed to write out the inode then mark
 	 * it dirty again so we'll try again later.
Comment 44 Wu Fengguang 2011-06-19 18:00:03 UTC
On Sun, Jun 19, 2011 at 11:56:46PM +0800, Christoph Hellwig wrote:
> On Fri, Jun 10, 2011 at 11:21:49AM +0800, Wu Fengguang wrote:
> > > Can you check if that helps with your livelock test case?  (And yes, I
> > > should go and try it myself.  Will do as soon as I'll get a bit more
> > > time).
> > 
> > My pleasure. It's easier to compare results on the same test bed and
> > I've made it very convenient to test patches :)
> 
> Here's one that should be even better, although in it's current form it
> break a subtile corner case of sync semantics, so it'll need more work
> if I go down that way:

The sync(1)s are much faster now, in most invocations it's in the same
level as ext4 :)

root@fat /home/wfg# ./sync-livelock.sh

sync time: 3
sync time: 5
sync time: 5
sync time: 6
sync time: 15
sync time: 5
sync time: 5
sync time: 5
sync time: 4
sync time: 10
sync time: 4
sync time: 4

Thanks,
Fengguang

> Index: xfs/fs/xfs/linux-2.6/xfs_sync.c
> ===================================================================
> --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c      2011-06-17 14:16:18.442399481
> +0200
> +++ xfs/fs/xfs/linux-2.6/xfs_sync.c   2011-06-18 17:55:44.864025123 +0200
> @@ -359,14 +359,16 @@ xfs_quiesce_data(
>  {
>       int                     error, error2 = 0;
>  
> -     /* push non-blocking */
> -     xfs_sync_data(mp, 0);
>       xfs_qm_sync(mp, SYNC_TRYLOCK);
> -
> -     /* push and block till complete */
> -     xfs_sync_data(mp, SYNC_WAIT);
>       xfs_qm_sync(mp, SYNC_WAIT);
>  
> +     /* flush all pending size updates and unwritten extent conversions */
> +     flush_workqueue(xfsconvertd_workqueue);
> +     flush_workqueue(xfsdatad_workqueue);
> +
> +     /* force out the newly dirtied log buffers */
> +     xfs_log_force(mp, XFS_LOG_SYNC);
> +
>       /* write superblock and hoover up shutdown errors */
>       error = xfs_sync_fsdata(mp);
>  
> Index: xfs/fs/xfs/linux-2.6/xfs_super.c
> ===================================================================
> --- xfs.orig/fs/xfs/linux-2.6/xfs_super.c     2011-06-18 17:51:05.660705925
> +0200
> +++ xfs/fs/xfs/linux-2.6/xfs_super.c  2011-06-18 17:52:50.107367305 +0200
> @@ -929,45 +929,12 @@ xfs_fs_write_inode(
>                * ->sync_fs call do that for thus, which reduces the number
>                * of synchronous log foces dramatically.
>                */
> -             xfs_ioend_wait(ip);
>               xfs_ilock(ip, XFS_ILOCK_SHARED);
> -             if (ip->i_update_core) {
> +             if (ip->i_update_core)
>                       error = xfs_log_inode(ip);
> -                     if (error)
> -                             goto out_unlock;
> -             }
> -     } else {
> -             /*
> -              * We make this non-blocking if the inode is contended, return
> -              * EAGAIN to indicate to the caller that they did not succeed.
> -              * This prevents the flush path from blocking on inodes inside
> -              * another operation right now, they get caught later by
> -              * xfs_sync.
> -              */
> -             if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
> -                     goto out;
> -
> -             if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip))
> -                     goto out_unlock;
> -
> -             /*
> -              * Now we have the flush lock and the inode is not pinned, we
> -              * can check if the inode is really clean as we know that
> -              * there are no pending transaction completions, it is not
> -              * waiting on the delayed write queue and there is no IO in
> -              * progress.
> -              */
> -             if (xfs_inode_clean(ip)) {
> -                     xfs_ifunlock(ip);
> -                     error = 0;
> -                     goto out_unlock;
> -             }
> -             error = xfs_iflush(ip, SYNC_TRYLOCK);
> +             xfs_iunlock(ip, XFS_ILOCK_SHARED);
>       }
>  
> - out_unlock:
> -     xfs_iunlock(ip, XFS_ILOCK_SHARED);
> - out:
>       /*
>        * if we failed to write out the inode then mark
>        * it dirty again so we'll try again later.