I have for a while the problem, that I can not install packages with the package manager from ubuntu. Every try results in a blocked dpkg task. I cant kill the task with kill -9 `pgrep dpkg` The only solution is to reboot my server. But when I try again to install/upgrade my system it hang again. In dmesg I get this: [ 1321.499551] INFO: task dpkg:3764 blocked for more than 120 seconds. [ 1321.499822] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1321.526755] dpkg D 00000000ffffffff 0 3764 3719 0x00000000 [ 1321.526761] ffff8802084fbdb8 0000000000000082 0000000000015bc0 0000000000015bc0 [ 1321.526766] ffff88020860b1a0 ffff8802084fbfd8 0000000000015bc0 ffff88020860ade0 [ 1321.526770] 0000000000015bc0 ffff8802084fbfd8 0000000000015bc0 ffff88020860b1a0 [ 1321.526780] Call Trace: [ 1321.526791] [<ffffffff811664b0>] ? bdi_sched_wait+0x0/0x20 [ 1321.526799] [<ffffffff811664be>] bdi_sched_wait+0xe/0x20 [ 1321.526809] [<ffffffff815429ef>] __wait_on_bit+0x5f/0x90 [ 1321.526816] [<ffffffff811664b0>] ? bdi_sched_wait+0x0/0x20 [ 1321.526824] [<ffffffff81542a98>] out_of_line_wait_on_bit+0x78/0x90 [ 1321.526834] [<ffffffff81085470>] ? wake_bit_function+0x0/0x40 [ 1321.526842] [<ffffffff81166474>] ? bdi_queue_work+0xa4/0xe0 [ 1321.526849] [<ffffffff8116782f>] bdi_sync_writeback+0x6f/0x80 [ 1321.526857] [<ffffffff81167860>] sync_inodes_sb+0x20/0x30 [ 1321.526865] [<ffffffff8116b332>] __sync_filesystem+0x82/0x90 [ 1321.526873] [<ffffffff8116b419>] sync_filesystems+0xd9/0x130 [ 1321.526881] [<ffffffff8116b4d1>] sys_sync+0x21/0x40 [ 1321.526890] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b [ 1441.509612] INFO: task dpkg:3764 blocked for more than 120 seconds. [ 1441.523584] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1441.550418] dpkg D 00000000ffffffff 0 3764 3719 0x00000000 [ 1441.550424] ffff8802084fbdb8 0000000000000082 0000000000015bc0 0000000000015bc0 [ 1441.550428] ffff88020860b1a0 ffff8802084fbfd8 0000000000015bc0 ffff88020860ade0 [ 1441.550433] 0000000000015bc0 ffff8802084fbfd8 0000000000015bc0 ffff88020860b1a0 [ 1441.550437] Call Trace: [ 1441.550447] [<ffffffff811664b0>] ? bdi_sched_wait+0x0/0x20 [ 1441.550455] [<ffffffff811664be>] bdi_sched_wait+0xe/0x20 [ 1441.550465] [<ffffffff815429ef>] __wait_on_bit+0x5f/0x90 [ 1441.550472] [<ffffffff811664b0>] ? bdi_sched_wait+0x0/0x20 [ 1441.550480] [<ffffffff81542a98>] out_of_line_wait_on_bit+0x78/0x90 [ 1441.550490] [<ffffffff81085470>] ? wake_bit_function+0x0/0x40 [ 1441.550497] [<ffffffff81166474>] ? bdi_queue_work+0xa4/0xe0 [ 1441.550505] [<ffffffff8116782f>] bdi_sync_writeback+0x6f/0x80 [ 1441.550513] [<ffffffff81167860>] sync_inodes_sb+0x20/0x30 [ 1441.550521] [<ffffffff8116b332>] __sync_filesystem+0x82/0x90 [ 1441.550529] [<ffffffff8116b419>] sync_filesystems+0xd9/0x130 [ 1441.550537] [<ffffffff8116b4d1>] sys_sync+0x21/0x40 [ 1441.550546] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b I created a bugreport to ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/624877 Hope we can find a solution for the problem Regards, Thomas
Recategorised to "IO storeage/block layer". Kernel version appears to be 2.6.32.
anything I can do to provide you with more information?
We've tried to fix a lot of things in this area since 2.6.32. If you can test a more recent kernel that would be good, thanks.
ok, I will see if i can copy the mainline image from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.35.4-maverick/ to my server. but thought the bug I cant install it with dpkg... :-(
ok, I booted to the rescue console provided at my provider. There I could chroot my server and installed with dpkg the needed packages.. but without success... after reboot and a try to upgrade my packages it stuck again in the dpkg process... I also got listed a new bug in dmesg: https://bugzilla.kernel.org/show_bug.cgi?id=18912 why the dpkg process and no other one?
I'm experiencing a similar bug--possibly the same bug--on two different Debian 64-bit machines, both with Linux 2.6.35 (Debian package 2.6.35-1~experimental.3). The message "task dpkg:PID blocked for more than 120 seconds" normally occurs when there are some heavy disk activity being done by other process (even from normal users), for instance, when I'm making backup. I've noticed that dpkg eventually works, but only after a long time and not necessarily immediately after the high disk activity has finished. During the time dpkg is blocked, besides not being able to kill it, I cannot sync the disk or even soft reboot the machine, otherwise these processes become also blocked. Fortunately, most of the other disk activities seem to work normally. Machine A is an Intel Xeon W3680 with 12GB RAM and 2x2TB WD20EARS disks in RAID10 (f2). This is how the call trace looks like: Sep 24 09:41:23 gnu kernel: [157565.846882] INFO: task dpkg:7258 blocked for more than 120 seconds. Sep 24 09:41:23 gnu kernel: [157565.846884] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 24 09:41:23 gnu kernel: [157565.846886] dpkg D 000000010257d0a1 0 7258 22003 0x00000000 Sep 24 09:41:23 gnu kernel: [157565.846889] ffff8803375be630 0000000000000082 ffff880336791e50 0000000000000001 Sep 24 09:41:23 gnu kernel: [157565.846893] ffff88033e0e4420 00000000000154c0 00000000000154c0 00000000000154c0 Sep 24 09:41:23 gnu kernel: [157565.846895] ffff880336791fd8 00000000000154c0 ffff8803375be630 ffff880336791fd8 Sep 24 09:41:23 gnu kernel: [157565.846897] Call Trace: Sep 24 09:41:23 gnu kernel: [157565.846904] [<ffffffff813054df>] ? schedule_timeout+0x2d/0xd7 Sep 24 09:41:23 gnu kernel: [157565.846907] [<ffffffff81038c8f>] ? enqueue_task_fair+0x21/0x62 Sep 24 09:41:23 gnu kernel: [157565.846911] [<ffffffff8100e7d9>] ? sched_clock+0x5/0x8 Sep 24 09:41:23 gnu kernel: [157565.846914] [<ffffffff81305354>] ? wait_for_common+0xd1/0x14e Sep 24 09:41:23 gnu kernel: [157565.846918] [<ffffffff8103f630>] ? default_wake_function+0x0/0xf Sep 24 09:41:23 gnu kernel: [157565.846921] [<ffffffff81106882>] ? sync_one_sb+0x0/0x20 Sep 24 09:41:23 gnu kernel: [157565.846925] [<ffffffff81102e85>] ? sync_inodes_sb+0x89/0x140 Sep 24 09:41:23 gnu kernel: [157565.846927] [<ffffffff81106882>] ? sync_one_sb+0x0/0x20 Sep 24 09:41:23 gnu kernel: [157565.846929] [<ffffffff8110684c>] ? __sync_filesystem+0x44/0x7a Sep 24 09:41:23 gnu kernel: [157565.846932] [<ffffffff810eab16>] ? iterate_supers+0x61/0xa0 Sep 24 09:41:23 gnu kernel: [157565.846934] [<ffffffff811068e1>] ? sys_sync+0x3f/0x54 Sep 24 09:41:23 gnu kernel: [157565.846937] [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b After having tried to sync while dpkg is blocked: Sep 24 09:49:23 gnu kernel: [158045.141824] INFO: task sync:18651 blocked for more than 120 seconds. Sep 24 09:49:23 gnu kernel: [158045.141827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 24 09:49:23 gnu kernel: [158045.141829] sync D ffff88033b4f86d0 0 18651 1329 0x00000000 Sep 24 09:49:23 gnu kernel: [158045.141833] ffff88033b4f86d0 0000000000000086 ffff8802a4a4fe50 0000000000000001 Sep 24 09:49:23 gnu kernel: [158045.141836] ffff88033e22d890 00000000000154c0 00000000000154c0 00000000000154c0 Sep 24 09:49:23 gnu kernel: [158045.141839] ffff8802a4a4ffd8 00000000000154c0 ffff88033b4f86d0 ffff8802a4a4ffd8 Sep 24 09:49:23 gnu kernel: [158045.141842] Call Trace: Sep 24 09:49:23 gnu kernel: [158045.141848] [<ffffffff813054df>] ? schedule_timeout+0x2d/0xd7 Sep 24 09:49:23 gnu kernel: [158045.141851] [<ffffffff81038c8f>] ? enqueue_task_fair+0x21/0x62 Sep 24 09:49:23 gnu kernel: [158045.141855] [<ffffffff8100e7d9>] ? sched_clock+0x5/0x8 Sep 24 09:49:23 gnu kernel: [158045.141858] [<ffffffff81305354>] ? wait_for_common+0xd1/0x14e Sep 24 09:49:23 gnu kernel: [158045.141862] [<ffffffff8103f630>] ? default_wake_function+0x0/0xf Sep 24 09:49:23 gnu kernel: [158045.141865] [<ffffffff81106882>] ? sync_one_sb+0x0/0x20 Sep 24 09:49:23 gnu kernel: [158045.141869] [<ffffffff81102e85>] ? sync_inodes_sb+0x89/0x140 Sep 24 09:49:23 gnu kernel: [158045.141871] [<ffffffff81106882>] ? sync_one_sb+0x0/0x20 Sep 24 09:49:23 gnu kernel: [158045.141873] [<ffffffff8110684c>] ? __sync_filesystem+0x44/0x7a Sep 24 09:49:23 gnu kernel: [158045.141877] [<ffffffff810eab16>] ? iterate_supers+0x61/0xa0 Sep 24 09:49:23 gnu kernel: [158045.141879] [<ffffffff811068e1>] ? sys_sync+0x3f/0x54 Sep 24 09:49:23 gnu kernel: [158045.141883] [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b Machine B is an AMD Phenon II 1090T with 8GB RAM and 1.5TB WD15EADS disk (no RAID). Call trace: Sep 25 20:49:41 phenom kernel: [26640.848742] INFO: task dpkg:23736 blocked for more than 120 seconds. Sep 25 20:49:41 phenom kernel: [26640.848744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 25 20:49:41 phenom kernel: [26640.848750] dpkg D 000000010060537e 0 23736 23721 0x00000000 Sep 25 20:49:41 phenom kernel: [26640.848753] ffff88020c2a1470 0000000000000086 ffff880001d17e50 0000000000000000 Sep 25 20:49:41 phenom kernel: [26640.848756] ffff88020ecba8e0 00000000000154c0 00000000000154c0 00000000000154c0 Sep 25 20:49:41 phenom kernel: [26640.848758] ffff880001d17fd8 00000000000154c0 ffff88020c2a1470 ffff880001d17fd8 Sep 25 20:49:41 phenom kernel: [26640.848760] Call Trace: Sep 25 20:49:41 phenom kernel: [26640.848766] [<ffffffff813054df>] ? schedule_timeout+0x2d/0xd7 Sep 25 20:49:41 phenom kernel: [26640.848769] [<ffffffff81038c8f>] ? enqueue_task_fair+0x21/0x62 Sep 25 20:49:41 phenom kernel: [26640.848771] [<ffffffff81305354>] ? wait_for_common+0xd1/0x14e Sep 25 20:49:41 phenom kernel: [26640.848774] [<ffffffff8103f630>] ? default_wake_function+0x0/0xf Sep 25 20:49:41 phenom kernel: [26640.848776] [<ffffffff81106882>] ? sync_one_sb+0x0/0x20 Sep 25 20:49:41 phenom kernel: [26640.848778] [<ffffffff81102e85>] ? sync_inodes_sb+0x89/0x140 Sep 25 20:49:41 phenom kernel: [26640.848780] [<ffffffff81106882>] ? sync_one_sb+0x0/0x20 Sep 25 20:49:41 phenom kernel: [26640.848781] [<ffffffff8110684c>] ? __sync_filesystem+0x44/0x7a Sep 25 20:49:41 phenom kernel: [26640.848784] [<ffffffff810eab16>] ? iterate_supers+0x61/0xa0 Sep 25 20:49:41 phenom kernel: [26640.848785] [<ffffffff811068e1>] ? sys_sync+0x3f/0x54 Sep 25 20:49:41 phenom kernel: [26640.848788] [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b -- Douglas A. Augusto
I could resolve the problem: I had "iommu=soft" and "nomce" defined as boot option. When I removed this options, it worked fine.
no fail, remove the both params not fix the problem :-( today I got it again. Every time with dpkg!?
[180480.018397] INFO: task dpkg:20086 blocked for more than 120 seconds. [180480.018686] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [180480.019102] dpkg D 0000000000000000 0 20086 20068 0x00000000 [180480.019107] ffff880066993d38 0000000000000082 0000000000015bc0 0000000000015bc0 [180480.019112] ffff88023400c890 ffff880066993fd8 0000000000015bc0 ffff88023400c4d0 [180480.019117] 0000000000015bc0 ffff880066993fd8 0000000000015bc0 ffff88023400c890 [180480.019121] Call Trace: [180480.019130] [<ffffffff81541add>] schedule_timeout+0x22d/0x300 [180480.019135] [<ffffffff8105da22>] ? enqueue_entity+0x122/0x1a0 [180480.019140] [<ffffffff81061e14>] ? check_preempt_wakeup+0x1c4/0x3c0 [180480.019144] [<ffffffff815416fb>] wait_for_common+0xdb/0x180 [180480.019149] [<ffffffff8105a124>] ? try_to_wake_up+0x284/0x380 [180480.019153] [<ffffffff8105a220>] ? default_wake_function+0x0/0x20 [180480.019157] [<ffffffff8154185d>] wait_for_completion+0x1d/0x20 [180480.019161] [<ffffffff81165e07>] sync_inodes_sb+0x87/0xb0 [180480.019166] [<ffffffff8116a692>] __sync_filesystem+0x82/0x90 [180480.019169] [<ffffffff8116a779>] sync_filesystems+0xd9/0x130 [180480.019173] [<ffffffff8116a831>] sys_sync+0x21/0x40 [180480.019178] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
current kernel is 2.6.32-25-generic
is there a filesystem mount option or modprobe param where we can try?
I have seen this problem on Ubuntu Lucid 2.6.32-25-generic, Ubuntu Maverick 2.6.35-22-generic, Ubuntu Maverick development 2.6.36-020636-generic, Ubuntu Lucid 2.6.31-10-rt, and Fedora 2.6.34.7-56.fc13.x86_64. I am pretty sure it happens on other kernels too, but not all have CONFIG_DETECT_HUNG_TASK=y. Yes, I was running dpkg on Fedora via a chroot. I suspect that the problem relates to dpkg's use of fsync. I have easy steps to reproduce. 1. watch -n1 'dd if=/dev/sda2 of=bigfile bs=1024000 count=30720' You might have to change the device file for your system. /dev/zero will probably work. 2. watch -n1 'apt-get -y remove iotop ; apt-get -y install iotop' Fedora 2.6.34.7-56.fc13.x86_64: Oct 25 18:17:33 higgs kernel: INFO: task dpkg:3722 blocked for more than 120 seconds. Oct 25 18:17:33 higgs kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 25 18:17:33 higgs kernel: dpkg D 0000000000000003 0 3722 3711 0x00000080 Oct 25 18:17:33 higgs kernel: ffff880194947d88 0000000000000082 ffff880194947dc8 ffffffff8103e383 Oct 25 18:17:33 higgs kernel: ffff880194947fd8 ffff88034c9b2ee0 00000000000153c0 ffff880194947fd8 Oct 25 18:17:33 higgs kernel: 00000000000153c0 00000000000153c0 00000000000153c0 00000000000153c0 Oct 25 18:17:33 higgs kernel: Call Trace: Oct 25 18:17:33 higgs kernel: [<ffffffff8103e383>] ? select_task_rq_fair+0x514/0x89a Oct 25 18:17:33 higgs kernel: [<ffffffff811291b9>] ? bdi_sched_wait+0x0/0x12 Oct 25 18:17:33 higgs kernel: [<ffffffff811291c7>] bdi_sched_wait+0xe/0x12 Oct 25 18:17:33 higgs kernel: [<ffffffff8144ba60>] __wait_on_bit+0x48/0x7b Oct 25 18:17:33 higgs kernel: [<ffffffff8144bb01>] out_of_line_wait_on_bit+0x6e/0x79 Oct 25 18:17:33 higgs kernel: [<ffffffff811291b9>] ? bdi_sched_wait+0x0/0x12 Oct 25 18:17:33 higgs kernel: [<ffffffff8106618c>] ? wake_bit_function+0x0/0x33 Oct 25 18:17:33 higgs kernel: [<ffffffff81129e38>] wait_on_bit.clone.1+0x1e/0x20 Oct 25 18:17:33 higgs kernel: [<ffffffff81129e9e>] bdi_sync_writeback+0x64/0x6b Oct 25 18:17:33 higgs kernel: [<ffffffff81129ec7>] sync_inodes_sb+0x22/0xec Oct 25 18:17:33 higgs kernel: [<ffffffff8112cf85>] __sync_filesystem+0x4c/0x83 Oct 25 18:17:33 higgs kernel: [<ffffffff8112d064>] sync_filesystems+0xa8/0xf5 Oct 25 18:17:33 higgs kernel: [<ffffffff8112d10b>] sys_sync+0x21/0x33 Oct 25 18:17:33 higgs kernel: [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b
hi! I have good news to this topic: I installed mainline kernel 2.6.36 (from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.36-maverick/) and now it work like a charm (tested it for ~ a week). I also try now the 2.6.36 kernel from https://launchpad.net/~kernel-ppa/+archive/ppa/+packages So I will see if there the bug is also gone.
I had no trouble reproducing it on the latest kernel, compiled on Oct 21st, from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.36-maverick/. Linux version 2.6.36-020636-generic (root@zinc) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #201010210905 SMP Thu Oct 21 09:08:58 UTC 2010 I used my simple dd + apt-get test I mentioned before. [ 361.230737] INFO: task jbd2/sda2-8:311 blocked for more than 120 seconds. [ 361.230740] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 361.230746] jbd2/sda2-8 D ffff880001ed4cc0 0 311 2 0x00000000 [ 361.230749] ffff8801b51b3d10 0000000000000046 ffff8801b51b3fd8 0000000000014cc0 [ 361.230752] 0000000000014cc0 ffff8801b51b3fd8 0000000000014cc0 ffff8801b51b3fd8 [ 361.230755] 0000000000014cc0 ffff8801b4ae3158 ffff8801b4ae3160 ffff8801b4ae2dc0 [ 361.230757] Call Trace: [ 361.230764] [<ffffffff8122bde6>] jbd2_journal_commit_transaction+0x1e6/0x11e0 [ 361.230769] [<ffffffff8100989b>] ? __switch_to+0xbb/0x2e0 [ 361.230773] [<ffffffff8105327e>] ? put_prev_entity+0x2e/0x70 [ 361.230776] [<ffffffff81588514>] ? _raw_spin_lock_irqsave+0x34/0x50 [ 361.230780] [<ffffffff81070dfb>] ? lock_timer_base+0x3b/0x70 [ 361.230784] [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40 [ 361.230787] [<ffffffff81230f45>] kjournald2+0xc5/0x240 [ 361.230789] [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40 [ 361.230791] [<ffffffff81230e80>] ? kjournald2+0x0/0x240 [ 361.230793] [<ffffffff81082b27>] kthread+0x97/0xa0 [ 361.230796] [<ffffffff8100bee4>] kernel_thread_helper+0x4/0x10 [ 361.230798] [<ffffffff81082a90>] ? kthread+0x0/0xa0 [ 361.230800] [<ffffffff8100bee0>] ? kernel_thread_helper+0x0/0x10 [ 361.230803] INFO: task jbd2/sda3-8:816 blocked for more than 120 seconds. [ 361.230804] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 361.230805] jbd2/sda3-8 D ffff880001e54cc0 0 816 2 0x00000000 [ 361.230808] ffff8801b7d09b00 0000000000000046 ffff8801b7d09fd8 0000000000014cc0 [ 361.230810] 0000000000014cc0 ffff8801b7d09fd8 0000000000014cc0 ffff8801b7d09fd8 [ 361.230813] 0000000000014cc0 ffff8801b785c838 ffff8801b785c840 ffff8801b785c4a0 [ 361.230815] Call Trace: [ 361.230819] [<ffffffff81586543>] io_schedule+0x73/0xc0 [ 361.230823] [<ffffffff81104005>] sync_page+0x45/0x60 [ 361.230824] [<ffffffff81586e1e>] __wait_on_bit+0x5e/0x90 [ 361.230827] [<ffffffff81103fc0>] ? sync_page+0x0/0x60 [ 361.230829] [<ffffffff811041b5>] wait_on_page_bit+0x75/0x80 [ 361.230831] [<ffffffff81083240>] ? wake_bit_function+0x0/0x40 [ 361.230834] [<ffffffff8110f057>] ? pagevec_lookup_tag+0x27/0x40 [ 361.230837] [<ffffffff81104733>] filemap_fdatawait_range+0xf3/0x180 [ 361.230839] [<ffffffff811047eb>] filemap_fdatawait+0x2b/0x30 [ 361.230842] [<ffffffff8122b81e>] journal_finish_inode_data_buffers+0x5e/0x170 [ 361.230845] [<ffffffff8122c1c5>] jbd2_journal_commit_transaction+0x5c5/0x11e0 [ 361.230847] [<ffffffff81588514>] ? _raw_spin_lock_irqsave+0x34/0x50 [ 361.230850] [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40 [ 361.230852] [<ffffffff81230f45>] kjournald2+0xc5/0x240 [ 361.230854] [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40 [ 361.230856] [<ffffffff81230e80>] ? kjournald2+0x0/0x240 [ 361.230858] [<ffffffff81082b27>] kthread+0x97/0xa0 [ 361.230860] [<ffffffff8100bee4>] kernel_thread_helper+0x4/0x10 [ 361.230863] [<ffffffff81082a90>] ? kthread+0x0/0xa0 [ 361.230865] [<ffffffff8100bee0>] ? kernel_thread_helper+0x0/0x10 [ 361.230870] INFO: task flush-8:0:1170 blocked for more than 120 seconds. [ 361.230871] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 361.230873] flush-8:0 D ffff880001e54cc0 0 1170 2 0x00000000 [ 361.230875] ffff8801af695530 0000000000000046 ffff8801af695fd8 0000000000014cc0 [ 361.230878] 0000000000014cc0 ffff8801af695fd8 0000000000014cc0 ffff8801af695fd8 [ 361.230880] 0000000000014cc0 ffff8801af72b158 ffff8801af72b160 ffff8801af72adc0 [ 361.230883] Call Trace: [ 361.230885] [<ffffffff81586543>] io_schedule+0x73/0xc0 [ 361.230889] [<ffffffff812a7f70>] get_request_wait+0xd0/0x170 [ 361.230891] [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40 [ 361.230894] [<ffffffff812a2029>] ? elv_merge+0x59/0x110 [ 361.230896] [<ffffffff812a809c>] __make_request+0x8c/0x4b0 [ 361.230899] [<ffffffff812a674e>] generic_make_request+0x20e/0x430 [ 361.230902] [<ffffffff811064ad>] ? mempool_alloc+0x6d/0x140 [ 361.230904] [<ffffffff812a69ed>] submit_bio+0x7d/0x100 [ 361.230908] [<ffffffff811801a3>] submit_bh+0xf3/0x140 [ 361.230910] [<ffffffff81181f83>] __block_write_full_page+0x1d3/0x350 [ 361.230913] [<ffffffff81181210>] ? end_buffer_async_write+0x0/0x170 [ 361.230916] [<ffffffff811f2910>] ? noalloc_get_block_write+0x0/0x30 [ 361.230918] [<ffffffff81181210>] ? end_buffer_async_write+0x0/0x170 [ 361.230921] [<ffffffff81182426>] block_write_full_page_endio+0x116/0x120 [ 361.230923] [<ffffffff811f2910>] ? noalloc_get_block_write+0x0/0x30 [ 361.230925] [<ffffffff81182445>] block_write_full_page+0x15/0x20 [ 361.230927] [<ffffffff811f2f9b>] ext4_writepage+0x10b/0x230 [ 361.230930] [<ffffffff811ef1ea>] mpage_da_submit_io+0x14a/0x1b0 [ 361.230932] [<ffffffff811f6218>] mpage_add_bh_to_extent+0xf8/0x100 [ 361.230934] [<ffffffff811f633e>] __mpage_da_writepage+0x11e/0x180 [ 361.230936] [<ffffffff811f6524>] write_cache_pages_da+0x184/0x2a0 [ 361.230939] [<ffffffff811f68cf>] ext4_da_writepages+0x28f/0x580 [ 361.230942] [<ffffffff810830c7>] ? bit_waitqueue+0x17/0xd0 [ 361.230944] [<ffffffff8110e051>] do_writepages+0x21/0x40 [ 361.230946] [<ffffffff81179a5f>] writeback_single_inode+0xaf/0x260 [ 361.230948] [<ffffffff81179e44>] writeback_sb_inodes+0xe4/0x160 [ 361.230951] [<ffffffff8117a032>] writeback_inodes_wb+0xd2/0xf0 [ 361.230953] [<ffffffff8117a24f>] wb_writeback+0x1ff/0x310 [ 361.230955] [<ffffffff8117a411>] wb_check_old_data_flush+0xb1/0xc0 [ 361.230957] [<ffffffff8117a4a9>] wb_do_writeback+0x89/0xd0 [ 361.230959] [<ffffffff8117a4f0>] ? bdi_writeback_thread+0x0/0x260 [ 361.230961] [<ffffffff8117a5a3>] bdi_writeback_thread+0xb3/0x260 [ 361.230963] [<ffffffff8117a4f0>] ? bdi_writeback_thread+0x0/0x260 [ 361.230965] [<ffffffff81082b27>] kthread+0x97/0xa0 [ 361.230968] [<ffffffff8100bee4>] kernel_thread_helper+0x4/0x10 [ 361.230970] [<ffffffff81082a90>] ? kthread+0x0/0xa0 [ 361.230972] [<ffffffff8100bee0>] ? kernel_thread_helper+0x0/0x10 [ 361.230976] INFO: task dd:1756 blocked for more than 120 seconds. [ 361.230977] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 361.230978] dd D ffff880001ed4cc0 0 1756 1755 0x00000000 [ 361.230981] ffff8801192ef848 0000000000000082 ffff8801192effd8 0000000000014cc0 [ 361.230983] 0000000000014cc0 ffff8801192effd8 0000000000014cc0 ffff8801192effd8 [ 361.230986] 0000000000014cc0 ffff8801aa580398 ffff8801aa5803a0 ffff8801aa580000 [ 361.230988] Call Trace: [ 361.230991] [<ffffffff81586543>] io_schedule+0x73/0xc0 [ 361.230993] [<ffffffff812a7f70>] get_request_wait+0xd0/0x170 [ 361.230996] [<ffffffff812b7056>] ? cfq_find_rq_fmerge+0x66/0x70 [ 361.230999] [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40 [ 361.231001] [<ffffffff812a20cb>] ? elv_merge+0xfb/0x110 [ 361.231003] [<ffffffff812a809c>] __make_request+0x8c/0x4b0 [ 361.231006] [<ffffffff8105b564>] ? find_busiest_group+0x544/0xa70 [ 361.231008] [<ffffffff812a674e>] generic_make_request+0x20e/0x430 [ 361.231010] [<ffffffff811064ad>] ? mempool_alloc+0x6d/0x140 [ 361.231013] [<ffffffff811088ea>] ? zone_watermark_ok+0x2a/0xf0 [ 361.231016] [<ffffffff81147946>] ? kmem_cache_alloc+0x96/0x120 [ 361.231018] [<ffffffff812a69ed>] submit_bio+0x7d/0x100 [ 361.231020] [<ffffffff811801a3>] submit_bh+0xf3/0x140 [ 361.231022] [<ffffffff8118422f>] block_read_full_page+0x29f/0x300 [ 361.231025] [<ffffffff811878f0>] ? blkdev_get_block+0x0/0x70 [ 361.231028] [<ffffffff8111ce15>] ? __inc_zone_page_state+0x35/0x40 [ 361.231030] [<ffffffff81104349>] ? add_to_page_cache_locked+0xe9/0x150 [ 361.231033] [<ffffffff81187498>] blkdev_readpage+0x18/0x20 [ 361.231035] [<ffffffff8110e56c>] read_pages+0xcc/0xe0 [ 361.231037] [<ffffffff8110e6c8>] __do_page_cache_readahead+0x148/0x160 [ 361.231039] [<ffffffff8110e701>] ra_submit+0x21/0x30 [ 361.231041] [<ffffffff8110ead8>] ondemand_readahead+0xa8/0x1b0 [ 361.231044] [<ffffffff8110ed76>] page_cache_sync_readahead+0x36/0x50 [ 361.231046] [<ffffffff81105436>] do_generic_file_read+0x266/0x430 [ 361.231049] [<ffffffff81104f80>] ? file_read_actor+0x0/0x190 [ 361.231051] [<ffffffff811056eb>] generic_file_aio_read+0xeb/0x220 [ 361.231054] [<ffffffff81156885>] do_sync_read+0xd5/0x120 [ 361.231056] [<ffffffff812c1e5f>] ? rb_erase+0xbf/0x140 [ 361.231060] [<ffffffff812635ba>] ? security_file_permission+0x6a/0x80 [ 361.231062] [<ffffffff81156c09>] vfs_read+0xc9/0x180 [ 361.231064] [<ffffffff81156e45>] sys_read+0x55/0x90 [ 361.231066] [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b [ 361.231068] INFO: task apt-get:1811 blocked for more than 120 seconds. [ 361.231069] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 361.231070] apt-get D ffff880001ed4cc0 0 1811 1781 0x00000000 [ 361.231073] ffff88012a7557b8 0000000000000082 ffff88012a755fd8 0000000000014cc0 [ 361.231076] 0000000000014cc0 ffff88012a755fd8 0000000000014cc0 ffff88012a755fd8 [ 361.231078] 0000000000014cc0 ffff8801b4afb158 ffff8801b4afb160 ffff8801b4afadc0 [ 361.231080] Call Trace: [ 361.231083] [<ffffffff81586543>] io_schedule+0x73/0xc0 [ 361.231085] [<ffffffff812a7f70>] get_request_wait+0xd0/0x170 [ 361.231087] [<ffffffff812b7056>] ? cfq_find_rq_fmerge+0x66/0x70 [ 361.231090] [<ffffffff81083200>] ? autoremove_wake_function+0x0/0x40 [ 361.231092] [<ffffffff812a20cb>] ? elv_merge+0xfb/0x110 [ 361.231094] [<ffffffff812a809c>] __make_request+0x8c/0x4b0 [ 361.231097] [<ffffffff812a674e>] generic_make_request+0x20e/0x430 [ 361.231099] [<ffffffff811064ad>] ? mempool_alloc+0x6d/0x140 [ 361.231102] [<ffffffff811513a4>] ? __mem_cgroup_try_charge+0x1f4/0x2b0 [ 361.231104] [<ffffffff812a69ed>] submit_bio+0x7d/0x100 [ 361.231107] [<ffffffff811801a3>] submit_bh+0xf3/0x140 [ 361.231109] [<ffffffff81181f83>] __block_write_full_page+0x1d3/0x350 [ 361.231111] [<ffffffff81181210>] ? end_buffer_async_write+0x0/0x170 [ 361.231113] [<ffffffff811f2910>] ? noalloc_get_block_write+0x0/0x30 [ 361.231116] [<ffffffff81181210>] ? end_buffer_async_write+0x0/0x170 [ 361.231118] [<ffffffff81182426>] block_write_full_page_endio+0x116/0x120 [ 361.231120] [<ffffffff811f2910>] ? noalloc_get_block_write+0x0/0x30 [ 361.231123] [<ffffffff81182445>] block_write_full_page+0x15/0x20 [ 361.231125] [<ffffffff811f2f9b>] ext4_writepage+0x10b/0x230 [ 361.231127] [<ffffffff811ef1ea>] mpage_da_submit_io+0x14a/0x1b0 [ 361.231129] [<ffffffff811f638e>] __mpage_da_writepage+0x16e/0x180 [ 361.231131] [<ffffffff811f6524>] write_cache_pages_da+0x184/0x2a0 [ 361.231134] [<ffffffff811f68cf>] ext4_da_writepages+0x28f/0x580 [ 361.231136] [<ffffffff81122905>] ? unmap_page_range+0x265/0x2c0 [ 361.231139] [<ffffffff8110e051>] do_writepages+0x21/0x40 [ 361.231141] [<ffffffff8110484e>] __filemap_fdatawrite_range+0x5e/0x70 [ 361.231144] [<ffffffff811048ba>] filemap_write_and_wait_range+0x5a/0x80 [ 361.231146] [<ffffffff8117e583>] vfs_fsync_range+0x63/0xa0 [ 361.231148] [<ffffffff8117e62c>] vfs_fsync+0x1c/0x20 [ 361.231151] [<ffffffff8112c2b7>] sys_msync+0x147/0x1c0 [ 361.231153] [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
We have this problem all over our busy Ubuntu 10.04 hosts :(
I'm hitting this too, only when I'm transfering files via USB. The problem is the sync() function, dpkg hangs when doing a sync(). If I call the sync command in another console, the command hangs too. Using Ubuntu Maverick (10.10): Linux 2.6.35-28-generic #49-Ubuntu SMP Tue Mar 1 14:39:03 UTC 2011 x86_64 GNU/Linux
A google search shows a lot of people hitting this bug. (Myself included.) The regression seems to carry all the way back to somewhere around 2.6.18. Presumably it happens mostly on server-grade hardware, and most people on servers are running older versions of linux that come with their stable distributions, which is why it's being noticed more now. I can trigger this bug by simply doing: fio -readonly -name iops -rw=randread -bs=512 -runtime=300 -iodepth 24 -filename /dev/cciss/c0d0 -ioengine libaio -direct=1 (But dpkg installs are known to trigger it too.) This is because almost all of the hewlett packard server management programs will have blocked while fio is running for > 120s, and often, even task 'jbd/cciss!c0d0' is one of the tasks that will block too. (I can make dpkg do it too.) I'm running a bone stock 2.6.38.5 kernel I've compiled myself to ensure it's not any kernel patches provided by the distro (debian squeeze). If there's anything I can do to assist in testing, let me know. This is a server I just purchased to put into service shortly, so there's currently no data on it that I am concerned about losing.
D'oh. Accidentally hit submit too soon, and there were some updates I made to my post (mostly to fix the horrible 'brain dump' text. Before 'This is because' is supposed to have, 'my screen fills with hung task messages'.
Easy to reproduce with 2.6.38.6, too - seems to be even worse than with 2.6.32 from squeeze.
Same for 2.6.39. I'll attach a sysrq dump of the hungs tasks. Interesting is that always when the machine hangs cciss_vol_status (same happens with arrayprobe) is running/hung and seems to produce the blocked access to the cciss device.
Created attachment 59972 [details] sysrq+t output -> stuck cciss_vol_status processes
hmm.. I dont got the problem any more, since I updated to kernel 2.6.36 Currently I use 2.6.38-8-server and also not hit the bug again there... root@s01:~# LANG=C apt-cache policy linux-image-2.6.38-8-server linux-image-2.6.38-8-server: Installed: 2.6.38-8.42~lucid1 Candidate: 2.6.38-8.42~lucid1 Version table: *** 2.6.38-8.42~lucid1 0 500 http://ppa.launchpad.net/kernel-ppa/ppa/ubuntu/ lucid/main Packages 100 /var/lib/dpkg/status possible that it is a config option cause the problem? Can somebody try the same kernel from kernel-ppa?
I saw this bug many times after the upgrade from Debian 5.0 to 6.0 on some AMD64 boxes. I can reproduce it easily. 1) Run a job that does heavy I/O # ssh root@192.168.0.2 "dd if=/dev/sda1" | dd of=/dev/mapper/vg-backup 2) Start many other processes that does heavy I/O also # For example: open 10 shells and start to execute random commands * In few minutes some of this commands will become unresponsive * And if you continue to execute more commands on another shell you will trigger the bug Many people is suffering this bug. There are a lot of people complaining on Internet about this. Some interesting discussion: http://lkml.org/lkml/2010/2/19/240 I am attaching the output of dmesg with sysrq+t when this bug triggers on one of my boxes
Created attachment 61222 [details] Output of /proc/modules and dmesg with sysrq+t
(udate by email because bugzilla is broken) On Wed, 8 Jun 2011 21:38:43 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=18632 I always get a bit worried when I see dm-crypt involved - it doesn't seem terribly well-behaved or reliable. That being said, some of the other reports here indicate that we are hitting a sync() livelock, sometimes fsync(). A pile of patches from Wu Fengguang which could/should address these issues is about to hit the linux-next tree.
In this case the dm-crypt mount was not being used during the test so you can ignore it. The "block stack" of the mountpoints where the I/O was happening (/) is the following: (RAID1 over two SATA disks) -> LVM -> XFS I find very interesting this comment from http://lkml.org/lkml/2010/2/24/2 : """ I guess the different layers doesn't really know that much about each other? I can imagine a few different scenarios where one only wants to do most of the scheduling on the lvm layer, and then wants to keep the queueing to a minimum on the other layers and keep the queue as small as possible there, so it can do the proper re-ordering. Anyone has any thoughts to share on this? I don't have much experience with this when it comes to block devices, I'm a network engineer and I'm trying to use my experience in QoS/packet schedulers in different layers, where for instance when one runs an IP QoS scheduler, one doesn't want a lot of buffering on the underlying ATM layer, because it makes the IP schedulers job much harder. """ Wild question... could be this problem triggered because some queue or lock between the block layers (LVM<->RAID<->[sda|sdb]) ? In the first job I write directly to the LVM->RAID1 queue with dd an on the other jobs I read/write random stuff from the root partition which is XFS->LVM->RAID1
On Thu, Jun 09, 2011 at 11:32:17AM +0800, Wu Fengguang wrote: > On Thu, Jun 09, 2011 at 06:02:41AM +0800, Andrew Morton wrote: > > (udate by email because bugzilla is broken) > > > > On Wed, 8 Jun 2011 21:38:43 GMT > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=18632 > > > > I always get a bit worried when I see dm-crypt involved - it doesn't > > seem terribly well-behaved or reliable. > > > > That being said, some of the other reports here indicate that we are > > hitting a sync() livelock, sometimes fsync(). > > These comments all mention "blocked sync(1) on concurrent heavy writes". > > Comment #6 From Douglas Augusto > Comment #12 From Nathan Grennan > Comment #16 From justincase@yopmail.com > Comment #23 From Carlos Alberto Lopez Perez Oh it's not sync(1) in Carlos' case, but random commands like man/dd as well as task xfssyncd:451 blocked for more than 120 seconds. It looks more like XFS related livelock, because I ran into similar problem in a very simple XFS setup on some plain disk partition. Thanks, Fengguang > Comment #17 From Justin Wheeler is special in that fio is doing > read-only IO. Is it updating the mtime and the mtime writes are in > turn starved by the higher-priority continuous reads? (or at least the > write speed cannot keep up with the new mtime dirties) > > > A pile of patches from > > Wu Fengguang which could/should address these issues is about to hit > > the linux-next tree. > > Some livelock fixes will show up in tomorrow's linux-next tree. As for > now the fixes are available here: > > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback.git fs-writeback > > Thanks, > Fengguang
On Thu, Jun 09, 2011 at 06:02:41AM +0800, Andrew Morton wrote: > (udate by email because bugzilla is broken) > > On Wed, 8 Jun 2011 21:38:43 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=18632 > > I always get a bit worried when I see dm-crypt involved - it doesn't > seem terribly well-behaved or reliable. > > That being said, some of the other reports here indicate that we are > hitting a sync() livelock, sometimes fsync(). These comments all mention "blocked sync(1) on concurrent heavy writes". Comment #6 From Douglas Augusto Comment #12 From Nathan Grennan Comment #16 From justincase@yopmail.com Comment #23 From Carlos Alberto Lopez Perez Comment #17 From Justin Wheeler is special in that fio is doing read-only IO. Is it updating the mtime and the mtime writes are in turn starved by the higher-priority continuous reads? (or at least the write speed cannot keep up with the new mtime dirties) > A pile of patches from > Wu Fengguang which could/should address these issues is about to hit > the linux-next tree. Some livelock fixes will show up in tomorrow's linux-next tree. As for now the fixes are available here: git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback.git fs-writeback Thanks, Fengguang
Reply-To: hch@infradead.org On Thu, Jun 09, 2011 at 11:54:26AM +0800, Wu Fengguang wrote: > Oh it's not sync(1) in Carlos' case, but random commands like man/dd > as well as > > task xfssyncd:451 blocked for more than 120 seconds. > > It looks more like XFS related livelock, because I ran into similar > problem in a very simple XFS setup on some plain disk partition. Care to explain what issues you saw, and with what setup and kernel version? Also usually the task blocked more than 120 seconds display should come with a stacktrace, how does it look like?
On Thu, Jun 09, 2011 at 04:27:18PM +0800, Christoph Hellwig wrote: > On Thu, Jun 09, 2011 at 11:54:26AM +0800, Wu Fengguang wrote: > > Oh it's not sync(1) in Carlos' case, but random commands like man/dd > > as well as > > > > task xfssyncd:451 blocked for more than 120 seconds. > > > > It looks more like XFS related livelock, because I ran into similar > > problem in a very simple XFS setup on some plain disk partition. > > Care to explain what issues you saw, and with what setup and kernel > version? Also usually the task blocked more than 120 seconds display > should come with a stacktrace, how does it look like? I have a sync livelock test script and it sometimes livelocked on XFS even with the livelock fix patches. Ext4 is always OK. [ 3581.181253] sync D ffff8800b6ca15d8 4560 4403 4392 0x00000000 [ 3581.181734] ffff88006f775bc8 0000000000000046 ffff8800b6ca12b8 00000001b6ca1938 [ 3581.182411] ffff88006f774000 00000000001d2e40 00000000001d2e40 ffff8800b6ca1280 [ 3581.183088] 00000000001d2e40 ffff88006f775fd8 00000340af111ef2 00000000001d2e40 [ 3581.183765] Call Trace: [ 3581.184008] [<ffffffff8109be73>] ? lock_release_holdtime+0xa3/0xab [ 3581.184392] [<ffffffff8108cc0d>] ? prepare_to_wait+0x6c/0x79 [ 3581.184756] [<ffffffff8108cc0d>] ? prepare_to_wait+0x6c/0x79 [ 3581.185120] [<ffffffff812ed520>] xfs_ioend_wait+0x87/0x9f [ 3581.185474] [<ffffffff8108c97a>] ? wake_up_bit+0x2a/0x2a [ 3581.185827] [<ffffffff812f742a>] xfs_sync_inode_data+0x92/0x9d [ 3581.186198] [<ffffffff812f76e2>] xfs_inode_ag_walk+0x1a5/0x287 [ 3581.186569] [<ffffffff812f779b>] ? xfs_inode_ag_walk+0x25e/0x287 [ 3581.186946] [<ffffffff812f7398>] ? xfs_sync_worker+0x69/0x69 [ 3581.187311] [<ffffffff812e2354>] ? xfs_perag_get+0x68/0xd0 [ 3581.187669] [<ffffffff81092175>] ? local_clock+0x41/0x5a [ 3581.188020] [<ffffffff8109be73>] ? lock_release_holdtime+0xa3/0xab [ 3581.188403] [<ffffffff812e22ec>] ? xfs_check_sizes+0x160/0x160 [ 3581.188773] [<ffffffff812e2354>] ? xfs_perag_get+0x68/0xd0 [ 3581.189130] [<ffffffff812e236c>] ? xfs_perag_get+0x80/0xd0 [ 3581.189488] [<ffffffff812e22ec>] ? xfs_check_sizes+0x160/0x160 [ 3581.189858] [<ffffffff812f7831>] ? xfs_inode_ag_iterator+0x6d/0x8f [ 3581.190241] [<ffffffff812f7398>] ? xfs_sync_worker+0x69/0x69 [ 3581.190606] [<ffffffff812f780b>] xfs_inode_ag_iterator+0x47/0x8f [ 3581.190982] [<ffffffff811611f5>] ? __sync_filesystem+0x7a/0x7a [ 3581.191352] [<ffffffff812f7877>] xfs_sync_data+0x24/0x43 [ 3581.191703] [<ffffffff812f7911>] xfs_quiesce_data+0x2c/0x88 [ 3581.192065] [<ffffffff812f5556>] xfs_fs_sync_fs+0x21/0x48 [ 3581.192419] [<ffffffff811611e1>] __sync_filesystem+0x66/0x7a [ 3581.192783] [<ffffffff8116120b>] sync_one_sb+0x16/0x18 [ 3581.193128] [<ffffffff8113e3e3>] iterate_supers+0x72/0xce [ 3581.193482] [<ffffffff81161140>] sync_filesystems+0x20/0x22 [ 3581.193842] [<ffffffff8116127e>] sys_sync+0x21/0x33 [ 3581.194177] [<ffffffff819016c2>] system_call_fastpath+0x16/0x1b I just reconfirmed the problem on 3.0-rc2 with/without the livelock fix patches (writeback fixes and cleanups v5 at http://lkml.org/lkml/2011/6/7/569), and find that situation has improved for XFS. It still has much longer sync time than ext4, however won't stuck until dd exits. root@fat /home/wfg# ./sync-livelock.sh 3.0-rc2, xfs: sync time: 20 sync time: 26 sync time: 27 3.0-rc2, ext4: sync time: 4 sync time: 4 sync time: 3 3.0-rc2 with livelock fix patches, xfs: sync time: 18 sync time: 21 sync time: 14 sync time: 20 sync time: 21 Thanks, Fengguang --- $ cat ./sync-livelock.sh #!/bin/sh umount /dev/sda7 mkfs.xfs -f /dev/sda7 # mkfs.ext4 /dev/sda7 # mkfs.btrfs /dev/sda7 mount /dev/sda7 /fs echo $((50<<20)) > /proc/sys/vm/dirty_bytes pid= for i in `seq 10` do dd if=/dev/zero of=/fs/zero-$i bs=1M count=1000 & pid="$pid $!" done sleep 1 tic=$(date +'%s') sync tac=$(date +'%s') echo echo sync time: $((tac-tic)) egrep '(Dirty|Writeback|NFS_Unstable)' /proc/meminfo pidof dd > /dev/null && { kill -9 $pid; echo sync NOT livelocked; }
Reply-To: hch@infradead.org On Thu, Jun 09, 2011 at 05:09:06PM +0800, Wu Fengguang wrote: > I have a sync livelock test script and it sometimes livelocked on XFS > even with the livelock fix patches. Ext4 is always OK. This sounds similar to the cfq issue just posted to lkml as "CFQ: async queue blocks the whole system". Does this happen with non-CFQ I/O schedulers, too? > [ 3581.185120] [<ffffffff812ed520>] xfs_ioend_wait+0x87/0x9f This waits for the I/O completion to actually arrive - something that XFS does correctly in both sync and fsync, but ext4 only does for fsync. It might have some issues in the way it's implemented, I'll look if we can do something. But I suspect cfq delaying async writes too much is definitively going to cause issues for us here.
> > It might have some issues in the way it's implemented, I'll look if > > we can do something. But I suspect cfq delaying async writes too much > > is definitively going to cause issues for us here. > > It's definitely a problem that cfq delays async writes too much. > However in Carlos's report, > > https://bugzilla.kernel.org/attachment.cgi?id=61222 > > there are no sync(1) or fsync running at all. So it may be indicating > a different problem. Sorry, your assumption is still valid: there are several bash doing xfs_read()s which will generate the required sync IO to livelock async writes. Thanks, Fengguang
Reply-To: hch@infradead.org On Thu, Jun 09, 2011 at 08:11:17PM +0800, Wu Fengguang wrote: > > This waits for the I/O completion to actually arrive - something that > > XFS does correctly in both sync and fsync, but ext4 only does for fsync. > > Will it benefit to flush the disk _once_ at the end of sync? > (perhaps it's not be as easy in complex storage setups or whatever) XFS currently does it's own passes over data and metadata in sync. To some extent they are overkill, but things like the iowait that's prominent in your trace actually is needed to catch all in-flight direct I/O. > It's definitely a problem that cfq delays async writes too much. > However in Carlos's report, > > https://bugzilla.kernel.org/attachment.cgi?id=61222 > > there are no sync(1) or fsync running at all. So it may be indicating > a different problem. If you have a heavy read load we still need to write back timestamps, and it kinda sounds like he's hitting that. No guarantee that actuall is the cause, though.
> > It's definitely a problem that cfq delays async writes too much. > > However in Carlos's report, > > > > https://bugzilla.kernel.org/attachment.cgi?id=61222 > > > > there are no sync(1) or fsync running at all. So it may be indicating > > a different problem. > > If you have a heavy read load we still need to write back timestamps, > and it kinda sounds like he's hitting that. No guarantee that actuall > is the cause, though. Right, in the other email I do find the reads. So Carlos, it's worthwhile to try the deadline scheduler, which will at least improve read/write workloads (likely the one you are running) that don't involve sync/fsync. Thanks, Fengguang
Reply-To: hch@infradead.org Here is a patch that removes the XFS-internal writing of data in sync and instead trusts the VFS code to get it right, and also does the ioend_wait under a lock, to protect against new pending I/Os showing up. It passed a single xfsqa run so far, so at least it won't blow up completely. Can you check if that helps with your livelock test case? (And yes, I should go and try it myself. Will do as soon as I'll get a bit more time). Index: xfs/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_super.c 2011-06-09 14:17:06.952532698 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_super.c 2011-06-09 14:17:36.742531190 +0200 @@ -991,7 +991,7 @@ xfs_fs_write_inode( * ->sync_fs call do that for thus, which reduces the number * of synchronous log foces dramatically. */ - xfs_ioend_wait(ip); +// xfs_ioend_wait(ip); xfs_ilock(ip, XFS_ILOCK_SHARED); if (ip->i_update_core) { error = xfs_log_inode(ip); Index: xfs/fs/xfs/linux-2.6/xfs_sync.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c 2011-06-09 14:16:59.575866405 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_sync.c 2011-06-09 14:22:31.675849591 +0200 @@ -215,6 +215,19 @@ xfs_inode_ag_iterator( } STATIC int +xfs_wait_ioend_cb( + struct xfs_inode *ip, + struct xfs_perag *pag, + int flags) +{ + xfs_ilock(ip, XFS_IOLOCK_SHARED); + xfs_ioend_wait(ip); + xfs_iunlock(ip, XFS_IOLOCK_SHARED); + + return 0; +} + +STATIC int xfs_sync_inode_data( struct xfs_inode *ip, struct xfs_perag *pag, @@ -359,14 +372,14 @@ xfs_quiesce_data( { int error, error2 = 0; - /* push non-blocking */ - xfs_sync_data(mp, 0); - xfs_qm_sync(mp, SYNC_TRYLOCK); - - /* push and block till complete */ - xfs_sync_data(mp, SYNC_WAIT); xfs_qm_sync(mp, SYNC_WAIT); + error = xfs_inode_ag_iterator(mp, xfs_wait_ioend_cb, 0); + if (error) + return XFS_ERROR(error); + + xfs_log_force(mp, XFS_LOG_SYNC); + /* write superblock and hoover up shutdown errors */ error = xfs_sync_fsdata(mp);
On Thu, Jun 09, 2011 at 07:02:14PM +0800, Christoph Hellwig wrote: > On Thu, Jun 09, 2011 at 05:09:06PM +0800, Wu Fengguang wrote: > > I have a sync livelock test script and it sometimes livelocked on XFS > > even with the livelock fix patches. Ext4 is always OK. > > This sounds similar to the cfq issue just posted to lkml as > "CFQ: async queue blocks the whole system". I once ran two dd's doing sequential reads and writes in parallel, and find the write dd to be completely blocked (note: I can no longer reproduce it today, on 3.0-rc2). At the time I thought: "Wow, this is good for typical desktop". But yes, it is livelocking async IOs, which is bad. > Does this happen with non-CFQ I/O schedulers, too? Just tried the deadline scheduler, sync times are still long: echo deadline > /sys/block/sda/queue/scheduler sync time: 21 sync time: 22 sync time: 29 Also tried disabling the cfq low latency feature, echo cfq > /sys/block/sda/queue/scheduler echo 0 > /sys/block/sda/queue/iosched/low_latency However the low_latency value seem to have NO much effects in the sync time (and also don't considerably improve async dd write bandwidth at the presence of another parallel dd read). sync time: 19 sync time: 24 sync time: 22 > > [ 3581.185120] [<ffffffff812ed520>] xfs_ioend_wait+0x87/0x9f > > This waits for the I/O completion to actually arrive - something that > XFS does correctly in both sync and fsync, but ext4 only does for fsync. Will it benefit to flush the disk _once_ at the end of sync? (perhaps it's not be as easy in complex storage setups or whatever) > It might have some issues in the way it's implemented, I'll look if > we can do something. But I suspect cfq delaying async writes too much > is definitively going to cause issues for us here. It's definitely a problem that cfq delays async writes too much. However in Carlos's report, https://bugzilla.kernel.org/attachment.cgi?id=61222 there are no sync(1) or fsync running at all. So it may be indicating a different problem. Thanks, Fengguang
Reply-To: hch@infradead.org On Thu, Jun 09, 2011 at 10:12:38PM +0800, Tao Ma wrote: > Just want to say more about the situation here. Actually the flusher is > too much easier to be blocked by the sync requests. And whenever it is > blocked, it takes a quite long time to get back(because several cfq > designs), so do you think we can use WRITE_SYNC for the bdev inodes in > flusher? AFAICS, in most of the cases when a volume is mounted, the > writeback for a bdev inode means the metadata writeback. And they are > very important for a file system and should be written as far as > possible. I ran my test cases with the change, and now the livelock > doesn't show up anymore. It's not a very good guestimate for metadata. A lot of metadata is either stored in directories (often happens for directories) or doesn't use the pagecache writeback functions at all. The major problem here seems to be that async requests simply starve sync requests far too much.
On 06/09/2011 08:43 PM, Wu Fengguang wrote: >>> It's definitely a problem that cfq delays async writes too much. >>> However in Carlos's report, >>> >>> https://bugzilla.kernel.org/attachment.cgi?id=61222 >>> >>> there are no sync(1) or fsync running at all. So it may be indicating >>> a different problem. >> >> If you have a heavy read load we still need to write back timestamps, >> and it kinda sounds like he's hitting that. No guarantee that actuall >> is the cause, though. > > Right, in the other email I do find the reads. > > So Carlos, it's worthwhile to try the deadline scheduler, which will > at least improve read/write workloads (likely the one you are running) > that don't involve sync/fsync. yes, deadline scheduler surely should help in this cases. Regards, Tao
Hi Fengguang and Christoph, On 06/09/2011 08:11 PM, Wu Fengguang wrote: > On Thu, Jun 09, 2011 at 07:02:14PM +0800, Christoph Hellwig wrote: >> On Thu, Jun 09, 2011 at 05:09:06PM +0800, Wu Fengguang wrote: >>> I have a sync livelock test script and it sometimes livelocked on XFS >>> even with the livelock fix patches. Ext4 is always OK. >> >> This sounds similar to the cfq issue just posted to lkml as >> "CFQ: async queue blocks the whole system". Just want to say more about the situation here. Actually the flusher is too much easier to be blocked by the sync requests. And whenever it is blocked, it takes a quite long time to get back(because several cfq designs), so do you think we can use WRITE_SYNC for the bdev inodes in flusher? AFAICS, in most of the cases when a volume is mounted, the writeback for a bdev inode means the metadata writeback. And they are very important for a file system and should be written as far as possible. I ran my test cases with the change, and now the livelock doesn't show up anymore. Regards, Tao > > I once ran two dd's doing sequential reads and writes in parallel, and > find the write dd to be completely blocked (note: I can no longer > reproduce it today, on 3.0-rc2). At the time I thought: "Wow, this is > good for typical desktop". But yes, it is livelocking async IOs, which > is bad. > >> Does this happen with non-CFQ I/O schedulers, too? > > Just tried the deadline scheduler, sync times are still long: > > echo deadline > /sys/block/sda/queue/scheduler > > sync time: 21 > sync time: 22 > sync time: 29 > > Also tried disabling the cfq low latency feature, > > echo cfq > /sys/block/sda/queue/scheduler > echo 0 > /sys/block/sda/queue/iosched/low_latency > > However the low_latency value seem to have NO much effects in the > sync time (and also don't considerably improve async dd write > bandwidth at the presence of another parallel dd read). > > sync time: 19 > sync time: 24 > sync time: 22 > >>> [ 3581.185120] [<ffffffff812ed520>] xfs_ioend_wait+0x87/0x9f >> >> This waits for the I/O completion to actually arrive - something that >> XFS does correctly in both sync and fsync, but ext4 only does for fsync. > > Will it benefit to flush the disk _once_ at the end of sync? > (perhaps it's not be as easy in complex storage setups or whatever) > >> It might have some issues in the way it's implemented, I'll look if >> we can do something. But I suspect cfq delaying async writes too much >> is definitively going to cause issues for us here. > > It's definitely a problem that cfq delays async writes too much. > However in Carlos's report, > > https://bugzilla.kernel.org/attachment.cgi?id=61222 > > there are no sync(1) or fsync running at all. So it may be indicating > a different problem. > > Thanks, > Fengguang
Reply-To: vgoyal@redhat.com On Thu, Jun 09, 2011 at 10:21:33AM -0400, Christoph Hellwig wrote: > On Thu, Jun 09, 2011 at 10:12:38PM +0800, Tao Ma wrote: > > Just want to say more about the situation here. Actually the flusher is > > too much easier to be blocked by the sync requests. And whenever it is > > blocked, it takes a quite long time to get back(because several cfq > > designs), so do you think we can use WRITE_SYNC for the bdev inodes in > > flusher? AFAICS, in most of the cases when a volume is mounted, the > > writeback for a bdev inode means the metadata writeback. And they are > > very important for a file system and should be written as far as > > possible. I ran my test cases with the change, and now the livelock > > doesn't show up anymore. > > It's not a very good guestimate for metadata. A lot of metadata is > either stored in directories (often happens for directories) or doesn't > use the pagecache writeback functions at all. > > The major problem here seems to be that async requests simply starve > sync requests far too much. You mean sync requests starve async requests? It is possible that CFQ can starve async requests for long time in presence of sync reqeusts. If that's the case, all the reported issues should go away with deadline scheduler. As I mentioned in other mail, one commit made the dias very heavily loaded in favor of sync requests. commit f8ae6e3eb8251be32c6e913393d9f8d9e0609489 Author: Shaohua Li <shaohua.li@intel.com> Date: Fri Jan 14 08:41:02 2011 +0100 block cfq: make queue preempt work for queues from different workload I will do some quick tests and try to write a small patch where we can keep track of how many times sync and async workloads have been scheduled and make sure we don't starve async req completely. Thanks Vivek
On 06/09/2011 10:32 PM, Vivek Goyal wrote: > On Thu, Jun 09, 2011 at 10:21:33AM -0400, Christoph Hellwig wrote: >> On Thu, Jun 09, 2011 at 10:12:38PM +0800, Tao Ma wrote: >>> Just want to say more about the situation here. Actually the flusher is >>> too much easier to be blocked by the sync requests. And whenever it is >>> blocked, it takes a quite long time to get back(because several cfq >>> designs), so do you think we can use WRITE_SYNC for the bdev inodes in >>> flusher? AFAICS, in most of the cases when a volume is mounted, the >>> writeback for a bdev inode means the metadata writeback. And they are >>> very important for a file system and should be written as far as >>> possible. I ran my test cases with the change, and now the livelock >>> doesn't show up anymore. >> >> It's not a very good guestimate for metadata. A lot of metadata is >> either stored in directories (often happens for directories) or doesn't >> use the pagecache writeback functions at all. >> >> The major problem here seems to be that async requests simply starve >> sync requests far too much. > > You mean sync requests starve async requests? > > It is possible that CFQ can starve async requests for long time in > presence of sync reqeusts. If that's the case, all the reported issues > should go away with deadline scheduler. > > As I mentioned in other mail, one commit made the dias very heavily > loaded in favor of sync requests. > > commit f8ae6e3eb8251be32c6e913393d9f8d9e0609489 > Author: Shaohua Li <shaohua.li@intel.com> > Date: Fri Jan 14 08:41:02 2011 +0100 > > block cfq: make queue preempt work for queues from different workload > > > I will do some quick tests and try to write a small patch where we can > keep track of how many times sync and async workloads have been scheduled > and make sure we don't starve async req completely. oh, so you mean the patch is the culprit? I will try to revert it to see whether the system works better. Regards, Tao
On Thu, Jun 09, 2011 at 09:23:57PM +0800, Christoph Hellwig wrote: > Here is a patch that removes the XFS-internal writing of data in sync > and instead trusts the VFS code to get it right, and also does the > ioend_wait under a lock, to protect against new pending I/Os showing up. > > It passed a single xfsqa run so far, so at least it won't blow up > completely. > > Can you check if that helps with your livelock test case? (And yes, I > should go and try it myself. Will do as soon as I'll get a bit more > time). My pleasure. It's easier to compare results on the same test bed and I've made it very convenient to test patches :) The patch does improve the sync time. before patch after patch -------------------------------------------------- sync time: 18 sync time: 18 sync time: 21 sync time: 15 sync time: 14 sync time: 14 sync time: 20 sync time: 16 sync time: 21 sync time: 14 avg 18.800 avg 15.400 stddev 2.638 stddev 1.497 I also did extra test runs and collect sync(1) call traces by doing "echo t > /proc/sysrq-trigger" randomly. The ones in XFS are: [ 78.395664] sync D 0000000000000002 5296 3024 3003 0x00000000 [ 78.396162] ffff8800b4229de8 0000000000000046 ffffffff819252fa 0000000000000000 [ 78.396850] ffff8800b4228000 00000000001d1a40 00000000001d1a40 ffff8800aefa22f0 [ 78.397519] 00000000001d1a40 ffff8800b4229fd8 00000000001d1a40 ffff8800b4229fd8 [ 78.398189] Call Trace: [ 78.398431] [<ffffffff819252fa>] ? schedule+0x374/0x992 [ 78.398778] [<ffffffff812e906f>] ? _xfs_log_force+0x1de/0x226 [ 78.399143] [<ffffffff81092be5>] ? local_clock+0x41/0x5a [ 78.399493] [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab [ 78.399875] [<ffffffff812e906f>] ? _xfs_log_force+0x1de/0x226 [ 78.400239] [<ffffffff812e9074>] _xfs_log_force+0x1e3/0x226 [ 78.400599] [<ffffffff81067c10>] ? try_to_wake_up+0x28b/0x28b [ 78.400964] [<ffffffff813052f9>] ? xfs_inode_ag_iterator+0x6d/0x8f [ 78.401347] [<ffffffff81164771>] ? __sync_filesystem+0x7a/0x7a [ 78.401715] [<ffffffff812e90cf>] xfs_log_force+0x18/0x3a [ 78.402064] [<ffffffff813053e8>] xfs_quiesce_data+0x3a/0x96 [ 78.402424] [<ffffffff81302f82>] xfs_fs_sync_fs+0x21/0x48 [ 78.402777] [<ffffffff8116475d>] __sync_filesystem+0x66/0x7a [ 78.403138] [<ffffffff81164787>] sync_one_sb+0x16/0x18 [ 78.403482] [<ffffffff81142d37>] iterate_supers+0x72/0xc8 [ 78.403835] [<ffffffff811646bc>] sync_filesystems+0x20/0x22 [ 78.404198] [<ffffffff811647fa>] sys_sync+0x21/0x33 [ 78.404533] [<ffffffff8192f742>] system_call_fastpath+0x16/0x1b [ 221.106751] sync D 0000000000000007 5296 3136 3116 0x00000000 [ 221.107229] ffff8800aee7bb40 0000000000000046 ffffffff819252fa 0000000000000000 [ 221.107903] ffff8800aee7a000 00000000001d1a40 00000000001d1a40 ffff88009c6a45e0 [ 221.108574] 00000000001d1a40 ffff8800aee7bfd8 00000000001d1a40 ffff8800aee7bfd8 [ 221.109246] Call Trace: [ 221.109486] [<ffffffff819252fa>] ? schedule+0x374/0x992 [ 221.109831] [<ffffffff81928012>] ? rwsem_down_failed_common+0xc7/0x103 [ 221.110225] [<ffffffff81092be5>] ? local_clock+0x41/0x5a [ 221.110574] [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab [ 221.110956] [<ffffffff81928012>] ? rwsem_down_failed_common+0xc7/0x103 [ 221.111352] [<ffffffff8192801e>] rwsem_down_failed_common+0xd3/0x103 [ 221.111738] [<ffffffff81928075>] rwsem_down_read_failed+0x12/0x14 [ 221.112116] [<ffffffff813c81a4>] call_rwsem_down_read_failed+0x14/0x30 [ 221.112513] [<ffffffff812de648>] ? xfs_ilock+0x4d/0xba [ 221.112855] [<ffffffff812de62b>] ? xfs_ilock+0x30/0xba [ 221.113199] [<ffffffff8109153b>] ? down_read_nested+0x74/0x89 [ 221.113563] [<ffffffff812de648>] ? xfs_ilock+0x4d/0xba [ 221.113904] [<ffffffff812de648>] xfs_ilock+0x4d/0xba [ 221.114243] [<ffffffff81304ee0>] xfs_wait_ioend_cb+0x1b/0x36 [ 221.114604] [<ffffffff813051a9>] xfs_inode_ag_walk+0x1a5/0x288 [ 221.114973] [<ffffffff81305263>] ? xfs_inode_ag_walk+0x25f/0x288 [ 221.115348] [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d [ 221.115722] [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab [ 221.116103] [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160 [ 221.116473] [<ffffffff812efbfc>] ? xfs_perag_get+0x68/0xd1 [ 221.116828] [<ffffffff812efc14>] ? xfs_perag_get+0x80/0xd1 [ 221.117185] [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160 [ 221.117553] [<ffffffff813052f9>] ? xfs_inode_ag_iterator+0x6d/0x8f [ 221.117936] [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d [ 221.118312] [<ffffffff813052d3>] xfs_inode_ag_iterator+0x47/0x8f [ 221.118689] [<ffffffff81164771>] ? __sync_filesystem+0x7a/0x7a [ 221.119058] [<ffffffff813053d1>] xfs_quiesce_data+0x23/0x96 [ 221.119417] [<ffffffff81302f82>] xfs_fs_sync_fs+0x21/0x48 [ 221.119771] [<ffffffff8116475d>] __sync_filesystem+0x66/0x7a [ 221.120134] [<ffffffff81164787>] sync_one_sb+0x16/0x18 [ 221.120477] [<ffffffff81142d37>] iterate_supers+0x72/0xc8 [ 221.120828] [<ffffffff811646bc>] sync_filesystems+0x20/0x22 [ 221.121188] [<ffffffff811647fa>] sys_sync+0x21/0x33 [ 349.972430] sync D 0000000000000007 5296 3312 3292 0x00000000 [ 349.972907] ffff88009c77fc08 0000000000000046 ffffffff819252fa 0000000100000070 [ 349.973575] ffff88009c77e000 00000000001d1a40 00000000001d1a40 ffff8800b56b0000 [ 349.974250] 00000000001d1a40 ffff88009c77ffd8 00000000001d1a40 ffff88009c77ffd8 [ 349.974921] Call Trace: [ 349.975159] [<ffffffff819252fa>] ? schedule+0x374/0x992 [ 349.975504] [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab [ 349.975880] [<ffffffff8108d5c5>] ? prepare_to_wait+0x6c/0x79 [ 349.976238] [<ffffffff812fade3>] xfs_ioend_wait+0x87/0x9f [ 349.976588] [<ffffffff8108d332>] ? wake_up_bit+0x2a/0x2a [ 349.976932] [<ffffffff812de648>] ? xfs_ilock+0x4d/0xba [ 349.977272] [<ffffffff81304ee8>] xfs_wait_ioend_cb+0x23/0x36 [ 349.977631] [<ffffffff813051a9>] xfs_inode_ag_walk+0x1a5/0x288 [ 349.977997] [<ffffffff81305263>] ? xfs_inode_ag_walk+0x25f/0x288 [ 349.978371] [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d [ 349.978748] [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab [ 349.979129] [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160 [ 349.979499] [<ffffffff812efbfc>] ? xfs_perag_get+0x68/0xd1 [ 349.979856] [<ffffffff812efc14>] ? xfs_perag_get+0x80/0xd1 [ 349.980211] [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160 [ 349.980582] [<ffffffff813052f9>] ? xfs_inode_ag_iterator+0x6d/0x8f [ 349.980964] [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d [ 349.981338] [<ffffffff813052d3>] xfs_inode_ag_iterator+0x47/0x8f [ 349.981717] [<ffffffff81164771>] ? __sync_filesystem+0x7a/0x7a [ 349.982085] [<ffffffff813053d1>] xfs_quiesce_data+0x23/0x96 [ 349.982443] [<ffffffff81302f82>] xfs_fs_sync_fs+0x21/0x48 [ 349.982798] [<ffffffff8116475d>] __sync_filesystem+0x66/0x7a [ 349.983159] [<ffffffff81164787>] sync_one_sb+0x16/0x18 [ 349.983504] [<ffffffff81142d37>] iterate_supers+0x72/0xc8 [ 349.983857] [<ffffffff811646bc>] sync_filesystems+0x20/0x22 [ 349.984215] [<ffffffff811647fa>] sys_sync+0x21/0x33 [ 349.984552] [<ffffffff8192f742>] system_call_fastpath+0x16/0x1b [ 419.417496] sync D 0000000000000006 5296 3412 3392 0x00000000 [ 419.417972] ffff8800aef6dc08 0000000000000046 ffffffff819252fa 0000000100000070 [ 419.418645] ffff8800aef6c000 00000000001d1a40 00000000001d1a40 ffff8800b704a2f0 [ 419.419317] 00000000001d1a40 ffff8800aef6dfd8 00000000001d1a40 ffff8800aef6dfd8 [ 419.419987] Call Trace: [ 419.420228] [<ffffffff819252fa>] ? schedule+0x374/0x992 [ 419.420576] [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab [ 419.420957] [<ffffffff8108d5c5>] ? prepare_to_wait+0x6c/0x79 [ 419.421321] [<ffffffff812fade3>] xfs_ioend_wait+0x87/0x9f [ 419.421673] [<ffffffff8108d332>] ? wake_up_bit+0x2a/0x2a [ 419.422020] [<ffffffff812de648>] ? xfs_ilock+0x4d/0xba [ 419.422364] [<ffffffff81304ee8>] xfs_wait_ioend_cb+0x23/0x36 [ 419.422724] [<ffffffff813051a9>] xfs_inode_ag_walk+0x1a5/0x288 [ 419.423093] [<ffffffff81305263>] ? xfs_inode_ag_walk+0x25f/0x288 [ 419.423470] [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d [ 419.423844] [<ffffffff8109d329>] ? lock_release_holdtime+0xa3/0xab [ 419.424226] [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160 [ 419.424596] [<ffffffff812efbfc>] ? xfs_perag_get+0x68/0xd1 [ 419.424951] [<ffffffff812efc14>] ? xfs_perag_get+0x80/0xd1 [ 419.425307] [<ffffffff812efb94>] ? xfs_check_sizes+0x160/0x160 [ 419.425676] [<ffffffff813052f9>] ? xfs_inode_ag_iterator+0x6d/0x8f [ 419.426058] [<ffffffff81304ec5>] ? xfs_sync_inode_data+0x9d/0x9d [ 419.426435] [<ffffffff813052d3>] xfs_inode_ag_iterator+0x47/0x8f [ 419.426812] [<ffffffff81164771>] ? __sync_filesystem+0x7a/0x7a [ 419.427180] [<ffffffff813053d1>] xfs_quiesce_data+0x23/0x96 [ 419.427542] [<ffffffff81302f82>] xfs_fs_sync_fs+0x21/0x48 [ 419.427895] [<ffffffff8116475d>] __sync_filesystem+0x66/0x7a [ 419.428258] [<ffffffff81164787>] sync_one_sb+0x16/0x18 [ 419.428601] [<ffffffff81142d37>] iterate_supers+0x72/0xc8 [ 419.428954] [<ffffffff811646bc>] sync_filesystems+0x20/0x22 [ 419.429315] [<ffffffff811647fa>] sys_sync+0x21/0x33 [ 419.429648] [<ffffffff8192f742>] system_call_fastpath+0x16/0x1b Thanks, Fengguang
Reply-To: hch@infradead.org On Fri, Jun 10, 2011 at 11:21:49AM +0800, Wu Fengguang wrote: > > Can you check if that helps with your livelock test case? (And yes, I > > should go and try it myself. Will do as soon as I'll get a bit more > > time). > > My pleasure. It's easier to compare results on the same test bed and > I've made it very convenient to test patches :) Here's one that should be even better, although in it's current form it break a subtile corner case of sync semantics, so it'll need more work if I go down that way: Index: xfs/fs/xfs/linux-2.6/xfs_sync.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c 2011-06-17 14:16:18.442399481 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_sync.c 2011-06-18 17:55:44.864025123 +0200 @@ -359,14 +359,16 @@ xfs_quiesce_data( { int error, error2 = 0; - /* push non-blocking */ - xfs_sync_data(mp, 0); xfs_qm_sync(mp, SYNC_TRYLOCK); - - /* push and block till complete */ - xfs_sync_data(mp, SYNC_WAIT); xfs_qm_sync(mp, SYNC_WAIT); + /* flush all pending size updates and unwritten extent conversions */ + flush_workqueue(xfsconvertd_workqueue); + flush_workqueue(xfsdatad_workqueue); + + /* force out the newly dirtied log buffers */ + xfs_log_force(mp, XFS_LOG_SYNC); + /* write superblock and hoover up shutdown errors */ error = xfs_sync_fsdata(mp); Index: xfs/fs/xfs/linux-2.6/xfs_super.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_super.c 2011-06-18 17:51:05.660705925 +0200 +++ xfs/fs/xfs/linux-2.6/xfs_super.c 2011-06-18 17:52:50.107367305 +0200 @@ -929,45 +929,12 @@ xfs_fs_write_inode( * ->sync_fs call do that for thus, which reduces the number * of synchronous log foces dramatically. */ - xfs_ioend_wait(ip); xfs_ilock(ip, XFS_ILOCK_SHARED); - if (ip->i_update_core) { + if (ip->i_update_core) error = xfs_log_inode(ip); - if (error) - goto out_unlock; - } - } else { - /* - * We make this non-blocking if the inode is contended, return - * EAGAIN to indicate to the caller that they did not succeed. - * This prevents the flush path from blocking on inodes inside - * another operation right now, they get caught later by - * xfs_sync. - */ - if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) - goto out; - - if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip)) - goto out_unlock; - - /* - * Now we have the flush lock and the inode is not pinned, we - * can check if the inode is really clean as we know that - * there are no pending transaction completions, it is not - * waiting on the delayed write queue and there is no IO in - * progress. - */ - if (xfs_inode_clean(ip)) { - xfs_ifunlock(ip); - error = 0; - goto out_unlock; - } - error = xfs_iflush(ip, SYNC_TRYLOCK); + xfs_iunlock(ip, XFS_ILOCK_SHARED); } - out_unlock: - xfs_iunlock(ip, XFS_ILOCK_SHARED); - out: /* * if we failed to write out the inode then mark * it dirty again so we'll try again later.
On Sun, Jun 19, 2011 at 11:56:46PM +0800, Christoph Hellwig wrote: > On Fri, Jun 10, 2011 at 11:21:49AM +0800, Wu Fengguang wrote: > > > Can you check if that helps with your livelock test case? (And yes, I > > > should go and try it myself. Will do as soon as I'll get a bit more > > > time). > > > > My pleasure. It's easier to compare results on the same test bed and > > I've made it very convenient to test patches :) > > Here's one that should be even better, although in it's current form it > break a subtile corner case of sync semantics, so it'll need more work > if I go down that way: The sync(1)s are much faster now, in most invocations it's in the same level as ext4 :) root@fat /home/wfg# ./sync-livelock.sh sync time: 3 sync time: 5 sync time: 5 sync time: 6 sync time: 15 sync time: 5 sync time: 5 sync time: 5 sync time: 4 sync time: 10 sync time: 4 sync time: 4 Thanks, Fengguang > Index: xfs/fs/xfs/linux-2.6/xfs_sync.c > =================================================================== > --- xfs.orig/fs/xfs/linux-2.6/xfs_sync.c 2011-06-17 14:16:18.442399481 > +0200 > +++ xfs/fs/xfs/linux-2.6/xfs_sync.c 2011-06-18 17:55:44.864025123 +0200 > @@ -359,14 +359,16 @@ xfs_quiesce_data( > { > int error, error2 = 0; > > - /* push non-blocking */ > - xfs_sync_data(mp, 0); > xfs_qm_sync(mp, SYNC_TRYLOCK); > - > - /* push and block till complete */ > - xfs_sync_data(mp, SYNC_WAIT); > xfs_qm_sync(mp, SYNC_WAIT); > > + /* flush all pending size updates and unwritten extent conversions */ > + flush_workqueue(xfsconvertd_workqueue); > + flush_workqueue(xfsdatad_workqueue); > + > + /* force out the newly dirtied log buffers */ > + xfs_log_force(mp, XFS_LOG_SYNC); > + > /* write superblock and hoover up shutdown errors */ > error = xfs_sync_fsdata(mp); > > Index: xfs/fs/xfs/linux-2.6/xfs_super.c > =================================================================== > --- xfs.orig/fs/xfs/linux-2.6/xfs_super.c 2011-06-18 17:51:05.660705925 > +0200 > +++ xfs/fs/xfs/linux-2.6/xfs_super.c 2011-06-18 17:52:50.107367305 +0200 > @@ -929,45 +929,12 @@ xfs_fs_write_inode( > * ->sync_fs call do that for thus, which reduces the number > * of synchronous log foces dramatically. > */ > - xfs_ioend_wait(ip); > xfs_ilock(ip, XFS_ILOCK_SHARED); > - if (ip->i_update_core) { > + if (ip->i_update_core) > error = xfs_log_inode(ip); > - if (error) > - goto out_unlock; > - } > - } else { > - /* > - * We make this non-blocking if the inode is contended, return > - * EAGAIN to indicate to the caller that they did not succeed. > - * This prevents the flush path from blocking on inodes inside > - * another operation right now, they get caught later by > - * xfs_sync. > - */ > - if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) > - goto out; > - > - if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip)) > - goto out_unlock; > - > - /* > - * Now we have the flush lock and the inode is not pinned, we > - * can check if the inode is really clean as we know that > - * there are no pending transaction completions, it is not > - * waiting on the delayed write queue and there is no IO in > - * progress. > - */ > - if (xfs_inode_clean(ip)) { > - xfs_ifunlock(ip); > - error = 0; > - goto out_unlock; > - } > - error = xfs_iflush(ip, SYNC_TRYLOCK); > + xfs_iunlock(ip, XFS_ILOCK_SHARED); > } > > - out_unlock: > - xfs_iunlock(ip, XFS_ILOCK_SHARED); > - out: > /* > * if we failed to write out the inode then mark > * it dirty again so we'll try again later.