Bug 25632
Summary: | I/O requests hanging on LVs on top of software RAID 5+0 | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Travers Carter (tcarter) |
Component: | MD | Assignee: | io_md |
Status: | RESOLVED INSUFFICIENT_DATA | ||
Severity: | high | CC: | alan, blueness, ilya.m, valentyn+bugzilla.kernel.org |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.36 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Travers Carter
2010-12-26 04:45:50 UTC
I should have also added above that there are a number of VMs using raw files on an ext4 filesystem on the VG as well as some using LVs on the same VG directly. Also I first encountered the problem in the RHEL 6 kernel before upgrading to 2.6.36 (see https://bugzilla.redhat.com/show_bug.cgi?id=652086) dmesg output from another instance of the problem: INFO: task jbd2/dm-4-8:1374 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. jbd2/dm-4-8 D 0000000000000000 0 1374 2 0x00000000 ffff88075ed15d20 0000000000000046 ffff88075ed15cd0 ffff88075cb68880 0000000000015c80 ffff88075d560ac0 0000000000015c80 ffff88075d561050 ffff88075ed15fd8 ffff88075d561058 ffff88075ed14010 0000000000015c80 Call Trace: [<ffffffffa01199c4>] jbd2_journal_commit_transaction+0x1c4/0x1480 [jbd2] [<ffffffff8104f402>] ? finish_task_switch+0x42/0xd0 [<ffffffff8100ba6e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff810810c0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa011e9c8>] kjournald2+0xb8/0x220 [jbd2] [<ffffffff810810c0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa011e910>] ? kjournald2+0x0/0x220 [jbd2] [<ffffffff81080a36>] kthread+0x96/0xa0 [<ffffffff8100bec4>] kernel_thread_helper+0x4/0x10 [<ffffffff810809a0>] ? kthread+0x0/0xa0 [<ffffffff8100bec0>] ? kernel_thread_helper+0x0/0x10 INFO: task flush-253:4:1862 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-253:4 D 0000000000000000 0 1862 2 0x00000000 ffff880753bd3980 0000000000000046 ffff88075c5d84b0 0000000000000000 0000000000015c80 ffff88075dd27540 0000000000015c80 ffff88075dd27ad0 ffff880753bd3fd8 ffff88075dd27ad8 ffff880753bd2010 0000000000015c80 Call Trace: [<ffffffffa0117f6c>] start_this_handle+0x1dc/0x520 [jbd2] [<ffffffff810810c0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa01184ab>] jbd2__journal_start+0xbb/0x100 [jbd2] [<ffffffffa0118503>] jbd2_journal_start+0x13/0x20 [jbd2] [<ffffffffa015acb0>] ext4_journal_start_sb+0xf0/0x130 [ext4] [<ffffffff81101456>] ? __pagevec_release+0x26/0x40 [<ffffffffa0145165>] ? ext4_meta_trans_blocks+0x75/0xf0 [ext4] [<ffffffffa0148c38>] ext4_da_writepages+0x268/0x630 [ext4] [<ffffffff810fe480>] ? __writepage+0x0/0x40 [<ffffffff8100ba6e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8100ba6e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff810ff9d1>] do_writepages+0x21/0x40 [<ffffffff8116c3ce>] writeback_single_inode+0x9e/0x2a0 [<ffffffff8116c7ee>] writeback_sb_inodes+0xde/0x180 [<ffffffff8116c92d>] writeback_inodes_wb+0x9d/0x160 [<ffffffff8116cc6b>] wb_writeback+0x27b/0x3f0 [<ffffffff8106f37c>] ? lock_timer_base+0x3c/0x70 [<ffffffff8116cf7d>] wb_do_writeback+0x19d/0x1e0 [<ffffffff8116d072>] bdi_writeback_thread+0xb2/0x270 [<ffffffff8116cfc0>] ? bdi_writeback_thread+0x0/0x270 [<ffffffff8116cfc0>] ? bdi_writeback_thread+0x0/0x270 [<ffffffff81080a36>] kthread+0x96/0xa0 [<ffffffff8100bec4>] kernel_thread_helper+0x4/0x10 [<ffffffff810809a0>] ? kthread+0x0/0xa0 [<ffffffff8100bec0>] ? kernel_thread_helper+0x0/0x10 INFO: task qemu-kvm:14405 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-kvm D 0000000000000000 0 14405 1 0x00000080 ffff880594427958 0000000000000082 ffff880f59f03cc0 ffff880f5b2f64f8 0000000000015c80 ffff88075458a100 0000000000015c80 ffff88075458a690 ffff880594427fd8 ffff88075458a698 ffff880594426010 0000000000015c80 Call Trace: [<ffffffff81173b10>] ? sync_buffer+0x0/0x50 [<ffffffff81494dd3>] io_schedule+0x73/0xc0 [<ffffffff81173b50>] sync_buffer+0x40/0x50 [<ffffffff8149568f>] __wait_on_bit+0x5f/0x90 [<ffffffff81173b10>] ? sync_buffer+0x0/0x50 [<ffffffff81495738>] out_of_line_wait_on_bit+0x78/0x90 [<ffffffff81081100>] ? wake_bit_function+0x0/0x50 [<ffffffff81173b0e>] __wait_on_buffer+0x2e/0x30 [<ffffffff81175b0c>] block_prepare_write+0x3bc/0x5b0 [<ffffffffa0147320>] ? ext4_da_get_block_prep+0x0/0x3e0 [ext4] [<ffffffff81175e17>] __block_write_begin+0x17/0x20 [<ffffffffa014a4c8>] ext4_da_write_begin+0x168/0x280 [ext4] [<ffffffff8113fe78>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70 [<ffffffff810f4e2e>] generic_file_buffered_write+0xfe/0x250 [<ffffffffa0146554>] ? ext4_dirty_inode+0x54/0x60 [ext4] [<ffffffff810f7920>] __generic_file_aio_write+0x240/0x470 [<ffffffff810f7bb5>] generic_file_aio_write+0x65/0xd0 [<ffffffffa013ff1f>] ext4_file_write+0x3f/0xd0 [ext4] [<ffffffff810736ce>] ? send_signal+0x3e/0x90 [<ffffffff81148c8a>] do_sync_write+0xda/0x120 [<ffffffff81073a46>] ? group_send_sig_info+0x56/0x70 [<ffffffff811d9dab>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811d2dff>] ? security_file_permission+0x1f/0x80 [<ffffffff81148f78>] vfs_write+0xc8/0x190 [<ffffffff81149a32>] sys_pwrite64+0x82/0xa0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b INFO: task qemu-kvm:14430 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-kvm D 0000000000000000 0 14430 1 0x00000080 ffff88091abcdcd8 0000000000000082 00000000ffffffea 0000000000000000 0000000000015c80 ffff880f5a288b00 0000000000015c80 ffff880f5a289090 ffff88091abcdfd8 ffff880f5a289098 ffff88091abcc010 0000000000015c80 Call Trace: [<ffffffff81495b32>] __mutex_lock_slowpath+0xf2/0x170 [<ffffffff81495a1b>] mutex_lock+0x2b/0x50 [<ffffffff810f7ba2>] generic_file_aio_write+0x52/0xd0 [<ffffffffa013ff1f>] ext4_file_write+0x3f/0xd0 [ext4] [<ffffffff8105872c>] ? pick_next_task_fair+0xfc/0x130 [<ffffffff81148c8a>] do_sync_write+0xda/0x120 [<ffffffff811d9dab>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811d2dff>] ? security_file_permission+0x1f/0x80 [<ffffffff81148f78>] vfs_write+0xc8/0x190 [<ffffffff81149a32>] sys_pwrite64+0x82/0xa0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b INFO: task qemu-kvm:14455 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-kvm D 0000000000000000 0 14455 1 0x00000080 ffff880cbc0a5cd8 0000000000000082 00000000ffffffea 0000000000000000 0000000000015c80 ffff8807f235ab00 0000000000015c80 ffff8807f235b090 ffff880cbc0a5fd8 ffff8807f235b098 ffff880cbc0a4010 0000000000015c80 Call Trace: [<ffffffff81495b32>] __mutex_lock_slowpath+0xf2/0x170 [<ffffffff810c70a5>] ? call_rcu_sched+0x15/0x20 [<ffffffff81495a1b>] mutex_lock+0x2b/0x50 [<ffffffff810f7ba2>] generic_file_aio_write+0x52/0xd0 [<ffffffffa013ff1f>] ext4_file_write+0x3f/0xd0 [ext4] [<ffffffff8105872c>] ? pick_next_task_fair+0xfc/0x130 [<ffffffff81148c8a>] do_sync_write+0xda/0x120 [<ffffffff811d9dab>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811d2dff>] ? security_file_permission+0x1f/0x80 [<ffffffff81148f78>] vfs_write+0xc8/0x190 [<ffffffff81149a32>] sys_pwrite64+0x82/0xa0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b INFO: task qemu-kvm:14456 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-kvm D 0000000000000000 0 14456 1 0x00000080 ffff8809135f9cd8 0000000000000082 0000000000000005 0000000000000001 0000000000015c80 ffff880f590d4100 0000000000015c80 ffff880f590d4690 ffff8809135f9fd8 ffff880f590d4698 ffff8809135f8010 0000000000015c80 Call Trace: [<ffffffff81057c61>] ? try_to_wake_up+0xe1/0x450 [<ffffffff81495b32>] __mutex_lock_slowpath+0xf2/0x170 [<ffffffff81495a1b>] mutex_lock+0x2b/0x50 [<ffffffff810f7ba2>] generic_file_aio_write+0x52/0xd0 [<ffffffffa013ff1f>] ext4_file_write+0x3f/0xd0 [ext4] [<ffffffff81148c8a>] do_sync_write+0xda/0x120 [<ffffffff810861d4>] ? switch_task_namespaces+0x24/0x60 [<ffffffff811d9dab>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811d2dff>] ? security_file_permission+0x1f/0x80 [<ffffffff81148f78>] vfs_write+0xc8/0x190 [<ffffffff81149a32>] sys_pwrite64+0x82/0xa0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b INFO: task qemu-kvm:14418 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-kvm D 0000000000000000 0 14418 1 0x00000080 ffff8804b2323a18 0000000000000082 ffff8804b2323998 ffffffffa000161d 0000000000015c80 ffff88075e64b4c0 0000000000015c80 ffff88075e64ba50 ffff8804b2323fd8 ffff88075e64ba58 ffff8804b2322010 0000000000015c80 Call Trace: [<ffffffffa000161d>] ? __map_bio+0xad/0x130 [dm_mod] [<ffffffff813963e5>] md_make_request+0x85/0x230 [<ffffffff8104ba8a>] ? update_curr+0x19a/0x1e0 [<ffffffff810810c0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff812013a3>] generic_make_request+0x1e3/0x560 [<ffffffffa000275b>] ? dm_get_live_table+0x4b/0x60 [dm_mod] [<ffffffffa00064b2>] ? linear_merge+0x52/0x60 [dm_mod] [<ffffffff812017a1>] submit_bio+0x81/0x110 [<ffffffff8117b44c>] dio_bio_submit+0xbc/0xc0 [<ffffffff8117c1ca>] __blockdev_direct_IO+0x8ba/0xb70 [<ffffffff81092b74>] ? futex_wait+0x224/0x370 [<ffffffff81179bd7>] blkdev_direct_IO+0x57/0x60 [<ffffffff81178c20>] ? blkdev_get_blocks+0x0/0xc0 [<ffffffff810f6e51>] generic_file_aio_read+0x6c1/0x720 [<ffffffff810922cb>] ? futex_wake+0x10b/0x120 [<ffffffff81148daa>] do_sync_read+0xda/0x120 [<ffffffff811d2e4f>] ? security_file_permission+0x6f/0x80 [<ffffffff81149105>] vfs_read+0xc5/0x190 [<ffffffff81149992>] sys_pread64+0x82/0xa0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b INFO: task qemu-kvm:14407 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-kvm D 0000000000000000 0 14407 1 0x00000080 ffff88059cb53b08 0000000000000086 ffff88059cb53ab8 ffffffffa00047cc 0000000000015c80 ffff88075dd86040 0000000000015c80 ffff88075dd865d0 ffff88059cb53fd8 ffff88075dd865d8 ffff88059cb52010 0000000000015c80 Call Trace: [<ffffffffa00047cc>] ? dm_table_unplug_all+0x5c/0xd0 [dm_mod] [<ffffffff810f56f0>] ? sync_page+0x0/0x50 [<ffffffff81494dd3>] io_schedule+0x73/0xc0 [<ffffffff810f5730>] sync_page+0x40/0x50 [<ffffffff8149568f>] __wait_on_bit+0x5f/0x90 [<ffffffff810f58f3>] wait_on_page_bit+0x73/0x80 [<ffffffff81081100>] ? wake_bit_function+0x0/0x50 [<ffffffff811007f5>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffff810f5cc0>] filemap_fdatawait_range+0x110/0x1a0 [<ffffffff810ff9d1>] ? do_writepages+0x21/0x40 [<ffffffff810f5ddb>] ? __filemap_fdatawrite_range+0x5b/0x60 [<ffffffff810f5e50>] filemap_write_and_wait_range+0x70/0x80 [<ffffffff81170b1a>] vfs_fsync_range+0x5a/0xa0 [<ffffffff81170ba7>] generic_write_sync+0x47/0x50 [<ffffffff810f7bfe>] generic_file_aio_write+0xae/0xd0 [<ffffffffa013ff1f>] ext4_file_write+0x3f/0xd0 [ext4] [<ffffffff810736ce>] ? send_signal+0x3e/0x90 [<ffffffff81148c8a>] do_sync_write+0xda/0x120 [<ffffffff81073a46>] ? group_send_sig_info+0x56/0x70 [<ffffffff811d9dab>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811d2dff>] ? security_file_permission+0x1f/0x80 [<ffffffff81148f78>] vfs_write+0xc8/0x190 [<ffffffff81149a32>] sys_pwrite64+0x82/0xa0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b INFO: task qemu-kvm:14433 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-kvm D 0000000000000000 0 14433 1 0x00000080 ffff8808a39b7b08 0000000000000086 ffff8808a39b7ab8 ffffffffa00047cc 0000000000015c80 ffff880f5b9c9500 0000000000015c80 ffff880f5b9c9a90 ffff8808a39b7fd8 ffff880f5b9c9a98 ffff8808a39b6010 0000000000015c80 Call Trace: [<ffffffffa00047cc>] ? dm_table_unplug_all+0x5c/0xd0 [dm_mod] [<ffffffff810f56f0>] ? sync_page+0x0/0x50 [<ffffffff81494dd3>] io_schedule+0x73/0xc0 [<ffffffff810f5730>] sync_page+0x40/0x50 [<ffffffff8149568f>] __wait_on_bit+0x5f/0x90 [<ffffffff810f58f3>] wait_on_page_bit+0x73/0x80 [<ffffffff81081100>] ? wake_bit_function+0x0/0x50 [<ffffffff811007f5>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffff810f5cc0>] filemap_fdatawait_range+0x110/0x1a0 [<ffffffff810ff9d1>] ? do_writepages+0x21/0x40 [<ffffffff810f5ddb>] ? __filemap_fdatawrite_range+0x5b/0x60 [<ffffffff810f5e50>] filemap_write_and_wait_range+0x70/0x80 [<ffffffff81170b1a>] vfs_fsync_range+0x5a/0xa0 [<ffffffff81170ba7>] generic_write_sync+0x47/0x50 [<ffffffff810f7bfe>] generic_file_aio_write+0xae/0xd0 [<ffffffffa013ff1f>] ext4_file_write+0x3f/0xd0 [ext4] [<ffffffff81148c8a>] do_sync_write+0xda/0x120 [<ffffffff811d9dab>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811d2dff>] ? security_file_permission+0x1f/0x80 [<ffffffff81148f78>] vfs_write+0xc8/0x190 [<ffffffff81149a32>] sys_pwrite64+0x82/0xa0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b INFO: task qemu-kvm:14436 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. qemu-kvm D 0000000000000000 0 14436 1 0x00000080 ffff8809135cfb08 0000000000000086 ffff8809135cfab8 ffffffffa00047cc 0000000000015c80 ffff8807f235b540 0000000000015c80 ffff8807f235bad0 ffff8809135cffd8 ffff8807f235bad8 ffff8809135ce010 0000000000015c80 Call Trace: [<ffffffffa00047cc>] ? dm_table_unplug_all+0x5c/0xd0 [dm_mod] [<ffffffff810f56f0>] ? sync_page+0x0/0x50 [<ffffffff81494dd3>] io_schedule+0x73/0xc0 [<ffffffff810f5730>] sync_page+0x40/0x50 [<ffffffff8149568f>] __wait_on_bit+0x5f/0x90 [<ffffffff810f58f3>] wait_on_page_bit+0x73/0x80 [<ffffffff81081100>] ? wake_bit_function+0x0/0x50 [<ffffffff811007f5>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffff810f5cc0>] filemap_fdatawait_range+0x110/0x1a0 [<ffffffff810ff9d1>] ? do_writepages+0x21/0x40 [<ffffffff810f5ddb>] ? __filemap_fdatawrite_range+0x5b/0x60 [<ffffffff810f5e50>] filemap_write_and_wait_range+0x70/0x80 [<ffffffff81170b1a>] vfs_fsync_range+0x5a/0xa0 [<ffffffff81170ba7>] generic_write_sync+0x47/0x50 [<ffffffff810f7bfe>] generic_file_aio_write+0xae/0xd0 [<ffffffffa013ff1f>] ext4_file_write+0x3f/0xd0 [ext4] [<ffffffff8105872c>] ? pick_next_task_fair+0xfc/0x130 [<ffffffff81148c8a>] do_sync_write+0xda/0x120 [<ffffffff811d9dab>] ? selinux_file_permission+0xfb/0x150 [<ffffffff811d2dff>] ? security_file_permission+0x1f/0x80 [<ffffffff81148f78>] vfs_write+0xc8/0x190 [<ffffffff81149a32>] sys_pwrite64+0x82/0xa0 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b While I'm not 100% sure its the same bug, I think Gentoo is hitting this on 2.6.36.2 and 2.6.37. See http://bugs.gentoo.org/show_bug.cgi?id=350455 Under heavy system loads, when doing a lvm pmove from one pv to another, the system deadlocks. The culprits appear to be jbd and flush on that md. You can see the backtrace obtained by sysrq+s on the gentoo bug above. After moving the VM guests that were using filesystem based disk images on ext4 on top of the LVs directly onto raw LVs just under two weeks ago I have yet to see the problem occur again (it typically occurred at least once a week). So this may actually be an ext4/jbd2 problem or at least an interaction with it rather than a md or LVM issue. Just a note: I'm currently using an older 2.6.32 kernel from Ubuntu with KVM machines running directly on LVs and it *does* hang when I use pvmove. I know 2.6.32 is sort of ancient for a kernel bug (so in a way: please ignore this message), but I wouldn't be entirely sure this is file system related. Please try changing the IO scheduler to "noop" - and re-test. |