Bug 207821
Summary: | NBD hangs indefinitely on stuck requests | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Roman Mamedov (rm+bko) |
Component: | Block Layer | Assignee: | Jens Axboe (axboe) |
Status: | NEW --- | ||
Severity: | normal | CC: | |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.4.32 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Roman Mamedov
2020-05-21 09:55:04 UTC
[3135160.111944] sysrq: Show Blocked State [3135160.112048] task PC stack pid father [3135160.112592] dmcrypt_write/2 D 0 5421 2 0x80004000 [3135160.112596] Call Trace: [3135160.112606] ? __schedule+0x2e4/0x6f0 [3135160.112610] ? wbt_exit+0x30/0x30 [3135160.112612] ? __wbt_done+0x30/0x30 [3135160.112618] ? crypt_iv_lmk_ctr+0xc0/0xc0 [dm_crypt] [3135160.112621] schedule+0x2f/0xa0 [3135160.112623] io_schedule+0x12/0x40 [3135160.112626] rq_qos_wait+0xfa/0x170 [3135160.112629] ? karma_partition+0x230/0x230 [3135160.112631] ? wbt_exit+0x30/0x30 [3135160.112633] wbt_wait+0xaf/0xf0 [3135160.112636] __rq_qos_throttle+0x23/0x30 [3135160.112638] blk_mq_make_request+0x12a/0x5d0 [3135160.112643] ? crypt_iv_lmk_ctr+0xc0/0xc0 [dm_crypt] [3135160.112645] generic_make_request+0xcf/0x310 [3135160.112650] dmcrypt_write+0x145/0x170 [dm_crypt] [3135160.112654] kthread+0x113/0x130 [3135160.112657] ? kthread_park+0x80/0x80 [3135160.112660] ret_from_fork+0x22/0x40 [3135160.112671] btrfs-cleaner D 0 6240 2 0x80004000 [3135160.112674] Call Trace: [3135160.112677] ? __schedule+0x2e4/0x6f0 [3135160.112724] ? __clear_extent_bit+0x2c7/0x430 [btrfs] [3135160.112727] schedule+0x2f/0xa0 [3135160.112730] schedule_preempt_disabled+0xa/0x10 [3135160.112732] __mutex_lock.isra.8+0x2b5/0x4a0 [3135160.112771] btrfs_delete_unused_bgs+0xcf/0x5c0 [btrfs] [3135160.112804] ? __btrfs_btree_balance_dirty+0x60/0x60 [btrfs] [3135160.112838] ? run_delayed_iput_locked+0xab/0xb0 [btrfs] [3135160.112871] ? __btrfs_btree_balance_dirty+0x60/0x60 [btrfs] [3135160.112903] cleaner_kthread+0x118/0x120 [btrfs] [3135160.112906] kthread+0x113/0x130 [3135160.112908] ? kthread_park+0x80/0x80 [3135160.112910] ret_from_fork+0x22/0x40 [3135160.112913] btrfs-transacti D 0 6241 2 0x80004000 [3135160.112915] Call Trace: [3135160.112918] ? __schedule+0x2e4/0x6f0 [3135160.112921] schedule+0x2f/0xa0 [3135160.112923] io_schedule+0x12/0x40 [3135160.112926] wait_on_page_bit+0x15c/0x230 [3135160.112929] ? file_fdatawait_range+0x20/0x20 [3135160.112932] __filemap_fdatawait_range+0x88/0xf0 [3135160.112969] ? __clear_extent_bit+0x2c7/0x430 [btrfs] [3135160.112972] filemap_fdatawait_range+0xe/0x20 [3135160.113005] __btrfs_wait_marked_extents.isra.21+0xce/0x110 [btrfs] [3135160.113039] btrfs_wait_extents+0x15/0x30 [btrfs] [3135160.113073] btrfs_write_and_wait_transaction.isra.25+0x74/0xc0 [btrfs] [3135160.113076] ? __wake_up_common_lock+0x87/0xc0 [3135160.113110] btrfs_commit_transaction+0x71c/0x9b0 [btrfs] [3135160.113113] ? finish_wait+0xb0/0xb0 [3135160.113145] transaction_kthread+0x146/0x180 [btrfs] [3135160.113179] ? btrfs_cleanup_transaction+0x530/0x530 [btrfs] [3135160.113181] kthread+0x113/0x130 [3135160.113183] ? kthread_park+0x80/0x80 [3135160.113185] ret_from_fork+0x22/0x40 [3135160.113188] btrfs D 0 11704 11701 0x00004004 [3135160.113190] Call Trace: [3135160.113193] ? __schedule+0x2e4/0x6f0 [3135160.113196] schedule+0x2f/0xa0 [3135160.113229] wait_for_commit+0x58/0x80 [btrfs] [3135160.113232] ? finish_wait+0xb0/0xb0 [3135160.113265] btrfs_commit_transaction+0x142/0x9b0 [btrfs] [3135160.113304] prepare_to_merge+0x1b0/0x260 [btrfs] [3135160.113340] relocate_block_group+0x1ae/0x630 [btrfs] [3135160.113363] btrfs_relocate_block_group+0x1b2/0x310 [btrfs] [3135160.113383] btrfs_relocate_chunk+0x36/0xa0 [btrfs] [3135160.113403] btrfs_balance+0x7b4/0xef0 [btrfs] [3135160.113405] ? _cond_resched+0x15/0x30 [3135160.113425] btrfs_ioctl_balance+0x2bc/0x370 [btrfs] [3135160.113445] btrfs_ioctl+0x15de/0x2da0 [btrfs] [3135160.113448] ? mem_cgroup_commit_charge+0x74/0x4e0 [3135160.113449] ? mem_cgroup_try_charge+0x7f/0x1b0 [3135160.113450] ? mem_cgroup_throttle_swaprate+0x24/0x14a [3135160.113452] ? __handle_mm_fault+0xadc/0x12c0 [3135160.113454] do_vfs_ioctl+0xb5/0x660 [3135160.113456] ksys_ioctl+0x82/0xa0 [3135160.113457] __x64_sys_ioctl+0x16/0x20 [3135160.113459] do_syscall_64+0x52/0x170 [3135160.113460] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [3135160.113462] RIP: 0033:0x7f00494a8427 [3135160.113465] Code: Bad RIP value. [3135160.113466] RSP: 002b:00007ffcb4982dc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [3135160.113467] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f00494a8427 [3135160.113468] RDX: 00007ffcb4982e68 RSI: 00000000c4009420 RDI: 0000000000000003 [3135160.113468] RBP: 00007ffcb4982e68 R08: 0000000000000003 R09: 0000000000000078 [3135160.113469] R10: fffffffffffffb66 R11: 0000000000000202 R12: 0000000000000003 [3135160.113469] R13: 00007ffcb49848c1 R14: 0000000000000001 R15: 0000000000000000 [3135160.113473] btrfs D 0 23577 23576 0x00000004 [3135160.113474] Call Trace: [3135160.113476] ? __schedule+0x2e4/0x6f0 [3135160.113477] schedule+0x2f/0xa0 [3135160.113496] btrfs_cancel_balance+0xf2/0x170 [btrfs] [3135160.113498] ? finish_wait+0xb0/0xb0 [3135160.113518] btrfs_ioctl+0x290/0x2da0 [btrfs] [3135160.113519] ? mem_cgroup_commit_charge+0x74/0x4e0 [3135160.113521] ? mem_cgroup_try_charge+0x7f/0x1b0 [3135160.113522] ? mem_cgroup_throttle_swaprate+0x24/0x14a [3135160.113523] ? __handle_mm_fault+0xadc/0x12c0 [3135160.113524] do_vfs_ioctl+0xb5/0x660 [3135160.113526] ksys_ioctl+0x82/0xa0 [3135160.113527] __x64_sys_ioctl+0x16/0x20 [3135160.113528] do_syscall_64+0x52/0x170 [3135160.113529] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [3135160.113530] RIP: 0033:0x7f9135b20427 [3135160.113532] Code: Bad RIP value. [3135160.113533] RSP: 002b:00007ffef74c02e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [3135160.113533] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f9135b20427 [3135160.113534] RDX: 0000000000000002 RSI: 0000000040049421 RDI: 0000000000000003 [3135160.113534] RBP: 00007ffef74c08b0 R08: 0000000000000003 R09: 0000000000000078 [3135160.113535] R10: fffffffffffffb66 R11: 0000000000000206 R12: 00007ffef74c18d6 [3135160.113535] R13: 0000000000000001 R14: 000055afd4652105 R15: 0000000000000000 [3135160.113540] sync D 0 6623 1870 0x00004000 [3135160.113541] Call Trace: [3135160.113543] ? __schedule+0x2e4/0x6f0 [3135160.113544] ? __switch_to_asm+0x34/0x70 [3135160.113546] schedule+0x2f/0xa0 [3135160.113547] io_schedule+0x12/0x40 [3135160.113548] wait_on_page_bit+0x15c/0x230 [3135160.113550] ? file_fdatawait_range+0x20/0x20 [3135160.113551] __filemap_fdatawait_range+0x88/0xf0 [3135160.113553] filemap_fdatawait_keep_errors+0x1a/0x40 [3135160.113555] sync_inodes_sb+0x230/0x2d0 [3135160.113557] ? default_file_splice_write+0x20/0x20 [3135160.113558] iterate_supers+0x97/0x100 [3135160.113559] ksys_sync+0x40/0xb0 [3135160.113561] __ia32_sys_sync+0xa/0x10 [3135160.113562] do_syscall_64+0x52/0x170 [3135160.113563] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [3135160.113564] RIP: 0033:0x7f0a1cd69287 [3135160.113565] Code: Bad RIP value. [3135160.113566] RSP: 002b:00007fff6cac4eb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2 [3135160.113567] RAX: ffffffffffffffda RBX: 00007fff6cac4fe8 RCX: 00007f0a1cd69287 [3135160.113567] RDX: 00007f0a1ce36001 RSI: 0000000000000000 RDI: 00007f0a1cdfab3a [3135160.113568] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 [3135160.113568] R10: fffffffffffffbe3 R11: 0000000000000206 R12: 0000000000000001 [3135160.113569] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [3135160.113571] cron-hourly.php D 0 7821 7820 0x00004000 [3135160.113572] Call Trace: [3135160.113574] ? __schedule+0x2e4/0x6f0 [3135160.113575] schedule+0x2f/0xa0 [3135160.113576] io_schedule+0x12/0x40 [3135160.113578] wait_on_page_bit+0x15c/0x230 [3135160.113579] ? file_fdatawait_range+0x20/0x20 [3135160.113581] migrate_pages+0x1cd/0xb70 [3135160.113583] ? isolate_freepages_block+0x380/0x380 [3135160.113584] ? __bpf_trace_mm_compaction_kcompactd_sleep+0x10/0x10 [3135160.113585] compact_zone+0x708/0xcc0 [3135160.113587] compact_node+0x86/0xc0 [3135160.113589] ? cpumask_next+0x16/0x20 [3135160.113590] sysctl_compaction_handler+0x40/0x70 [3135160.113592] proc_sys_call_handler+0x1e2/0x1f0 [3135160.113594] vfs_write+0xb9/0x1c0 [3135160.113595] ksys_write+0x6e/0x100 [3135160.113597] do_syscall_64+0x52/0x170 [3135160.113598] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [3135160.113598] RIP: 0033:0x7fae90dea504 [3135160.113600] Code: Bad RIP value. [3135160.113600] RSP: 002b:00007ffd3d1f0068 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [3135160.113601] RAX: ffffffffffffffda RBX: 00007fae8e4ae300 RCX: 00007fae90dea504 [3135160.113602] RDX: 0000000000000001 RSI: 00007fae8e462bd8 RDI: 0000000000000003 [3135160.113602] RBP: 0000000000000001 R08: 000000000000cff8 R09: 00007fae90e3bb40 [3135160.113603] R10: 79732f636f72702f R11: 0000000000000246 R12: 00007fae8e462bd8 [3135160.113603] R13: 0000000000000000 R14: 00007fae8e462d60 R15: 0000000000000000 [3135160.113609] cron-hourly.php D 0 13721 13720 0x00004000 [3135160.113610] Call Trace: [3135160.113612] ? __schedule+0x2e4/0x6f0 [3135160.113614] schedule+0x2f/0xa0 [3135160.113615] io_schedule+0x12/0x40 [3135160.113616] wait_on_page_bit+0x15c/0x230 [3135160.113618] ? file_fdatawait_range+0x20/0x20 [3135160.113619] migrate_pages+0x1cd/0xb70 [3135160.113620] ? isolate_freepages_block+0x380/0x380 [3135160.113622] ? __bpf_trace_mm_compaction_kcompactd_sleep+0x10/0x10 [3135160.113623] compact_zone+0x708/0xcc0 [3135160.113624] ? check_preempt_wakeup+0x189/0x230 [3135160.113626] compact_node+0x86/0xc0 [3135160.113627] ? cpumask_next+0x16/0x20 [3135160.113628] sysctl_compaction_handler+0x40/0x70 [3135160.113629] proc_sys_call_handler+0x1e2/0x1f0 [3135160.113631] vfs_write+0xb9/0x1c0 [3135160.113632] ksys_write+0x6e/0x100 [3135160.113633] do_syscall_64+0x52/0x170 [3135160.113635] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [3135160.113635] RIP: 0033:0x7fdbcb082504 [3135160.113637] Code: Bad RIP value. [3135160.113637] RSP: 002b:00007ffe4f604e88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [3135160.113638] RAX: ffffffffffffffda RBX: 00007fdbc86ae300 RCX: 00007fdbcb082504 [3135160.113639] RDX: 0000000000000001 RSI: 00007fdbc8662bd8 RDI: 0000000000000003 [3135160.113639] RBP: 0000000000000001 R08: 000000000000cff8 R09: 00007fdbcb0d3b40 [3135160.113640] R10: 79732f636f72702f R11: 0000000000000246 R12: 00007fdbc8662bd8 [3135160.113640] R13: 0000000000000000 R14: 00007fdbc8662d60 R15: 0000000000000000 [3135160.113669] cfdisk D 0 19465 18342 0x00000004 [3135160.113670] Call Trace: [3135160.113672] ? __schedule+0x2e4/0x6f0 [3135160.113673] ? _cond_resched+0x15/0x30 [3135160.113674] ? wbt_exit+0x30/0x30 [3135160.113675] ? __wbt_done+0x30/0x30 [3135160.113676] schedule+0x2f/0xa0 [3135160.113677] io_schedule+0x12/0x40 [3135160.113678] rq_qos_wait+0xfa/0x170 [3135160.113679] ? karma_partition+0x230/0x230 [3135160.113680] ? wbt_exit+0x30/0x30 [3135160.113681] wbt_wait+0xaf/0xf0 [3135160.113683] __rq_qos_throttle+0x23/0x30 [3135160.113684] blk_mq_make_request+0x12a/0x5d0 [3135160.113685] generic_make_request+0xcf/0x310 [3135160.113686] submit_bio+0x4d/0x1e0 [3135160.113688] submit_bio_wait+0x53/0x80 [3135160.113690] blkdev_issue_flush+0x8d/0xc0 [3135160.113692] blkdev_fsync+0x3a/0x50 [3135160.113693] do_fsync+0x43/0x80 [3135160.113694] __x64_sys_fsync+0x10/0x20 [3135160.113695] do_syscall_64+0x52/0x170 [3135160.113696] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [3135160.113697] RIP: 0033:0x7fdb04339214 [3135160.113699] Code: Bad RIP value. [3135160.113699] RSP: 002b:00007ffc9aa33d98 EFLAGS: 00000246 ORIG_RAX: 000000000000004a [3135160.113700] RAX: ffffffffffffffda RBX: 0000000002212490 RCX: 00007fdb04339214 [3135160.113701] RDX: 006e776f6e6b6e55 RSI: 00007fdb04d29b23 RDI: 0000000000000003 [3135160.113701] RBP: 00007fdb04d29b1b R08: 0000000002214060 R09: 0000000000000000 [3135160.113702] R10: fffffffffffffcba R11: 0000000000000246 R12: 0000000002213a80 [3135160.113702] R13: 0000000002212490 R14: 0000000000000000 R15: 0000000000000000 For the record, there is a -t (timeout) option in nbd-client (using the "NBD_SET_TIMEOUT" ioctl), which probably should help in cases like this. But this has to be set beforehand at mount-time, there is no way to adjust it on the fly, at least not without CONFIG_DEBUG_FS. |