Bug 119841 - LVM hangs after kernel upgrade
Summary: LVM hangs after kernel upgrade
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: Mike Snitzer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-09 06:06 UTC by filip.havlicek
Modified: 2017-03-21 10:19 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.4.2, 4.4.11, 4.5.5, 4.6.0, 4.6.1, 4.6.2, 4.6.3, 4.7-rc5,4.7,4.8.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
simple scripts to reproduce the problem (952 bytes, text/plain)
2016-06-09 06:06 UTC, filip.havlicek
Details
list of current tasks and their information after hang of dd processes (522.89 KB, text/plain)
2016-06-29 07:39 UTC, filip.havlicek
Details
4.6.3 backport of "block: fix blk_queue_split() resource exhaustion" (15.14 KB, patch)
2016-07-11 20:28 UTC, Mike Snitzer
Details | Diff
kernel messages (109.40 KB, application/gzip)
2016-11-29 02:38 UTC, john
Details

Description filip.havlicek 2016-06-09 06:06:11 UTC
Created attachment 219371 [details]
simple scripts to reproduce the problem

We have problem with LVM after upgrade kernel to version 4.4.2 (from 3.18.12). Processes which use LVM partition are in the state of uninterruptible sleep. This problem occurs while database backup from the remote server is running. Database backup is a simple bash script which uses lvm commands (lvcreate and mount over ssh) and rsync to backup data. LVM hangs after 50-300 backup cycles (sometimes more, sometimes less) with XFS and after much more with ext4 (few thousands of cycles) in the phase of rsync data from the database server to the backup server. There are no error messages in dmesg or somewhere else, LVM just freezes, and we have to reboot the system with signal to /proc/sysrq-trigger.
The same problem occurred with kernels 4.4.11, 4.5.5, 4.6.0, 4.6.1, 4.6.2.

How to reproduce:
  2 servers - database and backup, in our case Debian Wheezy 64bit
  ssh connection without password from the backup to the database server
  run as root

Database server:
  pvcreate /dev/sdaX
  vgcreate lvmtest /dev/sdaX
  lvcreate -L 50G -n data -r 32 lvmtest
  mkfs.xfs /dev/mapper/lvmtest-data
  mkdir /data
  echo "/dev/mapper/lvmtest-data /data xfs defaults,noatime 0 0" >> /etc/fstab
  mount -a
  run script database.sh from attachment in screen or something similar (simulate som activity)

Backup server:
  run script backup.sh from attachment in screen or something similar

After some time (few hundreds of cycles) you will see “dd” processes on the database server in the state of uninterruptible sleep. And it is a problem. It never happened with kernel 3.18.12, but with later kernels (tested on 4.4.2, 4.4.11, 4.5.5, 4.6.0, 4.6.1, 4.6.2) it happened everytime.
Comment 1 Mike Snitzer 2016-06-09 19:24:38 UTC
Is there anything in the kernel log that speaks to errors?
Anything like the snapshot becoming invalidated?

There were some dm-snapshot fixes in the past (e.g. commit 385277bfb57fa) but the kernels you tested should have the fixes (via stable@).
Comment 2 filip.havlicek 2016-06-09 19:36:39 UTC
Unfortunately, no error messages. Just processes in the D state.
Comment 3 filip.havlicek 2016-06-28 08:36:09 UTC
I tested it on 4.6.3 and 4.7-rc5, problem still occurs.
Comment 4 Mike Snitzer 2016-06-28 17:58:41 UTC
(In reply to filip.havlicek from comment #3)
> I tested it on 4.6.3 and 4.7-rc5, problem still occurs.

It'd be best to try bisect'ing to find the point of regression.

Short of that, please at least provide the corresponding stack traces for threads that are hung in the D state.  Once the hang occurs: echo t > /proc/sysrq-trigger

Then either save off relevant parts of dmesg or /var/log/messages.
Comment 5 filip.havlicek 2016-06-29 07:39:16 UTC
Created attachment 221451 [details]
list of current tasks and their information after hang of dd processes

Whole output from 'echo t > /proc/sysrq-trigger' after hang of dd processes is in attachment.
Comment 6 Mike Snitzer 2016-06-29 14:22:27 UTC
(In reply to filip.havlicek from comment #5)
> Created attachment 221451 [details]
> list of current tasks and their information after hang of dd processes
> 
> Whole output from 'echo t > /proc/sysrq-trigger' after hang of dd processes
> is in attachment.

The most relevant stacktraces are:

Jun 29 08:48:23 lvmmongotest kernel: rsync           D ffff880fcd043720     0 23967  23965 0x00000000
Jun 29 08:48:23 lvmmongotest kernel: ffff880fcd043720 ffff88100bf0a1c0 ffff880807746540 ffffffff8134256b
Jun 29 08:48:23 lvmmongotest kernel: 0000000000000000 ffff880fcd043784 ffff880fcd044000 ffffffff00000000
Jun 29 08:48:23 lvmmongotest kernel: ffff88080b9e5600 ffff88080b9e5618 fffffffe00000001 ffff880fcd043740
Jun 29 08:48:23 lvmmongotest kernel: Call Trace:
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134256b>] ? xfs_iext_bno_to_ext+0x8b/0x160
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a02ec4>] schedule+0x34/0x90
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a04dba>] rwsem_down_write_failed+0x1aa/0x2f0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8141e477>] call_rwsem_down_write_failed+0x17/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81115ec0>] ? mempool_alloc_slab+0x10/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a045af>] down_write+0x1f/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff818538b9>] snapshot_map+0x69/0x380
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183d704>] __map_bio+0x44/0x120
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183e0b2>] __split_and_process_bio+0x1e2/0x3f0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183e95f>] dm_make_request+0x6f/0xd0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813ee471>] generic_make_request+0xf1/0x190
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813ee57d>] submit_bio+0x6d/0x130
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811b14de>] mpage_bio_submit+0x2e/0x40
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811b1ec1>] mpage_readpages+0x161/0x190
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134a3a0>] ? xfs_get_blocks_direct+0x20/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134a3a0>] ? xfs_get_blocks_direct+0x20/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8115c224>] ? alloc_pages_current+0x94/0x110
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813491c7>] xfs_vm_readpages+0x47/0xa0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111fb89>] __do_page_cache_readahead+0x179/0x230
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111fd6a>] ondemand_readahead+0x12a/0x280
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183dc76>] ? dm_any_congested+0x36/0x40
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111ff2f>] page_cache_async_readahead+0x6f/0x80
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111524e>] generic_file_read_iter+0x48e/0x7b0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8118cfaa>] ? file_update_time+0x3a/0xf0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81354be7>] xfs_file_read_iter+0x157/0x310
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81174bb2>] __vfs_read+0xa2/0xd0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811764f9>] vfs_read+0x89/0x140
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811765fa>] SyS_read+0x4a/0xb0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a05c1b>] entry_SYSCALL_64_fastpath+0x13/0x8f

...

Jun 29 08:48:23 lvmmongotest kernel: dd              D ffff88100aa9f750     0 23986  15703 0x00000000
Jun 29 08:48:23 lvmmongotest kernel: ffff88100aa9f750 ffff88100bec8000 ffff881006454ec0 0000000000000296
Jun 29 08:48:23 lvmmongotest kernel: 0000000000000000 0000000000000296 ffff88100aaa0000 ffffffff00000000
Jun 29 08:48:23 lvmmongotest kernel: ffff88080b9e5600 ffff88080b9e5618 fffffffe00000001 ffff88100aa9f770
Jun 29 08:48:23 lvmmongotest kernel: Call Trace:
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a02ec4>] schedule+0x34/0x90
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813679b9>] ? xfs_mod_fdblocks+0x49/0x190
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a04dba>] rwsem_down_write_failed+0x1aa/0x2f0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8141e477>] call_rwsem_down_write_failed+0x17/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81115ec0>] ? mempool_alloc_slab+0x10/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a045af>] down_write+0x1f/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81853c39>] __origin_write+0x69/0x210
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811161b6>] ? mempool_alloc+0x66/0x160
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81854515>] do_origin.isra.14+0x45/0x60
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff818549ae>] origin_map+0x5e/0x80
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183d704>] __map_bio+0x44/0x120
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183e0b2>] __split_and_process_bio+0x1e2/0x3f0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183e95f>] dm_make_request+0x6f/0xd0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813ee471>] generic_make_request+0xf1/0x190
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813ee57d>] submit_bio+0x6d/0x130
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349b0e>] xfs_submit_ioend_bio.isra.15+0x2e/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349bff>] xfs_submit_ioend+0xef/0x120
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134aa5a>] xfs_do_writepage+0x4ea/0x550
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111e216>] write_cache_pages+0x1f6/0x4e0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134a570>] ? xfs_aops_discard_page+0x150/0x150
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349c84>] xfs_vm_writepages+0x54/0x80
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111f3f9>] do_writepages+0x19/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81114632>] __filemap_fdatawrite_range+0xb2/0x100
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81114752>] filemap_write_and_wait_range+0x32/0x70
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813545c3>] xfs_file_fsync+0x63/0x1e0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a515d>] vfs_fsync_range+0x3d/0xb0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81176429>] ? vfs_write+0x159/0x1a0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a51e7>] vfs_fsync+0x17/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a5223>] do_fsync+0x33/0x60
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a558b>] SyS_fsync+0xb/0x10
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a05c1b>] entry_SYSCALL_64_fastpath+0x13/0x8f
Jun 29 08:48:23 lvmmongotest kernel: dd              D ffff88100b473750     0 23989  15703 0x00000000
Jun 29 08:48:23 lvmmongotest kernel: ffff88100b473750 ffff88100be87080 ffff881006451680 0000000000000296
Jun 29 08:48:23 lvmmongotest kernel: 0000000000000000 0000000000000296 ffff88100b474000 ffffffff00000000
Jun 29 08:48:23 lvmmongotest kernel: ffff88080b9e5600 ffff88080b9e5618 fffffffe00000001 ffff88100b473770
Jun 29 08:48:23 lvmmongotest kernel: Call Trace:
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a02ec4>] schedule+0x34/0x90
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813679b9>] ? xfs_mod_fdblocks+0x49/0x190
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a04dba>] rwsem_down_write_failed+0x1aa/0x2f0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8141e477>] call_rwsem_down_write_failed+0x17/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81115ec0>] ? mempool_alloc_slab+0x10/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a045af>] down_write+0x1f/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81853c39>] __origin_write+0x69/0x210
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811161b6>] ? mempool_alloc+0x66/0x160
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81854515>] do_origin.isra.14+0x45/0x60
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff818549ae>] origin_map+0x5e/0x80
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183d704>] __map_bio+0x44/0x120
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183e0b2>] __split_and_process_bio+0x1e2/0x3f0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183e95f>] dm_make_request+0x6f/0xd0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813ee471>] generic_make_request+0xf1/0x190
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813ee57d>] submit_bio+0x6d/0x130
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349b0e>] xfs_submit_ioend_bio.isra.15+0x2e/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349bff>] xfs_submit_ioend+0xef/0x120
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134aa5a>] xfs_do_writepage+0x4ea/0x550
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111e216>] write_cache_pages+0x1f6/0x4e0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134a570>] ? xfs_aops_discard_page+0x150/0x150
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349c84>] xfs_vm_writepages+0x54/0x80
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111f3f9>] do_writepages+0x19/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81114632>] __filemap_fdatawrite_range+0xb2/0x100
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81114752>] filemap_write_and_wait_range+0x32/0x70
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813545c3>] xfs_file_fsync+0x63/0x1e0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a515d>] vfs_fsync_range+0x3d/0xb0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81176429>] ? vfs_write+0x159/0x1a0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a51e7>] vfs_fsync+0x17/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a5223>] do_fsync+0x33/0x60
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a558b>] SyS_fsync+0xb/0x10
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a05c1b>] entry_SYSCALL_64_fastpath+0x13/0x8f
Jun 29 08:48:23 lvmmongotest kernel: dd              D ffff880faab7b750     0 23992  15703 0x00000000
Jun 29 08:48:23 lvmmongotest kernel: ffff880faab7b750 ffff88100bec8b40 ffff8810064521c0 0000000000000296
Jun 29 08:48:23 lvmmongotest kernel: 0000000000000000 0000000000000296 ffff880faab7c000 ffffffff00000000
Jun 29 08:48:23 lvmmongotest kernel: ffff88080b9e5600 ffff88080b9e5618 fffffffe00000001 ffff880faab7b770
Jun 29 08:48:23 lvmmongotest kernel: Call Trace:
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a02ec4>] schedule+0x34/0x90
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813679b9>] ? xfs_mod_fdblocks+0x49/0x190
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a04dba>] rwsem_down_write_failed+0x1aa/0x2f0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8141e477>] call_rwsem_down_write_failed+0x17/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81115ec0>] ? mempool_alloc_slab+0x10/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a045af>] down_write+0x1f/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81853c39>] __origin_write+0x69/0x210
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811161b6>] ? mempool_alloc+0x66/0x160
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81854515>] do_origin.isra.14+0x45/0x60
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff818549ae>] origin_map+0x5e/0x80
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183d704>] __map_bio+0x44/0x120
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183e0b2>] __split_and_process_bio+0x1e2/0x3f0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183e95f>] dm_make_request+0x6f/0xd0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813ee471>] generic_make_request+0xf1/0x190
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813ee57d>] submit_bio+0x6d/0x130
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349b0e>] xfs_submit_ioend_bio.isra.15+0x2e/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349bff>] xfs_submit_ioend+0xef/0x120
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134aa5a>] xfs_do_writepage+0x4ea/0x550
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111e216>] write_cache_pages+0x1f6/0x4e0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134a570>] ? xfs_aops_discard_page+0x150/0x150
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349c84>] xfs_vm_writepages+0x54/0x80
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111f3f9>] do_writepages+0x19/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81114632>] __filemap_fdatawrite_range+0xb2/0x100
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81114752>] filemap_write_and_wait_range+0x32/0x70
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813545c3>] xfs_file_fsync+0x63/0x1e0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a515d>] vfs_fsync_range+0x3d/0xb0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81176429>] ? vfs_write+0x159/0x1a0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a51e7>] vfs_fsync+0x17/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a5223>] do_fsync+0x33/0x60
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a558b>] SyS_fsync+0xb/0x10
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a05c1b>] entry_SYSCALL_64_fastpath+0x13/0x8f
Jun 29 08:48:23 lvmmongotest kernel: dd              D ffff88100afab750     0 23995  15703 0x00000000
Jun 29 08:48:23 lvmmongotest kernel: ffff88100afab750 ffff88100be86540 ffff881005853840 ffff88102fffb800
Jun 29 08:48:23 lvmmongotest kernel: 000000000000001f ffff88102fffac00 ffff88100afac000 ffffffff00000000
Jun 29 08:48:23 lvmmongotest kernel: ffff88080b9e5600 ffff88080b9e5618 fffffffe00000001 ffff88100afab770
Jun 29 08:48:23 lvmmongotest kernel: Call Trace:
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a02ec4>] schedule+0x34/0x90
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a04dba>] rwsem_down_write_failed+0x1aa/0x2f0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134256b>] ? xfs_iext_bno_to_ext+0x8b/0x160
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8141e477>] call_rwsem_down_write_failed+0x17/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81115ec0>] ? mempool_alloc_slab+0x10/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a045af>] down_write+0x1f/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81853c39>] __origin_write+0x69/0x210
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811161b6>] ? mempool_alloc+0x66/0x160
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81854515>] do_origin.isra.14+0x45/0x60
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff818549ae>] origin_map+0x5e/0x80
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183d704>] __map_bio+0x44/0x120
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183e0b2>] __split_and_process_bio+0x1e2/0x3f0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8183e95f>] dm_make_request+0x6f/0xd0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813ee471>] generic_make_request+0xf1/0x190
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813ee57d>] submit_bio+0x6d/0x130
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349b0e>] xfs_submit_ioend_bio.isra.15+0x2e/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349bff>] xfs_submit_ioend+0xef/0x120
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134aa5a>] xfs_do_writepage+0x4ea/0x550
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111e216>] write_cache_pages+0x1f6/0x4e0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8134a570>] ? xfs_aops_discard_page+0x150/0x150
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81349c84>] xfs_vm_writepages+0x54/0x80
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff8111f3f9>] do_writepages+0x19/0x30
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81114632>] __filemap_fdatawrite_range+0xb2/0x100
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81114752>] filemap_write_and_wait_range+0x32/0x70
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff813545c3>] xfs_file_fsync+0x63/0x1e0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a515d>] vfs_fsync_range+0x3d/0xb0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81176429>] ? vfs_write+0x159/0x1a0
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a51e7>] vfs_fsync+0x17/0x20
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a5223>] do_fsync+0x33/0x60
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff811a558b>] SyS_fsync+0xb/0x10
Jun 29 08:48:23 lvmmongotest kernel: [<ffffffff81a05c1b>] entry_SYSCALL_64_fastpath+0x13/0x8f

I'll see if Mikulas (cc'd) is willing to take a closer look.
Comment 7 Mikulas Patocka 2016-06-29 15:10:33 UTC
It is an old problem, description and fix is here:
https://www.redhat.com/archives/dm-devel/2014-May/msg00089.html
or here:
https://www.redhat.com/archives/dm-devel/2016-June/msg00399.html

Ask the maintainers of the block layer to fix this bug.
Comment 8 filip.havlicek 2016-07-01 09:40:44 UTC
(In reply to Mikulas Patocka from comment #7)
> It is an old problem, description and fix is here:
> https://www.redhat.com/archives/dm-devel/2014-May/msg00089.html
> or here:
> https://www.redhat.com/archives/dm-devel/2016-June/msg00399.html
> 
> Ask the maintainers of the block layer to fix this bug.

After short test with kernel 4.6.3, it looks that fix helps. Thank you.
Should I change component of this bug from LVM2/DM to Block Layer and ask maintainers if is possible to fix it directly in kernel source?
Comment 9 Mike Snitzer 2016-07-01 15:07:22 UTC
(In reply to filip.havlicek from comment #8)
> (In reply to Mikulas Patocka from comment #7)
> > It is an old problem, description and fix is here:
> > https://www.redhat.com/archives/dm-devel/2014-May/msg00089.html
> > or here:
> > https://www.redhat.com/archives/dm-devel/2016-June/msg00399.html
> > 
> > Ask the maintainers of the block layer to fix this bug.
> 
> After short test with kernel 4.6.3, it looks that fix helps. Thank you.
> Should I change component of this bug from LVM2/DM to Block Layer and ask
> maintainers if is possible to fix it directly in kernel source?

I took care of it.  I've also started discussion with Jens.  See:
https://www.redhat.com/archives/dm-devel/2016-June/msg00425.html
Comment 10 filip.havlicek 2016-07-11 07:41:36 UTC
After one week of testing kernel 4.6.3 with fix applied, it's still OK.
Comment 11 Mike Snitzer 2016-07-11 20:05:26 UTC
(In reply to filip.havlicek from comment #10)
> After one week of testing kernel 4.6.3 with fix applied, it's still OK.

We won't be pursuing that fix (from comment#7) because it is considered too gross an implementation (touches cpu scheduler hooks and later variants add the use of a timer).

But I'll backport this candidate patch to the 4.6.3 kernel for you and attach it to this BZ: https://patchwork.kernel.org/patch/9223697/
Comment 12 Mike Snitzer 2016-07-11 20:28:38 UTC
Created attachment 222881 [details]
4.6.3 backport of "block: fix blk_queue_split() resource exhaustion"

This patch wasn't even compile tested.. but please remove your previous patch(es) from comment#7 and apply this.  Then compile and test to see if it also fixes your problem.

Thanks for your help!
Comment 13 filip.havlicek 2016-07-12 06:51:56 UTC
(In reply to Mike Snitzer from comment #12)
> Created attachment 222881 [details]
> 4.6.3 backport of "block: fix blk_queue_split() resource exhaustion"
> 
> This patch wasn't even compile tested.. but please remove your previous
> patch(es) from comment#7 and apply this.  Then compile and test to see if it
> also fixes your problem.
> 
> Thanks for your help!

Thank you too.
Kernel 4.6.3 with patch is compiled. The test is in progress. I will let you know the result.
Comment 14 filip.havlicek 2016-07-13 06:04:09 UTC
(In reply to Mike Snitzer from comment #12)
> Created attachment 222881 [details]
> 4.6.3 backport of "block: fix blk_queue_split() resource exhaustion"
> 
> This patch wasn't even compile tested.. but please remove your previous
> patch(es) from comment#7 and apply this.  Then compile and test to see if it
> also fixes your problem.
> 
> Thanks for your help!

Unfortunately this patch didn't fix our problem. I tried it twice, but both tests failed.
Comment 15 Mike Snitzer 2016-07-13 11:19:06 UTC
(In reply to filip.havlicek from comment #14)
> (In reply to Mike Snitzer from comment #12)
> > Created attachment 222881 [details]
> > 4.6.3 backport of "block: fix blk_queue_split() resource exhaustion"
> > 
> > This patch wasn't even compile tested.. but please remove your previous
> > patch(es) from comment#7 and apply this.  Then compile and test to see if
> it
> > also fixes your problem.
> > 
> > Thanks for your help!
> 
> Unfortunately this patch didn't fix our problem. I tried it twice, but both
> tests failed.

Damn.. certainly unfortunate news.
Comment 16 Mike Snitzer 2016-07-14 18:17:18 UTC
(In reply to Mike Snitzer from comment #15)
> (In reply to filip.havlicek from comment #14)
> > (In reply to Mike Snitzer from comment #12)
> > > Created attachment 222881 [details]
> > > 4.6.3 backport of "block: fix blk_queue_split() resource exhaustion"
> > > 
> > > This patch wasn't even compile tested.. but please remove your previous
> > > patch(es) from comment#7 and apply this.  Then compile and test to see if
> it
> > > also fixes your problem.
> > > 
> > > Thanks for your help!
> > 
> > Unfortunately this patch didn't fix our problem. I tried it twice, but both
> > tests failed.
> 
> Damn.. certainly unfortunate news.

I tried to reproduce using variants of the provided test scripts but unfortunately couldn't (using linux-dm.git's 'for-next' -- which is effectively Linux block and DM for the upcoming 4.8 merge window).

Here are the modified scripts I used (and ran locally in 2 different shells on a single server):

#database.sh
#!/bin/bash

set -xv

#vg=spindle
vg=thin
lvmname=50G
lvmdev=/dev/$vg/$lvmname
snapname=lvmsnapshot
snapdev=/dev/$vg/$snapname
snapmount=/mnt/$snapname
snapdatadir=$snapmount/data

lvcreate -L 50G -n $lvmname -r 32 $vg
mkfs.xfs $lvmdev
mkdir /data
mount -o defaults,noatime $lvmdev /data

dir=/data/data
[ -d "$dir" ] || mkdir $dir
while true; do
    for i in $(seq 0 3); do
        rm $dir/lvmfile${i}old
        mv $dir/lvmfile$i $dir/lvmfile${i}old
        #dd if=/dev/urandom of=$dir/lvmfile$i bs=1M count=300 conv=fsync &
        dd if=/dev/zero of=$dir/lvmfile$i bs=1M count=300 conv=fsync &
    done
    wait
done

#backup.sh
#!/bin/bash

set -xv

ssh="bash -c"
#vg=spindle
vg=thin
lvmname=50G
lvmdev=/dev/$vg/$lvmname
snapname=lvmsnapshot
snapdev=/dev/$vg/$snapname
snapmount=/mnt/$snapname
snapdatadir=$snapmount/data

backupdir=/mnt/dax

#[ -d "$backupdir" ] || mkdir -p $backupdir
echo > counter
$ssh "[ -d "$snapmount" ] || mkdir $snapmount"
while true; do
    date | tee -a counter
    $ssh "lvcreate -s $lvmdev -n $snapname --size 20G"
    $ssh "mount -o ro,nouuid $snapdev $snapmount"
    #rsync -av lvmtestserver:$snapdatadir/ /data/tmp/
    rsync -av $snapdatadir/ $backupdir
    $ssh "umount $snapmount"
    $ssh "lvremove -f $snapdev"
    rm $backupdir/*
    sleep 10
    echo
done
Comment 17 filip.havlicek 2016-07-21 05:31:31 UTC
(In reply to Mike Snitzer from comment #16)
> (In reply to Mike Snitzer from comment #15)
> > (In reply to filip.havlicek from comment #14)
> > > (In reply to Mike Snitzer from comment #12)
> > > > Created attachment 222881 [details]
> > > > 4.6.3 backport of "block: fix blk_queue_split() resource exhaustion"
> > > > 
> > > > This patch wasn't even compile tested.. but please remove your previous
> > > > patch(es) from comment#7 and apply this.  Then compile and test to see
> if it
> > > > also fixes your problem.
> > > > 
> > > > Thanks for your help!
> > > 
> > > Unfortunately this patch didn't fix our problem. I tried it twice, but
> both
> > > tests failed.
> > 
> > Damn.. certainly unfortunate news.
> 
> I tried to reproduce using variants of the provided test scripts but
> unfortunately couldn't (using linux-dm.git's 'for-next' -- which is
> effectively Linux block and DM for the upcoming 4.8 merge window).
> 
> Here are the modified scripts I used (and ran locally in 2 different shells
> on a single server):
> 
> #database.sh
> #!/bin/bash
> 
> set -xv
> 
> #vg=spindle
> vg=thin
> lvmname=50G
> lvmdev=/dev/$vg/$lvmname
> snapname=lvmsnapshot
> snapdev=/dev/$vg/$snapname
> snapmount=/mnt/$snapname
> snapdatadir=$snapmount/data
> 
> lvcreate -L 50G -n $lvmname -r 32 $vg
> mkfs.xfs $lvmdev
> mkdir /data
> mount -o defaults,noatime $lvmdev /data
> 
> dir=/data/data
> [ -d "$dir" ] || mkdir $dir
> while true; do
>     for i in $(seq 0 3); do
>         rm $dir/lvmfile${i}old
>         mv $dir/lvmfile$i $dir/lvmfile${i}old
>         #dd if=/dev/urandom of=$dir/lvmfile$i bs=1M count=300 conv=fsync &
>         dd if=/dev/zero of=$dir/lvmfile$i bs=1M count=300 conv=fsync &
>     done
>     wait
> done
> 
> #backup.sh
> #!/bin/bash
> 
> set -xv
> 
> ssh="bash -c"
> #vg=spindle
> vg=thin
> lvmname=50G
> lvmdev=/dev/$vg/$lvmname
> snapname=lvmsnapshot
> snapdev=/dev/$vg/$snapname
> snapmount=/mnt/$snapname
> snapdatadir=$snapmount/data
> 
> backupdir=/mnt/dax
> 
> #[ -d "$backupdir" ] || mkdir -p $backupdir
> echo > counter
> $ssh "[ -d "$snapmount" ] || mkdir $snapmount"
> while true; do
>     date | tee -a counter
>     $ssh "lvcreate -s $lvmdev -n $snapname --size 20G"
>     $ssh "mount -o ro,nouuid $snapdev $snapmount"
>     #rsync -av lvmtestserver:$snapdatadir/ /data/tmp/
>     rsync -av $snapdatadir/ $backupdir
>     $ssh "umount $snapmount"
>     $ssh "lvremove -f $snapdev"
>     rm $backupdir/*
>     sleep 10
>     echo
> done

Sorry for late answer.
You should run it on 2 servers with ssh. I tried it locally too and problem didn't occur. In production environment we use ssh from backup server to db server.
Comment 18 filip.havlicek 2016-07-29 12:31:36 UTC
Tested on last mainline kernel 4.7 - still the same problem.
Comment 19 Tomas Mozes 2016-10-18 06:36:58 UTC
Filip, do you still have this problem? We also have a problem with system hangs and maybe these are related, so I'm curious to know if anything changed since you reported.
Comment 20 filip.havlicek 2016-10-18 06:59:46 UTC
(In reply to Tomas Mozes from comment #19)
> Filip, do you still have this problem? We also have a problem with system
> hangs and maybe these are related, so I'm curious to know if anything
> changed since you reported.

Yes, we still have this problem (with LVM backup tools - ZRM and own scripts for mongodb backup).
We will test last stable kernels and if problem occurs with them, then we will have to use the patch from comment#7 (which helped, but they don't want it as a part of kernel comment#11).
Comment 21 filip.havlicek 2016-11-11 13:41:05 UTC
It's much worse with 4.8.7. Deadlocks occurs after a few backup cycles.
Comment 22 john 2016-11-29 02:37:02 UTC
also affected by this bug on 4.4.24 (ubuntu 4.4.0-47.68~14.04.1)

filip have you tested the rejected patch ?https://patchwork.kernel.org/patch/7398411/
Comment 23 john 2016-11-29 02:38:35 UTC
Created attachment 246131 [details]
kernel messages
Comment 24 filip.havlicek 2016-11-29 08:07:21 UTC
(In reply to john from comment #22)
> also affected by this bug on 4.4.24 (ubuntu 4.4.0-47.68~14.04.1)
> 
> filip have you tested the rejected patch
> ?https://patchwork.kernel.org/patch/7398411/

I tested this patch: https://www.redhat.com/archives/dm-devel/2016-June/msg00399.html
It's almost the same.
I tested it on 4.4.33 and 4.8.8. Both works.

Now we have 4.8.8 with this patch on one production server. After a bit of additional testing, we will deploy 4.4.X or 4.8.X kernel with this patch to more servers.
Comment 25 john 2016-11-29 10:27:48 UTC
Did you just apply the patch 1/3
https://www.redhat.com/archives/dm-devel/2016-June/msg00399.html

or also 2/3 and 3/3
https://www.redhat.com/archives/dm-devel/2016-June/msg00400.html
https://www.redhat.com/archives/dm-devel/2016-June/msg00401.html

also in the 2nd patch, it only refers to raid1 and raid10. are raid5 and raid6 unaffected?
Comment 26 Mike Snitzer 2016-11-29 19:09:49 UTC
It should be noted that I still have a "v4" of this patchset sitting on this branch:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip

You'll note that it makes use of a timed offload to the workqueue (after 1 sec).  Was meant to address Ming Lei's concerns with reduced plugging due to bio_list flush from a different context (exchange with Ming available in the patchwork patch v3 referenced in comment#22).  BUT Jens disliked this timed offload hack... and that is where this line of work stopped.

But this issue is obviously still impacting users.  We really need to fix block core breakage that was introduced with the initial bio_list rescuer work Kent did rather then to just let users keep hitting this!
Comment 27 john 2016-11-29 23:06:36 UTC
Mike, does the deadlock occur only smp systems ?
Comment 28 filip.havlicek 2016-11-30 07:16:12 UTC
(In reply to john from comment #25)
> Did you just apply the patch 1/3
> https://www.redhat.com/archives/dm-devel/2016-June/msg00399.html
> 
> or also 2/3 and 3/3
> https://www.redhat.com/archives/dm-devel/2016-June/msg00400.html
> https://www.redhat.com/archives/dm-devel/2016-June/msg00401.html
> 
> also in the 2nd patch, it only refers to raid1 and raid10. are raid5 and
> raid6 unaffected?

Just the patch 1/3.

This remains blocking problem for us. It will be great if someone fix this bug in kernel, but until that time we have to deal with it (that's why we have to use the patch from comment#7).
Comment 29 john 2017-01-05 02:47:43 UTC
Hi Jens,

Any progress on this ? It would be great to have lvm snapshot working in the LTS kernel ...
Comment 30 Mike Snitzer 2017-01-05 19:10:58 UTC
(In reply to john from comment #29)
> Hi Jens,
> 
> Any progress on this ? It would be great to have lvm snapshot working in the
> LTS kernel ...

Not speaking for Jens, but any chance you could try this patch that was proposed upstream?: https://marc.info/?l=linux-raid&m=148232453107685&q=p3

(which is from this msg: https://marc.info/?l=linux-raid&m=148232453107685&w=2 )
Comment 31 Mike Snitzer 2017-01-06 18:04:37 UTC
(In reply to Mike Snitzer from comment #30)
> (In reply to john from comment #29)
> > Hi Jens,
> > 
> > Any progress on this ? It would be great to have lvm snapshot working in
> the
> > LTS kernel ...
> 
> Not speaking for Jens, but any chance you could try this patch that was
> proposed upstream?: https://marc.info/?l=linux-raid&m=148232453107685&q=p3
> 
> (which is from this msg:
> https://marc.info/?l=linux-raid&m=148232453107685&w=2 )

Nevermind, based on Mikulas' testing, it unfortunately appears that simple patch doesn't work, see:
https://marc.info/?l=linux-kernel&m=148372148320489&w=2
Comment 32 john 2017-01-08 21:20:53 UTC
Thanks for giving it a try, I'm sure you can appreciate how this regression is an issue in an lts kernel
Comment 33 john 2017-01-25 20:55:49 UTC
Hi Mike,

Do you know if any there's any progress to fix this ?

Thanks _
Comment 34 john 2017-02-23 03:43:21 UTC
Bug 192731 related ?
Comment 35 Mike Snitzer 2017-02-24 16:11:09 UTC
There is a DM-core specific fix that was merged for 4.11 (will be in v4.11-rc1), see:
http://git.kernel.org/linus/d67a5f4b5947aba4bfe9a80a2b86079c215ca755

Please test and report back.
Comment 36 john 2017-02-28 01:55:49 UTC
Thanks Mike, we'll test the backport in 4.4
Comment 37 filip.havlicek 2017-02-28 06:30:04 UTC
Thanks, I will test it during the week and let you know.
Comment 38 filip.havlicek 2017-03-01 08:38:03 UTC
I compiled linux kernel directly from Linus Torvalds tree (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/).
After more than 1000 testing cycles everything is still OK. I think, that problem is solved by this patch.
I will test it one more time on different type of server hardware.
Comment 39 filip.havlicek 2017-03-03 07:26:54 UTC
On the second hardware type was test successful too.
We will wait for release of the kernel 4.11. We will make some tests again, and if it will be successful, we will use this kernel.
Thank you.
Comment 40 Steven Haigh 2017-03-14 01:25:30 UTC
Hi all,

As this seems to have promise, is there a plan / timeframe for inclusion in the 4.4 / 4.9 longterm trees?

It also seems that quite a few Xen users are seeing this as per new bug report:
https://xen.crc.id.au/bugs/view.php?id=81
Comment 41 Mikulas Patocka 2017-03-16 20:41:46 UTC
The patch has been added to 4.4, 4.9 and 4.10 stable branches and it will be in the next release.
Comment 42 filip.havlicek 2017-03-21 10:19:31 UTC
4.4.55 tested and works fine. Thank you. For us, this problem is resolved.

Note You need to log in before you can comment on or make changes to this bug.