Bug 193331 - Hung IO system, tasks blocked with btrfs filesystem and heavy IO
Summary: Hung IO system, tasks blocked with btrfs filesystem and heavy IO
Status: RESOLVED OBSOLETE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-26 21:26 UTC by Cheyenne Wills
Modified: 2018-01-24 07:19 UTC (History)
4 users (show)

See Also:
Kernel Version: multiple: 4.4.41 upto 4.9.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
capture of timeout backtrace messages (69.44 KB, application/octet-stream)
2017-01-26 21:26 UTC, Cheyenne Wills
Details
console backtrace from 4.10.1 system (43.97 KB, text/plain)
2017-03-07 18:20 UTC, Cheyenne Wills
Details
full dmesg of my 4.9.18_p14 RT kernel (155.46 KB, text/plain)
2017-04-10 08:36 UTC, Tomas Thiemel
Details
4.12.12 kernel call trace (13.71 KB, text/plain)
2017-10-24 17:43 UTC, Cheyenne Wills
Details
sysrq l output following 4.12.12 stall (711 bytes, text/plain)
2017-10-24 17:44 UTC, Cheyenne Wills
Details
sysrq t output following 4.12.12 stall (107.20 KB, text/plain)
2017-10-24 17:44 UTC, Cheyenne Wills
Details

Description Cheyenne Wills 2017-01-26 21:26:17 UTC
Created attachment 253251 [details]
capture of timeout backtrace messages

Note: this could be a btrfs filesystem, but in a online discussion, someone suggested starting here.  

It appears that the common function in the hangs is blk_flush_plug_list (in the different times that I can reproduce the problem).

Linux kernel: 4.9.5 (gentoo-sources)
btrfs-progs: 4.5

Xen guest (xen-blkfront) - which uses blk_mq

15 Luns in the filesystem.

Initially had a lockup on a 4.4.21 system during heavy IO (was performing a system backup).  The 120 second blocked task occurred.  I was able to set up a test environment and recreate the problem using the btrfs scrub command.  The problem seems to occur when multiple luns are involved).

I've turned on several "kernel hacking" settings to try to gather more information.

Partial backtrace
===============================
INFO: task btrfs:3268 blocked for more than 120 seconds.
      Tainted: G        W       4.9.5-gentoo #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btrfs           D    0  3268      1 0x00000000
 ffff88045d840000 000000000000dacd ffff8804644381c0 ffff88045612d3c0
 ffff8804663d8218 ffffc90001b7f5d8 ffffffff8178c976 0000000000000246
 ffffe8ff00000002 ffff8804663d8218 0000000000000000 ffff88045612d3c0
Call Trace:
 [<ffffffff8178c976>] ? __schedule+0x2f6/0x830
 [<ffffffff8178cee1>] schedule+0x31/0x80
 [<ffffffff81791156>] schedule_timeout+0x1f6/0x340
 [<ffffffff814368e0>] ? blk_mq_flush_plug_list+0x120/0x130
 [<ffffffff8142b4d9>] ? blk_flush_plug_list+0xc9/0x280
 [<ffffffff8178c645>] io_schedule_timeout+0x65/0xa0
 [<ffffffff81436ff7>] bt_get.isra.13+0x107/0x1a0
 [<ffffffff810c9434>] ? sched_clock_cpu+0x84/0xa0
 [<ffffffff810dd5b0>] ? wake_up_bit+0x30/0x30
 [<ffffffff81437162>] blk_mq_get_tag+0x22/0x90
 [<ffffffff81432edb>] __blk_mq_alloc_request+0x1b/0x220
 [<ffffffff8143341b>] blk_mq_map_request+0x10b/0x1b0
 [<ffffffff81433310>] ? blk_mq_stop_hw_queues+0x40/0x40
 [<ffffffff81435f05>] blk_sq_make_request+0x95/0x280
 [<ffffffff814294fc>] ? blk_queue_enter+0x9c/0x280
 [<ffffffff8142947c>] ? blk_queue_enter+0x1c/0x280
 [<ffffffff814298ab>] generic_make_request+0xcb/0x180
 [<ffffffff814299b7>] submit_bio+0x57/0x110
 [<ffffffff813c3d2d>] scrub_submit+0x2d/0x40
 [<ffffffff813c3f8e>] scrub_add_page_to_rd_bio+0x24e/0x270
 [<ffffffff810e988d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff813c54d7>] scrub_pages+0x1e7/0x400
 [<ffffffff810e988d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff813c2768>] ? scrub_find_csum+0x138/0x150
 [<ffffffff813c6d4b>] scrub_stripe+0x7fb/0x1170
 [<ffffffff810c9434>] ? sched_clock_cpu+0x84/0xa0
 [<ffffffff813c77cd>] scrub_chunk+0x10d/0x160
 [<ffffffff813c7ac4>] scrub_enumerate_chunks+0x2a4/0x640
 [<ffffffff813c99d5>] btrfs_scrub_dev+0x205/0x630
 [<ffffffff810fd7fd>] ? rcu_read_lock_sched_held+0x5d/0x70
 [<ffffffff811b8d03>] ? mnt_want_write_file+0x23/0x50
 [<ffffffff813a2a63>] btrfs_ioctl+0x1b43/0x2700
 [<ffffffff810c9434>] ? sched_clock_cpu+0x84/0xa0
 [<ffffffff810c9347>] ? sched_clock_local+0x17/0x80
 [<ffffffff810e5f52>] ? __lock_is_held+0x52/0x70
 [<ffffffff811a890c>] do_vfs_ioctl+0x8c/0x6a0
 [<ffffffff811b5371>] ? __fget+0x101/0x1f0
 [<ffffffff811b5270>] ? expand_files+0x2a0/0x2a0
 [<ffffffff811a8f5c>] SyS_ioctl+0x3c/0x70
 [<ffffffff817928c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2

Showing all locks held in the system:
3 locks held by kworker/u128:0/6:
 #0:  ("%s-%s""btrfs", name){.+.+.+}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #1:  ((&work->normal_work)){+.+.+.}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #2:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffff813cb2ff>] reada_start_machine_worker+0xaf/0x3a0
3 locks held by kworker/dying/29:
 #0:  (&acct->lock#2){+.+.+.}, at: [<ffffffff8111fffc>] acct_process+0xfc/0x2b0
 #1:  (sb_writers#3){.+.+.+}, at: [<ffffffff8111f7dc>] do_acct_process+0x4bc/0x540
 #2:  (&sb->s_type->i_mutex_key#6){+.+.+.}, at: [<ffffffff8137dfa0>] btrfs_file_write_iter+0x60/0x5f0
3 locks held by kworker/u128:1/33:
 #0:  ("%s-%s""btrfs", name){.+.+.+}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #1:  ((&work->normal_work)){+.+.+.}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #2:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffff813cb2ff>] reada_start_machine_worker+0xaf/0x3a0
3 locks held by kworker/u128:3/127:
 #0:  ("%s-%s""btrfs", name){.+.+.+}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #1:  ((&work->normal_work)){+.+.+.}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #2:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffff813cb2ff>] reada_start_machine_worker+0xaf/0x3a0
2 locks held by khungtaskd/303:
 #0:  (rcu_read_lock){......}, at: [<ffffffff81133d80>] watchdog+0xa0/0x470
 #1:  (tasklist_lock){.+.+..}, at: [<ffffffff810e77cf>] debug_show_all_locks+0x3f/0x1b0
3 locks held by btrfs-transacti/1164:
 #0:  (&fs_info->transaction_kthread_mutex){+.+...}, at: [<ffffffff813609d4>] transaction_kthread+0x54/0x1e0
 #1:  (&fs_info->reloc_mutex){+.+...}, at: [<ffffffff81366172>] btrfs_commit_transaction.part.25+0x352/0xa00
 #2:  (&fs_info->tree_log_mutex){+.+...}, at: [<ffffffff81366203>] btrfs_commit_transaction.part.25+0x3e3/0xa00
1 lock held by btrfs-transacti/1740:
 #0:  (&fs_info->transaction_kthread_mutex){+.+...}, at: [<ffffffff813609d4>] transaction_kthread+0x54/0x1e0
3 locks held by kworker/u128:5/2842:
 #0:  ("%s-%s""btrfs", name){.+.+.+}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #1:  ((&work->normal_work)){+.+.+.}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #2:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffff813cb2ff>] reada_start_machine_worker+0xaf/0x3a0
1 lock held by btrfs/3268:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3269:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3270:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3271:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3272:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3273:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3274:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3275:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3276:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3277:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3278:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3279:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3280:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3281:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3282:
 #0:  (sb_writers#3){.+.+.+}, at: [<ffffffff811b8d03>] mnt_want_write_file+0x23/0x50
1 lock held by btrfs/3283:
 #0:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffff813ca0db>] btrfs_scrub_progress+0x2b/0x150
3 locks held by kworker/u128:6/3284:
 #0:  ("%s-%s""btrfs", name){.+.+.+}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #1:  ((&work->normal_work)){+.+.+.}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #2:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffff813cb2ff>] reada_start_machine_worker+0xaf/0x3a0
3 locks held by kworker/u128:10/3288:
 #0:  ("%s-%s""btrfs", name){.+.+.+}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #1:  ((&work->normal_work)){+.+.+.}, at: [<ffffffff810b4021>] process_one_work+0x141/0x410
 #2:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffff813cb2ff>] reada_start_machine_worker+0xaf/0x3a0
1 lock held by ps/3303:
 #0:  (&acct->lock#2){+.+.+.}, at: [<ffffffff8111fffc>] acct_process+0xfc/0x2b0
1 lock held by grep/3304:
 #0:  (&acct->lock#2){+.+.+.}, at: [<ffffffff8111fffc>] acct_process+0xfc/0x2b0



===============================

# cat /proc/cpuinfo  (4 processors, only showing first)

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU E5-2637 v3 @ 3.50GHz
stepping	: 2
microcode	: 0x36
cpu MHz		: 3500.077
cache size	: 15360 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid xsaveopt
bugs		:
bogomips	: 7000.07
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:


----------
# btrfs fi show
Label: 'ROOT'  uuid: ec1fab3a-3d9c-4d09-89a8-e8ad6b5bf4e4
    Total devices 1 FS bytes used 9.20GiB
    devid    1 size 28.98GiB used 9.78GiB path /dev/xvda3

Label: 'FTP'  uuid: 7ecefe63-37f1-43d7-8bb1-a111d46c43f7
    Total devices 15 FS bytes used 4.04TiB
    devid    2 size 250.00GiB used 247.00GiB path /dev/xvdk
    devid    3 size 250.00GiB used 247.00GiB path /dev/xvdl
    devid    4 size 500.00GiB used 496.03GiB path /dev/xvde
    devid    5 size 500.00GiB used 496.94GiB path /dev/xvdb
    devid    6 size 100.00GiB used 98.00GiB path /dev/xvdd
    devid    7 size 100.00GiB used 97.00GiB path /dev/xvdg
    devid    9 size 250.00GiB used 247.00GiB path /dev/xvdi
    devid   10 size 250.00GiB used 247.00GiB path /dev/xvdj
    devid   11 size 500.00GiB used 496.00GiB path /dev/xvdm
    devid   12 size 250.00GiB used 247.00GiB path /dev/xvdf
    devid   13 size 500.00GiB used 497.00GiB path /dev/xvdc
    devid   14 size 100.00GiB used 97.00GiB path /dev/xvdh
    devid   15 size 250.00GiB used 247.00GiB path /dev/xvdn
    devid   16 size 500.00GiB used 490.03GiB path /dev/xvdo
    devid   17 size 500.00GiB used 452.06GiB path /dev/xvdp


penguint ~ # btrfs fi usage /media/FTP
Overall:
    Device size:           4.69TiB
    Device unallocated:       97.94GiB
    Device missing:          0.00B
    Used:              4.05TiB
    Free (estimated):        647.80GiB  (min: 598.83GiB)
    Data ratio:               1.00
    Metadata ratio:           2.00
    Global reserve:      512.00MiB  (used: 0.00B)

Data,single: Size:4.58TiB, Used:4.04TiB
   /dev/xvdb     494.94GiB
   /dev/xvdc     496.00GiB
   /dev/xvdd      98.00GiB
   /dev/xvde     495.00GiB
   /dev/xvdf     247.00GiB
   /dev/xvdg      97.00GiB
   /dev/xvdh      96.00GiB
   /dev/xvdi     247.00GiB
   /dev/xvdj     247.00GiB
   /dev/xvdk     246.00GiB
   /dev/xvdl     247.00GiB
   /dev/xvdm     496.00GiB
   /dev/xvdn     246.00GiB
   /dev/xvdo     489.00GiB
   /dev/xvdp     444.00GiB

Metadata,RAID1: Size:8.00GiB, Used:6.76GiB
   /dev/xvdb       2.00GiB
   /dev/xvdc    1023.00MiB
   /dev/xvde       1.00GiB
   /dev/xvdh       1.00GiB
   /dev/xvdk       1.00GiB
   /dev/xvdn    1023.00MiB
   /dev/xvdo       1.00GiB
   /dev/xvdp       8.00GiB

System,RAID1: Size:64.00MiB, Used:604.00KiB
   /dev/xvde      32.00MiB
   /dev/xvdo      32.00MiB
   /dev/xvdp      64.00MiB

Unallocated:
   /dev/xvdb       3.06GiB
   /dev/xvdc       3.00GiB
   /dev/xvdd       2.00GiB
   /dev/xvde       3.97GiB
   /dev/xvdf       3.00GiB
   /dev/xvdg       3.00GiB
   /dev/xvdh       3.00GiB
   /dev/xvdi       3.00GiB
   /dev/xvdj       3.00GiB
   /dev/xvdk       3.00GiB
   /dev/xvdl       3.00GiB
   /dev/xvdm       4.00GiB
   /dev/xvdn       3.00GiB
   /dev/xvdo       9.97GiB
   /dev/xvdp      47.94GiB

# uname -a
Linux penguint 4.9.5-gentoo #2 SMP Thu Jan 26 09:40:42 MST 2017 x86_64 Intel(R) Xeon(R) CPU E5-2637 v3 @ 3.50GHz GenuineIntel GNU/Linux

# equery list btrfs-progs
 * Searching for btrfs-progs ...
[I-O] [  ] sys-fs/btrfs-progs-4.9:0/0

====
Attached is the output of the kernel messages following the 120 second timeout.

Please let me know if you need additional information.  I can recreate the problem at will by performing a btrfs scrub against the FTP filesystem.
Comment 1 Cheyenne Wills 2017-02-03 16:06:30 UTC
Additional information.

I dropped back to a 4.0.5-gentoo kernel level and the problem doesn't seem to happen.

Also I had another production lockup (this is a ftp server, and had a high load at time of the lockup).


4.4.21-gentoo kernel
===========
Feb  2 15:18:03  INFO: task python2.7:31587 blocked for more than 120 seconds.
Feb  2 15:18:03        Tainted: G        W       4.4.21-gentoo #2
Feb  2 15:18:03  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 15:18:03  python2.7       D ffff88020e675000     0 31587  30085 0x00000000
Feb  2 15:18:03   ffff88020e342f00 0000000300000000 ffff88007fe44000 ffff88007fe436d0
Feb  2 15:18:03   7fffffffffffffff ffff8800ea3904c8 ffffe8ffff671080 ffff880205ffa178
Feb  2 15:18:03   ffffffff81595ed5 7fffffffffffffff ffffffff81597d29 ffffffff8131803a
Feb  2 15:18:03  Call Trace:
Feb  2 15:18:03   [<ffffffff81595ed5>] ? schedule+0x6c/0x79
Feb  2 15:18:03   [<ffffffff81597d29>] ? schedule_timeout+0x35/0x1b8
Feb  2 15:18:03   [<ffffffff8131803a>] ? blk_mq_flush_plug_list+0xde/0xeb
Feb  2 15:18:03   [<ffffffff8130fc76>] ? blk_flush_plug_list+0xab/0x1c1
Feb  2 15:18:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:18:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:18:03   [<ffffffff813188c1>] ? bt_get+0xe9/0x17e
Feb  2 15:18:03   [<ffffffff810a6192>] ? wait_woken+0x66/0x66
Feb  2 15:18:03   [<ffffffff81318a7f>] ? blk_mq_get_tag+0x2a/0x98
Feb  2 15:18:03   [<ffffffff81315606>] ? __blk_mq_alloc_request+0x10/0x1ba
Feb  2 15:18:03   [<ffffffff81316c3e>] ? blk_mq_map_request.isra.48+0xf2/0x12a
Feb  2 15:18:03   [<ffffffff813177e7>] ? blk_sq_make_request+0x67/0x1fe
Feb  2 15:18:03   [<ffffffff8130eaa9>] ? generic_make_request+0xb4/0x153
Feb  2 15:18:03   [<ffffffff8130ec2c>] ? submit_bio+0xe4/0xed
Feb  2 15:18:03   [<ffffffff8129e47f>] ? btrfs_map_bio+0x22b/0x2a8
Feb  2 15:18:03   [<ffffffff8127c33b>] ? btrfs_submit_bio_hook+0xc0/0x14b
Feb  2 15:18:03   [<ffffffff8127db14>] ? btrfs_direct_IO+0x27a/0x27a
Feb  2 15:18:03   [<ffffffff81291376>] ? submit_one_bio+0x65/0x84
Feb  2 15:18:03   [<ffffffff812960f0>] ? extent_readpages+0x165/0x177
Feb  2 15:18:03   [<ffffffff810efdfc>] ? __do_page_cache_readahead+0x147/0x1e7
Feb  2 15:18:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:18:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:18:03   [<ffffffff8113c66a>] ? __generic_file_splice_read+0x2cf/0x4db
Feb  2 15:18:03   [<ffffffff8113b713>] ? page_cache_pipe_buf_release+0x12/0x12
Feb  2 15:18:03   [<ffffffff810079d1>] ? xen_clocksource_read+0x11/0x12
Feb  2 15:18:03   [<ffffffff8109a1fb>] ? sched_clock_local+0xc/0x6e
Feb  2 15:18:03   [<ffffffff8109ed05>] ? put_prev_entity+0x60/0x59a
Feb  2 15:18:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:18:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:18:03   [<ffffffff8152c0ea>] ? tcp_poll+0x15d/0x1eb
Feb  2 15:18:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:18:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:18:03   [<ffffffff8113bab8>] ? splice_direct_to_actor+0xb7/0x18c
Feb  2 15:18:03   [<ffffffff8113c0bb>] ? iter_file_splice_write+0x2aa/0x2aa
Feb  2 15:18:03   [<ffffffff8113bc0b>] ? do_splice_direct+0x7e/0x99
Feb  2 15:18:03   [<ffffffff8111cbbf>] ? do_sendfile+0x176/0x2d2
Feb  2 15:18:03   [<ffffffff8111d3d1>] ? SyS_sendfile64+0x50/0x7c
Feb  2 15:18:03   [<ffffffff8159886e>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Feb  2 15:20:03  INFO: task python2.7:31587 blocked for more than 120 seconds.
Feb  2 15:20:03        Tainted: G        W       4.4.21-gentoo #2
Feb  2 15:20:03  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 15:20:03  python2.7       D ffff88020e675000     0 31587  30085 0x00000000
Feb  2 15:20:03   ffff88020e342f00 0000000300000000 ffff88007fe44000 ffff88007fe436d0
Feb  2 15:20:03   7fffffffffffffff ffff8800ea3904c8 ffffe8ffff671080 ffff880205ffa178
Feb  2 15:20:03   ffffffff81595ed5 7fffffffffffffff ffffffff81597d29 ffffffff8131803a
Feb  2 15:20:03  Call Trace:
Feb  2 15:20:03   [<ffffffff81595ed5>] ? schedule+0x6c/0x79
Feb  2 15:20:03   [<ffffffff81597d29>] ? schedule_timeout+0x35/0x1b8
Feb  2 15:20:03   [<ffffffff8131803a>] ? blk_mq_flush_plug_list+0xde/0xeb
Feb  2 15:20:03   [<ffffffff8130fc76>] ? blk_flush_plug_list+0xab/0x1c1
Feb  2 15:20:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:20:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:20:03   [<ffffffff813188c1>] ? bt_get+0xe9/0x17e
Feb  2 15:20:03   [<ffffffff810a6192>] ? wait_woken+0x66/0x66
Feb  2 15:20:03   [<ffffffff81318a7f>] ? blk_mq_get_tag+0x2a/0x98
Feb  2 15:20:03   [<ffffffff81315606>] ? __blk_mq_alloc_request+0x10/0x1ba
Feb  2 15:20:03   [<ffffffff81316c3e>] ? blk_mq_map_request.isra.48+0xf2/0x12a
Feb  2 15:20:03   [<ffffffff813177e7>] ? blk_sq_make_request+0x67/0x1fe
Feb  2 15:20:03   [<ffffffff8130eaa9>] ? generic_make_request+0xb4/0x153
Feb  2 15:20:03   [<ffffffff8130ec2c>] ? submit_bio+0xe4/0xed
Feb  2 15:20:03   [<ffffffff8129e47f>] ? btrfs_map_bio+0x22b/0x2a8
Feb  2 15:20:03   [<ffffffff8127c33b>] ? btrfs_submit_bio_hook+0xc0/0x14b
Feb  2 15:20:03   [<ffffffff8127db14>] ? btrfs_direct_IO+0x27a/0x27a
Feb  2 15:20:03   [<ffffffff81291376>] ? submit_one_bio+0x65/0x84
Feb  2 15:20:03   [<ffffffff812960f0>] ? extent_readpages+0x165/0x177
Feb  2 15:20:03   [<ffffffff810efdfc>] ? __do_page_cache_readahead+0x147/0x1e7
Feb  2 15:20:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:20:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:20:03   [<ffffffff8113c66a>] ? __generic_file_splice_read+0x2cf/0x4db
Feb  2 15:20:03   [<ffffffff8113b713>] ? page_cache_pipe_buf_release+0x12/0x12
Feb  2 15:20:03   [<ffffffff810079d1>] ? xen_clocksource_read+0x11/0x12
Feb  2 15:20:03   [<ffffffff8109a1fb>] ? sched_clock_local+0xc/0x6e
Feb  2 15:20:03   [<ffffffff8109ed05>] ? put_prev_entity+0x60/0x59a
Feb  2 15:20:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:20:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:20:03   [<ffffffff8152c0ea>] ? tcp_poll+0x15d/0x1eb
Feb  2 15:20:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:20:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:20:03   [<ffffffff8113bab8>] ? splice_direct_to_actor+0xb7/0x18c
Feb  2 15:20:03   [<ffffffff8113c0bb>] ? iter_file_splice_write+0x2aa/0x2aa
Feb  2 15:20:03   [<ffffffff8113bc0b>] ? do_splice_direct+0x7e/0x99
Feb  2 15:20:03   [<ffffffff8111cbbf>] ? do_sendfile+0x176/0x2d2
Feb  2 15:20:03   [<ffffffff8111d3d1>] ? SyS_sendfile64+0x50/0x7c
Feb  2 15:20:03   [<ffffffff8159886e>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Feb  2 15:22:03  INFO: task python2.7:31587 blocked for more than 120 seconds.
Feb  2 15:22:03        Tainted: G        W       4.4.21-gentoo #2
Feb  2 15:22:03  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 15:22:03  python2.7       D ffff88020e675000     0 31587  30085 0x00000000
Feb  2 15:22:03   ffff88020e342f00 0000000300000000 ffff88007fe44000 ffff88007fe436d0
Feb  2 15:22:03   7fffffffffffffff ffff8800ea3904c8 ffffe8ffff671080 ffff880205ffa178
Feb  2 15:22:03   ffffffff81595ed5 7fffffffffffffff ffffffff81597d29 ffffffff8131803a
Feb  2 15:22:03  Call Trace:
Feb  2 15:22:03   [<ffffffff81595ed5>] ? schedule+0x6c/0x79
Feb  2 15:22:03   [<ffffffff81597d29>] ? schedule_timeout+0x35/0x1b8
Feb  2 15:22:03   [<ffffffff8131803a>] ? blk_mq_flush_plug_list+0xde/0xeb
Feb  2 15:22:03   [<ffffffff8130fc76>] ? blk_flush_plug_list+0xab/0x1c1
Feb  2 15:22:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:22:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:22:03   [<ffffffff813188c1>] ? bt_get+0xe9/0x17e
Feb  2 15:22:03   [<ffffffff810a6192>] ? wait_woken+0x66/0x66
Feb  2 15:22:03   [<ffffffff81318a7f>] ? blk_mq_get_tag+0x2a/0x98
Feb  2 15:22:03   [<ffffffff81315606>] ? __blk_mq_alloc_request+0x10/0x1ba
Feb  2 15:22:03   [<ffffffff81316c3e>] ? blk_mq_map_request.isra.48+0xf2/0x12a
Feb  2 15:22:03   [<ffffffff813177e7>] ? blk_sq_make_request+0x67/0x1fe
Feb  2 15:22:03   [<ffffffff8130eaa9>] ? generic_make_request+0xb4/0x153
Feb  2 15:22:03   [<ffffffff8130ec2c>] ? submit_bio+0xe4/0xed
Feb  2 15:22:03   [<ffffffff8129e47f>] ? btrfs_map_bio+0x22b/0x2a8
Feb  2 15:22:03   [<ffffffff8127c33b>] ? btrfs_submit_bio_hook+0xc0/0x14b
Feb  2 15:22:03   [<ffffffff8127db14>] ? btrfs_direct_IO+0x27a/0x27a
Feb  2 15:22:03   [<ffffffff81291376>] ? submit_one_bio+0x65/0x84
Feb  2 15:22:03   [<ffffffff812960f0>] ? extent_readpages+0x165/0x177
Feb  2 15:22:03   [<ffffffff810efdfc>] ? __do_page_cache_readahead+0x147/0x1e7
Feb  2 15:22:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:22:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:22:03   [<ffffffff8113c66a>] ? __generic_file_splice_read+0x2cf/0x4db
Feb  2 15:22:03   [<ffffffff8113b713>] ? page_cache_pipe_buf_release+0x12/0x12
Feb  2 15:22:03   [<ffffffff810079d1>] ? xen_clocksource_read+0x11/0x12
Feb  2 15:22:03   [<ffffffff8109a1fb>] ? sched_clock_local+0xc/0x6e
Feb  2 15:22:03   [<ffffffff8109ed05>] ? put_prev_entity+0x60/0x59a
Feb  2 15:22:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:22:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:22:03   [<ffffffff8152c0ea>] ? tcp_poll+0x15d/0x1eb
Feb  2 15:22:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:22:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:22:03   [<ffffffff8113bab8>] ? splice_direct_to_actor+0xb7/0x18c
Feb  2 15:22:03   [<ffffffff8113c0bb>] ? iter_file_splice_write+0x2aa/0x2aa
Feb  2 15:22:03   [<ffffffff8113bc0b>] ? do_splice_direct+0x7e/0x99
Feb  2 15:22:03   [<ffffffff8111cbbf>] ? do_sendfile+0x176/0x2d2
Feb  2 15:22:03   [<ffffffff8111d3d1>] ? SyS_sendfile64+0x50/0x7c
Feb  2 15:22:03   [<ffffffff8159886e>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Feb  2 15:24:03  INFO: task python2.7:31587 blocked for more than 120 seconds.
Feb  2 15:24:03        Tainted: G        W       4.4.21-gentoo #2
Feb  2 15:24:03  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 15:24:03  python2.7       D ffff88020e675000     0 31587  30085 0x00000000
Feb  2 15:24:03   ffff88020e342f00 0000000300000000 ffff88007fe44000 ffff88007fe436d0
Feb  2 15:24:03   7fffffffffffffff ffff8800ea3904c8 ffffe8ffff671080 ffff880205ffa178
Feb  2 15:24:03   ffffffff81595ed5 7fffffffffffffff ffffffff81597d29 ffffffff8131803a
Feb  2 15:24:03  Call Trace:
Feb  2 15:24:03   [<ffffffff81595ed5>] ? schedule+0x6c/0x79
Feb  2 15:24:03   [<ffffffff81597d29>] ? schedule_timeout+0x35/0x1b8
Feb  2 15:24:03   [<ffffffff8131803a>] ? blk_mq_flush_plug_list+0xde/0xeb
Feb  2 15:24:03   [<ffffffff8130fc76>] ? blk_flush_plug_list+0xab/0x1c1
Feb  2 15:24:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:24:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:24:03   [<ffffffff813188c1>] ? bt_get+0xe9/0x17e
Feb  2 15:24:03   [<ffffffff810a6192>] ? wait_woken+0x66/0x66
Feb  2 15:24:03   [<ffffffff81318a7f>] ? blk_mq_get_tag+0x2a/0x98
Feb  2 15:24:03   [<ffffffff81315606>] ? __blk_mq_alloc_request+0x10/0x1ba
Feb  2 15:24:03   [<ffffffff81316c3e>] ? blk_mq_map_request.isra.48+0xf2/0x12a
Feb  2 15:24:03   [<ffffffff813177e7>] ? blk_sq_make_request+0x67/0x1fe
Feb  2 15:24:03   [<ffffffff8130eaa9>] ? generic_make_request+0xb4/0x153
Feb  2 15:24:03   [<ffffffff8130ec2c>] ? submit_bio+0xe4/0xed
Feb  2 15:24:03   [<ffffffff8129e47f>] ? btrfs_map_bio+0x22b/0x2a8
Feb  2 15:24:03   [<ffffffff8127c33b>] ? btrfs_submit_bio_hook+0xc0/0x14b
Feb  2 15:24:03   [<ffffffff8127db14>] ? btrfs_direct_IO+0x27a/0x27a
Feb  2 15:24:03   [<ffffffff81291376>] ? submit_one_bio+0x65/0x84
Feb  2 15:24:03   [<ffffffff812960f0>] ? extent_readpages+0x165/0x177
Feb  2 15:24:03   [<ffffffff810efdfc>] ? __do_page_cache_readahead+0x147/0x1e7
Feb  2 15:24:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:24:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:24:03   [<ffffffff8113c66a>] ? __generic_file_splice_read+0x2cf/0x4db
Feb  2 15:24:03   [<ffffffff8113b713>] ? page_cache_pipe_buf_release+0x12/0x12
Feb  2 15:24:03   [<ffffffff810079d1>] ? xen_clocksource_read+0x11/0x12
Feb  2 15:24:03   [<ffffffff8109a1fb>] ? sched_clock_local+0xc/0x6e
Feb  2 15:24:03   [<ffffffff8109ed05>] ? put_prev_entity+0x60/0x59a
Feb  2 15:24:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:24:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:24:03   [<ffffffff8152c0ea>] ? tcp_poll+0x15d/0x1eb
Feb  2 15:24:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:24:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:24:03   [<ffffffff8113bab8>] ? splice_direct_to_actor+0xb7/0x18c
Feb  2 15:24:03   [<ffffffff8113c0bb>] ? iter_file_splice_write+0x2aa/0x2aa
Feb  2 15:24:03   [<ffffffff8113bc0b>] ? do_splice_direct+0x7e/0x99
Feb  2 15:24:03   [<ffffffff8111cbbf>] ? do_sendfile+0x176/0x2d2
Feb  2 15:24:03   [<ffffffff8111d3d1>] ? SyS_sendfile64+0x50/0x7c
Feb  2 15:24:03   [<ffffffff8159886e>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Feb  2 15:26:03  INFO: task python2.7:31587 blocked for more than 120 seconds.
Feb  2 15:26:03        Tainted: G        W       4.4.21-gentoo #2
Feb  2 15:26:03  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 15:26:03  python2.7       D ffff88020e675000     0 31587  30085 0x00000000
Feb  2 15:26:03   ffff88020e342f00 0000000300000000 ffff88007fe44000 ffff88007fe436d0
Feb  2 15:26:03   7fffffffffffffff ffff8800ea3904c8 ffffe8ffff671080 ffff880205ffa178
Feb  2 15:26:03   ffffffff81595ed5 7fffffffffffffff ffffffff81597d29 ffffffff8131803a
Feb  2 15:26:03  Call Trace:
Feb  2 15:26:03   [<ffffffff81595ed5>] ? schedule+0x6c/0x79
Feb  2 15:26:03   [<ffffffff81597d29>] ? schedule_timeout+0x35/0x1b8
Feb  2 15:26:03   [<ffffffff8131803a>] ? blk_mq_flush_plug_list+0xde/0xeb
Feb  2 15:26:03   [<ffffffff8130fc76>] ? blk_flush_plug_list+0xab/0x1c1
Feb  2 15:26:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:26:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:26:03   [<ffffffff813188c1>] ? bt_get+0xe9/0x17e
Feb  2 15:26:03   [<ffffffff810a6192>] ? wait_woken+0x66/0x66
Feb  2 15:26:03   [<ffffffff81318a7f>] ? blk_mq_get_tag+0x2a/0x98
Feb  2 15:26:03   [<ffffffff81315606>] ? __blk_mq_alloc_request+0x10/0x1ba
Feb  2 15:26:03   [<ffffffff81316c3e>] ? blk_mq_map_request.isra.48+0xf2/0x12a
Feb  2 15:26:03   [<ffffffff813177e7>] ? blk_sq_make_request+0x67/0x1fe
Feb  2 15:26:03   [<ffffffff8130eaa9>] ? generic_make_request+0xb4/0x153
Feb  2 15:26:03   [<ffffffff8130ec2c>] ? submit_bio+0xe4/0xed
Feb  2 15:26:03   [<ffffffff8129e47f>] ? btrfs_map_bio+0x22b/0x2a8
Feb  2 15:26:03   [<ffffffff8127c33b>] ? btrfs_submit_bio_hook+0xc0/0x14b
Feb  2 15:26:03   [<ffffffff8127db14>] ? btrfs_direct_IO+0x27a/0x27a
Feb  2 15:26:03   [<ffffffff81291376>] ? submit_one_bio+0x65/0x84
Feb  2 15:26:03   [<ffffffff812960f0>] ? extent_readpages+0x165/0x177
Feb  2 15:26:03   [<ffffffff810efdfc>] ? __do_page_cache_readahead+0x147/0x1e7
Feb  2 15:26:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:26:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:26:03   [<ffffffff8113c66a>] ? __generic_file_splice_read+0x2cf/0x4db
Feb  2 15:26:03   [<ffffffff8113b713>] ? page_cache_pipe_buf_release+0x12/0x12
Feb  2 15:26:03   [<ffffffff810079d1>] ? xen_clocksource_read+0x11/0x12
Feb  2 15:26:03   [<ffffffff8109a1fb>] ? sched_clock_local+0xc/0x6e
Feb  2 15:26:03   [<ffffffff8109ed05>] ? put_prev_entity+0x60/0x59a
Feb  2 15:26:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:26:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:26:03   [<ffffffff8152c0ea>] ? tcp_poll+0x15d/0x1eb
Feb  2 15:26:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:26:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:26:03   [<ffffffff8113bab8>] ? splice_direct_to_actor+0xb7/0x18c
Feb  2 15:26:03   [<ffffffff8113c0bb>] ? iter_file_splice_write+0x2aa/0x2aa
Feb  2 15:26:03   [<ffffffff8113bc0b>] ? do_splice_direct+0x7e/0x99
Feb  2 15:26:03   [<ffffffff8111cbbf>] ? do_sendfile+0x176/0x2d2
Feb  2 15:26:03   [<ffffffff8111d3d1>] ? SyS_sendfile64+0x50/0x7c
Feb  2 15:26:03   [<ffffffff8159886e>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Feb  2 15:26:03  INFO: task python2.7:31679 blocked for more than 120 seconds.
Feb  2 15:26:03        Tainted: G        W       4.4.21-gentoo #2
Feb  2 15:26:03  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 15:26:03  python2.7       D ffff88020e635000     0 31679  30085 0x00000000
Feb  2 15:26:03   ffff8801fd75a700 00000001fd75a700 ffff88012d018000 ffff88012d017ae0
Feb  2 15:26:03   7fffffffffffffff ffff88012d017b80 0000000000000002 ffffffff81596462
Feb  2 15:26:03   ffffffff81595ed5 7fffffffffffffff ffffffff81597d29 ffff8800ea598880
Feb  2 15:26:03  Call Trace:
Feb  2 15:26:03   [<ffffffff81596462>] ? bit_wait+0x4b/0x4b
Feb  2 15:26:03   [<ffffffff81595ed5>] ? schedule+0x6c/0x79
Feb  2 15:26:03   [<ffffffff81597d29>] ? schedule_timeout+0x35/0x1b8
Feb  2 15:26:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:26:03   [<ffffffff815390c2>] ? tcp_write_xmit+0x503/0xbfb
Feb  2 15:26:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:26:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:26:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:26:03   [<ffffffff81596474>] ? bit_wait_io+0x12/0x55
Feb  2 15:26:03   [<ffffffff815962fd>] ? __wait_on_bit_lock+0x42/0x83
Feb  2 15:26:03   [<ffffffff810e6aea>] ? __lock_page+0x5c/0x61
Feb  2 15:26:03   [<ffffffff810a61bc>] ? autoremove_wake_function+0x2a/0x2a
Feb  2 15:26:03   [<ffffffff8113c819>] ? __generic_file_splice_read+0x47e/0x4db
Feb  2 15:26:03   [<ffffffff8113b713>] ? page_cache_pipe_buf_release+0x12/0x12
Feb  2 15:26:03   [<ffffffff8109dcca>] ? dequeue_entity+0x731/0x770
Feb  2 15:26:03   [<ffffffff8109ed05>] ? put_prev_entity+0x60/0x59a
Feb  2 15:26:03   [<ffffffff810079d1>] ? xen_clocksource_read+0x11/0x12
Feb  2 15:26:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:26:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:26:03   [<ffffffff8152c0ea>] ? tcp_poll+0x15d/0x1eb
Feb  2 15:26:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:26:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:26:03   [<ffffffff8113bab8>] ? splice_direct_to_actor+0xb7/0x18c
Feb  2 15:26:03   [<ffffffff8113c0bb>] ? iter_file_splice_write+0x2aa/0x2aa
Feb  2 15:26:03   [<ffffffff8113bc0b>] ? do_splice_direct+0x7e/0x99
Feb  2 15:26:03   [<ffffffff8111cbbf>] ? do_sendfile+0x176/0x2d2
Feb  2 15:26:03   [<ffffffff8111d3d1>] ? SyS_sendfile64+0x50/0x7c
Feb  2 15:26:03   [<ffffffff8159886e>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Feb  2 15:28:03  INFO: task python2.7:31587 blocked for more than 120 seconds.
Feb  2 15:28:03        Tainted: G        W       4.4.21-gentoo #2
Feb  2 15:28:03  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 15:28:03  python2.7       D ffff88020e675000     0 31587  30085 0x00000000
Feb  2 15:28:03   ffff88020e342f00 0000000300000000 ffff88007fe44000 ffff88007fe436d0
Feb  2 15:28:03   7fffffffffffffff ffff8800ea3904c8 ffffe8ffff671080 ffff880205ffa178
Feb  2 15:28:03   ffffffff81595ed5 7fffffffffffffff ffffffff81597d29 ffffffff8131803a
Feb  2 15:28:03  Call Trace:
Feb  2 15:28:03   [<ffffffff81595ed5>] ? schedule+0x6c/0x79
Feb  2 15:28:03   [<ffffffff81597d29>] ? schedule_timeout+0x35/0x1b8
Feb  2 15:28:03   [<ffffffff8131803a>] ? blk_mq_flush_plug_list+0xde/0xeb
Feb  2 15:28:03   [<ffffffff8130fc76>] ? blk_flush_plug_list+0xab/0x1c1
Feb  2 15:28:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:28:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:28:03   [<ffffffff813188c1>] ? bt_get+0xe9/0x17e
Feb  2 15:28:03   [<ffffffff810a6192>] ? wait_woken+0x66/0x66
Feb  2 15:28:03   [<ffffffff81318a7f>] ? blk_mq_get_tag+0x2a/0x98
Feb  2 15:28:03   [<ffffffff81315606>] ? __blk_mq_alloc_request+0x10/0x1ba
Feb  2 15:28:03   [<ffffffff81316c3e>] ? blk_mq_map_request.isra.48+0xf2/0x12a
Feb  2 15:28:03   [<ffffffff813177e7>] ? blk_sq_make_request+0x67/0x1fe
Feb  2 15:28:03   [<ffffffff8130eaa9>] ? generic_make_request+0xb4/0x153
Feb  2 15:28:03   [<ffffffff8130ec2c>] ? submit_bio+0xe4/0xed
Feb  2 15:28:03   [<ffffffff8129e47f>] ? btrfs_map_bio+0x22b/0x2a8
Feb  2 15:28:03   [<ffffffff8127c33b>] ? btrfs_submit_bio_hook+0xc0/0x14b
Feb  2 15:28:03   [<ffffffff8127db14>] ? btrfs_direct_IO+0x27a/0x27a
Feb  2 15:28:03   [<ffffffff81291376>] ? submit_one_bio+0x65/0x84
Feb  2 15:28:03   [<ffffffff812960f0>] ? extent_readpages+0x165/0x177
Feb  2 15:28:03   [<ffffffff810efdfc>] ? __do_page_cache_readahead+0x147/0x1e7
Feb  2 15:28:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:28:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:28:03   [<ffffffff8113c66a>] ? __generic_file_splice_read+0x2cf/0x4db
Feb  2 15:28:03   [<ffffffff8113b713>] ? page_cache_pipe_buf_release+0x12/0x12
Feb  2 15:28:03   [<ffffffff810079d1>] ? xen_clocksource_read+0x11/0x12
Feb  2 15:28:03   [<ffffffff8109a1fb>] ? sched_clock_local+0xc/0x6e
Feb  2 15:28:03   [<ffffffff8109ed05>] ? put_prev_entity+0x60/0x59a
Feb  2 15:28:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:28:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:28:03   [<ffffffff8152c0ea>] ? tcp_poll+0x15d/0x1eb
Feb  2 15:28:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:28:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:28:03   [<ffffffff8113bab8>] ? splice_direct_to_actor+0xb7/0x18c
Feb  2 15:28:03   [<ffffffff8113c0bb>] ? iter_file_splice_write+0x2aa/0x2aa
Feb  2 15:28:03   [<ffffffff8113bc0b>] ? do_splice_direct+0x7e/0x99
Feb  2 15:28:03   [<ffffffff8111cbbf>] ? do_sendfile+0x176/0x2d2
Feb  2 15:28:03   [<ffffffff8111d3d1>] ? SyS_sendfile64+0x50/0x7c
Feb  2 15:28:03   [<ffffffff8159886e>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Feb  2 15:28:03  INFO: task python2.7:31679 blocked for more than 120 seconds.
Feb  2 15:28:03        Tainted: G        W       4.4.21-gentoo #2
Feb  2 15:28:03  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 15:28:03  python2.7       D ffff88020e635000     0 31679  30085 0x00000000
Feb  2 15:28:03   ffff8801fd75a700 00000001fd75a700 ffff88012d018000 ffff88012d017ae0
Feb  2 15:28:03   7fffffffffffffff ffff88012d017b80 0000000000000002 ffffffff81596462
Feb  2 15:28:03   ffffffff81595ed5 7fffffffffffffff ffffffff81597d29 ffff8800ea598880
Feb  2 15:28:03  Call Trace:
Feb  2 15:28:03   [<ffffffff81596462>] ? bit_wait+0x4b/0x4b
Feb  2 15:28:03   [<ffffffff81595ed5>] ? schedule+0x6c/0x79
Feb  2 15:28:03   [<ffffffff81597d29>] ? schedule_timeout+0x35/0x1b8
Feb  2 15:28:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:28:03   [<ffffffff815390c2>] ? tcp_write_xmit+0x503/0xbfb
Feb  2 15:28:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:28:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:28:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:28:03   [<ffffffff81596474>] ? bit_wait_io+0x12/0x55
Feb  2 15:28:03   [<ffffffff815962fd>] ? __wait_on_bit_lock+0x42/0x83
Feb  2 15:28:03   [<ffffffff810e6aea>] ? __lock_page+0x5c/0x61
Feb  2 15:28:03   [<ffffffff810a61bc>] ? autoremove_wake_function+0x2a/0x2a
Feb  2 15:28:03   [<ffffffff8113c819>] ? __generic_file_splice_read+0x47e/0x4db
Feb  2 15:28:03   [<ffffffff8113b713>] ? page_cache_pipe_buf_release+0x12/0x12
Feb  2 15:28:03   [<ffffffff8109dcca>] ? dequeue_entity+0x731/0x770
Feb  2 15:28:03   [<ffffffff8109ed05>] ? put_prev_entity+0x60/0x59a
Feb  2 15:28:03   [<ffffffff810079d1>] ? xen_clocksource_read+0x11/0x12
Feb  2 15:28:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:28:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:28:03   [<ffffffff8152c0ea>] ? tcp_poll+0x15d/0x1eb
Feb  2 15:28:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:28:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:28:03   [<ffffffff8113bab8>] ? splice_direct_to_actor+0xb7/0x18c
Feb  2 15:28:03   [<ffffffff8113c0bb>] ? iter_file_splice_write+0x2aa/0x2aa
Feb  2 15:28:03   [<ffffffff8113bc0b>] ? do_splice_direct+0x7e/0x99
Feb  2 15:28:03   [<ffffffff8111cbbf>] ? do_sendfile+0x176/0x2d2
Feb  2 15:28:03   [<ffffffff8111d3d1>] ? SyS_sendfile64+0x50/0x7c
Feb  2 15:28:03   [<ffffffff8159886e>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Feb  2 15:30:03  INFO: task python2.7:31587 blocked for more than 120 seconds.
Feb  2 15:30:03        Tainted: G        W       4.4.21-gentoo #2
Feb  2 15:30:03  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 15:30:03  python2.7       D ffff88020e675000     0 31587  30085 0x00000000
Feb  2 15:30:03   ffff88020e342f00 0000000300000000 ffff88007fe44000 ffff88007fe436d0
Feb  2 15:30:03   7fffffffffffffff ffff8800ea3904c8 ffffe8ffff671080 ffff880205ffa178
Feb  2 15:30:03   ffffffff81595ed5 7fffffffffffffff ffffffff81597d29 ffffffff8131803a
Feb  2 15:30:03  Call Trace:
Feb  2 15:30:03   [<ffffffff81595ed5>] ? schedule+0x6c/0x79
Feb  2 15:30:03   [<ffffffff81597d29>] ? schedule_timeout+0x35/0x1b8
Feb  2 15:30:03   [<ffffffff8131803a>] ? blk_mq_flush_plug_list+0xde/0xeb
Feb  2 15:30:03   [<ffffffff8130fc76>] ? blk_flush_plug_list+0xab/0x1c1
Feb  2 15:30:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:30:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:30:03   [<ffffffff813188c1>] ? bt_get+0xe9/0x17e
Feb  2 15:30:03   [<ffffffff810a6192>] ? wait_woken+0x66/0x66
Feb  2 15:30:03   [<ffffffff81318a7f>] ? blk_mq_get_tag+0x2a/0x98
Feb  2 15:30:03   [<ffffffff81315606>] ? __blk_mq_alloc_request+0x10/0x1ba
Feb  2 15:30:03   [<ffffffff81316c3e>] ? blk_mq_map_request.isra.48+0xf2/0x12a
Feb  2 15:30:03   [<ffffffff813177e7>] ? blk_sq_make_request+0x67/0x1fe
Feb  2 15:30:03   [<ffffffff8130eaa9>] ? generic_make_request+0xb4/0x153
Feb  2 15:30:03   [<ffffffff8130ec2c>] ? submit_bio+0xe4/0xed
Feb  2 15:30:03   [<ffffffff8129e47f>] ? btrfs_map_bio+0x22b/0x2a8
Feb  2 15:30:03   [<ffffffff8127c33b>] ? btrfs_submit_bio_hook+0xc0/0x14b
Feb  2 15:30:03   [<ffffffff8127db14>] ? btrfs_direct_IO+0x27a/0x27a
Feb  2 15:30:03   [<ffffffff81291376>] ? submit_one_bio+0x65/0x84
Feb  2 15:30:03   [<ffffffff812960f0>] ? extent_readpages+0x165/0x177
Feb  2 15:30:03   [<ffffffff810efdfc>] ? __do_page_cache_readahead+0x147/0x1e7
Feb  2 15:30:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:30:03   [<ffffffff810f005d>] ? ondemand_readahead+0x1c1/0x1cf
Feb  2 15:30:03   [<ffffffff8113c66a>] ? __generic_file_splice_read+0x2cf/0x4db
Feb  2 15:30:03   [<ffffffff8113b713>] ? page_cache_pipe_buf_release+0x12/0x12
Feb  2 15:30:03   [<ffffffff810079d1>] ? xen_clocksource_read+0x11/0x12
Feb  2 15:30:03   [<ffffffff8109a1fb>] ? sched_clock_local+0xc/0x6e
Feb  2 15:30:03   [<ffffffff8109ed05>] ? put_prev_entity+0x60/0x59a
Feb  2 15:30:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:30:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:30:03   [<ffffffff8152c0ea>] ? tcp_poll+0x15d/0x1eb
Feb  2 15:30:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:30:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:30:03   [<ffffffff8113bab8>] ? splice_direct_to_actor+0xb7/0x18c
Feb  2 15:30:03   [<ffffffff8113c0bb>] ? iter_file_splice_write+0x2aa/0x2aa
Feb  2 15:30:03   [<ffffffff8113bc0b>] ? do_splice_direct+0x7e/0x99
Feb  2 15:30:03   [<ffffffff8111cbbf>] ? do_sendfile+0x176/0x2d2
Feb  2 15:30:03   [<ffffffff8111d3d1>] ? SyS_sendfile64+0x50/0x7c
Feb  2 15:30:03   [<ffffffff8159886e>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Feb  2 15:30:03  INFO: task python2.7:31679 blocked for more than 120 seconds.
Feb  2 15:30:03        Tainted: G        W       4.4.21-gentoo #2
Feb  2 15:30:03  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  2 15:30:03  python2.7       D ffff88020e635000     0 31679  30085 0x00000000
Feb  2 15:30:03   ffff8801fd75a700 00000001fd75a700 ffff88012d018000 ffff88012d017ae0
Feb  2 15:30:03   7fffffffffffffff ffff88012d017b80 0000000000000002 ffffffff81596462
Feb  2 15:30:03   ffffffff81595ed5 7fffffffffffffff ffffffff81597d29 ffff8800ea598880
Feb  2 15:30:03  Call Trace:
Feb  2 15:30:03   [<ffffffff81596462>] ? bit_wait+0x4b/0x4b
Feb  2 15:30:03   [<ffffffff81595ed5>] ? schedule+0x6c/0x79
Feb  2 15:30:03   [<ffffffff81597d29>] ? schedule_timeout+0x35/0x1b8
Feb  2 15:30:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:30:03   [<ffffffff815390c2>] ? tcp_write_xmit+0x503/0xbfb
Feb  2 15:30:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:30:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:30:03   [<ffffffff815957a5>] ? io_schedule_timeout+0x5d/0x85
Feb  2 15:30:03   [<ffffffff81596474>] ? bit_wait_io+0x12/0x55
Feb  2 15:30:03   [<ffffffff815962fd>] ? __wait_on_bit_lock+0x42/0x83
Feb  2 15:30:03   [<ffffffff810e6aea>] ? __lock_page+0x5c/0x61
Feb  2 15:30:03   [<ffffffff810a61bc>] ? autoremove_wake_function+0x2a/0x2a
Feb  2 15:30:03   [<ffffffff8113c819>] ? __generic_file_splice_read+0x47e/0x4db
Feb  2 15:30:03   [<ffffffff8113b713>] ? page_cache_pipe_buf_release+0x12/0x12
Feb  2 15:30:03   [<ffffffff8109dcca>] ? dequeue_entity+0x731/0x770
Feb  2 15:30:03   [<ffffffff8109ed05>] ? put_prev_entity+0x60/0x59a
Feb  2 15:30:03   [<ffffffff810079d1>] ? xen_clocksource_read+0x11/0x12
Feb  2 15:30:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:30:03   [<ffffffff810aac61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x1e
Feb  2 15:30:03   [<ffffffff8152c0ea>] ? tcp_poll+0x15d/0x1eb
Feb  2 15:30:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:30:03   [<ffffffff8113c8aa>] ? generic_file_splice_read+0x34/0x56
Feb  2 15:30:03   [<ffffffff8113bab8>] ? splice_direct_to_actor+0xb7/0x18c
Feb  2 15:30:03   [<ffffffff8113c0bb>] ? iter_file_splice_write+0x2aa/0x2aa
Feb  2 15:30:03   [<ffffffff8113bc0b>] ? do_splice_direct+0x7e/0x99
Feb  2 15:30:03   [<ffffffff8111cbbf>] ? do_sendfile+0x176/0x2d2
Feb  2 15:30:03   [<ffffffff8111d3d1>] ? SyS_sendfile64+0x50/0x7c
Feb  2 15:30:03   [<ffffffff8159886e>] ? entry_SYSCALL_64_fastpath+0x12/0x71
Comment 2 Cheyenne Wills 2017-03-07 18:19:01 UTC
Just tried a 4.10.1 kernel.  Still getting same hang (attaching console backtrace)

If this is the wrong component, could someone suggest a better place to report this?
Comment 3 Cheyenne Wills 2017-03-07 18:20:06 UTC
Created attachment 255119 [details]
console backtrace from 4.10.1 system
Comment 4 Tomas Thiemel 2017-04-10 08:33:59 UTC
Kernel 4.9.18-rt-rt14 (PREEMPT_RT), same problem

[57681.201181] =============================================

[58008.905418] INFO: task btrfs-transacti:4943 blocked for more than 300 seconds.
[58008.905422]       Not tainted 4.9.18-rt-rt14-xeon #4
[58008.905423] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[58008.905425] btrfs-transacti D    0  4943      2 0x00080000
[58008.905430]  ffff8807f741d780 ffff8807fd2a2f00 000000000000489c ffff8807f9725e00
[58008.905436]  ffff88082f319458 ffffc9001f66fd70 ffffffff8174c892 ffff88082f319458
[58008.905441]  ffff8807f9725e00 ffff8807f9725e00 ffffc9001f66fda0 ffff8807f93b1980
[58008.905446] Call Trace:
[58008.905453]  [<ffffffff8174c892>] ? __schedule+0x1d2/0x550
[58008.905456]  [<ffffffff8174cc57>] schedule+0x47/0xd0
[58008.905461]  [<ffffffff812fd88f>] btrfs_commit_transaction.part.25+0x90f/0xa00
[58008.905464]  [<ffffffff8109fd90>] ? wake_up_bit+0x30/0x30
[58008.905466]  [<ffffffff812fd9b5>] btrfs_commit_transaction+0x35/0x60
[58008.905468]  [<ffffffff812f76e9>] transaction_kthread+0x1a9/0x1e0
[58008.905470]  [<ffffffff812f7540>] ? btrfs_cleanup_transaction+0x520/0x520
[58008.905472]  [<ffffffff810026a5>] ? do_syscall_64+0x75/0x190
[58008.905475]  [<ffffffff8107b1ef>] kthread+0xef/0x110
[58008.905478]  [<ffffffff8107b100>] ? kthread_create_on_node+0x60/0x60
[58008.905481]  [<ffffffff817508d2>] ret_from_fork+0x22/0x30
[58008.905482] 
               Showing all locks held in the system:
[58008.905486] 2 locks held by khungtaskd/304:
[58008.905487]  #0:  (rcu_read_lock){......}, at: [<ffffffff81100a0d>] watchdog+0x9d/0x380
[58008.905493]  #1:  (tasklist_lock){......}, at: [<ffffffff810a483f>] debug_show_all_locks+0x3f/0x1b0
[58008.905503] 2 locks held by btrfs-cleaner/4942:
[58008.905504]  #0:  (&fs_info->cleaner_mutex){......}, at: [<ffffffff812f2fef>] cleaner_kthread+0x8f/0x1b0
[58008.905511]  #1:  (sb_internal){......}, at: [<ffffffff812fdcae>] start_transaction+0x2ce/0x430
[58008.905517] 1 lock held by btrfs-transacti/4943:
[58008.905518]  #0:  (&fs_info->transaction_kthread_mutex){......}, at: [<ffffffff812f7595>] transaction_kthread+0x55/0x1e0
[58008.905529] 1 lock held by qemu-system-x86/6886:
[58008.905530]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905550] 1 lock held by qemu-system-x86/6889:
[58008.905551]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905569] 1 lock held by qemu-system-x86/6892:
[58008.905570]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905584] 1 lock held by qemu-system-x86/6893:
[58008.905584]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905604] 1 lock held by qemu-system-x86/6912:
[58008.905604]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905613] 1 lock held by qemu-system-x86/6913:
[58008.905614]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905626] 1 lock held by qemu-system-x86/6914:
[58008.905627]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905640] 1 lock held by qemu-system-x86/6915:
[58008.905641]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905655] 1 lock held by qemu-system-x86/6916:
[58008.905656]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905670] 1 lock held by qemu-system-x86/6917:
[58008.905670]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905686] 1 lock held by qemu-system-x86/7275:
[58008.905687]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905701] 1 lock held by qemu-system-x86/7277:
[58008.905702]  #0:  (&vcpu->mutex){......}, at: [<ffffffffa0379115>] vcpu_load+0x15/0x90 [kvm]
[58008.905717] 2 locks held by bash/26312:
[58008.905718]  #0:  (&tty->ldisc_sem){......}, at: [<ffffffff8174f4de>] ldsem_down_read+0xe/0x10
[58008.905724]  #1:  (&ldata->atomic_read_lock){......}, at: [<ffffffff814ac95d>] n_tty_read+0xbd/0x920
[58008.905733] 2 locks held by bash/32506:
[58008.905733]  #0:  (&tty->ldisc_sem){......}, at: [<ffffffff8174f4de>] ldsem_down_read+0xe/0x10
[58008.905737]  #1:  (&ldata->atomic_read_lock){......}, at: [<ffffffff814ac95d>] n_tty_read+0xbd/0x920
[58008.905741] 2 locks held by bash/548:
[58008.905741]  #0:  (&tty->ldisc_sem){......}, at: [<ffffffff8174f4de>] ldsem_down_read+0xe/0x10
[58008.905745]  #1:  (&ldata->atomic_read_lock){......}, at: [<ffffffff814ac95d>] n_tty_read+0xbd/0x920
[58008.905750] 3 locks held by rsync/3388:
[58008.905751]  #0:  (sb_writers#8){......}, at: [<ffffffff8119ba81>] do_sys_ftruncate.constprop.17+0xb1/0xe0
[58008.905758]  #1:  (&sb->s_type->i_mutex_key#9){......}, at: [<ffffffff8119b752>] do_truncate+0x62/0xc0
[58008.905764]  #2:  (sb_internal){......}, at: [<ffffffff812fdcae>] start_transaction+0x2ce/0x430
[58008.905772] 3 locks held by rsync/5224:
[58008.905773]  #0:  (sb_writers#8){......}, at: [<ffffffff8119ba81>] do_sys_ftruncate.constprop.17+0xb1/0xe0
[58008.905779]  #1:  (&sb->s_type->i_mutex_key#9){......}, at: [<ffffffff8119b752>] do_truncate+0x62/0xc0
[58008.905785]  #2:  (sb_internal){......}, at: [<ffffffff812fdcae>] start_transaction+0x2ce/0x430
[58008.905793] 3 locks held by rsync/6413:
[58008.905794]  #0:  (sb_writers#8){......}, at: [<ffffffff8119ba81>] do_sys_ftruncate.constprop.17+0xb1/0xe0
[58008.905801]  #1:  (&sb->s_type->i_mutex_key#9){......}, at: [<ffffffff8119b752>] do_truncate+0x62/0xc0
[58008.905807]  #2:  (sb_internal){......}, at: [<ffffffff812fdcae>] start_transaction+0x2ce/0x430
[58008.905813] 3 locks held by rsync/6664:
[58008.905814]  #0:  (sb_writers#8){......}, at: [<ffffffff811c070f>] mnt_want_write+0x1f/0x50
[58008.905820]  #1:  (&type->i_mutex_dir_key#2/1){......}, at: [<ffffffff811abe8a>] filename_create+0x7a/0x160
[58008.905828]  #2:  (sb_internal){......}, at: [<ffffffff812fdcae>] start_transaction+0x2ce/0x430
[58008.905835] 2 locks held by bash/6772:
[58008.905836]  #0:  (&tty->ldisc_sem){......}, at: [<ffffffff8174f4de>] ldsem_down_read+0xe/0x10
[58008.905841]  #1:  (&ldata->atomic_read_lock){......}, at: [<ffffffff814ac95d>] n_tty_read+0xbd/0x920

[58008.905848] =============================================
Comment 5 Tomas Thiemel 2017-04-10 08:36:56 UTC
Created attachment 255805 [details]
full dmesg of my 4.9.18_p14 RT kernel
Comment 6 Tomas Mozes 2017-05-26 10:40:04 UTC
Maybe related https://bugzilla.kernel.org/show_bug.cgi?id=194967

Observed hangs on different hardware, both ext4/xfs on kernel 4.4. It does not happen with kernel 4.1.

Do you still have this problem?
Comment 7 Cheyenne Wills 2017-05-26 10:47:14 UTC
(In reply to Tomas Mozes from comment #6)
> Maybe related https://bugzilla.kernel.org/show_bug.cgi?id=194967
> 
> Observed hangs on different hardware, both ext4/xfs on kernel 4.4. It does
> not happen with kernel 4.1.
> 
> Do you still have this problem?

I haven't tried yet with anything newer then the 4.10.1
Comment 8 Cheyenne Wills 2017-10-24 17:42:00 UTC
Still having the same problem with 4.12.12 kernel:

Attaching the output of the initial stall, the output of a sysrq-l and the output a sysrq-t
Comment 9 Cheyenne Wills 2017-10-24 17:43:55 UTC
Created attachment 260367 [details]
4.12.12 kernel call trace

4.12.12 kernel call trace following btrfs scrub on multi-lun filesystem under xen
Comment 10 Cheyenne Wills 2017-10-24 17:44:21 UTC
Created attachment 260369 [details]
sysrq l output following 4.12.12 stall
Comment 11 Cheyenne Wills 2017-10-24 17:44:51 UTC
Created attachment 260371 [details]
sysrq t output following 4.12.12 stall
Comment 12 Tomas Mozes 2017-10-25 08:18:49 UTC
So it seems like #194967 is not related to this as I cannot reproduce it with kernel 4.12+.
Comment 13 Cheyenne Wills 2017-10-26 19:21:49 UTC
It's possible that 4.13 might have fixed the problem.  Am investigating
Comment 14 Cheyenne Wills 2017-10-26 21:03:39 UTC
It appears that this might have been a xen block-front / block-back issue of some sort and the size of the grant table.

After testing a 4.13 kernel, the btrfs scrub was not locking up (which was taking about 5 minutes).  However I started seeing a bunch of xen messages about the hitting a limit for the xen grant table:

...
xen:grant_table: xen/grant-table: max_grant_frames reached cur=32 extra=1 limit=32 gnttab_free_count=0 req_entries=29
xen:grant_table: xen/grant-table: max_grant_frames reached cur=32 extra=1 limit=32 gnttab_free_count=0 req_entries=29
xen:grant_table: xen/grant-table: max_grant_frames reached cur=32 extra=1 limit=32 gnttab_free_count=0 req_entries=29
xen:grant_table: xen/grant-table: max_grant_frames reached cur=32 extra=1 limit=32 gnttab_free_count=0 req_entries=29
...


I rebooted the xen host with an increased max grant table that got rid of the message.  In investigating the size of the grant table I found a hit on a Novell site that discussed a hang due to heavy IO and the shortage of grant table frames (https://www.novell.com/support/kb/doc.php?id=7018590).  

What I suspect at this point was that the problem described in the Novell site was probably the core problem I was having.  Something in the 4.13 kernel "fixed" the kernel so it wouldn't hang, which allowed the system to continue and reach a point that it could generate the above messages about the grant tables.  

I ran an additional test with a 4.12 kernel (that was hanging), but with the bump in the host's gnttab_max_frames increased.  I wasn't able to reproduce the original hang.

So.. to summarize, something in 4.13 "fixed" a hang due to limited number of xen grant_frames (allowing the system to report the error).  By increasing the gnttbl_max_frames in the xen host the guest appears to function correctly under heavy IO load (even a pre-4.13 system).  


I'm closing this issue because 1) the hang itself doesn't occur under a 4.13 kernel and 2) using a prior kernel but with with an increased gnttab_max_frames that appears to fix the hang.
Comment 15 Tomas Mozes 2017-10-27 06:05:24 UTC
Thanks for the info.
Comment 16 Tomas Mozes 2018-01-24 07:19:04 UTC
Seems like kernel 4.13 added an optimization, but was not a permanent fix:
https://www.mail-archive.com/xen-devel@lists.xen.org/msg116412.html

So using kernel 4.13 it's less likely to see a hang, but they still happen. Xen 4.10 increased the gnttab limit to 64 and we have a xen-diag monitoring which can monitor the grant table.

Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=880554

Note You need to log in before you can comment on or make changes to this bug.