Bug 199435 - HPSA + P420i resetting logical Direct-Access never complete
Summary: HPSA + P420i resetting logical Direct-Access never complete
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: linux-scsi@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-18 08:19 UTC by Anthony Hausman
Modified: 2020-06-17 09:18 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.11.0-14-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Patch to use local work-queue insead of system work-queue (5.88 KB, patch)
2018-04-18 17:26 UTC, Don
Details | Diff
Latest out of box hpsa driver. (83.42 KB, application/x-bzip)
2018-04-21 23:31 UTC, Don
Details
Load on server during reset problem (47.54 KB, image/png)
2018-05-02 08:47 UTC, Anthony Hausman
Details
Patch to correct resets (15.46 KB, patch)
2019-04-23 20:36 UTC, Don
Details | Diff

Description Anthony Hausman 2018-04-18 08:19:04 UTC
I'm using the kernel 4.11.0-14-generic with the last hpsa driver compile from the last commit of torvalds github :

https://github.com/torvalds/linux/commit/8b834bff1b73dce46f4e9f5e84af6f73fed8b0ef#diff-7a84fb366ebc08b575a832f0aeee3434

I'm using a Smart Array P420i, Firmware Version 8.32.
When a resetting logical is triggered, this one never complete and the server start to have a heavy load (can rise to 3000).
After the reset, some task begin to timout but I think that is just the effect of the resetting (cmaeventd is the process checking for controller status):

Apr 18 01:28:53 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
Apr 18 01:29:16 kernel: INFO: task cmaeventd:3397 blocked for more than 120 seconds.
Apr 18 01:29:16 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 18 01:29:16 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 18 01:29:16 kernel: cmaeventd       D    0  3397      1 0x00000000
Apr 18 01:29:16 kernel: Call Trace:
Apr 18 01:29:16 kernel:  __schedule+0x3b9/0x8f0
Apr 18 01:29:16 kernel:  schedule+0x36/0x80
Apr 18 01:29:16 kernel:  scsi_block_when_processing_errors+0xd5/0x110
Apr 18 01:29:16 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 18 01:29:16 kernel:  sg_open+0x14a/0x5c0
Apr 18 01:29:16 kernel:  ? lookup_fast+0xd8/0x3b0
Apr 18 01:29:16 kernel:  ? refcount_inc+0x9/0x40
Apr 18 01:29:16 kernel:  chrdev_open+0xbf/0x1b0
Apr 18 01:29:16 kernel:  do_dentry_open+0x208/0x310
Apr 18 01:29:16 kernel:  ? cdev_put+0x30/0x30
Apr 18 01:29:16 kernel:  vfs_open+0x4e/0x80
Apr 18 01:29:16 kernel:  path_openat+0x2ac/0x1450
Apr 18 01:29:16 kernel:  do_filp_open+0x99/0x110
Apr 18 01:29:16 kernel:  ? __check_object_size+0x108/0x19e
Apr 18 01:29:16 kernel:  ? __alloc_fd+0x46/0x170
Apr 18 01:29:16 kernel:  do_sys_open+0x12d/0x280
Apr 18 01:29:16 kernel:  ? do_sys_open+0x12d/0x280
Apr 18 01:29:16 kernel:  ? __put_cred+0x3d/0x50
Apr 18 01:29:16 kernel:  ? SyS_access+0x1e8/0x230
Apr 18 01:29:16 kernel:  SyS_open+0x1e/0x20
Apr 18 01:29:16 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 18 01:29:16 kernel: RIP: 0033:0x7f413c901be0
Apr 18 01:29:16 kernel: RSP: 002b:00007ffc0c1cd5b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
Apr 18 01:29:16 kernel: RAX: ffffffffffffffda RBX: 00000000025f7a40 RCX: 00007f413c901be0
Apr 18 01:29:16 kernel: RDX: 0000000000000008 RSI: 0000000000000002 RDI: 00007ffc0c1cd5f0
Apr 18 01:29:16 kernel: RBP: 0000000002563b40 R08: 0000000000000001 R09: 0000000000000000
Apr 18 01:29:16 kernel: R10: 00007f413c8ea760 R11: 0000000000000246 R12: 00007ffc0c1cd7b0
Apr 18 01:29:16 kernel: R13: 0000000000000001 R14: 00007ffc0c1cd700 R15: 00007ffc0c1cd830
Apr 18 01:29:16 kernel: INFO: task cmaidad:3442 blocked for more than 120 seconds.
Apr 18 01:29:16 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 18 01:29:16 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 18 01:29:16 kernel: cmaidad         D    0  3442      1 0x00000000
Apr 18 01:29:16 kernel: Call Trace:
Apr 18 01:29:16 kernel:  __schedule+0x3b9/0x8f0
Apr 18 01:29:16 kernel:  schedule+0x36/0x80
Apr 18 01:29:16 kernel:  scsi_block_when_processing_errors+0xd5/0x110
Apr 18 01:29:16 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 18 01:29:16 kernel:  sg_open+0x14a/0x5c0
Apr 18 01:29:16 kernel:  ? lookup_fast+0xd8/0x3b0
Apr 18 01:29:16 kernel:  ? refcount_inc+0x9/0x40
Apr 18 01:29:16 kernel:  chrdev_open+0xbf/0x1b0
Apr 18 01:29:16 kernel:  do_dentry_open+0x208/0x310
Apr 18 01:29:16 kernel:  ? cdev_put+0x30/0x30
Apr 18 01:29:16 kernel:  vfs_open+0x4e/0x80
Apr 18 01:29:16 kernel:  path_openat+0x2ac/0x1450
Apr 18 01:29:16 kernel:  do_filp_open+0x99/0x110
Apr 18 01:29:16 kernel:  ? ipcperms+0x94/0x100
Apr 18 01:29:16 kernel:  ? __check_object_size+0x108/0x19e
Apr 18 01:29:16 kernel:  ? __alloc_fd+0x46/0x170
Apr 18 01:29:16 kernel:  do_sys_open+0x12d/0x280
Apr 18 01:29:16 kernel:  ? do_sys_open+0x12d/0x280
Apr 18 01:29:16 kernel:  ? __put_cred+0x3d/0x50
Apr 18 01:29:16 kernel:  ? SyS_access+0x1e8/0x230
Apr 18 01:29:16 kernel:  SyS_open+0x1e/0x20
Apr 18 01:29:16 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 18 01:29:16 kernel: RIP: 0033:0x7ff5af4cdbe0
Apr 18 01:29:16 kernel: RSP: 002b:00007fff8eac8818 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
Apr 18 01:29:16 kernel: RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007ff5af4cdbe0
Apr 18 01:29:16 kernel: RDX: 0000000000000008 RSI: 0000000000000002 RDI: 00007fff8eac8850
Apr 18 01:29:16 kernel: RBP: 0000000002372870 R08: 0000000000000001 R09: 00007ff5af4b77b8
Apr 18 01:29:16 kernel: R10: 00007ff5af4b6760 R11: 0000000000000246 R12: 0000000002372878
Apr 18 01:29:16 kernel: R13: 0000000000000005 R14: 00007ff5b00018c0 R15: 0000000000000000
Apr 18 01:29:16 kernel: INFO: task jbd2/sdam-8:9965 blocked for more than 120 seconds.
Apr 18 01:29:16 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 18 01:29:16 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 18 01:29:16 kernel: jbd2/sdam-8     D    0  9965      2 0x00000000
Apr 18 01:29:16 kernel: Call Trace:
Apr 18 01:29:16 kernel:  __schedule+0x3b9/0x8f0
Apr 18 01:29:16 kernel:  schedule+0x36/0x80
Apr 18 01:29:16 kernel:  jbd2_journal_commit_transaction+0x241/0x1830
Apr 18 01:29:16 kernel:  ? update_load_avg+0x84/0x560
Apr 18 01:29:16 kernel:  ? update_load_avg+0x84/0x560
Apr 18 01:29:16 kernel:  ? dequeue_entity+0xed/0x4c0
Apr 18 01:29:16 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 18 01:29:16 kernel:  ? lock_timer_base+0x7d/0xa0
Apr 18 01:29:16 kernel:  kjournald2+0xca/0x250
Apr 18 01:29:16 kernel:  ? kjournald2+0xca/0x250
Apr 18 01:29:16 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 18 01:29:16 kernel:  kthread+0x109/0x140
Apr 18 01:29:16 kernel:  ? commit_timeout+0x10/0x10
Apr 18 01:29:16 kernel:  ? kthread_create_on_node+0x70/0x70
Apr 18 01:29:16 kernel:  ret_from_fork+0x25/0x30

The only way to be back to normal is to reboot the server. 
Hope this helps somebody. If there is any more info I can provide, just ask what would be useful.
Comment 1 Don 2018-04-18 14:27:07 UTC
Do you see any lockup messages in the console logs?
"Controller lockup detected"...


The driver you used is from 4.16 kernel on a 4.11 kernel? I have not tested this configuration.

I notice that the driver is still using the kernel work-queue for monitoring. I will be sending up a patch to change this to local work-queues soon. Perhaps you can test this patch? It may help to discover more information on what is happening.

Also, after you rebooted, were there any lockup entries in the ilo IML log?
Comment 2 Anthony Hausman 2018-04-18 14:38:22 UTC
I don't have any "Controller lockup detected" message in the Syslog unfortunately.
On the ilo IML log, the last message was about the cache module:

CAUTION: POST Messages - POST Error: 1792-Slot X Drive Array - Valid Data Found in Cache Module. Data will automatically be written to drive array..

I have nothing about lockup entries.

Indeed, we use the driver from the last kernel and compiled it for 4.11.
I am ready to test the patch you are proposing. 
Where can I retrieve it?
Comment 3 Don 2018-04-18 17:26:11 UTC
Created attachment 275437 [details]
Patch to use local work-queue insead of system work-queue

If the driver initiates a re-scan from a system work-queue, the kernel can hang.

This patch has not been submitted to linux-scsi, I will be sending this patch out soon.
Comment 4 Don 2018-04-18 17:28:13 UTC
Your stack trace does not show and hpsa driver components, but I do see the reset issued but not completing.

I'm hoping that the attached patch helps diagnose the issue a little better.
Comment 5 Anthony Hausman 2018-04-19 09:11:42 UTC
Don,

I have applied the patch, it actually run and I try to reproduce the problem.
I'll inform you about the diagnose.
Comment 6 Anthony Hausman 2018-04-19 10:35:28 UTC
I have a stack trave about the Workqueue:

Apr 19 11:22:52 kernel: INFO: task kworker/u129:28:428 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: kworker/u129:28 D    0   428      2 0x00000000
Apr 19 11:22:52 kernel: Workqueue: writeback wb_workfn (flush-67:80)
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? find_get_pages_tag+0x19f/0x2b0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_writepages+0x4e6/0xe20
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_writepages+0x4e6/0xe20
Apr 19 11:22:52 kernel:  ? generic_writepages+0x67/0x90
Apr 19 11:22:52 kernel:  ? sd_init_command+0x30/0xb0
Apr 19 11:22:52 kernel:  do_writepages+0x1e/0x30
Apr 19 11:22:52 kernel:  ? do_writepages+0x1e/0x30
Apr 19 11:22:52 kernel:  __writeback_single_inode+0x45/0x330
Apr 19 11:22:52 kernel:  writeback_sb_inodes+0x26a/0x5f0
Apr 19 11:22:52 kernel:  __writeback_inodes_wb+0x92/0xc0
Apr 19 11:22:52 kernel:  wb_writeback+0x26e/0x320
Apr 19 11:22:52 kernel:  wb_workfn+0x2cf/0x3a0
Apr 19 11:22:52 kernel:  ? wb_workfn+0x2cf/0x3a0
Apr 19 11:22:52 kernel:  process_one_work+0x16b/0x4a0
Apr 19 11:22:52 kernel:  worker_thread+0x4b/0x500
Apr 19 11:22:52 kernel:  kthread+0x109/0x140
Apr 19 11:22:52 kernel:  ? process_one_work+0x4a0/0x4a0
Apr 19 11:22:52 kernel:  ? kthread_create_on_node+0x70/0x70
Apr 19 11:22:52 kernel:  ret_from_fork+0x25/0x30
Apr 19 11:22:52 kernel: INFO: task jbd2/sdbb-8:10556 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: jbd2/sdbb-8     D    0 10556      2 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  ? update_cfs_rq_load_avg.constprop.91+0x227/0x4e0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  jbd2_journal_commit_transaction+0x241/0x1830
Apr 19 11:22:52 kernel:  ? update_load_avg+0x84/0x560
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  ? lock_timer_base+0x7d/0xa0
Apr 19 11:22:52 kernel:  kjournald2+0xca/0x250
Apr 19 11:22:52 kernel:  ? kjournald2+0xca/0x250
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  kthread+0x109/0x140
Apr 19 11:22:52 kernel:  ? commit_timeout+0x10/0x10
Apr 19 11:22:52 kernel:  ? kthread_create_on_node+0x70/0x70
Apr 19 11:22:52 kernel:  ret_from_fork+0x25/0x30
Apr 19 11:22:52 kernel: INFO: task task:14138 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14138  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? autoremove_wake_function+0x40/0x40
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? dquot_file_open+0x3d/0x50
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __mark_inode_dirty+0x176/0x370
Apr 19 11:22:52 kernel:  generic_update_time+0x7b/0xd0
Apr 19 11:22:52 kernel:  ? current_time+0x38/0x80
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  file_update_time+0xb7/0x110
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  __generic_file_write_iter+0x9d/0x1f0
Apr 19 11:22:52 kernel:  ext4_file_write_iter+0x21a/0x3c0
Apr 19 11:22:52 kernel:  ? __slab_free+0x9e/0x2e0
Apr 19 11:22:52 kernel:  new_sync_write+0xd3/0x130
Apr 19 11:22:52 kernel:  __vfs_write+0x26/0x40
Apr 19 11:22:52 kernel:  vfs_write+0xb8/0x1b0
Apr 19 11:22:52 kernel:  ? do_sys_open+0x1b4/0x280
Apr 19 11:22:52 kernel:  SyS_pwrite64+0x95/0xb0
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ed23
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a95aabb90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ed23
Apr 19 11:22:52 kernel: RDX: 0000000000000018 RSI: 00007f9a95aabcc0 RDI: 000000000000001a
Apr 19 11:22:52 kernel: RBP: 00007f9a20000f30 R08: 00007f9a95aabc18 R09: 00007f9a95aabb30
Apr 19 11:22:52 kernel: R10: 00000000000000c0 R11: 0000000000000293 R12: 00007f9a20000ef0
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab0e3a0 R15: 00007f9a20000a30
Apr 19 11:22:52 kernel: INFO: task task:14159 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14159  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? schedule+0x36/0x80
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? out_of_line_wait_on_bit+0x82/0xb0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  ext4_create+0x110/0x1b0
Apr 19 11:22:52 kernel:  path_openat+0x133b/0x1450
Apr 19 11:22:52 kernel:  do_filp_open+0x99/0x110
Apr 19 11:22:52 kernel:  ? __check_object_size+0x108/0x19e
Apr 19 11:22:52 kernel:  ? __alloc_fd+0x46/0x170
Apr 19 11:22:52 kernel:  do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  ? do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  SyS_open+0x1e/0x20
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ebfd
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a912a2bf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ebfd
Apr 19 11:22:52 kernel: RDX: 0000000000000180 RSI: 0000000000000041 RDI: 00007f9a912a2d50
Apr 19 11:22:52 kernel: RBP: 00007f99f4000f30 R08: 0000000000000000 R09: 0000000000000001
Apr 19 11:22:52 kernel: R10: 00000000000c58da R11: 0000000000000293 R12: 00007f99f4000ef0
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab44360 R15: 00007f99f4000a30
Apr 19 11:22:52 kernel: INFO: task task:14163 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14163  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? autoremove_wake_function+0x40/0x40
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? dquot_file_open+0x3d/0x50
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __mark_inode_dirty+0x176/0x370
Apr 19 11:22:52 kernel:  generic_update_time+0x7b/0xd0
Apr 19 11:22:52 kernel:  ? current_time+0x38/0x80
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  file_update_time+0xb7/0x110
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  __generic_file_write_iter+0x9d/0x1f0
Apr 19 11:22:52 kernel:  ext4_file_write_iter+0x21a/0x3c0
Apr 19 11:22:52 kernel:  ? __slab_free+0x178/0x2e0
Apr 19 11:22:52 kernel:  new_sync_write+0xd3/0x130
Apr 19 11:22:52 kernel:  __vfs_write+0x26/0x40
Apr 19 11:22:52 kernel:  vfs_write+0xb8/0x1b0
Apr 19 11:22:52 kernel:  ? do_sys_open+0x1b4/0x280
Apr 19 11:22:52 kernel:  SyS_pwrite64+0x95/0xb0
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ed23
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a902a0b90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ed23
Apr 19 11:22:52 kernel: RDX: 0000000000000018 RSI: 00007f9a902a0cc0 RDI: 0000000000000015
Apr 19 11:22:52 kernel: RBP: 00007f99ec000f30 R08: 00007f9a902a0c18 R09: 00007f9a902a0b30
Apr 19 11:22:52 kernel: R10: 00000000000000f0 R11: 0000000000000293 R12: 00007f99ec000ef0
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab14ad0 R15: 00007f99ec000a30
Apr 19 11:22:52 kernel: INFO: task task:14190 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14190  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? autoremove_wake_function+0x40/0x40
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? dquot_file_open+0x3d/0x50
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __mark_inode_dirty+0x176/0x370
Apr 19 11:22:52 kernel:  generic_update_time+0x7b/0xd0
Apr 19 11:22:52 kernel:  ? current_time+0x38/0x80
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  file_update_time+0xb7/0x110
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  __generic_file_write_iter+0x9d/0x1f0
Apr 19 11:22:52 kernel:  ext4_file_write_iter+0x21a/0x3c0
Apr 19 11:22:52 kernel:  ? __slab_free+0x9e/0x2e0
Apr 19 11:22:52 kernel:  new_sync_write+0xd3/0x130
Apr 19 11:22:52 kernel:  __vfs_write+0x26/0x40
Apr 19 11:22:52 kernel:  vfs_write+0xb8/0x1b0
Apr 19 11:22:52 kernel:  ? do_sys_open+0x1b4/0x280
Apr 19 11:22:52 kernel:  SyS_pwrite64+0x95/0xb0
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ed23
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a8b296b90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ed23
Apr 19 11:22:52 kernel: RDX: 0000000000000018 RSI: 00007f9a8b296cc0 RDI: 0000000000000012
Apr 19 11:22:52 kernel: RBP: 00007f99c4000f30 R08: 00007f9a8b296c18 R09: 00007f9a8b296b30
Apr 19 11:22:52 kernel: R10: 0000000000000120 R11: 0000000000000293 R12: 00007f99c4000ef0
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab218f0 R15: 00007f99c4000a30
Apr 19 11:22:52 kernel: INFO: task task:14203 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14203  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? schedule+0x36/0x80
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? out_of_line_wait_on_bit+0x82/0xb0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  ext4_create+0x110/0x1b0
Apr 19 11:22:52 kernel:  path_openat+0x133b/0x1450
Apr 19 11:22:52 kernel:  do_filp_open+0x99/0x110
Apr 19 11:22:52 kernel:  ? __check_object_size+0x108/0x19e
Apr 19 11:22:52 kernel:  ? __alloc_fd+0x46/0x170
Apr 19 11:22:52 kernel:  do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  ? do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  SyS_open+0x1e/0x20
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ebfd
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a89a93bf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ebfd
Apr 19 11:22:52 kernel: RDX: 0000000000000180 RSI: 0000000000000041 RDI: 00007f9a89a93d50
Apr 19 11:22:52 kernel: RBP: 00007f99c0001010 R08: 0000000000000000 R09: 0000000000000001
Apr 19 11:22:52 kernel: R10: 00000000000c9c34 R11: 0000000000000293 R12: 00007f99c0000e80
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab0d8c0 R15: 00007f99c0000a30
Apr 19 11:22:52 kernel: INFO: task task:14207 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14207  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? schedule+0x36/0x80
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? out_of_line_wait_on_bit+0x82/0xb0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  ext4_create+0x110/0x1b0
Apr 19 11:22:52 kernel:  path_openat+0x133b/0x1450
Apr 19 11:22:52 kernel:  do_filp_open+0x99/0x110
Apr 19 11:22:52 kernel:  ? __check_object_size+0x108/0x19e
Apr 19 11:22:52 kernel:  ? __alloc_fd+0x46/0x170
Apr 19 11:22:52 kernel:  do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  ? do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  SyS_open+0x1e/0x20
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ebfd
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a89292bf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ebfd
Apr 19 11:22:52 kernel: RDX: 0000000000000180 RSI: 0000000000000041 RDI: 00007f9a89292d50
Apr 19 11:22:52 kernel: RBP: 00007f99b4000f40 R08: 0000000000000000 R09: 0000000000000001
Apr 19 11:22:52 kernel: R10: 00000000000c2428 R11: 0000000000000293 R12: 00007f99b4000f00
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab07c48 R15: 00007f99b4000a60
Apr 19 11:22:52 kernel: INFO: task task:14213 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14213  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? schedule+0x36/0x80
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? out_of_line_wait_on_bit+0x82/0xb0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  ext4_create+0x110/0x1b0
Apr 19 11:22:52 kernel:  path_openat+0x133b/0x1450
Apr 19 11:22:52 kernel:  do_filp_open+0x99/0x110
Apr 19 11:22:52 kernel:  ? __check_object_size+0x108/0x19e
Apr 19 11:22:52 kernel:  ? __alloc_fd+0x46/0x170
Apr 19 11:22:52 kernel:  do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  ? do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  SyS_open+0x1e/0x20
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: Apr 19 11:22:52 kernel: INFO: task kworker/u129:28:428 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: kworker/u129:28 D    0   428      2 0x00000000
Apr 19 11:22:52 kernel: Workqueue: writeback wb_workfn (flush-67:80)
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? find_get_pages_tag+0x19f/0x2b0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_writepages+0x4e6/0xe20
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_writepages+0x4e6/0xe20
Apr 19 11:22:52 kernel:  ? generic_writepages+0x67/0x90
Apr 19 11:22:52 kernel:  ? sd_init_command+0x30/0xb0
Apr 19 11:22:52 kernel:  do_writepages+0x1e/0x30
Apr 19 11:22:52 kernel:  ? do_writepages+0x1e/0x30
Apr 19 11:22:52 kernel:  __writeback_single_inode+0x45/0x330
Apr 19 11:22:52 kernel:  writeback_sb_inodes+0x26a/0x5f0
Apr 19 11:22:52 kernel:  __writeback_inodes_wb+0x92/0xc0
Apr 19 11:22:52 kernel:  wb_writeback+0x26e/0x320
Apr 19 11:22:52 kernel:  wb_workfn+0x2cf/0x3a0
Apr 19 11:22:52 kernel:  ? wb_workfn+0x2cf/0x3a0
Apr 19 11:22:52 kernel:  process_one_work+0x16b/0x4a0
Apr 19 11:22:52 kernel:  worker_thread+0x4b/0x500
Apr 19 11:22:52 kernel:  kthread+0x109/0x140
Apr 19 11:22:52 kernel:  ? process_one_work+0x4a0/0x4a0
Apr 19 11:22:52 kernel:  ? kthread_create_on_node+0x70/0x70
Apr 19 11:22:52 kernel:  ret_from_fork+0x25/0x30
Apr 19 11:22:52 kernel: INFO: task jbd2/sdbb-8:10556 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: jbd2/sdbb-8     D    0 10556      2 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  ? update_cfs_rq_load_avg.constprop.91+0x227/0x4e0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  jbd2_journal_commit_transaction+0x241/0x1830
Apr 19 11:22:52 kernel:  ? update_load_avg+0x84/0x560
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  ? lock_timer_base+0x7d/0xa0
Apr 19 11:22:52 kernel:  kjournald2+0xca/0x250
Apr 19 11:22:52 kernel:  ? kjournald2+0xca/0x250
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  kthread+0x109/0x140
Apr 19 11:22:52 kernel:  ? commit_timeout+0x10/0x10
Apr 19 11:22:52 kernel:  ? kthread_create_on_node+0x70/0x70
Apr 19 11:22:52 kernel:  ret_from_fork+0x25/0x30
Apr 19 11:22:52 kernel: INFO: task task:14138 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14138  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? autoremove_wake_function+0x40/0x40
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? dquot_file_open+0x3d/0x50
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __mark_inode_dirty+0x176/0x370
Apr 19 11:22:52 kernel:  generic_update_time+0x7b/0xd0
Apr 19 11:22:52 kernel:  ? current_time+0x38/0x80
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  file_update_time+0xb7/0x110
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  __generic_file_write_iter+0x9d/0x1f0
Apr 19 11:22:52 kernel:  ext4_file_write_iter+0x21a/0x3c0
Apr 19 11:22:52 kernel:  ? __slab_free+0x9e/0x2e0
Apr 19 11:22:52 kernel:  new_sync_write+0xd3/0x130
Apr 19 11:22:52 kernel:  __vfs_write+0x26/0x40
Apr 19 11:22:52 kernel:  vfs_write+0xb8/0x1b0
Apr 19 11:22:52 kernel:  ? do_sys_open+0x1b4/0x280
Apr 19 11:22:52 kernel:  SyS_pwrite64+0x95/0xb0
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ed23
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a95aabb90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ed23
Apr 19 11:22:52 kernel: RDX: 0000000000000018 RSI: 00007f9a95aabcc0 RDI: 000000000000001a
Apr 19 11:22:52 kernel: RBP: 00007f9a20000f30 R08: 00007f9a95aabc18 R09: 00007f9a95aabb30
Apr 19 11:22:52 kernel: R10: 00000000000000c0 R11: 0000000000000293 R12: 00007f9a20000ef0
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab0e3a0 R15: 00007f9a20000a30
Apr 19 11:22:52 kernel: INFO: task task:14159 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14159  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? schedule+0x36/0x80
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? out_of_line_wait_on_bit+0x82/0xb0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  ext4_create+0x110/0x1b0
Apr 19 11:22:52 kernel:  path_openat+0x133b/0x1450
Apr 19 11:22:52 kernel:  do_filp_open+0x99/0x110
Apr 19 11:22:52 kernel:  ? __check_object_size+0x108/0x19e
Apr 19 11:22:52 kernel:  ? __alloc_fd+0x46/0x170
Apr 19 11:22:52 kernel:  do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  ? do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  SyS_open+0x1e/0x20
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ebfd
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a912a2bf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ebfd
Apr 19 11:22:52 kernel: RDX: 0000000000000180 RSI: 0000000000000041 RDI: 00007f9a912a2d50
Apr 19 11:22:52 kernel: RBP: 00007f99f4000f30 R08: 0000000000000000 R09: 0000000000000001
Apr 19 11:22:52 kernel: R10: 00000000000c58da R11: 0000000000000293 R12: 00007f99f4000ef0
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab44360 R15: 00007f99f4000a30
Apr 19 11:22:52 kernel: INFO: task task:14163 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14163  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? autoremove_wake_function+0x40/0x40
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? dquot_file_open+0x3d/0x50
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __mark_inode_dirty+0x176/0x370
Apr 19 11:22:52 kernel:  generic_update_time+0x7b/0xd0
Apr 19 11:22:52 kernel:  ? current_time+0x38/0x80
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  file_update_time+0xb7/0x110
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  __generic_file_write_iter+0x9d/0x1f0
Apr 19 11:22:52 kernel:  ext4_file_write_iter+0x21a/0x3c0
Apr 19 11:22:52 kernel:  ? __slab_free+0x178/0x2e0
Apr 19 11:22:52 kernel:  new_sync_write+0xd3/0x130
Apr 19 11:22:52 kernel:  __vfs_write+0x26/0x40
Apr 19 11:22:52 kernel:  vfs_write+0xb8/0x1b0
Apr 19 11:22:52 kernel:  ? do_sys_open+0x1b4/0x280
Apr 19 11:22:52 kernel:  SyS_pwrite64+0x95/0xb0
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ed23
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a902a0b90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ed23
Apr 19 11:22:52 kernel: RDX: 0000000000000018 RSI: 00007f9a902a0cc0 RDI: 0000000000000015
Apr 19 11:22:52 kernel: RBP: 00007f99ec000f30 R08: 00007f9a902a0c18 R09: 00007f9a902a0b30
Apr 19 11:22:52 kernel: R10: 00000000000000f0 R11: 0000000000000293 R12: 00007f99ec000ef0
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab14ad0 R15: 00007f99ec000a30
Apr 19 11:22:52 kernel: INFO: task task:14190 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14190  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? autoremove_wake_function+0x40/0x40
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? dquot_file_open+0x3d/0x50
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __mark_inode_dirty+0x176/0x370
Apr 19 11:22:52 kernel:  generic_update_time+0x7b/0xd0
Apr 19 11:22:52 kernel:  ? current_time+0x38/0x80
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  file_update_time+0xb7/0x110
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  __generic_file_write_iter+0x9d/0x1f0
Apr 19 11:22:52 kernel:  ext4_file_write_iter+0x21a/0x3c0
Apr 19 11:22:52 kernel:  ? __slab_free+0x9e/0x2e0
Apr 19 11:22:52 kernel:  new_sync_write+0xd3/0x130
Apr 19 11:22:52 kernel:  __vfs_write+0x26/0x40
Apr 19 11:22:52 kernel:  vfs_write+0xb8/0x1b0
Apr 19 11:22:52 kernel:  ? do_sys_open+0x1b4/0x280
Apr 19 11:22:52 kernel:  SyS_pwrite64+0x95/0xb0
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ed23
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a8b296b90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ed23
Apr 19 11:22:52 kernel: RDX: 0000000000000018 RSI: 00007f9a8b296cc0 RDI: 0000000000000012
Apr 19 11:22:52 kernel: RBP: 00007f99c4000f30 R08: 00007f9a8b296c18 R09: 00007f9a8b296b30
Apr 19 11:22:52 kernel: R10: 0000000000000120 R11: 0000000000000293 R12: 00007f99c4000ef0
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab218f0 R15: 00007f99c4000a30
Apr 19 11:22:52 kernel: INFO: task task:14203 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14203  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? schedule+0x36/0x80
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? out_of_line_wait_on_bit+0x82/0xb0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  ext4_create+0x110/0x1b0
Apr 19 11:22:52 kernel:  path_openat+0x133b/0x1450
Apr 19 11:22:52 kernel:  do_filp_open+0x99/0x110
Apr 19 11:22:52 kernel:  ? __check_object_size+0x108/0x19e
Apr 19 11:22:52 kernel:  ? __alloc_fd+0x46/0x170
Apr 19 11:22:52 kernel:  do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  ? do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  SyS_open+0x1e/0x20
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ebfd
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a89a93bf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ebfd
Apr 19 11:22:52 kernel: RDX: 0000000000000180 RSI: 0000000000000041 RDI: 00007f9a89a93d50
Apr 19 11:22:52 kernel: RBP: 00007f99c0001010 R08: 0000000000000000 R09: 0000000000000001
Apr 19 11:22:52 kernel: R10: 00000000000c9c34 R11: 0000000000000293 R12: 00007f99c0000e80
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab0d8c0 R15: 00007f99c0000a30
Apr 19 11:22:52 kernel: INFO: task task:14207 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14207  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? schedule+0x36/0x80
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? out_of_line_wait_on_bit+0x82/0xb0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  ext4_create+0x110/0x1b0
Apr 19 11:22:52 kernel:  path_openat+0x133b/0x1450
Apr 19 11:22:52 kernel:  do_filp_open+0x99/0x110
Apr 19 11:22:52 kernel:  ? __check_object_size+0x108/0x19e
Apr 19 11:22:52 kernel:  ? __alloc_fd+0x46/0x170
Apr 19 11:22:52 kernel:  do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  ? do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  SyS_open+0x1e/0x20
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ebfd
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a89292bf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ebfd
Apr 19 11:22:52 kernel: RDX: 0000000000000180 RSI: 0000000000000041 RDI: 00007f9a89292d50
Apr 19 11:22:52 kernel: RBP: 00007f99b4000f40 R08: 0000000000000000 R09: 0000000000000001
Apr 19 11:22:52 kernel: R10: 00000000000c2428 R11: 0000000000000293 R12: 00007f99b4000f00
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab07c48 R15: 00007f99b4000a60
Apr 19 11:22:52 kernel: INFO: task task:14213 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14213  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? schedule+0x36/0x80
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? out_of_line_wait_on_bit+0x82/0xb0
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  __ext4_new_inode+0x7b0/0x1420
Apr 19 11:22:52 kernel:  ext4_create+0x110/0x1b0
Apr 19 11:22:52 kernel:  path_openat+0x133b/0x1450
Apr 19 11:22:52 kernel:  do_filp_open+0x99/0x110
Apr 19 11:22:52 kernel:  ? __check_object_size+0x108/0x19e
Apr 19 11:22:52 kernel:  ? __alloc_fd+0x46/0x170
Apr 19 11:22:52 kernel:  do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  ? do_sys_open+0x12d/0x280
Apr 19 11:22:52 kernel:  SyS_open+0x1e/0x20
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ebfd
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a86a8dbf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ebfd
Apr 19 11:22:52 kernel: RDX: 0000000000000180 RSI: 0000000000000041 RDI: 00007f9a86a8dd50
Apr 19 11:22:52 kernel: RBP: 00007f99a8000f30 R08: 0000000000000000 R09: 0000000000000001
Apr 19 11:22:52 kernel: R10: 00000000000c595e R11: 0000000000000293 R12: 00007f99a8000ef0
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab2a780 R15: 00007f99a8000a30
Apr 19 11:22:52 kernel: INFO: task task:14238 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14238  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? __getblk_gfp+0x2f/0x350
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? dquot_file_open+0x3d/0x50
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __mark_inode_dirty+0x176/0x370
Apr 19 11:22:52 kernel:  generic_update_time+0x7b/0xd0
Apr 19 11:22:52 kernel:  ? current_time+0x38/0x80
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  file_update_time+0xb7/0x110
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  __generic_file_write_iter+0x9d/0x1f0
Apr 19 11:22:52 kernel:  ext4_file_write_iter+0x21a/0x3c0
Apr 19 11:22:52 kernel:  ? __slab_free+0x178/0x2e0
Apr 19 11:22:52 kernel:  new_sync_write+0xd3/0x130
Apr 19 11:22:52 kernel:  __vfs_write+0x26/0x40
Apr 19 11:22:52 kernel:  vfs_write+0xb8/0x1b0
Apr 19 11:22:52 kernel:  ? do_sys_open+0x1b4/0x280
Apr 19 11:22:52 kernel:  SyS_pwrite64+0x95/0xb0
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ed23
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a7da7bb90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ed23
Apr 19 11:22:52 kernel: RDX: 0000000000000018 RSI: 00007f9a7da7bcc0 RDI: 000000000000000c
Apr 19 11:22:52 kernel: RBP: 00007f9964001060 R08: 00007f9a7da7bc18 R09: 00007f9a7da7bb30
Apr 19 11:22:52 kernel: R10: 00000000000000c0 R11: 0000000000000293 R12: 00007f9964001020
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9aafeb08 R15: 00007f9964000b60
RIP: 0033:0x7f9a9968ebfd
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a86a8dbf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ebfd
Apr 19 11:22:52 kernel: RDX: 0000000000000180 RSI: 0000000000000041 RDI: 00007f9a86a8dd50
Apr 19 11:22:52 kernel: RBP: 00007f99a8000f30 R08: 0000000000000000 R09: 0000000000000001
Apr 19 11:22:52 kernel: R10: 00000000000c595e R11: 0000000000000293 R12: 00007f99a8000ef0
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9ab2a780 R15: 00007f99a8000a30
Apr 19 11:22:52 kernel: INFO: task task:14238 blocked for more than 120 seconds.
Apr 19 11:22:52 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 19 11:22:52 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 19 11:22:52 kernel: task            D    0 14238  14058 0x00000000
Apr 19 11:22:52 kernel: Call Trace:
Apr 19 11:22:52 kernel:  __schedule+0x3b9/0x8f0
Apr 19 11:22:52 kernel:  schedule+0x36/0x80
Apr 19 11:22:52 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 19 11:22:52 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 19 11:22:52 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 19 11:22:52 kernel:  ? __getblk_gfp+0x2f/0x350
Apr 19 11:22:52 kernel:  start_this_handle+0x103/0x3f0
Apr 19 11:22:52 kernel:  ? dquot_file_open+0x3d/0x50
Apr 19 11:22:52 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 19 11:22:52 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 19 11:22:52 kernel:  ? ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 19 11:22:52 kernel:  ext4_dirty_inode+0x32/0x70
Apr 19 11:22:52 kernel:  __mark_inode_dirty+0x176/0x370
Apr 19 11:22:52 kernel:  generic_update_time+0x7b/0xd0
Apr 19 11:22:52 kernel:  ? current_time+0x38/0x80
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  file_update_time+0xb7/0x110
Apr 19 11:22:52 kernel:  ? ext4_xattr_security_set+0x30/0x30
Apr 19 11:22:52 kernel:  __generic_file_write_iter+0x9d/0x1f0
Apr 19 11:22:52 kernel:  ext4_file_write_iter+0x21a/0x3c0
Apr 19 11:22:52 kernel:  ? __slab_free+0x178/0x2e0
Apr 19 11:22:52 kernel:  new_sync_write+0xd3/0x130
Apr 19 11:22:52 kernel:  __vfs_write+0x26/0x40
Apr 19 11:22:52 kernel:  vfs_write+0xb8/0x1b0
Apr 19 11:22:52 kernel:  ? do_sys_open+0x1b4/0x280
Apr 19 11:22:52 kernel:  SyS_pwrite64+0x95/0xb0
Apr 19 11:22:52 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 19 11:22:52 kernel: RIP: 0033:0x7f9a9968ed23
Apr 19 11:22:52 kernel: RSP: 002b:00007f9a7da7bb90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 19 11:22:52 kernel: RAX: ffffffffffffffda RBX: 00007f9a6001c7b8 RCX: 00007f9a9968ed23
Apr 19 11:22:52 kernel: RDX: 0000000000000018 RSI: 00007f9a7da7bcc0 RDI: 000000000000000c
Apr 19 11:22:52 kernel: RBP: 00007f9964001060 R08: 00007f9a7da7bc18 R09: 00007f9a7da7bb30
Apr 19 11:22:52 kernel: R10: 00000000000000c0 R11: 0000000000000293 R12: 00007f9964001020
Apr 19 11:22:52 kernel: R13: 00007f9a6001c7d0 R14: 00007f9a9aafeb08 R15: 00007f9964000b60

Nothing return by hpsa.
This stack trace has caused nothing on the server, no offline or critical disk return by hpsa utilities and to heavy load.
Comment 7 Anthony Hausman 2018-04-20 13:09:22 UTC
I had a similar stack trace:

Apr 20 14:57:18 kernel: INFO: task jbd2/sdt-8:10890 blocked for more than 120 seconds.
Apr 20 14:57:18 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 14:57:18 kernel: jbd2/sdt-8      D    0 10890      2 0x00000000
Apr 20 14:57:18 kernel: Call Trace:
Apr 20 14:57:18 kernel:  __schedule+0x3b9/0x8f0
Apr 20 14:57:18 kernel:  schedule+0x36/0x80
Apr 20 14:57:18 kernel:  jbd2_journal_commit_transaction+0x241/0x1830
Apr 20 14:57:18 kernel:  ? update_load_avg+0x84/0x560
Apr 20 14:57:18 kernel:  ? update_load_avg+0x84/0x560
Apr 20 14:57:18 kernel:  ? dequeue_entity+0xed/0x4c0
Apr 20 14:57:18 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 20 14:57:18 kernel:  ? lock_timer_base+0x7d/0xa0
Apr 20 14:57:18 kernel:  kjournald2+0xca/0x250
Apr 20 14:57:18 kernel:  ? kjournald2+0xca/0x250
Apr 20 14:57:18 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 20 14:57:18 kernel:  kthread+0x109/0x140
Apr 20 14:57:18 kernel:  ? commit_timeout+0x10/0x10
Apr 20 14:57:18 kernel:  ? kthread_create_on_node+0x70/0x70
Apr 20 14:57:18 kernel:  ret_from_fork+0x25/0x30
Apr 20 14:57:18 kernel: INFO: task task:13497 blocked for more than 120 seconds.
Apr 20 14:57:18 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 14:57:18 kernel: task            D    0 13497  13196 0x00000000
Apr 20 14:57:18 kernel: Call Trace:
Apr 20 14:57:18 kernel:  __schedule+0x3b9/0x8f0
Apr 20 14:57:18 kernel:  schedule+0x36/0x80
Apr 20 14:57:18 kernel:  rwsem_down_write_failed+0x237/0x3b0
Apr 20 14:57:18 kernel:  ? copy_page_to_iter_iovec+0x97/0x170
Apr 20 14:57:18 kernel:  call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  ? call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  down_write+0x2d/0x40
Apr 20 14:57:18 kernel:  ext4_file_write_iter+0x70/0x3c0
Apr 20 14:57:18 kernel:  ? futex_wake+0x90/0x170
Apr 20 14:57:18 kernel:  new_sync_write+0xd3/0x130
Apr 20 14:57:18 kernel:  __vfs_write+0x26/0x40
Apr 20 14:57:18 kernel:  vfs_write+0xb8/0x1b0
Apr 20 14:57:18 kernel:  SyS_pwrite64+0x95/0xb0
Apr 20 14:57:18 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 20 14:57:18 kernel: RIP: 0033:0x7fa085d92d23
Apr 20 14:57:18 kernel: RSP: 002b:00007fa0801acc90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 20 14:57:18 kernel: RAX: ffffffffffffffda RBX: 00007fa0480009d0 RCX: 00007fa085d92d23
Apr 20 14:57:18 kernel: RDX: 0000000000000200 RSI: 00007fa004000b30 RDI: 000000000000000f
Apr 20 14:57:18 kernel: RBP: 00007fa0801ad060 R08: 00007fa0801acd2c R09: 0000000000000001
Apr 20 14:57:18 kernel: R10: 00000001f86be000 R11: 0000000000000293 R12: 00007fa0040014c0
Apr 20 14:57:18 kernel: R13: 00007fa004000d80 R14: 000000000000002e R15: 00007fa0480009d0
Apr 20 14:57:18 kernel: INFO: task task:13499 blocked for more than 120 seconds.
Apr 20 14:57:18 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 14:57:18 kernel: task            D    0 13499  13196 0x00000000
Apr 20 14:57:18 kernel: Call Trace:
Apr 20 14:57:18 kernel:  __schedule+0x3b9/0x8f0
Apr 20 14:57:18 kernel:  schedule+0x36/0x80
Apr 20 14:57:18 kernel:  rwsem_down_write_failed+0x237/0x3b0
Apr 20 14:57:18 kernel:  ? copy_page_to_iter_iovec+0x97/0x170
Apr 20 14:57:18 kernel:  call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  ? call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  down_write+0x2d/0x40
Apr 20 14:57:18 kernel:  ext4_file_write_iter+0x70/0x3c0
Apr 20 14:57:18 kernel:  ? futex_wake+0x90/0x170
Apr 20 14:57:18 kernel:  new_sync_write+0xd3/0x130
Apr 20 14:57:18 kernel:  __vfs_write+0x26/0x40
Apr 20 14:57:18 kernel:  vfs_write+0xb8/0x1b0
Apr 20 14:57:18 kernel:  SyS_pwrite64+0x95/0xb0
Apr 20 14:57:18 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 20 14:57:18 kernel: RIP: 0033:0x7fa085d92d23
Apr 20 14:57:18 kernel: RSP: 002b:00007fa07f9abc90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 20 14:57:18 kernel: RAX: ffffffffffffffda RBX: 00007f9fac008d00 RCX: 00007fa085d92d23
Apr 20 14:57:18 kernel: RDX: 0000000000000200 RSI: 00007fa0080013b0 RDI: 000000000000000f
Apr 20 14:57:18 kernel: RBP: 00007fa07f9ac060 R08: 00007fa07f9abd2c R09: 0000000000000001
Apr 20 14:57:18 kernel: R10: 0000000219541000 R11: 0000000000000293 R12: 00007fa008001140
Apr 20 14:57:18 kernel: R13: 00007fa0080008c0 R14: 000000000000002e R15: 00007f9fac008d00
Apr 20 14:57:18 kernel: INFO: task task:13510 blocked for more than 120 seconds.
Apr 20 14:57:18 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 14:57:18 kernel: task            D    0 13510  13196 0x00000000
Apr 20 14:57:18 kernel: Call Trace:
Apr 20 14:57:18 kernel:  __schedule+0x3b9/0x8f0
Apr 20 14:57:18 kernel:  schedule+0x36/0x80
Apr 20 14:57:18 kernel:  rwsem_down_write_failed+0x237/0x3b0
Apr 20 14:57:18 kernel:  ? copy_page_to_iter_iovec+0x97/0x170
Apr 20 14:57:18 kernel:  call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  ? call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  down_write+0x2d/0x40
Apr 20 14:57:18 kernel:  ext4_file_write_iter+0x70/0x3c0
Apr 20 14:57:18 kernel:  ? futex_wake+0x90/0x170
Apr 20 14:57:18 kernel:  new_sync_write+0xd3/0x130
Apr 20 14:57:18 kernel:  __vfs_write+0x26/0x40
Apr 20 14:57:18 kernel:  vfs_write+0xb8/0x1b0
Apr 20 14:57:18 kernel:  SyS_pwrite64+0x95/0xb0
Apr 20 14:57:18 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 20 14:57:18 kernel: RIP: 0033:0x7fa085d92d23
Apr 20 14:57:18 kernel: RSP: 002b:00007fa07a9a1c90 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 20 14:57:18 kernel: RAX: ffffffffffffffda RBX: 00007f9fe0007700 RCX: 00007fa085d92d23
Apr 20 14:57:18 kernel: RDX: 0000000000000200 RSI: 00007f9fe0000b30 RDI: 000000000000000f
Apr 20 14:57:18 kernel: RBP: 00007f9fe0007970 R08: 00007fa07a9a1d2c R09: 0000000000000001
Apr 20 14:57:18 kernel: R10: 00000001b890c000 R11: 0000000000000293 R12: 0000000000000270
Apr 20 14:57:18 kernel: R13: 0000000000000060 R14: 00007f9fe0007ec0 R15: 00007f9fe0000020
Apr 20 14:57:18 kernel: INFO: task task:13585 blocked for more than 120 seconds.
Apr 20 14:57:18 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 14:57:18 kernel: task            D    0 13585  13196 0x00000000
Apr 20 14:57:18 kernel: Call Trace:
Apr 20 14:57:18 kernel:  __schedule+0x3b9/0x8f0
Apr 20 14:57:18 kernel:  ? __dev_queue_xmit+0x268/0x680
Apr 20 14:57:18 kernel:  schedule+0x36/0x80
Apr 20 14:57:18 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 20 14:57:18 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 20 14:57:18 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 20 14:57:18 kernel:  start_this_handle+0x103/0x3f0
Apr 20 14:57:18 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 20 14:57:18 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 20 14:57:18 kernel:  ? ext4_dirty_inode+0x32/0x70
Apr 20 14:57:18 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 20 14:57:18 kernel:  ext4_dirty_inode+0x32/0x70
Apr 20 14:57:18 kernel:  __mark_inode_dirty+0x176/0x370
Apr 20 14:57:18 kernel:  generic_update_time+0x7b/0xd0
Apr 20 14:57:18 kernel:  ? current_time+0x38/0x80
Apr 20 14:57:18 kernel:  file_update_time+0xb7/0x110
Apr 20 14:57:18 kernel:  __generic_file_write_iter+0x9d/0x1f0
Apr 20 14:57:18 kernel:  ext4_file_write_iter+0x21a/0x3c0
Apr 20 14:57:18 kernel:  ? futex_wake+0x90/0x170
Apr 20 14:57:18 kernel:  new_sync_write+0xd3/0x130
Apr 20 14:57:18 kernel:  __vfs_write+0x26/0x40
Apr 20 14:57:18 kernel:  vfs_write+0xb8/0x1b0
Apr 20 14:57:18 kernel:  ? SyS_futex+0x7f/0x180
Apr 20 14:57:18 kernel:  SyS_pwrite64+0x95/0xb0
Apr 20 14:57:18 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 20 14:57:18 kernel: RIP: 0033:0x7fa085d92d23
Apr 20 14:57:18 kernel: RSP: 002b:00007fa06c184d70 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 20 14:57:18 kernel: RAX: ffffffffffffffda RBX: 00007f9f640014e0 RCX: 00007fa085d92d23
Apr 20 14:57:18 kernel: RDX: 0000000000000200 RSI: 00007f9f64001750 RDI: 000000000000000f
Apr 20 14:57:18 kernel: RBP: 00007f9f640017b8 R08: 00007fa06c184e0c R09: 0000000000000001
Apr 20 14:57:18 kernel: R10: 0000000166f7b000 R11: 0000000000000293 R12: 0000000000000024
Apr 20 14:57:18 kernel: R13: 0000005d6f4d6d18 R14: 000000000000001c R15: 00007f9f8c005d60
Apr 20 14:57:18 kernel: INFO: task statsync:17472 blocked for more than 120 seconds.
Apr 20 14:57:18 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 14:57:18 kernel: statsync        D    0 17472  13196 0x00000000
Apr 20 14:57:18 kernel: Call Trace:
Apr 20 14:57:18 kernel:  __schedule+0x3b9/0x8f0
Apr 20 14:57:18 kernel:  schedule+0x36/0x80
Apr 20 14:57:18 kernel:  rwsem_down_write_failed+0x237/0x3b0
Apr 20 14:57:18 kernel:  call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  ? call_rwsem_down_write_failed+0x17/0x30
Apr 20 14:57:18 kernel:  down_write+0x2d/0x40
Apr 20 14:57:18 kernel:  ext4_file_write_iter+0x70/0x3c0
Apr 20 14:57:18 kernel:  new_sync_write+0xd3/0x130
Apr 20 14:57:18 kernel:  __vfs_write+0x26/0x40
Apr 20 14:57:18 kernel:  vfs_write+0xb8/0x1b0
Apr 20 14:57:18 kernel:  ? SyS_futex+0x7f/0x180
Apr 20 14:57:18 kernel:  SyS_pwrite64+0x95/0xb0
Apr 20 14:57:18 kernel:  entry_SYSCALL_64_fastpath+0x1e/0xad
Apr 20 14:57:18 kernel: RIP: 0033:0x7fa085d92d23
Apr 20 14:57:18 kernel: RSP: 002b:00007fa050ff8020 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Apr 20 14:57:18 kernel: RAX: ffffffffffffffda RBX: 00007fa050ff80c0 RCX: 00007fa085d92d23
Apr 20 14:57:18 kernel: RDX: 0000000000000038 RSI: 00007fa050ff8130 RDI: 000000000000000f
Apr 20 14:57:18 kernel: RBP: 00007fa04c00fcd0 R08: 00007fa050ff80bc R09: 0000000000000001
Apr 20 14:57:18 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007fa04c0008c0
Apr 20 14:57:18 kernel: R13: 00007fa050ff80c0 R14: 00007fa04c01b560 R15: 00007fa04c000998
Apr 20 14:57:18 kernel: INFO: task kworker/u130:6:47634 blocked for more than 120 seconds.
Apr 20 14:57:18 kernel:       Tainted: G           OE   4.11.0-14-generic #20~16.04.1-Ubuntu
Apr 20 14:57:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 14:57:18 kernel: kworker/u130:6  D    0 47634      2 0x00000000
Apr 20 14:57:18 kernel: Workqueue: writeback wb_workfn (flush-65:48)
Apr 20 14:57:18 kernel: Call Trace:
Apr 20 14:57:18 kernel:  __schedule+0x3b9/0x8f0
Apr 20 14:57:18 kernel:  ? blk_queue_bio+0x1df/0x430
Apr 20 14:57:18 kernel:  schedule+0x36/0x80
Apr 20 14:57:18 kernel:  wait_transaction_locked+0x8a/0xd0
Apr 20 14:57:18 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 20 14:57:18 kernel:  add_transaction_credits+0x1c1/0x2a0
Apr 20 14:57:18 kernel:  ? __slab_alloc+0x20/0x40
Apr 20 14:57:18 kernel:  start_this_handle+0x103/0x3f0
Apr 20 14:57:18 kernel:  ? mempool_alloc+0x6e/0x170
Apr 20 14:57:18 kernel:  ? kmem_cache_alloc+0xd7/0x1b0
Apr 20 14:57:18 kernel:  jbd2__journal_start+0xdb/0x1f0
Apr 20 14:57:18 kernel:  ? ext4_writepages+0x4e6/0xe20
Apr 20 14:57:18 kernel:  __ext4_journal_start_sb+0x6d/0x120
Apr 20 14:57:18 kernel:  ext4_writepages+0x4e6/0xe20
Apr 20 14:57:18 kernel:  ? swiotlb_unmap_sg_attrs+0x40/0x70
Apr 20 14:57:18 kernel:  ? scsi_dma_map+0x98/0xc0
Apr 20 14:57:18 kernel:  ? __enqueue_cmd_and_start_io.isra.41+0x77/0x150 [hpsa]
Apr 20 14:57:18 kernel:  ? enqueue_cmd_and_start_io+0x18/0x80 [hpsa]
Apr 20 14:57:18 kernel:  ? hpsa_ciss_submit+0x31b/0x400 [hpsa]
Apr 20 14:57:18 kernel:  do_writepages+0x1e/0x30
Apr 20 14:57:18 kernel:  ? do_writepages+0x1e/0x30
Apr 20 14:57:18 kernel:  __writeback_single_inode+0x45/0x330
Apr 20 14:57:18 kernel:  writeback_sb_inodes+0x26a/0x5f0
Apr 20 14:57:18 kernel:  __writeback_inodes_wb+0x92/0xc0
Apr 20 14:57:18 kernel:  wb_writeback+0x26e/0x320
Apr 20 14:57:18 kernel:  wb_workfn+0x2cf/0x3a0
Apr 20 14:57:18 kernel:  ? wb_workfn+0x2cf/0x3a0
Apr 20 14:57:18 kernel:  process_one_work+0x16b/0x4a0
Apr 20 14:57:18 kernel:  worker_thread+0x4b/0x500
Apr 20 14:57:18 kernel:  kthread+0x109/0x140
Apr 20 14:57:18 kernel:  ? process_one_work+0x4a0/0x4a0
Apr 20 14:57:18 kernel:  ? kthread_create_on_node+0x70/0x70
Apr 20 14:57:18 kernel:  ret_from_fork+0x25/0x30

And once again, it has caused nothing on the server, no offline or critical disk return by hpsa utilities or heavy load.
But it directly concerns the hpsa module and the workqueue.
Comment 8 Anthony Hausman 2018-04-21 08:12:48 UTC
So I have reproduced the problem with the patch driver.
At the beginning, one disk return a lot of blk_update_request: critical medium error/Unrecovered read error and after the driver trigger a reset logical on all disk.

At the first trigger, all reset completed successfully but the third reset on the problematic error disk the system hang out and the reset never complete.

The load on the server is less important at that time but application seems to stuck their IO still.
And the faulty disk is still considered healthy via the hp utilitues (ssacli).

Here is the stack trace:

[Fri Apr 20 20:56:58 2018] sd 0:1:0:15: [sdp] Unaligned partial completion (resid=32, sector_sz=512)
[Fri Apr 20 20:56:58 2018] sd 0:1:0:15: [sdp] tag#50 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Fri Apr 20 20:56:58 2018] sd 0:1:0:15: [sdp] tag#50 Sense Key : Medium Error [current] 
[Fri Apr 20 20:56:58 2018] sd 0:1:0:15: [sdp] tag#50 Add. Sense: Unrecovered read error
[Fri Apr 20 20:56:58 2018] sd 0:1:0:15: [sdp] tag#50 CDB: Read(16) 88 00 00 00 00 02 36 46 b5 a8 00 00 04 00 00 00
[Fri Apr 20 20:56:58 2018] blk_update_request: critical medium error, dev sdp, sector 9500538280
[Fri Apr 20 20:57:30 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 20:59:06 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 20:59:06 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 21:00:05 2018] sd 0:1:0:15: [sdp] Unaligned partial completion (resid=198, sector_sz=512)
[Fri Apr 20 21:00:05 2018] sd 0:1:0:15: [sdp] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Fri Apr 20 21:00:05 2018] sd 0:1:0:15: [sdp] tag#7 Sense Key : Medium Error [current] 
[Fri Apr 20 21:00:05 2018] sd 0:1:0:15: [sdp] tag#7 Add. Sense: Unrecovered read error
[Fri Apr 20 21:00:05 2018] sd 0:1:0:15: [sdp] tag#7 CDB: Read(16) 88 00 00 00 00 02 36 46 b9 a8 00 00 04 00 00 00
[Fri Apr 20 21:00:05 2018] blk_update_request: critical medium error, dev sdp, sector 9500539304
[Fri Apr 20 21:00:56 2018] sd 0:1:0:15: [sdp] Unaligned partial completion (resid=48, sector_sz=512)
[Fri Apr 20 21:00:56 2018] sd 0:1:0:15: [sdp] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Fri Apr 20 21:00:56 2018] sd 0:1:0:15: [sdp] tag#2 Sense Key : Medium Error [current] 
[Fri Apr 20 21:00:56 2018] sd 0:1:0:15: [sdp] tag#2 Add. Sense: Unrecovered read error
[Fri Apr 20 21:00:56 2018] sd 0:1:0:15: [sdp] tag#2 CDB: Read(16) 88 00 00 00 00 02 36 46 a9 a8 00 00 04 00 00 00
[Fri Apr 20 21:00:56 2018] blk_update_request: critical medium error, dev sdp, sector 9500535208
[Fri Apr 20 21:09:59 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 21:48:43 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 21:48:43 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 21:51:44 2018] hpsa 0000:08:00.0: scsi 0:1:0:0: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:05 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:05 2018] hpsa 0000:08:00.0: scsi 0:1:0:0: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:05 2018] hpsa 0000:08:00.0: scsi 0:1:0:1: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:06 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:06 2018] hpsa 0000:08:00.0: scsi 0:1:0:1: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:06 2018] hpsa 0000:08:00.0: scsi 0:1:0:2: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:07 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:07 2018] hpsa 0000:08:00.0: scsi 0:1:0:2: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:07 2018] hpsa 0000:08:00.0: scsi 0:1:0:3: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:08 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:08 2018] hpsa 0000:08:00.0: scsi 0:1:0:3: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:08 2018] hpsa 0000:08:00.0: scsi 0:1:0:4: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:09 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:09 2018] hpsa 0000:08:00.0: scsi 0:1:0:4: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:09 2018] hpsa 0000:08:00.0: scsi 0:1:0:6: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:10 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:10 2018] hpsa 0000:08:00.0: scsi 0:1:0:6: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:10 2018] hpsa 0000:08:00.0: scsi 0:1:0:7: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:11 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:11 2018] hpsa 0000:08:00.0: scsi 0:1:0:7: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:11 2018] hpsa 0000:08:00.0: scsi 0:1:0:8: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:12 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:12 2018] hpsa 0000:08:00.0: scsi 0:1:0:8: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:12 2018] hpsa 0000:08:00.0: scsi 0:1:0:9: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:13 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:13 2018] hpsa 0000:08:00.0: scsi 0:1:0:9: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:13 2018] hpsa 0000:08:00.0: scsi 0:1:0:10: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:14 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:14 2018] hpsa 0000:08:00.0: scsi 0:1:0:10: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:14 2018] hpsa 0000:08:00.0: scsi 0:1:0:11: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:15 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:15 2018] hpsa 0000:08:00.0: scsi 0:1:0:11: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:15 2018] hpsa 0000:08:00.0: scsi 0:1:0:12: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:16 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:16 2018] hpsa 0000:08:00.0: scsi 0:1:0:12: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:16 2018] hpsa 0000:08:00.0: scsi 0:1:0:13: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:17 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:17 2018] hpsa 0000:08:00.0: scsi 0:1:0:13: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:17 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:18 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:18 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:18 2018] hpsa 0000:08:00.0: scsi 0:1:0:16: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:19 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:19 2018] hpsa 0000:08:00.0: scsi 0:1:0:16: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:19 2018] hpsa 0000:08:00.0: scsi 0:1:0:17: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:20 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:20 2018] hpsa 0000:08:00.0: scsi 0:1:0:17: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:20 2018] hpsa 0000:08:00.0: scsi 0:1:0:18: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:21 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:21 2018] hpsa 0000:08:00.0: scsi 0:1:0:18: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:21 2018] hpsa 0000:08:00.0: scsi 0:1:0:19: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:22 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:22 2018] hpsa 0000:08:00.0: scsi 0:1:0:19: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:22 2018] hpsa 0000:08:00.0: scsi 0:1:0:20: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:23 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:23 2018] hpsa 0000:08:00.0: scsi 0:1:0:20: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:23 2018] hpsa 0000:08:00.0: scsi 0:1:0:21: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:24 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:24 2018] hpsa 0000:08:00.0: scsi 0:1:0:21: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:24 2018] hpsa 0000:08:00.0: scsi 0:1:0:22: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:25 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:25 2018] hpsa 0000:08:00.0: scsi 0:1:0:22: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:25 2018] hpsa 0000:08:00.0: scsi 0:1:0:23: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:26 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:26 2018] hpsa 0000:08:00.0: scsi 0:1:0:23: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:26 2018] hpsa 0000:08:00.0: scsi 0:1:0:25: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:27 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:27 2018] hpsa 0000:08:00.0: scsi 0:1:0:25: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:27 2018] hpsa 0000:08:00.0: scsi 0:1:0:26: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:28 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:28 2018] hpsa 0000:08:00.0: scsi 0:1:0:26: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:28 2018] hpsa 0000:08:00.0: scsi 0:1:0:28: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:29 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:29 2018] hpsa 0000:08:00.0: scsi 0:1:0:28: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:29 2018] hpsa 0000:08:00.0: scsi 0:1:0:29: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:30 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:30 2018] hpsa 0000:08:00.0: scsi 0:1:0:29: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:30 2018] hpsa 0000:08:00.0: scsi 0:1:0:30: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:31 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:31 2018] hpsa 0000:08:00.0: scsi 0:1:0:30: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:31 2018] hpsa 0000:08:00.0: scsi 0:1:0:31: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:32 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:32 2018] hpsa 0000:08:00.0: scsi 0:1:0:31: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:32 2018] hpsa 0000:08:00.0: scsi 0:1:0:32: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:33 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:33 2018] hpsa 0000:08:00.0: scsi 0:1:0:32: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:33 2018] hpsa 0000:08:00.0: scsi 0:1:0:33: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:34 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:34 2018] hpsa 0000:08:00.0: scsi 0:1:0:33: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:34 2018] hpsa 0000:08:00.0: scsi 0:1:0:34: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:35 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:35 2018] hpsa 0000:08:00.0: scsi 0:1:0:34: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:35 2018] hpsa 0000:08:00.0: scsi 0:1:0:35: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:36 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:36 2018] hpsa 0000:08:00.0: scsi 0:1:0:35: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:36 2018] hpsa 0000:08:00.0: scsi 0:1:0:36: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:37 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:37 2018] hpsa 0000:08:00.0: scsi 0:1:0:36: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:37 2018] hpsa 0000:08:00.0: scsi 0:1:0:37: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:39 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:39 2018] hpsa 0000:08:00.0: scsi 0:1:0:37: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:39 2018] hpsa 0000:08:00.0: scsi 0:1:0:39: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:40 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:40 2018] hpsa 0000:08:00.0: scsi 0:1:0:39: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:40 2018] hpsa 0000:08:00.0: scsi 0:1:0:40: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:41 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:41 2018] hpsa 0000:08:00.0: scsi 0:1:0:40: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:41 2018] hpsa 0000:08:00.0: scsi 0:1:0:41: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:42 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:42 2018] hpsa 0000:08:00.0: scsi 0:1:0:41: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:42 2018] hpsa 0000:08:00.0: scsi 0:1:0:42: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:43 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:43 2018] hpsa 0000:08:00.0: scsi 0:1:0:42: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:43 2018] hpsa 0000:08:00.0: scsi 0:1:0:43: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:44 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:44 2018] hpsa 0000:08:00.0: scsi 0:1:0:43: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:44 2018] hpsa 0000:08:00.0: scsi 0:1:0:44: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:45 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:45 2018] hpsa 0000:08:00.0: scsi 0:1:0:44: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:45 2018] hpsa 0000:08:00.0: scsi 0:1:0:45: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:46 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:46 2018] hpsa 0000:08:00.0: scsi 0:1:0:45: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:46 2018] hpsa 0000:08:00.0: scsi 0:1:0:47: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:47 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:47 2018] hpsa 0000:08:00.0: scsi 0:1:0:47: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:47 2018] hpsa 0000:08:00.0: scsi 0:1:0:48: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:48 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:48 2018] hpsa 0000:08:00.0: scsi 0:1:0:48: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:48 2018] hpsa 0000:08:00.0: scsi 0:1:0:49: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:49 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:49 2018] hpsa 0000:08:00.0: scsi 0:1:0:49: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:49 2018] hpsa 0000:08:00.0: scsi 0:1:0:50: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:50 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:50 2018] hpsa 0000:08:00.0: scsi 0:1:0:50: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:50 2018] hpsa 0000:08:00.0: scsi 0:1:0:51: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:51 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:51 2018] hpsa 0000:08:00.0: scsi 0:1:0:51: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:51 2018] hpsa 0000:08:00.0: scsi 0:1:0:52: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:52 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:52 2018] hpsa 0000:08:00.0: scsi 0:1:0:52: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:52 2018] hpsa 0000:08:00.0: scsi 0:1:0:54: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:53 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:53 2018] hpsa 0000:08:00.0: scsi 0:1:0:54: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:53 2018] hpsa 0000:08:00.0: scsi 0:1:0:55: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:54 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:54 2018] hpsa 0000:08:00.0: scsi 0:1:0:55: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:54 2018] hpsa 0000:08:00.0: scsi 0:1:0:56: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:55 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:55 2018] hpsa 0000:08:00.0: scsi 0:1:0:56: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:55 2018] hpsa 0000:08:00.0: scsi 0:1:0:57: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:56 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:56 2018] hpsa 0000:08:00.0: scsi 0:1:0:57: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:56 2018] hpsa 0000:08:00.0: scsi 0:1:0:58: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:57 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:57 2018] hpsa 0000:08:00.0: scsi 0:1:0:58: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:57 2018] hpsa 0000:08:00.0: scsi 0:1:0:59: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:14:58 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:14:58 2018] hpsa 0000:08:00.0: scsi 0:1:0:59: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:18:05 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:51:00 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 22:51:00 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 22:53:25 2018] sd 0:1:0:15: [sdp] tag#164 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Fri Apr 20 22:53:25 2018] sd 0:1:0:15: [sdp] tag#164 Sense Key : Medium Error [current] 
[Fri Apr 20 22:53:25 2018] sd 0:1:0:15: [sdp] tag#164 Add. Sense: Unrecovered read error
[Fri Apr 20 22:53:25 2018] sd 0:1:0:15: [sdp] tag#164 CDB: Read(16) 88 00 00 00 00 02 87 8d b4 e0 00 00 04 00 00 00
[Fri Apr 20 22:53:25 2018] blk_update_request: critical medium error, dev sdp, sector 10864145632
[Fri Apr 20 22:55:11 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 23:12:20 2018] hpsa 0000:08:00.0: device is ready.
[Fri Apr 20 23:12:20 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[Fri Apr 20 23:14:28 2018] sd 0:1:0:15: [sdp] Unaligned partial completion (resid=48, sector_sz=512)
[Fri Apr 20 23:14:28 2018] sd 0:1:0:15: [sdp] tag#25 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Fri Apr 20 23:14:28 2018] sd 0:1:0:15: [sdp] tag#25 Sense Key : Medium Error [current] 
[Fri Apr 20 23:14:28 2018] sd 0:1:0:15: [sdp] tag#25 Add. Sense: Unrecovered read error
[Fri Apr 20 23:14:28 2018] sd 0:1:0:15: [sdp] tag#25 CDB: Read(16) 88 00 00 00 00 02 87 8d b8 e0 00 00 04 00 00 00
[Fri Apr 20 23:14:28 2018] blk_update_request: critical medium error, dev sdp, sector 10864146656
[Fri Apr 20 23:16:15 2018] hpsa 0000:08:00.0: scsi 0:1:0:15: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
Comment 9 Don 2018-04-21 23:28:04 UTC
When you applied the 4.16 hpsa driver patches, was this patch also applied?

commit 84676c1f21e8ff54befe985f4f14dc1edc10046b
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri Jan 12 10:53:05 2018 +0800

    genirq/affinity: assign vectors to all possible CPUs
    
    Currently we assign managed interrupt vectors to all present CPUs.  This
    works fine for systems were we only online/offline CPUs.  But in case of
    systems that support physical CPU hotplug (or the virtualized version of
    it) this means the additional CPUs covered for in the ACPI tables or on
    the command line are not catered for.  To fix this we'd either need to
    introduce new hotplug CPU states just for this case, or we can start
    assining vectors to possible but not present CPUs.
    
    Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Tested-by: Stefan Haberland <sth@linux.vnet.ibm.com>
    Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
    Cc: linux-kernel@vger.kernel.org
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>


The above patch is why the hpsa-fix-selection-of-reply-queue patch was needed.
If not, I would redact that patch because it may be causing your issues.


There was another patch required for the hpsa-fix-selection-of-reply-queue patch:
scsi-introduce-force-blk-mq.


The errors shown in your logs indicate issues with DMA transfers of your data.
Unaligned partial completion errors are usually issues with the scatter/gather
buffers that represent your data buffers. I would like to eliminate using
the 4.16 hpsa driver in a 4.11 kernel.

Can you try our out-of-box driver?

I'll attach this to the BZ. You compile it with make -f Makefile.alt
The name is hpsa-3.4.20-136.tar.bz2



--------

commit 8b834bff1b73dce46f4e9f5e84af6f73fed8b0ef
Author: Ming Lei <ming.lei@redhat.com>
Date:   Tue Mar 13 17:42:39 2018 +0800

    scsi: hpsa: fix selection of reply queue
    
    Since commit 84676c1f21e8 ("genirq/affinity: assign vectors to all
    possible CPUs") we could end up with an MSI-X vector that did not have
    any online CPUs mapped. This would lead to I/O hangs since there was no
    CPU to receive the completion.
    
    Retrieve IRQ affinity information using pci_irq_get_affinity() and use
    this mapping to choose a reply queue.
    
    [mkp: tweaked commit desc]
    
    Cc: Hannes Reinecke <hare@suse.de>
    Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
    Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
    Cc: Christoph Hellwig <hch@lst.de>,
    Cc: Don Brace <don.brace@microsemi.com>
    Cc: Kashyap Desai <kashyap.desai@broadcom.com>
    Cc: Laurence Oberman <loberman@redhat.com>
    Cc: Meelis Roos <mroos@linux.ee>
    Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>
    Cc: Mike Snitzer <snitzer@redhat.com>
    Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Tested-by: Laurence Oberman <loberman@redhat.com>
    Tested-by: Don Brace <don.brace@microsemi.com>
    Tested-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
    Acked-by: Don Brace <don.brace@microsemi.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>



I believe this patch is also required.

commit cf2a0ce8d1c25c8cc4509874d270be8fc6026cc3
Author: Ming Lei <ming.lei@redhat.com>
Date:   Tue Mar 13 17:42:41 2018 +0800

    scsi: introduce force_blk_mq
    
    From scsi driver view, it is a bit troublesome to support both blk-mq
    and non-blk-mq at the same time, especially when drivers need to support
    multi hw-queue.
    
    This patch introduces 'force_blk_mq' to scsi_host_template so that drivers
    can provide blk-mq only support, so driver code can avoid the trouble
    for supporting both.
    
    Cc: Omar Sandoval <osandov@fb.com>,
    Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
    Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
    Cc: Christoph Hellwig <hch@lst.de>,
    Cc: Don Brace <don.brace@microsemi.com>
    Cc: Kashyap Desai <kashyap.desai@broadcom.com>
    Cc: Mike Snitzer <snitzer@redhat.com>
    Cc: Laurence Oberman <loberman@redhat.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
Comment 10 Don 2018-04-21 23:31:19 UTC
Created attachment 275473 [details]
Latest out of box hpsa driver.

This tar file contains our latest out-of-box driver.

1. tar xf hpsa-3.4.20-136.tar.bz2
2. cd hpsa-3.4.20/drivers/scsi
3. make -f Makefile.alt

If you are booted from hpsa, you will need to update your initrd and reboot.

If you are using hpsa for non-boot drives, your can
1. rmmod hpsa
2. insmod ./hpsa.ko
Comment 11 Anthony Hausman 2018-04-22 10:12:57 UTC
The only patch that I'm sure that I have is the "scsi: hpsa: fix selection of reply queue" one.
For the I'm using an out of the box 4.11 kernel. So I'm really not sure that the other patches are present.


Unfortunately, the module does not compile using 4.11.0-14-generic headers.

# make -C /lib/modules/4.11.0-14-generic/build M=$(pwd) --makefile="/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt"
make: Entering directory '/usr/src/linux-headers-4.11.0-14-generic'
make -C /lib/modules/4.4.0-96-generic/build M=/usr/src/linux-headers-4.11.0-14-generic EXTRA_CFLAGS+=-DKCLASS4A modules
make[1]: Entering directory '/usr/src/linux-headers-4.4.0-96-generic'
make[2]: *** No rule to make target 'kernel/bounds.c', needed by 'kernel/bounds.s'.  Stop.
Makefile:1423: recipe for target '_module_/usr/src/linux-headers-4.11.0-14-generic' failed
make[1]: *** [_module_/usr/src/linux-headers-4.11.0-14-generic] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-96-generic'
/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt:96: recipe for target 'default' failed
make: *** [default] Error 2
make: Leaving directory '/usr/src/linux-headers-4.11.0-14-generic'

But if you tell me the principal problem is using the 4.11 kernel, I can upgrade it to use the 4.16.3 kernel.

If I use it, must I use the out of box 3.4.20-136 hpsa driver or use your precedent patch on the last 3.4.20-125?
Comment 12 loberman 2018-04-22 18:38:13 UTC
We had a bunch of issues with the HPSA as already mentioned above.
The specific issue that we had to revert was this commit
8b834bff1b73dce46f4e9f5e84af6f73fed8b0ef


I assume your array has a charged battery (capacitor) and the writeback-cache is enabled on the 420i

Are you only seeing this wen you have cmaeventd running, because hat can use pass through commands and has been known to cause issues.
I am not running any of the HPE Proliant SPP daemons on my system.

I have not seen this load related issue (without those daemons running) that you are seeing on my DL380G7 or Dl380G8 here so I will work on trying to reproduce and assist.

Thanks
Laurence
Comment 13 loberman 2018-04-22 19:20:26 UTC
Apr 18 01:29:16 kernel: cmaidad         D    0  3442      1 0x00000000
Apr 18 01:29:16 kernel: Call Trace:
Apr 18 01:29:16 kernel:  __schedule+0x3b9/0x8f0
Apr 18 01:29:16 kernel:  schedule+0x36/0x80
Apr 18 01:29:16 kernel:  scsi_block_when_processing_errors+0xd5/0x110
Apr 18 01:29:16 kernel:  ? wake_atomic_t_function+0x60/0x60
Apr 18 01:29:16 kernel:  sg_open+0x14a/0x5c0

                         ***** Likely a pass though from the cma* management daemons

Can you try reproduce with all the HP Health daemons disabled
Comment 14 Anthony Hausman 2018-04-23 09:07:51 UTC
Indeed I have a charged battery (capacitor) and the writeback-cache enabled.
I run the hp-health component too, I have already try to disable it on the 4.11 kernel and have reproduced the problem of load without it.

The cma related call trace up after the logical drive reset is called.

Right now, I test on a server the kernel 4.16.3-041603-generic with the hpsa module with the patch to use local work-queue insead of system work-queue.

Right now I didn't reproduce the problem.
I had a disk with bad blocks (before launching a read-only test badblocks returned a lot of block error) but since I have upgraded the kernel with the patch hpsa module I have no more error.

I'm still trying to reproduce the problem by launching a badblocks read-only test on the "ex-faulty" disk.

I'll tell you the result of the test.
Comment 15 Don 2018-04-23 20:36:27 UTC
(In reply to Anthony Hausman from comment #11)
> The only patch that I'm sure that I have is the "scsi: hpsa: fix selection
> of reply queue" one.
> For the I'm using an out of the box 4.11 kernel. So I'm really not sure that
> the other patches are present.
> 
> 
> Unfortunately, the module does not compile using 4.11.0-14-generic headers.
> 
> # make -C /lib/modules/4.11.0-14-generic/build M=$(pwd)
> --makefile="/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt"
> make: Entering directory '/usr/src/linux-headers-4.11.0-14-generic'
> make -C /lib/modules/4.4.0-96-generic/build
> M=/usr/src/linux-headers-4.11.0-14-generic EXTRA_CFLAGS+=-DKCLASS4A modules
> make[1]: Entering directory '/usr/src/linux-headers-4.4.0-96-generic'
> make[2]: *** No rule to make target 'kernel/bounds.c', needed by
> 'kernel/bounds.s'.  Stop.
> Makefile:1423: recipe for target
> '_module_/usr/src/linux-headers-4.11.0-14-generic' failed
> make[1]: *** [_module_/usr/src/linux-headers-4.11.0-14-generic] Error 2
> make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-96-generic'
> /root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt:96: recipe for
> target 'default' failed
> make: *** [default] Error 2
> make: Leaving directory '/usr/src/linux-headers-4.11.0-14-generic'
> 
> But if you tell me the principal problem is using the 4.11 kernel, I can
> upgrade it to use the 4.16.3 kernel.
> 
> If I use it, must I use the out of box 3.4.20-136 hpsa driver or use your
> precedent patch on the last 3.4.20-125?


(In reply to Anthony Hausman from comment #11)
> The only patch that I'm sure that I have is the "scsi: hpsa: fix selection
> of reply queue" one.
> For the I'm using an out of the box 4.11 kernel. So I'm really not sure that
> the other patches are present.
> 
> 
> Unfortunately, the module does not compile using 4.11.0-14-generic headers.
> 
> # make -C /lib/modules/4.11.0-14-generic/build M=$(pwd)
> --makefile="/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt"
> make: Entering directory '/usr/src/linux-headers-4.11.0-14-generic'
> make -C /lib/modules/4.4.0-96-generic/build
> M=/usr/src/linux-headers-4.11.0-14-generic EXTRA_CFLAGS+=-DKCLASS4A modules
> make[1]: Entering directory '/usr/src/linux-headers-4.4.0-96-generic'
> make[2]: *** No rule to make target 'kernel/bounds.c', needed by
> 'kernel/bounds.s'.  Stop.
> Makefile:1423: recipe for target
> '_module_/usr/src/linux-headers-4.11.0-14-generic' failed
> make[1]: *** [_module_/usr/src/linux-headers-4.11.0-14-generic] Error 2
> make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-96-generic'
> /root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt:96: recipe for
> target 'default' failed
> make: *** [default] Error 2
> make: Leaving directory '/usr/src/linux-headers-4.11.0-14-generic'
> 
> But if you tell me the principal problem is using the 4.11 kernel, I can
> upgrade it to use the 4.16.3 kernel.
> 
> If I use it, must I use the out of box 3.4.20-136 hpsa driver or use your
> precedent patch on the last 3.4.20-125?

The 4.16.3 driver should be OK to use.

You could not untar the sources I gave you in /tmp and build with make -f Makefile.alt?

If you copy the source code into the kernel tree, you should be able to do
make modules SUBDIRS=drivers/scsi hpsa.ko
Comment 16 Anthony Hausman 2018-04-25 08:37:42 UTC
Don,

So I'm actually running the kernel 4.16.3 (build 18-04-19) with the hpsa modules patch to use local work-queue insead of system work-queue.

I have a reproduce a reset with no stack trace (which is a good news).
The only thing is between the resetting logical and the completation, 2 hours passed and caused an heavy load on the server during this time:

Apr 25 01:31:09 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
Apr 25 03:31:00 kernel: hpsa 0000:08:00.0: device is ready.
Apr 25 03:31:00 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1

The good thing after the reset has completed, this one is removed:

Apr 25 03:31:45 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: removed Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
Apr 25 03:31:48 kernel: scsi 0:1:0:0: rejecting I/O to dead device

So the question is if it's normal than the reset logical take such a long time (and causing trouble on the server)?
Comment 17 Don 2018-04-25 14:50:48 UTC
(In reply to Anthony Hausman from comment #16)
> Don,
> 
> So I'm actually running the kernel 4.16.3 (build 18-04-19) with the hpsa
> modules patch to use local work-queue insead of system work-queue.
> 
> I have a reproduce a reset with no stack trace (which is a good news).
> The only thing is between the resetting logical and the completation, 2
> hours passed and caused an heavy load on the server during this time:
> 
> Apr 25 01:31:09 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: resetting logical 
> Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
> Apr 25 03:31:00 kernel: hpsa 0000:08:00.0: device is ready.
> Apr 25 03:31:00 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: reset logical 
> completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0
> SSDSmartPathCap- En- Exp=1
> 
> The good thing after the reset has completed, this one is removed:
> 
> Apr 25 03:31:45 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: removed
> Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1

The driver was notified by the P420i that the volume went offline, so the driver removed it from SML.

> Apr 25 03:31:48 kernel: scsi 0:1:0:0: rejecting I/O to dead device

There were I/O requests for the device, but the SML detected that it was deleted.

> 
> So the question is if it's normal than the reset logical take such a long
> time (and causing trouble on the server)?

It is not normal.

For a Logical Volume reset, the P420i flushes out any outstanding I/O requests then returns. The SML should block any new requests from coming down while the reset is in progress.

Do you know what process was consuming the CPU cycles?
ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20

Are your using sg_reset to test LV resets? Or, does the device have some intermittent issues which is causing the SML to issue the reset operation?

If you turn off the agents, do the resets complete more quickly?

I am wondering if the agents are frequently probing the P420i for changes when the reset is active and the agents are consuming the CPU cycles.
Comment 18 Anthony Hausman 2018-04-25 15:06:42 UTC
Unfortunatly I don't know what process was consuming the CPU cycles at this.
I'll try to reproduce the problem to reproduce the problem to have the information.

I'm not using sg_reset to test the lv reset, actually I am launching a badblocks command on a problematic disk and the reset is invoked when it begins to fails.

I'll use sg_reset to reproduce the problem and test with/without the agent.
I invoke the agent every 5 minutes to check the controller and disks states.

I keep you inform on my test.

By the way, I thank you for your help.
Comment 19 loberman 2018-04-25 15:09:52 UTC
I was concerned about the agents but Anthony disabled them and still saw this.I have seen this timeout sometimes when the agents probe via passthrough.

I did just bump into this reset on a 7.5 RHEL kernel with no agents but it recovered almost immediately.
I need to chase that down
Comment 20 Anthony Hausman 2018-04-26 12:42:59 UTC
So here are all my test.
With the agent enable, using hp check command disk (hpacucli/ssacli and hpssacli) and launching a sg_reset, the reset has no problem on the problematic disk:

Apr 26 14:31:20 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
Apr 26 14:31:21 kernel: hpsa 0000:08:00.0: device is ready.
Apr 26 14:31:21 kernel: hpsa 0000:08:00.0: scsi 0:1:0:0: reset logical  completed successfully Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1

The reset only took 1 second.

The "bug" seems to appear only when the disk returns errors concerning Unrecovered read error (when using badblocks read-only test by example).

I try to reproduce it.
Comment 21 Anthony Hausman 2018-05-02 08:46:17 UTC
I have reproduced the problem.
Here the condition that I have done:

Kernel: 4.16.3-041603-generic
hpsa: 3.4.20-125 with patch to use local work-queue instead of system work-queue.

I needed to execute a badblocks in a read-only test on a disk who has failed before:

~# while :; do badblocks -v -b 4096 -s /dev/sdt; done

And several days after, the bug raised.
You'll find a graph of the load in an attachment.
Before the reset, I have a hpsa_update_device_info: inquiry failed and a stack trace on badblocks (this one seems to be logical)

Load: 850

[Tue May  1 06:27:37 2018] hpsa 0000:08:00.0: aborted: LUN:000000c000003901 CDB:12000000310000000000000000000000
[Tue May  1 06:27:37 2018] hpsa 0000:08:00.0: hpsa_update_device_info: inquiry failed, device will be skipped.
[Tue May  1 06:27:37 2018] hpsa 0000:08:00.0: scsi 0:0:50:0: removed Direct-Access     ATA      MB4000GCWDC      PHYS DRV SSDSmartPathCap- En- Exp=0
[Tue May  1 06:28:24 2018] hpsa 0000:08:00.0: aborted: LUN:000000c000003901 CDB:12000000310000000000000000000000
[Tue May  1 06:28:24 2018] hpsa 0000:08:00.0: hpsa_update_device_info: inquiry failed, device will be skipped.
[Tue May  1 06:29:51 2018] INFO: task badblocks:46824 blocked for more than 120 seconds.
[Tue May  1 06:29:51 2018]       Tainted: G           OE    4.16.3-041603-generic #201804190730
[Tue May  1 06:29:51 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue May  1 06:29:51 2018] badblocks       D    0 46824  48728 0x00000004
[Tue May  1 06:29:51 2018] Call Trace:
[Tue May  1 06:29:51 2018]  __schedule+0x297/0x880
[Tue May  1 06:29:51 2018]  ? iov_iter_get_pages+0xc0/0x2c0
[Tue May  1 06:29:51 2018]  schedule+0x2c/0x80
[Tue May  1 06:29:51 2018]  io_schedule+0x16/0x40
[Tue May  1 06:29:51 2018]  __blkdev_direct_IO_simple+0x1ff/0x360
[Tue May  1 06:29:51 2018]  ? bdget+0x120/0x120
[Tue May  1 06:29:51 2018]  blkdev_direct_IO+0x3a2/0x3f0
[Tue May  1 06:29:51 2018]  ? blkdev_direct_IO+0x3a2/0x3f0
[Tue May  1 06:29:51 2018]  ? current_time+0x32/0x70
[Tue May  1 06:29:51 2018]  ? __atime_needs_update+0x7f/0x190
[Tue May  1 06:29:51 2018]  generic_file_read_iter+0xc6/0xc10
[Tue May  1 06:29:51 2018]  ? __blkdev_direct_IO_simple+0x360/0x360
[Tue May  1 06:29:51 2018]  ? generic_file_read_iter+0xc6/0xc10
[Tue May  1 06:29:51 2018]  ? __wake_up+0x13/0x20
[Tue May  1 06:29:51 2018]  ? tty_ldisc_deref+0x16/0x20
[Tue May  1 06:29:51 2018]  ? tty_write+0x1fb/0x320
[Tue May  1 06:29:51 2018]  blkdev_read_iter+0x35/0x40
[Tue May  1 06:29:51 2018]  __vfs_read+0xfb/0x170
[Tue May  1 06:29:51 2018]  vfs_read+0x8e/0x130
[Tue May  1 06:29:51 2018]  SyS_read+0x55/0xc0
[Tue May  1 06:29:51 2018]  do_syscall_64+0x73/0x130
[Tue May  1 06:29:51 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Tue May  1 06:29:51 2018] RIP: 0033:0x7fe31b97c330
[Tue May  1 06:29:51 2018] RSP: 002b:00007fffcea10258 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue May  1 06:29:51 2018] RAX: ffffffffffffffda RBX: 0000026e19800000 RCX: 00007fe31b97c330
[Tue May  1 06:29:51 2018] RDX: 0000000000040000 RSI: 00007fe31c26e000 RDI: 0000000000000003
[Tue May  1 06:29:51 2018] RBP: 0000000000001000 R08: 0000000026e19800 R09: 00007fffcea10008
[Tue May  1 06:29:51 2018] R10: 00007fffcea10020 R11: 0000000000000246 R12: 0000000000000003
[Tue May  1 06:29:51 2018] R13: 00007fe31c26e000 R14: 0000000000000040 R15: 0000000000040000
[Tue May  1 06:31:52 2018] INFO: task badblocks:46824 blocked for more than 120 seconds.
[Tue May  1 06:31:52 2018]       Tainted: G           OE    4.16.3-041603-generic #201804190730
[Tue May  1 06:31:52 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue May  1 06:31:52 2018] badblocks       D    0 46824  48728 0x00000004
[Tue May  1 06:31:52 2018] Call Trace:
[Tue May  1 06:31:52 2018]  __schedule+0x297/0x880
[Tue May  1 06:31:52 2018]  ? iov_iter_get_pages+0xc0/0x2c0
[Tue May  1 06:31:52 2018]  schedule+0x2c/0x80
[Tue May  1 06:31:52 2018]  io_schedule+0x16/0x40
[Tue May  1 06:31:52 2018]  __blkdev_direct_IO_simple+0x1ff/0x360
[Tue May  1 06:31:52 2018]  ? bdget+0x120/0x120
[Tue May  1 06:31:52 2018]  blkdev_direct_IO+0x3a2/0x3f0
[Tue May  1 06:31:52 2018]  ? blkdev_direct_IO+0x3a2/0x3f0
[Tue May  1 06:31:52 2018]  ? current_time+0x32/0x70
[Tue May  1 06:31:52 2018]  ? __atime_needs_update+0x7f/0x190
[Tue May  1 06:31:52 2018]  generic_file_read_iter+0xc6/0xc10
[Tue May  1 06:31:52 2018]  ? __blkdev_direct_IO_simple+0x360/0x360
[Tue May  1 06:31:52 2018]  ? generic_file_read_iter+0xc6/0xc10
[Tue May  1 06:31:52 2018]  ? __wake_up+0x13/0x20
[Tue May  1 06:31:52 2018]  ? tty_ldisc_deref+0x16/0x20
[Tue May  1 06:31:52 2018]  ? tty_write+0x1fb/0x320
[Tue May  1 06:31:52 2018]  blkdev_read_iter+0x35/0x40
[Tue May  1 06:31:52 2018]  __vfs_read+0xfb/0x170
[Tue May  1 06:31:52 2018]  vfs_read+0x8e/0x130
[Tue May  1 06:31:52 2018]  SyS_read+0x55/0xc0
[Tue May  1 06:31:52 2018]  do_syscall_64+0x73/0x130
[Tue May  1 06:31:52 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Tue May  1 06:31:52 2018] RIP: 0033:0x7fe31b97c330
[Tue May  1 06:31:52 2018] RSP: 002b:00007fffcea10258 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue May  1 06:31:52 2018] RAX: ffffffffffffffda RBX: 0000026e19800000 RCX: 00007fe31b97c330
[Tue May  1 06:31:52 2018] RDX: 0000000000040000 RSI: 00007fe31c26e000 RDI: 0000000000000003
[Tue May  1 06:31:52 2018] RBP: 0000000000001000 R08: 0000000026e19800 R09: 00007fffcea10008
[Tue May  1 06:31:52 2018] R10: 00007fffcea10020 R11: 0000000000000246 R12: 0000000000000003
[Tue May  1 06:31:52 2018] R13: 00007fe31c26e000 R14: 0000000000000040 R15: 0000000000040000
[Tue May  1 06:32:55 2018] hpsa 0000:08:00.0: scsi 0:1:0:19: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- E
n- Exp=1

I have done a ps like you said before this time, every 30 seconds:

ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     1  TS /sbin/init                                          0.0  3680 101792   0   0  0.0 poll_schedule_timeout          init
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0     9  TS [rcu_sched]                                         0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_sched
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  1462  TS lldpd: monitor                                      0.0  2160  48672   0   0  0.0 skb_wait_for_more_packets      lldpd
  0  2034  TS /usr/sbin/syslog-ng --process-mode=background -f /  0.0 56936 637344   0   0  1.3 ep_poll                        syslog-ng
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty


ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  1462  TS lldpd: monitor                                      0.0  2160  48672   0   0  0.0 skb_wait_for_more_packets      lldpd
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty
  0  2333  TS cat filer-01-24-1.keys                              0.0   324   4384   0   0  0.0 pipe_wait                      cat
  0  2334  TS /usr/bin/python /usr/local/scality-walker/scality-  0.0 11208 131444   0   0  1.2 wait_woken                     scality-walker.
  0  2740  TS /opt/datadog-agent/embedded/bin/python /opt/datado  0.0 42160 289140   0   0  0.6 poll_schedule_timeout          python


ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  1462  TS lldpd: monitor                                      0.0  2160  48672   0   0  0.0 skb_wait_for_more_packets      lldpd
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty
  0  2333  TS cat filer-01-24-1.keys                              0.0   324   4384   0   0  0.0 pipe_wait                      cat
  0  2334  TS /usr/bin/python /usr/local/scality-walker/scality-  0.0 11208 131444   0   0  1.2 wait_woken                     scality-walker.
  0  2740  TS /opt/datadog-agent/embedded/bin/python /opt/datado  0.0 42160 289140   0   0  0.6 poll_schedule_timeout          python


1 minute later, I had a task trace about cmaeventd (logical) and jbd2 tasks:

Load: 2000

[Tue May  1 06:33:53 2018] INFO: task cmaeventd:3405 blocked for more than 120 seconds.
[Tue May  1 06:33:53 2018]       Tainted: G           OE    4.16.3-041603-generic #201804190730
[Tue May  1 06:33:53 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue May  1 06:33:53 2018] cmaeventd       D    0  3405      1 0x00000000
[Tue May  1 06:33:53 2018] Call Trace:
[Tue May  1 06:33:53 2018]  __schedule+0x297/0x880
[Tue May  1 06:33:53 2018]  schedule+0x2c/0x80
[Tue May  1 06:33:53 2018]  scsi_block_when_processing_errors+0xd4/0x110
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  sg_open+0x14c/0x5d0
[Tue May  1 06:33:53 2018]  chrdev_open+0xc4/0x1b0
[Tue May  1 06:33:53 2018]  do_dentry_open+0x1c2/0x310
[Tue May  1 06:33:53 2018]  ? cdev_put.part.3+0x20/0x20
[Tue May  1 06:33:53 2018]  vfs_open+0x4f/0x80
[Tue May  1 06:33:53 2018]  path_openat+0x66e/0x1770
[Tue May  1 06:33:53 2018]  ? unlazy_walk+0x3b/0xb0
[Tue May  1 06:33:53 2018]  ? terminate_walk+0x8e/0xf0
[Tue May  1 06:33:53 2018]  do_filp_open+0x9b/0x110
[Tue May  1 06:33:53 2018]  ? __check_object_size+0xac/0x1a0
[Tue May  1 06:33:53 2018]  ? __check_object_size+0xac/0x1a0
[Tue May  1 06:33:53 2018]  ? __alloc_fd+0x46/0x170
[Tue May  1 06:33:53 2018]  do_sys_open+0x1ba/0x250
[Tue May  1 06:33:53 2018]  ? do_sys_open+0x1ba/0x250
[Tue May  1 06:33:53 2018]  ? SyS_access+0x13d/0x230
[Tue May  1 06:33:53 2018]  SyS_open+0x1e/0x20
[Tue May  1 06:33:53 2018]  do_syscall_64+0x73/0x130
[Tue May  1 06:33:53 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Tue May  1 06:33:53 2018] RIP: 0033:0x7fdfbc6b0be0
[Tue May  1 06:33:53 2018] RSP: 002b:00007ffe1f418728 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[Tue May  1 06:33:53 2018] RAX: ffffffffffffffda RBX: 00000000018b8640 RCX: 00007fdfbc6b0be0
[Tue May  1 06:33:53 2018] RDX: 0000000000000008 RSI: 0000000000000002 RDI: 00007ffe1f418760
[Tue May  1 06:33:53 2018] RBP: 00007ffe1f418760 R08: 0000000000000001 R09: 0000000000000000
[Tue May  1 06:33:53 2018] R10: 00007fdfbc699760 R11: 0000000000000246 R12: 0000000000000002
[Tue May  1 06:33:53 2018] R13: 0000000000000001 R14: 00007ffe1f418870 R15: 00007ffe1f4189a0
[Tue May  1 06:33:53 2018] INFO: task cmaidad:3507 blocked for more than 120 seconds.
[Tue May  1 06:33:53 2018]       Tainted: G           OE    4.16.3-041603-generic #201804190730
[Tue May  1 06:33:53 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue May  1 06:33:53 2018] cmaidad         D    0  3507      1 0x00000000
[Tue May  1 06:33:53 2018] Call Trace:
[Tue May  1 06:33:53 2018]  __schedule+0x297/0x880
[Tue May  1 06:33:53 2018]  ? __find_get_block+0xb6/0x2f0
[Tue May  1 06:33:53 2018]  schedule+0x2c/0x80
[Tue May  1 06:33:53 2018]  scsi_block_when_processing_errors+0xd4/0x110
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  sg_open+0x14c/0x5d0
[Tue May  1 06:33:53 2018]  chrdev_open+0xc4/0x1b0
[Tue May  1 06:33:53 2018]  do_dentry_open+0x1c2/0x310
[Tue May  1 06:33:53 2018]  ? cdev_put.part.3+0x20/0x20
[Tue May  1 06:33:53 2018]  vfs_open+0x4f/0x80
[Tue May  1 06:33:53 2018]  path_openat+0x66e/0x1770
[Tue May  1 06:33:53 2018]  ? unlazy_walk+0x3b/0xb0
[Tue May  1 06:33:53 2018]  ? terminate_walk+0x8e/0xf0
[Tue May  1 06:33:53 2018]  do_filp_open+0x9b/0x110
[Tue May  1 06:33:53 2018]  ? __check_object_size+0xac/0x1a0
[Tue May  1 06:33:53 2018]  ? __check_object_size+0xac/0x1a0
[Tue May  1 06:33:53 2018]  ? __alloc_fd+0x46/0x170
[Tue May  1 06:33:53 2018]  do_sys_open+0x1ba/0x250
[Tue May  1 06:33:53 2018]  ? do_sys_open+0x1ba/0x250
[Tue May  1 06:33:53 2018]  ? SyS_access+0x13d/0x230
[Tue May  1 06:33:53 2018]  SyS_open+0x1e/0x20
[Tue May  1 06:33:53 2018]  do_syscall_64+0x73/0x130
[Tue May  1 06:33:53 2018]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Tue May  1 06:33:53 2018] RIP: 0033:0x7f5dec322be0
[Tue May  1 06:33:53 2018] RSP: 002b:00007ffee82dccc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[Tue May  1 06:33:53 2018] RAX: ffffffffffffffda RBX: 00000000021c4c60 RCX: 00007f5dec322be0
[Tue May  1 06:33:53 2018] RDX: 0000000000000008 RSI: 0000000000000002 RDI: 00007ffee82dcd00
[Tue May  1 06:33:53 2018] RBP: 00007ffee82dcd00 R08: 0000000000000001 R09: 0000000000000003
[Tue May  1 06:33:53 2018] R10: 00007f5dec30b760 R11: 0000000000000246 R12: 0000000000000002
[Tue May  1 06:33:53 2018] R13: 0000000000000001 R14: 00007ffee82dce10 R15: 00007ffee82dcf40
[Tue May  1 06:33:53 2018] INFO: task jbd2/sdas-8:9924 blocked for more than 120 seconds.
[Tue May  1 06:33:53 2018]       Tainted: G           OE    4.16.3-041603-generic #201804190730
[Tue May  1 06:33:53 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue May  1 06:33:53 2018] jbd2/sdas-8     D    0  9924      2 0x80000000
[Tue May  1 06:33:53 2018] Call Trace:
[Tue May  1 06:33:53 2018]  __schedule+0x297/0x880
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  schedule+0x2c/0x80
[Tue May  1 06:33:53 2018]  jbd2_journal_commit_transaction+0x244/0x1740
[Tue May  1 06:33:53 2018]  ? update_curr+0xf5/0x1d0
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  ? lock_timer_base+0x6b/0x90
[Tue May  1 06:33:53 2018]  kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  kthread+0x121/0x140
[Tue May  1 06:33:53 2018]  ? commit_timeout+0x20/0x20
[Tue May  1 06:33:53 2018]  ? kthread_create_worker_on_cpu+0x70/0x70
[Tue May  1 06:33:53 2018]  ret_from_fork+0x35/0x40
[Tue May  1 06:33:53 2018] INFO: task jbd2/sdan-8:9955 blocked for more than 120 seconds.
[Tue May  1 06:33:53 2018]       Tainted: G           OE    4.16.3-041603-generic #201804190730
[Tue May  1 06:33:53 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue May  1 06:33:53 2018] jbd2/sdan-8     D    0  9955      2 0x80000000
[Tue May  1 06:33:53 2018] Call Trace:
[Tue May  1 06:33:53 2018]  __schedule+0x297/0x880
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  schedule+0x2c/0x80
[Tue May  1 06:33:53 2018]  jbd2_journal_commit_transaction+0x244/0x1740
[Tue May  1 06:33:53 2018]  ? update_curr+0xf5/0x1d0
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  ? lock_timer_base+0x6b/0x90
[Tue May  1 06:33:53 2018]  kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  kthread+0x121/0x140
[Tue May  1 06:33:53 2018]  ? commit_timeout+0x20/0x20
[Tue May  1 06:33:53 2018]  ? kthread_create_worker_on_cpu+0x70/0x70
[Tue May  1 06:33:53 2018]  ? do_syscall_64+0x73/0x130
[Tue May  1 06:33:53 2018]  ? SyS_exit_group+0x14/0x20
[Tue May  1 06:33:53 2018]  ret_from_fork+0x35/0x40
[Tue May  1 06:33:53 2018] INFO: task jbd2/sdaq-8:9965 blocked for more than 120 seconds.
[Tue May  1 06:33:53 2018]       Tainted: G           OE    4.16.3-041603-generic #201804190730
[Tue May  1 06:33:53 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue May  1 06:33:53 2018] jbd2/sdaq-8     D    0  9965      2 0x80000000
[Tue May  1 06:33:53 2018] Call Trace:
[Tue May  1 06:33:53 2018]  __schedule+0x297/0x880
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  schedule+0x2c/0x80
[Tue May  1 06:33:53 2018]  jbd2_journal_commit_transaction+0x244/0x1740
[Tue May  1 06:33:53 2018]  ? update_curr+0xf5/0x1d0
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  ? lock_timer_base+0x6b/0x90
[Tue May  1 06:33:53 2018]  kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  kthread+0x121/0x140
[Tue May  1 06:33:53 2018]  ? commit_timeout+0x20/0x20
[Tue May  1 06:33:53 2018]  ? kthread_create_worker_on_cpu+0x70/0x70
[Tue May  1 06:33:53 2018]  ret_from_fork+0x35/0x40
[Tue May  1 06:33:53 2018] INFO: task jbd2/sdaj-8:10082 blocked for more than 120 seconds.
[Tue May  1 06:33:53 2018]       Tainted: G           OE    4.16.3-041603-generic #201804190730
[Tue May  1 06:33:53 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue May  1 06:33:53 2018] jbd2/sdaj-8     D    0 10082      2 0x80000000
[Tue May  1 06:33:53 2018] Call Trace:
[Tue May  1 06:33:53 2018]  __schedule+0x297/0x880
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  schedule+0x2c/0x80
[Tue May  1 06:33:53 2018]  jbd2_journal_commit_transaction+0x244/0x1740
[Tue May  1 06:33:53 2018]  ? update_curr+0xf5/0x1d0
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  ? lock_timer_base+0x6b/0x90
[Tue May  1 06:33:53 2018]  kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  kthread+0x121/0x140
[Tue May  1 06:33:53 2018]  ? commit_timeout+0x20/0x20
[Tue May  1 06:33:53 2018]  ? kthread_create_worker_on_cpu+0x70/0x70
[Tue May  1 06:33:53 2018]  ? do_syscall_64+0x73/0x130
[Tue May  1 06:33:53 2018]  ? SyS_exit_group+0x14/0x20
[Tue May  1 06:33:53 2018]  ret_from_fork+0x35/0x40
[Tue May  1 06:33:53 2018] INFO: task jbd2/sdao-8:10109 blocked for more than 120 seconds.
[Tue May  1 06:33:53 2018]       Tainted: G           OE    4.16.3-041603-generic #201804190730
[Tue May  1 06:33:53 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue May  1 06:33:53 2018] jbd2/sdao-8     D    0 10109      2 0x80000000
[Tue May  1 06:33:53 2018] Call Trace:
[Tue May  1 06:33:53 2018]  __schedule+0x297/0x880
[Tue May  1 06:33:53 2018]  ? bit_wait+0x60/0x60
[Tue May  1 06:33:53 2018]  schedule+0x2c/0x80
[Tue May  1 06:33:53 2018]  io_schedule+0x16/0x40
[Tue May  1 06:33:53 2018]  bit_wait_io+0x11/0x60
[Tue May  1 06:33:53 2018]  __wait_on_bit+0x4c/0x90
[Tue May  1 06:33:53 2018]  out_of_line_wait_on_bit+0x90/0xb0
[Tue May  1 06:33:53 2018]  ? bit_waitqueue+0x40/0x40
[Tue May  1 06:33:53 2018]  __wait_on_buffer+0x32/0x40
[Tue May  1 06:33:53 2018]  jbd2_journal_commit_transaction+0xf59/0x1740
[Tue May  1 06:33:53 2018]  kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  kthread+0x121/0x140
[Tue May  1 06:33:53 2018]  ? commit_timeout+0x20/0x20
[Tue May  1 06:33:53 2018]  ? kthread_create_worker_on_cpu+0x70/0x70
[Tue May  1 06:33:53 2018]  ret_from_fork+0x35/0x40
[Tue May  1 06:33:53 2018] INFO: task jbd2/sdag-8:10135 blocked for more than 120 seconds.
[Tue May  1 06:33:53 2018]       Tainted: G           OE    4.16.3-041603-generic #201804190730
[Tue May  1 06:33:53 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue May  1 06:33:53 2018] jbd2/sdag-8     D    0 10135      2 0x80000000
[Tue May  1 06:33:53 2018] Call Trace:
[Tue May  1 06:33:53 2018]  __schedule+0x297/0x880
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  schedule+0x2c/0x80
[Tue May  1 06:33:53 2018]  jbd2_journal_commit_transaction+0x244/0x1740
[Tue May  1 06:33:53 2018]  ? update_curr+0xf5/0x1d0
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  ? lock_timer_base+0x6b/0x90
[Tue May  1 06:33:53 2018]  kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? kjournald2+0xc8/0x270
[Tue May  1 06:33:53 2018]  ? wait_woken+0x80/0x80
[Tue May  1 06:33:53 2018]  kthread+0x121/0x140
[Tue May  1 06:33:53 2018]  ? commit_timeout+0x20/0x20
[Tue May  1 06:33:53 2018]  ? kthread_create_worker_on_cpu+0x70/0x70
[Tue May  1 06:33:53 2018]  ret_from_fork+0x35/0x40

Some other ps after that message:

ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  1462  TS lldpd: monitor                                      0.0  2160  48672   0   0  0.0 skb_wait_for_more_packets      lldpd
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty
  0  2333  TS cat filer-01-24-1.keys                              0.0   324   4384   0   0  0.0 pipe_wait                      cat
  0  2427  TS /usr/bin/python /usr/bin/salt-minion KeepAlive Mul  0.0 109868 719424  0   0  0.1 poll_schedule_timeout          /usr/bin/python
  0  3555  TS cmascsid -p 15 -s OK -l /var/log/hp-snmp-agents/cm  0.0   396  12880   0   0  0.0 msgrcv                         cmascsid

ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  1462  TS lldpd: monitor                                      0.0  2160  48672   0   0  0.0 skb_wait_for_more_packets      lldpd
  0  1865  TS /usr/sbin/sshd -D                                   0.0   836  61392   0   0  0.0 poll_schedule_timeout          sshd
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty
  0  2333  TS cat filer-01-24-1.keys                              0.0   324   4384   0   0  0.0 pipe_wait                      cat
  0  2399  TS /usr/bin/python /usr/sbin/exabgp /etc/exabgp/exabg  0.0  9552  50888   0   0  0.0 poll_schedule_timeout          exabgp

ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   436  TS [md0_raid1]                                         0.0     0      0   0   0  0.0 md_thread                      md0_raid1
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  1462  TS lldpd: monitor                                      0.0  2160  48672   0   0  0.0 skb_wait_for_more_packets      lldpd
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty
  0  2333  TS cat filer-01-24-1.keys                              0.0   324   4384   0   0  0.0 pipe_wait                      cat
  0  2662  TS asynctask-worker [disable] : 1                      0.0 14856 135372   0   0  0.0 poll_schedule_timeout          asynctask-worke

ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  1462  TS lldpd: monitor                                      0.0  2160  48672   0   0  0.0 skb_wait_for_more_packets      lldpd
  0  2034  TS /usr/sbin/syslog-ng --process-mode=background -f /  0.0 56936 637344   0   0  1.3 ep_poll                        syslog-ng
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty
  0  2333  TS cat filer-01-24-1.keys                              0.0   324   4384   0   0  0.0 pipe_wait                      cat
  0  2471  TS python /etc/exabgp/processes/exasrv.py /etc/exabgp  0.0  6316  35120   0   0  0.0 poll_schedule_timeout          python

ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  1462  TS lldpd: monitor                                      0.0  2160  48672   0   0  0.0 skb_wait_for_more_packets      lldpd
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty
  0  2471  TS python /etc/exabgp/processes/exasrv.py /etc/exabgp  0.0  6316  35120   0   0  0.0 poll_schedule_timeout          python
  0  3275  TS cmahealthd -p 30 -s OK -t OK -i -l /var/log/hp-snm  0.0   972  22236   0   0  0.0 msgrcv                         cmahealthd
  0  3487  TS cmasasd -p 15 -s OK -l /var/log/hp-snmp-agents/cma  0.0   388  10820   0   0  0.0 msgrcv                         cmasasd

ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   458  TS [jbd2/md0-8]                                        0.0     0      0   0   0  0.0 kjournald2                     jbd2/md0-8
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  1462  TS lldpd: monitor                                      0.0  2160  48672   0   0  0.0 skb_wait_for_more_packets      lldpd
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty
  0  3275  TS cmahealthd -p 30 -s OK -t OK -i -l /var/log/hp-snm  0.0   972  22236   0   0  0.0 msgrcv                         cmahealthd
  0  3487  TS cmasasd -p 15 -s OK -l /var/log/hp-snmp-agents/cma  0.0   388  10820   0   0  0.0 msgrcv                         cmasasd

ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kwor
ker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kwor
ker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   436  TS [md0_raid1]                                         0.0     0      0   0   0  0.0 md_thread                      md0_raid1
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  1462  TS lldpd: monitor                                      0.0  2160  48672   0   0  0.0 skb_wait_for_more_packets      lldpd
  0  1865  TS /usr/sbin/sshd -D                                   0.0   836  61392   0   0  0.0 poll_schedule_timeout          sshd
  0  2033  TS lldpd: 2 neighbors                                  0.0  2488  49000   0   0  0.0 ep_poll                        lldpd
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty

Few minutes later before reboot:

ps -deo psr,pid,cls,cmd:50,pmem,size,vsz,nice,psr,pcpu,wchan:30,comm:30 | sort -nk1 | head -20
  0     3  TS [kworker/0:0]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:0
  0     4  TS [kworker/0:0H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:0H
  0     7  TS [mm_percpu_wq]                                      0.0     0      0 -20   0  0.0 rescuer_thread                 mm_percpu_wq
  0     8  TS [ksoftirqd/0]                                       0.0     0      0   0   0  0.0 smpboot_thread_fn              ksoftirqd/0
  0    10  TS [rcu_bh]                                            0.0     0      0   0   0  0.0 rcu_gp_kthread                 rcu_bh
  0    11  FF [migration/0]                                       0.0     0      0   -   0  0.0 smpboot_thread_fn              migration/0
  0    12  FF [watchdog/0]                                        0.0     0      0   -   0  0.0 smpboot_thread_fn              watchdog/0
  0    13  TS [cpuhp/0]                                           0.0     0      0   0   0  0.0 smpboot_thread_fn              cpuhp/0
  0    71  TS [kblockd]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 kblockd
  0    76  FF [watchdogd]                                         0.0     0      0   -   0  0.0 kthread_worker_fn              watchdogd
  0   128  TS [nvme-delete-wq]                                    0.0     0      0 -20   0  0.0 rescuer_thread                 nvme-delete-wq
  0   245  TS [kworker/0:2]                                       0.0     0      0   0   0  0.0 worker_thread                  kworker/0:2
  0   271  TS [raid5wq]                                           0.0     0      0 -20   0  0.0 rescuer_thread                 raid5wq
  0   427  TS [kworker/u129:0]                                    0.0     0      0   0   0  0.0 worker_thread                  kworker/u129:0
  0   477  TS [kworker/0:1H]                                      0.0     0      0 -20   0  0.0 worker_thread                  kworker/0:1H
  0  2080  TS logger -p daemon.info -t docker_daemon_events       0.0   328   4360   0   0  0.0 pipe_wait                      logger
  0  2248  TS /sbin/getty -8 38400 tty6                           0.0   356  15836   0   0  0.0 wait_woken                     getty
  0  2427  TS /usr/bin/python /usr/bin/salt-minion KeepAlive Mul  0.0 109868 719424  0   0  0.1 poll_schedule_timeout          /usr/bin/python
  0  3326  TS cmasm2d -p 30 -l /var/log/hp-snmp-agents/cma.log    0.0   948  24176   0   0  0.0 msgrcv                         cmasm2d
  0  3364  TS cmaperfd -p 30 -s OK -l /var/log/hp-snmp-agents/cm  0.0  1628  22724   0   0  0.0 msgrcv                         cmaperfd

So here it is, I hope we have now enough thing to track down this weird behavior.

If you need some other informations or more thing, I can make a little to script to pass some commands if the reset raise without returns.
Comment 22 Anthony Hausman 2018-05-02 08:47:06 UTC
Created attachment 275723 [details]
Load on server during reset problem
Comment 23 Anthony Hausman 2018-05-02 12:04:50 UTC
Ho, I have forgotten to say that before the hpsa do some actions, I had several errors on the disk where the badblocks ran:
...
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 Sense Key : Medium Error [current] 
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 Add. Sense: Unrecovered read error
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 CDB: Read(16) 88 00 00 00 00 01 37 0c c5 b0 00 00 00 08 00 00
[Mon Apr 30 22:21:18 2018] print_req_error: critical medium error, dev sdt, sector 5218551216
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] Unaligned partial completion (resid=242, sector_sz=512)
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 Sense Key : Medium Error [current] 
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 Add. Sense: Unrecovered read error
[Mon Apr 30 22:21:18 2018] sd 0:1:0:19: [sdt] tag#0 CDB: Read(16) 88 00 00 00 00 01 37 0c c5 b8 00 00 00 08 00 00
[Mon Apr 30 22:21:18 2018] print_req_error: critical medium error, dev sdt, sector 5218551224
[Tue May  1 06:27:37 2018] hpsa 0000:08:00.0: aborted: LUN:000000c000003901 CDB:12000000310000000000000000000000
[Tue May  1 06:27:37 2018] hpsa 0000:08:00.0: hpsa_update_device_info: inquiry failed, device will be skipped.
[Tue May  1 06:27:37 2018] hpsa 0000:08:00.0: scsi 0:0:50:0: removed Direct-Access     ATA      MB4000GCWDC      PHYS DRV SSDSmartP
athCap- En- Exp=0
[Tue May  1 06:28:24 2018] hpsa 0000:08:00.0: aborted: LUN:000000c000003901 CDB:12000000310000000000000000000000
[Tue May  1 06:28:24 2018] hpsa 0000:08:00.0: hpsa_update_device_info: inquiry failed, device will be skipped.
...
Comment 24 Gaetan Trellu 2018-08-30 15:01:56 UTC
Same behavior here with controllers P440ar and P420i on DL480 G8 and DL480p G8.

Firmware:
  - P440ar: 6.60
  - P420i: 8.32

[128958.979859] hpsa 0000:03:00.0: scsi 0:1:0:9: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[129170.663840] INFO: task scsi_eh_0:446 blocked for more than 120 seconds.
[129170.671251]       Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.678176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129170.686930] scsi_eh_0       D    0   446      2 0x80000000
[129170.686934] Call Trace:
[129170.686945]  __schedule+0x3d6/0x8b0
[129170.686947]  schedule+0x36/0x80
[129170.686950]  schedule_timeout+0x1db/0x370
[129170.686954]  ? __dev_printk+0x3c/0x80
[129170.686956]  ? dev_printk+0x56/0x80
[129170.686959]  io_schedule_timeout+0x1e/0x50
[129170.686961]  wait_for_completion_io+0xb4/0x140
[129170.686965]  ? wake_up_q+0x70/0x70
[129170.686972]  hpsa_scsi_do_simple_cmd.isra.56+0xc7/0xf0 [hpsa]
[129170.686975]  hpsa_eh_device_reset_handler+0x3bb/0x790 [hpsa]
[129170.686978]  ? sched_clock_cpu+0x11/0xb0
[129170.686983]  ? scsi_device_put+0x2b/0x30
[129170.686987]  scsi_eh_ready_devs+0x368/0xc10
[129170.686993]  ? __pm_runtime_resume+0x5b/0x80
[129170.686995]  scsi_error_handler+0x4c3/0x5c0
[129170.687000]  kthread+0x105/0x140
[129170.687003]  ? scsi_eh_get_sense+0x240/0x240
[129170.687005]  ? kthread_destroy_worker+0x50/0x50
[129170.687012]  ? do_syscall_64+0x73/0x130
[129170.687015]  ? SyS_exit_group+0x14/0x20
[129170.687017]  ret_from_fork+0x35/0x40
[129170.687021] INFO: task jbd2/sda1-8:636 blocked for more than 120 seconds.
[129170.694649]       Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.701598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129170.710343] jbd2/sda1-8     D    0   636      2 0x80000000
[129170.710346] Call Trace:
[129170.710349]  __schedule+0x3d6/0x8b0
[129170.710351]  ? bit_wait+0x60/0x60
[129170.710352]  schedule+0x36/0x80
[129170.710354]  io_schedule+0x16/0x40
[129170.710359]  bit_wait_io+0x11/0x60
[129170.710362]  __wait_on_bit+0x63/0x90
[129170.710367]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.710373]  ? bit_waitqueue+0x40/0x40
[129170.710377]  __wait_on_buffer+0x32/0x40
[129170.710381]  jbd2_journal_commit_transaction+0xdf6/0x1760
[129170.710387]  kjournald2+0xc8/0x250
[129170.710392]  ? kjournald2+0xc8/0x250
[129170.710395]  ? wait_woken+0x80/0x80
[129170.710398]  kthread+0x105/0x140
[129170.710399]  ? commit_timeout+0x20/0x20
[129170.710402]  ? kthread_destroy_worker+0x50/0x50
[129170.710404]  ? do_syscall_64+0x73/0x130
[129170.710407]  ? SyS_exit_group+0x14/0x20
[129170.710412]  ret_from_fork+0x35/0x40
[129170.710423] INFO: task rs:main Q:Reg:2907 blocked for more than 120 seconds.
[129170.718358]       Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.725305] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129170.734076] rs:main Q:Reg   D    0  2907      1 0x00000000
[129170.734079] Call Trace:
[129170.734082]  __schedule+0x3d6/0x8b0
[129170.734086]  ? bit_waitqueue+0x40/0x40
[129170.734087]  ? bit_wait+0x60/0x60
[129170.734089]  schedule+0x36/0x80
[129170.734091]  io_schedule+0x16/0x40
[129170.734092]  bit_wait_io+0x11/0x60
[129170.734094]  __wait_on_bit+0x63/0x90
[129170.734096]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.734098]  ? bit_waitqueue+0x40/0x40
[129170.734100]  do_get_write_access+0x202/0x410
[129170.734102]  jbd2_journal_get_write_access+0x51/0x70
[129170.734107]  __ext4_journal_get_write_access+0x3b/0x80
[129170.734111]  ext4_reserve_inode_write+0x95/0xc0
[129170.734115]  ? ext4_dirty_inode+0x48/0x70
[129170.734117]  ext4_mark_inode_dirty+0x53/0x1d0
[129170.734119]  ? __ext4_journal_start_sb+0x6d/0x120
[129170.734121]  ext4_dirty_inode+0x48/0x70
[129170.734125]  __mark_inode_dirty+0x184/0x3b0
[129170.734129]  generic_update_time+0x7b/0xd0
[129170.734132]  ? current_time+0x32/0x70
[129170.734134]  file_update_time+0xbe/0x110
[129170.734140]  __generic_file_write_iter+0x9d/0x1f0
[129170.734142]  ext4_file_write_iter+0xc4/0x3f0
[129170.734147]  ? futex_wake+0x90/0x170
[129170.734151]  new_sync_write+0xe5/0x140
[129170.734155]  __vfs_write+0x29/0x40
[129170.734156]  vfs_write+0xb8/0x1b0
[129170.734158]  SyS_write+0x55/0xc0
[129170.734160]  do_syscall_64+0x73/0x130
[129170.734163]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[129170.734165] RIP: 0033:0x7feefa9394bd
[129170.734166] RSP: 002b:00007feef7ce8600 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[129170.734168] RAX: ffffffffffffffda RBX: 00007feeec00d120 RCX: 00007feefa9394bd
[129170.734169] RDX: 0000000000000078 RSI: 00007feeec00d120 RDI: 0000000000000006
[129170.734171] RBP: 0000000000000000 R08: 00000000010d0030 R09: 00007feef7ce8890
[129170.734173] R10: 00007feef7ce8890 R11: 0000000000000293 R12: 00007feeec0027c0
[129170.734174] R13: 00007feef7ce8620 R14: 000000000046a8b4 R15: 0000000000000078
[129170.734194] INFO: task dockerd:10374 blocked for more than 120 seconds.
[129170.741596]       Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.748540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129170.757284] dockerd         D    0 10374      1 0x00000000
[129170.757286] Call Trace:
[129170.757289]  __schedule+0x3d6/0x8b0
[129170.757295]  ? bit_wait+0x60/0x60
[129170.757298]  schedule+0x36/0x80
[129170.757300]  io_schedule+0x16/0x40
[129170.757302]  bit_wait_io+0x11/0x60
[129170.757303]  __wait_on_bit+0x63/0x90
[129170.757306]  ? select_idle_sibling+0x1db/0x410
[129170.757307]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.757311]  ? bit_waitqueue+0x40/0x40
[129170.757319]  do_get_write_access+0x202/0x410
[129170.757323]  ? __wake_up_common_lock+0x8e/0xc0
[129170.757327]  jbd2_journal_get_write_access+0x51/0x70
[129170.757331]  __ext4_journal_get_write_access+0x3b/0x80
[129170.757334]  ext4_reserve_inode_write+0x95/0xc0
[129170.757338]  ? ext4_dirty_inode+0x48/0x70
[129170.757340]  ext4_mark_inode_dirty+0x53/0x1d0
[129170.757343]  ? __ext4_journal_start_sb+0x6d/0x120
[129170.757345]  ext4_dirty_inode+0x48/0x70
[129170.757350]  __mark_inode_dirty+0x184/0x3b0
[129170.757358]  generic_update_time+0x7b/0xd0
[129170.757362]  ? current_time+0x32/0x70
[129170.757365]  file_update_time+0xbe/0x110
[129170.757368]  __generic_file_write_iter+0x9d/0x1f0
[129170.757371]  ext4_file_write_iter+0xc4/0x3f0
[129170.757374]  new_sync_write+0xe5/0x140
[129170.757376]  __vfs_write+0x29/0x40
[129170.757378]  vfs_write+0xb8/0x1b0
[129170.757379]  SyS_write+0x55/0xc0
[129170.757382]  do_syscall_64+0x73/0x130
[129170.757385]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[129170.757393] RIP: 0033:0x5632bfc51d40
[129170.757396] RSP: 002b:000000c420edaaf0 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
[129170.757403] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00005632bfc51d40
[129170.757405] RDX: 000000000000012d RSI: 000000c420bb0800 RDI: 000000000000001e
[129170.757407] RBP: 000000c420edab48 R08: 0000000000000000 R09: 0000000000000000
[129170.757409] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffffff
[129170.757410] R13: 0000000000000083 R14: 0000000000000082 R15: 0000000000000100
[129170.757448] INFO: task log:6889 blocked for more than 120 seconds.
[129170.764622]       Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.771584] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129170.780349] log             D    0  6889      1 0x00000000
[129170.780352] Call Trace:
[129170.780356]  __schedule+0x3d6/0x8b0
[129170.780360]  ? bit_wait+0x60/0x60
[129170.780361]  schedule+0x36/0x80
[129170.780363]  io_schedule+0x16/0x40
[129170.780365]  bit_wait_io+0x11/0x60
[129170.780366]  __wait_on_bit+0x63/0x90
[129170.780368]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.780373]  ? bit_waitqueue+0x40/0x40
[129170.780377]  do_get_write_access+0x202/0x410
[129170.780380]  jbd2_journal_get_write_access+0x51/0x70
[129170.780385]  __ext4_journal_get_write_access+0x3b/0x80
[129170.780387]  ext4_reserve_inode_write+0x95/0xc0
[129170.780389]  ? ext4_dirty_inode+0x48/0x70
[129170.780391]  ext4_mark_inode_dirty+0x53/0x1d0
[129170.780393]  ? __ext4_journal_start_sb+0x6d/0x120
[129170.780395]  ext4_dirty_inode+0x48/0x70
[129170.780397]  __mark_inode_dirty+0x184/0x3b0
[129170.780401]  generic_update_time+0x7b/0xd0
[129170.780405]  ? current_time+0x32/0x70
[129170.780409]  file_update_time+0xbe/0x110
[129170.780413]  __generic_file_write_iter+0x9d/0x1f0
[129170.780417]  ext4_file_write_iter+0xc4/0x3f0
[129170.780421]  ? futex_wake+0x90/0x170
[129170.780423]  new_sync_write+0xe5/0x140
[129170.780425]  __vfs_write+0x29/0x40
[129170.780426]  vfs_write+0xb8/0x1b0
[129170.780428]  SyS_write+0x55/0xc0
[129170.780430]  do_syscall_64+0x73/0x130
[129170.780433]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[129170.780434] RIP: 0033:0x7f5c82b984bd
[129170.780435] RSP: 002b:00007f5c80a615c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[129170.780436] RAX: ffffffffffffffda RBX: 0000000000000082 RCX: 00007f5c82b984bd
[129170.780437] RDX: 0000000000000082 RSI: 00007f5c80a615f0 RDI: 0000000000000003
[129170.780438] RBP: 00007f5c80a615f0 R08: 0000000000000000 R09: 0000000000000011
[129170.780439] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
[129170.780440] R13: 00007f5c80a617e0 R14: 0000000000000056 R15: 0000000000000081
[129170.780453] INFO: task log:6964 blocked for more than 120 seconds.
[129170.787377]       Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.794341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129170.803204] log             D    0  6964      1 0x00000000
[129170.803207] Call Trace:
[129170.803210]  __schedule+0x3d6/0x8b0
[129170.803216]  ? find_next_bit+0xb/0x10
[129170.803218]  ? bit_wait+0x60/0x60
[129170.803219]  schedule+0x36/0x80
[129170.803221]  io_schedule+0x16/0x40
[129170.803223]  bit_wait_io+0x11/0x60
[129170.803224]  __wait_on_bit+0x63/0x90
[129170.803226]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.803228]  ? bit_waitqueue+0x40/0x40
[129170.803230]  do_get_write_access+0x202/0x410
[129170.803234]  jbd2_journal_get_write_access+0x51/0x70
[129170.803237]  __ext4_journal_get_write_access+0x3b/0x80
[129170.803239]  ext4_reserve_inode_write+0x95/0xc0
[129170.803241]  ? ext4_dirty_inode+0x48/0x70
[129170.803243]  ext4_mark_inode_dirty+0x53/0x1d0
[129170.803244]  ? _cond_resched+0x1a/0x50
[129170.803247]  ? __ext4_journal_start_sb+0x6d/0x120
[129170.803250]  ext4_dirty_inode+0x48/0x70
[129170.803252]  __mark_inode_dirty+0x184/0x3b0
[129170.803254]  ? block_write_end+0x33/0x80
[129170.803256]  generic_write_end+0x87/0xe0
[129170.803258]  ext4_da_write_end+0x117/0x290
[129170.803260]  ? copyin+0x29/0x30
[129170.803263]  generic_perform_write+0xff/0x1b0
[129170.803266]  __generic_file_write_iter+0x1a6/0x1f0
[129170.803269]  ext4_file_write_iter+0xc4/0x3f0
[129170.803271]  ? futex_wake+0x90/0x170
[129170.803273]  new_sync_write+0xe5/0x140
[129170.803275]  __vfs_write+0x29/0x40
[129170.803277]  vfs_write+0xb8/0x1b0
[129170.803279]  SyS_write+0x55/0xc0
[129170.803281]  do_syscall_64+0x73/0x130
[129170.803284]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[129170.803286] RIP: 0033:0x7f875f3864bd
[129170.803287] RSP: 002b:00007f875d24f540 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[129170.803289] RAX: ffffffffffffffda RBX: 000000000000010a RCX: 00007f875f3864bd
[129170.803290] RDX: 000000000000010a RSI: 00007f875d24f570 RDI: 0000000000000003
[129170.803291] RBP: 00007f875d24f570 R08: 0000000000000000 R09: 0000000000000011
[129170.803292] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
[129170.803293] R13: 00007f875d24f7e0 R14: 00000000000000de R15: 0000000000000109
[129170.803304] INFO: task log:6976 blocked for more than 120 seconds.
[129170.810258]       Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.817202] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129170.826044] log             D    0  6976      1 0x00000000
[129170.826047] Call Trace:
[129170.826050]  __schedule+0x3d6/0x8b0
[129170.826057]  ? bit_wait+0x60/0x60
[129170.826060]  schedule+0x36/0x80
[129170.826062]  io_schedule+0x16/0x40
[129170.826063]  bit_wait_io+0x11/0x60
[129170.826065]  __wait_on_bit+0x63/0x90
[129170.826066]  ? ttwu_do_wakeup+0x1e/0x150
[129170.826071]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.826079]  ? bit_waitqueue+0x40/0x40
[129170.826082]  do_get_write_access+0x202/0x410
[129170.826084]  jbd2_journal_get_write_access+0x51/0x70
[129170.826087]  __ext4_journal_get_write_access+0x3b/0x80
[129170.826089]  ext4_reserve_inode_write+0x95/0xc0
[129170.826094]  ? ext4_dirty_inode+0x48/0x70
[129170.826100]  ext4_mark_inode_dirty+0x53/0x1d0
[129170.826104]  ? _cond_resched+0x1a/0x50
[129170.826107]  ? __ext4_journal_start_sb+0x6d/0x120
[129170.826109]  ext4_dirty_inode+0x48/0x70
[129170.826112]  __mark_inode_dirty+0x184/0x3b0
[129170.826115]  ? block_write_end+0x33/0x80
[129170.826116]  generic_write_end+0x87/0xe0
[129170.826120]  ext4_da_write_end+0x117/0x290
[129170.826125]  ? copyin+0x29/0x30
[129170.826133]  generic_perform_write+0xff/0x1b0
[129170.826135]  __generic_file_write_iter+0x1a6/0x1f0
[129170.826137]  ext4_file_write_iter+0xc4/0x3f0
[129170.826139]  ? futex_wake+0x90/0x170
[129170.826142]  new_sync_write+0xe5/0x140
[129170.826150]  __vfs_write+0x29/0x40
[129170.826153]  vfs_write+0xb8/0x1b0
[129170.826155]  SyS_write+0x55/0xc0
[129170.826157]  do_syscall_64+0x73/0x130
[129170.826159]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[129170.826161] RIP: 0033:0x7fea6f6044bd
[129170.826168] RSP: 002b:00007fea6d4cd530 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[129170.826173] RAX: ffffffffffffffda RBX: 000000000000010f RCX: 00007fea6f6044bd
[129170.826174] RDX: 000000000000010f RSI: 00007fea6d4cd560 RDI: 0000000000000003
[129170.826175] RBP: 00007fea6d4cd560 R08: 0000000000000000 R09: 0000000000000011
[129170.826176] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
[129170.826177] R13: 00007fea6d4cd7e0 R14: 00000000000000e3 R15: 000000000000010e
[129170.826188] INFO: task log:6997 blocked for more than 120 seconds.
[129170.833188]       Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.840109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129170.848851] log             D    0  6997      1 0x00000000
[129170.848853] Call Trace:
[129170.848857]  __schedule+0x3d6/0x8b0
[129170.848859]  ? bit_wait+0x60/0x60
[129170.848860]  schedule+0x36/0x80
[129170.848862]  io_schedule+0x16/0x40
[129170.848864]  bit_wait_io+0x11/0x60
[129170.848865]  __wait_on_bit+0x63/0x90
[129170.848867]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.848869]  ? bit_waitqueue+0x40/0x40
[129170.848871]  do_get_write_access+0x202/0x410
[129170.848875]  jbd2_journal_get_write_access+0x51/0x70
[129170.848877]  __ext4_journal_get_write_access+0x3b/0x80
[129170.848879]  ext4_reserve_inode_write+0x95/0xc0
[129170.848882]  ? ext4_dirty_inode+0x48/0x70
[129170.848884]  ext4_mark_inode_dirty+0x53/0x1d0
[129170.848886]  ? __ext4_journal_start_sb+0x6d/0x120
[129170.848889]  ext4_dirty_inode+0x48/0x70
[129170.848892]  __mark_inode_dirty+0x184/0x3b0
[129170.848894]  generic_update_time+0x7b/0xd0
[129170.848896]  ? current_time+0x32/0x70
[129170.848898]  file_update_time+0xbe/0x110
[129170.848901]  __generic_file_write_iter+0x9d/0x1f0
[129170.848903]  ext4_file_write_iter+0xc4/0x3f0
[129170.848905]  ? futex_wake+0x90/0x170
[129170.848908]  new_sync_write+0xe5/0x140
[129170.848910]  __vfs_write+0x29/0x40
[129170.848912]  vfs_write+0xb8/0x1b0
[129170.848914]  SyS_write+0x55/0xc0
[129170.848916]  do_syscall_64+0x73/0x130
[129170.848918]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[129170.848919] RIP: 0033:0x7fcaf66d04bd
[129170.848921] RSP: 002b:00007fcaf4599560 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[129170.848924] RAX: ffffffffffffffda RBX: 00000000000000de RCX: 00007fcaf66d04bd
[129170.848925] RDX: 00000000000000de RSI: 00007fcaf4599590 RDI: 0000000000000003
[129170.848926] RBP: 00007fcaf4599590 R08: 0000000000000000 R09: 0000000000000011
[129170.848927] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
[129170.848928] R13: 00007fcaf45997e0 R14: 00000000000000b2 R15: 00000000000000dd
[129170.848940] INFO: task log:7127 blocked for more than 120 seconds.
[129170.855864]       Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.862786] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129170.871674] log             D    0  7127      1 0x00000000
[129170.871677] Call Trace:
[129170.871679]  __schedule+0x3d6/0x8b0
[129170.871681]  ? bit_wait+0x60/0x60
[129170.871683]  schedule+0x36/0x80
[129170.871685]  io_schedule+0x16/0x40
[129170.871686]  bit_wait_io+0x11/0x60
[129170.871691]  __wait_on_bit+0x63/0x90
[129170.871695]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.871699]  ? bit_waitqueue+0x40/0x40
[129170.871703]  do_get_write_access+0x202/0x410
[129170.871706]  jbd2_journal_get_write_access+0x51/0x70
[129170.871709]  __ext4_journal_get_write_access+0x3b/0x80
[129170.871711]  ext4_reserve_inode_write+0x95/0xc0
[129170.871713]  ? ext4_dirty_inode+0x48/0x70
[129170.871715]  ext4_mark_inode_dirty+0x53/0x1d0
[129170.871717]  ? __ext4_journal_start_sb+0x6d/0x120
[129170.871720]  ext4_dirty_inode+0x48/0x70
[129170.871721]  __mark_inode_dirty+0x184/0x3b0
[129170.871725]  generic_update_time+0x7b/0xd0
[129170.871729]  ? current_time+0x32/0x70
[129170.871734]  file_update_time+0xbe/0x110
[129170.871740]  __generic_file_write_iter+0x9d/0x1f0
[129170.871744]  ext4_file_write_iter+0xc4/0x3f0
[129170.871746]  ? futex_wake+0x90/0x170
[129170.871748]  new_sync_write+0xe5/0x140
[129170.871750]  __vfs_write+0x29/0x40
[129170.871751]  vfs_write+0xb8/0x1b0
[129170.871753]  SyS_write+0x55/0xc0
[129170.871755]  do_syscall_64+0x73/0x130
[129170.871758]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[129170.871760] RIP: 0033:0x7f10989804bd
[129170.871763] RSP: 002b:00007f1096849560 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[129170.871769] RAX: ffffffffffffffda RBX: 00000000000000de RCX: 00007f10989804bd
[129170.871772] RDX: 00000000000000de RSI: 00007f1096849590 RDI: 0000000000000003
[129170.871775] RBP: 00007f1096849590 R08: 0000000000000000 R09: 0000000000000011
[129170.871778] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
[129170.871781] R13: 00007f10968497e0 R14: 00000000000000b2 R15: 00000000000000dd
[129170.871792] INFO: task log:7150 blocked for more than 120 seconds.
[129170.878715]       Not tainted 4.15.0-33-generic #36~16.04.1-Ubuntu
[129170.885639] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[129170.894451] log             D    0  7150      1 0x00000000
[129170.894455] Call Trace:
[129170.894460]  __schedule+0x3d6/0x8b0
[129170.894462]  ? bit_wait+0x60/0x60
[129170.894463]  schedule+0x36/0x80
[129170.894465]  io_schedule+0x16/0x40
[129170.894467]  bit_wait_io+0x11/0x60
[129170.894468]  __wait_on_bit+0x63/0x90
[129170.894470]  out_of_line_wait_on_bit+0x8e/0xb0
[129170.894473]  ? bit_waitqueue+0x40/0x40
[129170.894475]  do_get_write_access+0x202/0x410
[129170.894477]  jbd2_journal_get_write_access+0x51/0x70
[129170.894481]  __ext4_journal_get_write_access+0x3b/0x80
[129170.894484]  ext4_reserve_inode_write+0x95/0xc0
[129170.894485]  ? ext4_dirty_inode+0x48/0x70
[129170.894487]  ext4_mark_inode_dirty+0x53/0x1d0
[129170.894490]  ? __ext4_journal_start_sb+0x6d/0x120
[129170.894492]  ext4_dirty_inode+0x48/0x70
[129170.894495]  __mark_inode_dirty+0x184/0x3b0
[129170.894498]  generic_update_time+0x7b/0xd0
[129170.894500]  ? current_time+0x32/0x70
[129170.894502]  file_update_time+0xbe/0x110
[129170.894505]  __generic_file_write_iter+0x9d/0x1f0
[129170.894507]  ext4_file_write_iter+0xc4/0x3f0
[129170.894509]  ? futex_wake+0x90/0x170
[129170.894513]  new_sync_write+0xe5/0x140
[129170.894515]  __vfs_write+0x29/0x40
[129170.894517]  vfs_write+0xb8/0x1b0
[129170.894518]  SyS_write+0x55/0xc0
[129170.894521]  do_syscall_64+0x73/0x130
[129170.894523]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[129170.894524] RIP: 0033:0x7f5174f514bd
[129170.894526] RSP: 002b:00007f5172e1a560 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[129170.894530] RAX: ffffffffffffffda RBX: 00000000000000de RCX: 00007f5174f514bd
[129170.894531] RDX: 00000000000000de RSI: 00007f5172e1a590 RDI: 0000000000000003
[129170.894532] RBP: 00007f5172e1a590 R08: 0000000000000000 R09: 0000000000000011
[129170.894533] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
[129170.894534] R13: 00007f5172e1a7e0 R14: 00000000000000b2 R15: 00000000000000dd
Comment 25 Gaetan Trellu 2018-09-07 02:41:10 UTC
More logs.
[    5.272077] HP HPSA Driver (v 3.4.14-0)
[    5.340589] hpsa 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
[    5.352372] hpsa 0000:03:00.0: MSI-X capable controller
[    5.358775] hpsa 0000:03:00.0: Logical aborts not supported
[    5.410577] scsi host6: hpsa
[    5.620173] hpsa 0000:03:00.0: scsi 6:3:0:0: added RAID              HP       P440ar           controller SSDSmartPathCap- En- Exp=1
[    5.633345] hpsa 0000:03:00.0: scsi 6:0:0:0: masked Direct-Access     ATA      TK0120GDJXT      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.651921] hpsa 0000:03:00.0: scsi 6:0:1:0: masked Direct-Access     ATA      TK0120GDJXT      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.682879] ata6.00: ATA-9: VR0120GEJXL, 4IWTHPG0, max UDMA/100
[    5.682891] ata5.00: ATA-9: VR0120GEJXL, 4IWTHPG0, max UDMA/100
[    5.800257] hpsa 0000:03:00.0: scsi 6:0:2:0: masked Direct-Access     ATA      MB3000GCWDB      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.813417] hpsa 0000:03:00.0: scsi 6:0:3:0: masked Direct-Access     ATA      MB3000GCWDB      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.826488] hpsa 0000:03:00.0: scsi 6:0:4:0: masked Direct-Access     ATA      MB3000GCWDB      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.839558] hpsa 0000:03:00.0: scsi 6:0:5:0: masked Direct-Access     ATA      MB3000GCWDB      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.852628] hpsa 0000:03:00.0: scsi 6:0:6:0: masked Direct-Access     ATA      MB3000GCWDB      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.865698] hpsa 0000:03:00.0: scsi 6:0:7:0: masked Direct-Access     ATA      MB3000GCWDB      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.878769] hpsa 0000:03:00.0: scsi 6:0:8:0: masked Direct-Access     ATA      MB3000GCWDB      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.891839] hpsa 0000:03:00.0: scsi 6:0:9:0: masked Direct-Access     ATA      MB3000GCWDB      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.904910] hpsa 0000:03:00.0: scsi 6:0:10:0: masked Direct-Access     ATA      MB3000GCWDB      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.918076] hpsa 0000:03:00.0: scsi 6:0:11:0: masked Direct-Access     ATA      MB3000GCWDB      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.931242] hpsa 0000:03:00.0: scsi 6:0:12:0: masked Direct-Access     ATA      TK0120GDJXT      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.944442] hpsa 0000:03:00.0: scsi 6:0:13:0: masked Direct-Access     ATA      TK0120GDJXT      PHYS DRV SSDSmartPathCap- En- Exp=0
[    5.957609] hpsa 0000:03:00.0: scsi 6:0:14:0: masked Enclosure         HPE      12G SAS Exp Card enclosure SSDSmartPathCap- En- Exp=0
[    5.970871] hpsa 0000:03:00.0: scsi 6:1:0:0: added Direct-Access     HP       LOGICAL VOLUME   RAID-1(+0) SSDSmartPathCap+ En+ Exp=1
[    5.984038] hpsa 0000:03:00.0: scsi 6:1:0:1: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap+ En+ Exp=1
[    5.996822] hpsa 0000:03:00.0: scsi 6:1:0:2: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap+ En+ Exp=1
[    6.009606] hpsa 0000:03:00.0: scsi 6:1:0:3: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    6.022391] hpsa 0000:03:00.0: scsi 6:1:0:4: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    6.035176] hpsa 0000:03:00.0: scsi 6:1:0:5: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    6.047960] hpsa 0000:03:00.0: scsi 6:1:0:6: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    6.060759] hpsa 0000:03:00.0: scsi 6:1:0:7: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    6.073545] hpsa 0000:03:00.0: scsi 6:1:0:8: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    6.086329] hpsa 0000:03:00.0: scsi 6:1:0:9: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    6.099113] hpsa 0000:03:00.0: scsi 6:1:0:10: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    6.111991] hpsa 0000:03:00.0: scsi 6:1:0:11: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    6.124869] hpsa 0000:03:00.0: scsi 6:1:0:12: added Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
[    6.138251] scsi 6:0:0:0: RAID              HP       P440ar           6.60 PQ: 0 ANSI: 5
[    6.147610] scsi 6:1:0:0: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.156967] scsi 6:1:0:1: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.171837] scsi 6:1:0:2: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.181197] scsi 6:1:0:3: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.190653] scsi 6:1:0:4: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.200015] scsi 6:1:0:5: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.200420] scsi 6:1:0:6: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.200813] scsi 6:1:0:7: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.201205] scsi 6:1:0:8: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.201599] scsi 6:1:0:9: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.201999] scsi 6:1:0:10: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.202395] scsi 6:1:0:11: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.202789] scsi 6:1:0:12: Direct-Access     HP       LOGICAL VOLUME   6.60 PQ: 0 ANSI: 5
[    6.205267] scsi 4:0:0:0: Direct-Access     ATA      VR0120GEJXL      HPG0 PQ: 0 ANSI: 5
[    6.205610] scsi 5:0:0:0: Direct-Access     ATA      VR0120GEJXL      HPG0 PQ: 0 ANSI: 5
[   15.324913] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[   16.681743] hpwdt 0000:01:00.0: HP Watchdog Timer Driver: NMI decoding initialized, allow kernel dump: ON (default = 1/ON)
[   16.681825] hpwdt 0000:01:00.0: HP Watchdog Timer Driver: 1.3.3, timer margin: 30 seconds (nowayout=0).
[   35.467951] hpsa 0000:03:00.0: Acknowledging event: 0xc0000000 (HP SSD Smart Path configuration change)
[   35.636446] hpsa 0000:03:00.0: scsi 6:1:0:0: updated Direct-Access     HP       LOGICAL VOLUME   RAID-1(+0) SSDSmartPathCap+ En+ Exp=1
[   35.636452] hpsa 0000:03:00.0: scsi 6:1:0:1: updated Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap+ En+ Exp=1
[   35.636457] hpsa 0000:03:00.0: scsi 6:1:0:2: updated Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap+ En+ Exp=1
Comment 26 Gaetan Trellu 2018-10-15 14:12:57 UTC
Moved from Ubuntu 16.04.5 to CentOS 7.5 with hpsa kernel module (kmod-hpsa-3.4.20-141.rhel7u5.x86_64.rpm) from HPE website.

Running without kernel panic since more than a week.
Comment 27 Gaetan Trellu 2018-12-06 17:25:21 UTC
By compiling the hpsa kernel module from SourceForge on Ubuntu 16.04 with kernel 4.4 solved the issue for us.

Steps:
# apt-get install dkms build-essential
# tar xjvf hpsa-3.4.20-141.tar.bz2
# cd hpsa-3.4.20/drivers/
# sudo cp -a scsi /usr/src/hpsa-3.4.20.141
# dkms add -m hpsa -v 3.4.20.141
# dkms build -m hpsa -v 3.4.20.141
# dkms install -m hpsa -v 3.4.20.141

Link: https://sourceforge.net/projects/cciss/
Comment 28 Anthony Hausman 2019-01-23 15:16:01 UTC
I have actually compiled the hpsa driver 3.4.20.141 into the kernel 4.19.13.

I still have the same behavior, a heavy load (3000) and all disk of the controller unavailable.

But this time, it's not the reset who trigger the bug, here the log that I have.

First of all, one disk returns a lot of critical medium error:
```
[Wed Jan 23 15:55:34 2019] print_req_error: critical medium error, dev sdt, sector 13836632
[Wed Jan 23 15:55:34 2019] sd 3:1:0:19: [sdt] Unaligned partial completion (resid=52, sector_sz=512)
[Wed Jan 23 15:55:35 2019] sd 3:1:0:19: [sdt] Unaligned partial completion (resid=48, sector_sz=512)
[Wed Jan 23 15:55:35 2019] sd 3:1:0:19: [sdt] Unaligned partial completion (resid=32, sector_sz=512)
[Wed Jan 23 15:55:35 2019] sd 3:1:0:19: [sdt] Unaligned partial completion (resid=52, sector_sz=512)
[Wed Jan 23 15:55:52 2019] sd 3:1:0:19: [sdt] Unaligned partial completion (resid=32, sector_sz=512)
[Wed Jan 23 15:55:52 2019] scsi_io_completion_action: 5 callbacks suppressed
[Wed Jan 23 15:55:52 2019] sd 3:1:0:19: [sdt] tag#23 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Wed Jan 23 15:55:52 2019] sd 3:1:0:19: [sdt] tag#23 Sense Key : Medium Error [current] 
[Wed Jan 23 15:55:52 2019] sd 3:1:0:19: [sdt] tag#23 Add. Sense: Unrecovered read error
[Wed Jan 23 15:55:52 2019] sd 3:1:0:19: [sdt] tag#23 CDB: Read(16) 88 00 00 00 00 00 00 d3 21 58 00 00 00 08 00 00
[Wed Jan 23 15:55:52 2019] print_req_error: 5 callbacks suppressed
[Wed Jan 23 15:55:52 2019] print_req_error: critical medium error, dev sdt, sector 13836632
```

After this, hpsa show sone failed inquiry:
```
[Wed Jan 23 15:57:07 2019] hpsa 0000:08:00.0: aborted: NULL_SDEV_PTR TAG:0x00000000:00000770 LUN:000000c000003901 CDB:12000000310000000000000000000000
[Wed Jan 23 15:57:07 2019] hpsa 0000:08:00.0: hpsa_update_device_info: inquiry failed, device will be skipped.
[Wed Jan 23 15:57:08 2019] hpsa 0000:08:00.0:           removed scsi 3:0:50:0: Direct-Access     ATA      MB4000GCWDC      PHYS DRV SSDSmartPathCap- En- Exp=0 qd=14
[Wed Jan 23 15:57:31 2019] hpsa 0000:08:00.0: aborted: NULL_SDEV_PTR TAG:0x00000000:000015c0 LUN:000000c000003901 CDB:12000000310000000000000000000000
[Wed Jan 23 15:57:31 2019] hpsa 0000:08:00.0: hpsa_update_device_info: inquiry failed, device will be skipped.
[Wed Jan 23 15:57:54 2019] hpsa 0000:08:00.0: aborted: NULL_SDEV_PTR TAG:0x00000000:00000e70 LUN:000000c000003901 CDB:12000000310000000000000000000000
[Wed Jan 23 15:57:54 2019] hpsa 0000:08:00.0: hpsa_update_device_info: inquiry failed, device will be skipped.

[Wed Jan 23 15:59:04 2019] hpsa 0000:08:00.0: aborted: NULL_SDEV_PTR TAG:0x00000000:00002650 LUN:000000c000003901 CDB:12000000310000000000000000000000
[Wed Jan 23 15:59:04 2019] hpsa 0000:08:00.0: hpsa_update_device_info: inquiry failed, device will be skipped.
[Wed Jan 23 15:59:28 2019] hpsa 0000:08:00.0: aborted: NULL_SDEV_PTR TAG:0x00000000:00001400 LUN:000000c000003901 CDB:12000000310000000000000000000000
[Wed Jan 23 15:59:28 2019] hpsa 0000:08:00.0: hpsa_update_device_info: inquiry failed, device will be skipped.
[Wed Jan 23 15:59:51 2019] hpsa 0000:08:00.0: aborted: NULL_SDEV_PTR TAG:0x00000000:00001400 LUN:000000c000003901 CDB:12000000310000000000000000000000
[Wed Jan 23 15:59:51 2019] hpsa 0000:08:00.0: hpsa_update_device_info: inquiry failed, device will be skipped.
```


And following this Call Trace:
```
Wed Jan 23 16:00:19 2019] INFO: task task:12406 blocked for more than 120 seconds.
[Wed Jan 23 16:00:19 2019]       Not tainted 4.19.13-dailymotion #1
[Wed Jan 23 16:00:19 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jan 23 16:00:19 2019] task            D    0 12406  12384 0x00000000
[Wed Jan 23 16:00:19 2019] Call Trace:
[Wed Jan 23 16:00:19 2019]  ? __schedule+0x2b7/0x880
[Wed Jan 23 16:00:19 2019]  ? bit_wait+0x50/0x50
[Wed Jan 23 16:00:19 2019]  schedule+0x28/0x80
[Wed Jan 23 16:00:19 2019]  io_schedule+0x12/0x40
[Wed Jan 23 16:00:19 2019]  bit_wait_io+0xd/0x50
[Wed Jan 23 16:00:19 2019]  __wait_on_bit+0x44/0x80
[Wed Jan 23 16:00:19 2019]  out_of_line_wait_on_bit+0x91/0xb0
[Wed Jan 23 16:00:19 2019]  ? init_wait_var_entry+0x40/0x40
[Wed Jan 23 16:00:19 2019]  __ext4_get_inode_loc+0x1a4/0x3f0
[Wed Jan 23 16:00:19 2019]  ext4_iget+0x8f/0xbb0
[Wed Jan 23 16:00:19 2019]  ? d_alloc_parallel+0x9d/0x4a0
[Wed Jan 23 16:00:19 2019]  ext4_lookup+0xda/0x200
[Wed Jan 23 16:00:19 2019]  __lookup_slow+0x97/0x150
[Wed Jan 23 16:00:19 2019]  lookup_slow+0x35/0x50
[Wed Jan 23 16:00:19 2019]  walk_component+0x1c4/0x340
[Wed Jan 23 16:00:19 2019]  link_path_walk.part.33+0x2a6/0x510
[Wed Jan 23 16:00:19 2019]  ? path_init+0x190/0x310
[Wed Jan 23 16:00:19 2019]  path_openat+0xdd/0x1540
[Wed Jan 23 16:00:19 2019]  ? get_futex_key+0x2ed/0x3d0
[Wed Jan 23 16:00:19 2019]  do_filp_open+0x9b/0x110
[Wed Jan 23 16:00:19 2019]  ? __check_object_size+0xb1/0x1a0
[Wed Jan 23 16:00:19 2019]  ? do_sys_open+0x1bd/0x250
[Wed Jan 23 16:00:19 2019]  do_sys_open+0x1bd/0x250
[Wed Jan 23 16:00:19 2019]  do_syscall_64+0x55/0x110
[Wed Jan 23 16:00:19 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Wed Jan 23 16:00:19 2019] RIP: 0033:0x7f1c109a5bfd
[Wed Jan 23 16:00:19 2019] Code: Bad RIP value.
[Wed Jan 23 16:00:19 2019] RSP: 002b:00007f1c075b7d90 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
[Wed Jan 23 16:00:19 2019] RAX: ffffffffffffffda RBX: 00007f1bdc000af0 RCX: 00007f1c109a5bfd
[Wed Jan 23 16:00:19 2019] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f1c075b7dc0
[Wed Jan 23 16:00:19 2019] RBP: 0000000000000000 R08: 0000564865cae4aa R09: 0000000000000000
[Wed Jan 23 16:00:19 2019] R10: 0000000000000004 R11: 0000000000000293 R12: 00007f1bdc0008c0
[Wed Jan 23 16:00:19 2019] R13: 00007f1c075b7dc0 R14: 00007f1c075b8f50 R15: 00007f1bd0000c78
[Wed Jan 23 16:01:41 2019] hpsa 0000:08:00.0:     logical_reset scsi 3:1:0:19: Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1 qd=0
[Wed Jan 23 16:02:20 2019] INFO: task jbd2/sdac-8:9669 blocked for more than 120 seconds.
[Wed Jan 23 16:02:20 2019]       Not tainted 4.19.13-dailymotion #1
[Wed Jan 23 16:02:20 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jan 23 16:02:20 2019] jbd2/sdac-8     D    0  9669      2 0x80000000
[Wed Jan 23 16:02:20 2019] Call Trace:
[Wed Jan 23 16:02:20 2019]  ? __schedule+0x2b7/0x880
[Wed Jan 23 16:02:20 2019]  ? __wake_up_common_lock+0x89/0xc0
[Wed Jan 23 16:02:20 2019]  ? wait_woken+0x80/0x80
[Wed Jan 23 16:02:20 2019]  schedule+0x28/0x80
[Wed Jan 23 16:02:20 2019]  jbd2_journal_commit_transaction+0x246/0x1740
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? wait_woken+0x80/0x80
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? lock_timer_base+0x67/0x80
[Wed Jan 23 16:02:20 2019]  ? kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  ? __wake_up_common+0x74/0x120
[Wed Jan 23 16:02:20 2019]  ? wait_woken+0x80/0x80
[Wed Jan 23 16:02:20 2019]  ? commit_timeout+0x10/0x10
[Wed Jan 23 16:02:20 2019]  kthread+0x113/0x130
[Wed Jan 23 16:02:20 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Wed Jan 23 16:02:20 2019]  ret_from_fork+0x35/0x40
[Wed Jan 23 16:02:20 2019] INFO: task jbd2/sdab-8:9684 blocked for more than 120 seconds.
[Wed Jan 23 16:02:20 2019]       Not tainted 4.19.13-dailymotion #1
[Wed Jan 23 16:02:20 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jan 23 16:02:20 2019] jbd2/sdab-8     D    0  9684      2 0x80000000
[Wed Jan 23 16:02:20 2019] Call Trace:
[Wed Jan 23 16:02:20 2019]  ? __schedule+0x2b7/0x880
[Wed Jan 23 16:02:20 2019]  ? bit_wait+0x50/0x50
[Wed Jan 23 16:02:20 2019]  schedule+0x28/0x80
[Wed Jan 23 16:02:20 2019]  io_schedule+0x12/0x40
[Wed Jan 23 16:02:20 2019]  bit_wait_io+0xd/0x50
[Wed Jan 23 16:02:20 2019]  __wait_on_bit+0x44/0x80
[Wed Jan 23 16:02:20 2019]  out_of_line_wait_on_bit+0x91/0xb0
[Wed Jan 23 16:02:20 2019]  ? init_wait_var_entry+0x40/0x40
[Wed Jan 23 16:02:20 2019]  jbd2_journal_commit_transaction+0xd0d/0x1740
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  ? wait_woken+0x80/0x80
[Wed Jan 23 16:02:20 2019]  ? commit_timeout+0x10/0x10
[Wed Jan 23 16:02:20 2019]  kthread+0x113/0x130
[Wed Jan 23 16:02:20 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Wed Jan 23 16:02:20 2019]  ret_from_fork+0x35/0x40
[Wed Jan 23 16:02:20 2019] INFO: task jbd2/sdaa-8:9792 blocked for more than 120 seconds.
[Wed Jan 23 16:02:20 2019]       Not tainted 4.19.13-dailymotion #1
[Wed Jan 23 16:02:20 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jan 23 16:02:20 2019] jbd2/sdaa-8     D    0  9792      2 0x80000000
[Wed Jan 23 16:02:20 2019] Call Trace:
[Wed Jan 23 16:02:20 2019]  ? __schedule+0x2b7/0x880
[Wed Jan 23 16:02:20 2019]  ? bit_wait+0x50/0x50
[Wed Jan 23 16:02:20 2019]  schedule+0x28/0x80
[Wed Jan 23 16:02:20 2019]  io_schedule+0x12/0x40
[Wed Jan 23 16:02:20 2019]  bit_wait_io+0xd/0x50
[Wed Jan 23 16:02:20 2019]  __wait_on_bit+0x44/0x80
[Wed Jan 23 16:02:20 2019]  out_of_line_wait_on_bit+0x91/0xb0
[Wed Jan 23 16:02:20 2019]  ? init_wait_var_entry+0x40/0x40
[Wed Jan 23 16:02:20 2019]  jbd2_journal_commit_transaction+0xd0d/0x1740
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  ? __wake_up_common+0x74/0x120
[Wed Jan 23 16:02:20 2019]  ? wait_woken+0x80/0x80
[Wed Jan 23 16:02:20 2019]  ? commit_timeout+0x10/0x10
[Wed Jan 23 16:02:20 2019]  kthread+0x113/0x130
[Wed Jan 23 16:02:20 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Wed Jan 23 16:02:20 2019]  ret_from_fork+0x35/0x40
[Wed Jan 23 16:02:20 2019] INFO: task jbd2/sdb-8:9796 blocked for more than 120 seconds.
[Wed Jan 23 16:02:20 2019]       Not tainted 4.19.13-dailymotion #1
[Wed Jan 23 16:02:20 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jan 23 16:02:20 2019] jbd2/sdb-8      D    0  9796      2 0x80000000
[Wed Jan 23 16:02:20 2019] Call Trace:
[Wed Jan 23 16:02:20 2019]  ? __schedule+0x2b7/0x880
[Wed Jan 23 16:02:20 2019]  ? bit_wait+0x50/0x50
[Wed Jan 23 16:02:20 2019]  schedule+0x28/0x80
[Wed Jan 23 16:02:20 2019]  io_schedule+0x12/0x40
[Wed Jan 23 16:02:20 2019]  bit_wait_io+0xd/0x50
[Wed Jan 23 16:02:20 2019]  __wait_on_bit+0x44/0x80
[Wed Jan 23 16:02:20 2019]  out_of_line_wait_on_bit+0x91/0xb0
[Wed Jan 23 16:02:20 2019]  ? init_wait_var_entry+0x40/0x40
[Wed Jan 23 16:02:20 2019]  jbd2_journal_commit_transaction+0xd0d/0x1740
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  ? __wake_up_common+0x74/0x120
[Wed Jan 23 16:02:20 2019]  ? wait_woken+0x80/0x80
[Wed Jan 23 16:02:20 2019]  ? commit_timeout+0x10/0x10
[Wed Jan 23 16:02:20 2019]  kthread+0x113/0x130
[Wed Jan 23 16:02:20 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Wed Jan 23 16:02:20 2019]  ret_from_fork+0x35/0x40
[Wed Jan 23 16:02:20 2019] INFO: task jbd2/sdf-8:9939 blocked for more than 120 seconds.
[Wed Jan 23 16:02:20 2019]       Not tainted 4.19.13-dailymotion #1
[Wed Jan 23 16:02:20 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jan 23 16:02:20 2019] jbd2/sdf-8      D    0  9939      2 0x80000000
[Wed Jan 23 16:02:20 2019] Call Trace:
[Wed Jan 23 16:02:20 2019]  ? __schedule+0x2b7/0x880
[Wed Jan 23 16:02:20 2019]  ? __wake_up_common_lock+0x89/0xc0
[Wed Jan 23 16:02:20 2019]  ? wait_woken+0x80/0x80
[Wed Jan 23 16:02:20 2019]  schedule+0x28/0x80
[Wed Jan 23 16:02:20 2019]  jbd2_journal_commit_transaction+0x246/0x1740
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? wait_woken+0x80/0x80
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x34/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? lock_timer_base+0x67/0x80
[Wed Jan 23 16:02:20 2019]  ? kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  ? __wake_up_common+0x74/0x120
[Wed Jan 23 16:02:20 2019]  ? wait_woken+0x80/0x80
[Wed Jan 23 16:02:20 2019]  ? commit_timeout+0x10/0x10
[Wed Jan 23 16:02:20 2019]  kthread+0x113/0x130
[Wed Jan 23 16:02:20 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Wed Jan 23 16:02:20 2019]  ret_from_fork+0x35/0x40
[Wed Jan 23 16:02:20 2019] INFO: task jbd2/sdc-8:9947 blocked for more than 120 seconds.
[Wed Jan 23 16:02:20 2019]       Not tainted 4.19.13-dailymotion #1
[Wed Jan 23 16:02:20 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jan 23 16:02:20 2019] jbd2/sdc-8      D    0  9947      2 0x80000000
[Wed Jan 23 16:02:20 2019] Call Trace:
[Wed Jan 23 16:02:20 2019]  ? __schedule+0x2b7/0x880
[Wed Jan 23 16:02:20 2019]  ? bit_wait+0x50/0x50
[Wed Jan 23 16:02:20 2019]  schedule+0x28/0x80
[Wed Jan 23 16:02:20 2019]  io_schedule+0x12/0x40
[Wed Jan 23 16:02:20 2019]  bit_wait_io+0xd/0x50
[Wed Jan 23 16:02:20 2019]  __wait_on_bit+0x44/0x80
[Wed Jan 23 16:02:20 2019]  out_of_line_wait_on_bit+0x91/0xb0
[Wed Jan 23 16:02:20 2019]  ? init_wait_var_entry+0x40/0x40
[Wed Jan 23 16:02:20 2019]  jbd2_journal_commit_transaction+0xd0d/0x1740
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? __switch_to_asm+0x40/0x70
[Wed Jan 23 16:02:20 2019]  ? kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  kjournald2+0xbd/0x270
[Wed Jan 23 16:02:20 2019]  ? __wake_up_common+0x74/0x120
[Wed Jan 23 16:02:20 2019]  ? wait_woken+0x80/0x80
[Wed Jan 23 16:02:20 2019]  ? commit_timeout+0x10/0x10
[Wed Jan 23 16:02:20 2019]  kthread+0x113/0x130
[Wed Jan 23 16:02:20 2019]  ? kthread_create_worker_on_cpu+0x70/0x70
[Wed Jan 23 16:02:20 2019]  ret_from_fork+0x35/0x40
[Wed Jan 23 16:02:20 2019] INFO: task task:10018 blocked for more than 120 seconds.
[Wed Jan 23 16:02:20 2019]       Not tainted 4.19.13-dailymotion #1
[Wed Jan 23 16:02:20 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jan 23 16:02:20 2019] task            D    0 10018  10007 0x00000000
[Wed Jan 23 16:02:20 2019] Call Trace:
[Wed Jan 23 16:02:20 2019]  ? __schedule+0x2b7/0x880
[Wed Jan 23 16:02:20 2019]  ? futex_wait_queue_me+0xd3/0x120
[Wed Jan 23 16:02:20 2019]  schedule+0x28/0x80
[Wed Jan 23 16:02:20 2019]  rwsem_down_write_failed+0x15e/0x350
[Wed Jan 23 16:02:20 2019]  ? call_rwsem_down_write_failed+0x13/0x20
[Wed Jan 23 16:02:20 2019]  call_rwsem_down_write_failed+0x13/0x20
[Wed Jan 23 16:02:20 2019]  down_write+0x29/0x40
[Wed Jan 23 16:02:20 2019]  ext4_file_write_iter+0x96/0x3e0
[Wed Jan 23 16:02:20 2019]  ? __sys_sendto+0xac/0x140
[Wed Jan 23 16:02:20 2019]  __vfs_write+0x112/0x1a0
[Wed Jan 23 16:02:20 2019]  vfs_write+0xad/0x1a0
[Wed Jan 23 16:02:20 2019]  ksys_pwrite64+0x71/0x90
[Wed Jan 23 16:02:20 2019]  do_syscall_64+0x55/0x110
[Wed Jan 23 16:02:20 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Wed Jan 23 16:02:20 2019] RIP: 0033:0x7ffa3666ad23
[Wed Jan 23 16:02:20 2019] Code: Bad RIP value.
[Wed Jan 23 16:02:20 2019] RSP: 002b:00007ffa32a88a50 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
[Wed Jan 23 16:02:20 2019] RAX: ffffffffffffffda RBX: 0000000000000200 RCX: 00007ffa3666ad23
[Wed Jan 23 16:02:20 2019] RDX: 0000000000000200 RSI: 00007ffa32a88b30 RDI: 0000000000000013
[Wed Jan 23 16:02:20 2019] RBP: 0000000000000200 R08: 00007ffa32a88b08 R09: 00007ffa32a889f0
[Wed Jan 23 16:02:20 2019] R10: 00000003c2275000 R11: 0000000000000293 R12: 0000000000000000
[Wed Jan 23 16:02:20 2019] R13: 00000003c2275000 R14: 0000000000000013 R15: 00007ffa32a88b30
[Wed Jan 23 16:02:20 2019] INFO: task task:10021 blocked for more than 120 seconds.
[Wed Jan 23 16:02:20 2019]       Not tainted 4.19.13-dailymotion #1
[Wed Jan 23 16:02:20 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Wed Jan 23 16:02:20 2019] task            D    0 10021  10007 0x00000000
[Wed Jan 23 16:02:20 2019] Call Trace:
[Wed Jan 23 16:02:20 2019]  ? __schedule+0x2b7/0x880
[Wed Jan 23 16:02:20 2019]  ? futex_wait_queue_me+0xd3/0x120
[Wed Jan 23 16:02:20 2019]  schedule+0x28/0x80
[Wed Jan 23 16:02:20 2019]  rwsem_down_write_failed+0x15e/0x350
[Wed Jan 23 16:02:20 2019]  ? call_rwsem_down_write_failed+0x13/0x20
[Wed Jan 23 16:02:20 2019]  call_rwsem_down_write_failed+0x13/0x20
[Wed Jan 23 16:02:20 2019]  down_write+0x29/0x40
[Wed Jan 23 16:02:20 2019]  ext4_file_write_iter+0x96/0x3e0
[Wed Jan 23 16:02:20 2019]  ? __sys_sendto+0xac/0x140
[Wed Jan 23 16:02:20 2019]  __vfs_write+0x112/0x1a0
[Wed Jan 23 16:02:20 2019]  vfs_write+0xad/0x1a0
[Wed Jan 23 16:02:20 2019]  ksys_pwrite64+0x71/0x90
[Wed Jan 23 16:02:20 2019]  do_syscall_64+0x55/0x110
[Wed Jan 23 16:02:20 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Wed Jan 23 16:02:20 2019] RIP: 0033:0x7ffa3666ad23
[Wed Jan 23 16:02:20 2019] Code: Bad RIP value.
[Wed Jan 23 16:02:20 2019] RSP: 002b:00007ffa31285a50 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
[Wed Jan 23 16:02:20 2019] RAX: ffffffffffffffda RBX: 0000000000000200 RCX: 00007ffa3666ad23
[Wed Jan 23 16:02:20 2019] RDX: 0000000000000200 RSI: 00007ffa31285b30 RDI: 0000000000000013
[Wed Jan 23 16:02:20 2019] RBP: 0000000000000200 R08: 00007ffa31285b08 R09: 00007ffa312859f0
[Wed Jan 23 16:02:20 2019] R10: 00000003c2276000 R11: 0000000000000293 R12: 0000000000000000
[Wed Jan 23 16:02:20 2019] R13: 00000003c2276000 R14: 0000000000000013 R15: 00007ffa31285b30
```
Comment 29 Gaetan Trellu 2019-01-24 16:24:18 UTC
The issue came back too...

The only way so for to avoid the crashes has been to switch the card from RAID to HBA but at performances cost.
Comment 30 Don 2019-03-22 19:43:52 UTC
I have been re-writing the eh_reset path recently.

I found a race condition between the completion handler and the reset handler.

I have an update that I have been aggressively testing that seems to be holding up.

I am testing using 7 volumes: 2 SAS HBAs, 2 SATA HBAs, 1 Smart Path enabled R5, and 2 LVs.

My tests consist of issuing resets (sg_reset -d) to all volumes repeatedly while doing:
1. mke2fs to all 7 in parallel
2. mount all volumes in parallel
3. rsync data to all volumes in parallel
4. umount all volumes in parallel
5. fsck all volumes in parallel.

Before my update, I was having the above issue.

I intend to run the above tests over the weekend, then submit the update internally for code review/testing before I send the update to kernel.org.

Thanks,
Don Brace
Comment 31 Anthony Hausman 2019-03-25 09:08:22 UTC
(In reply to Don from comment #30)
> I have been re-writing the eh_reset path recently.
> 
> I found a race condition between the completion handler and the reset
> handler.
> 
> I have an update that I have been aggressively testing that seems to be
> holding up.
> 
> I am testing using 7 volumes: 2 SAS HBAs, 2 SATA HBAs, 1 Smart Path enabled
> R5, and 2 LVs.
> 
> My tests consist of issuing resets (sg_reset -d) to all volumes repeatedly
> while doing:
> 1. mke2fs to all 7 in parallel
> 2. mount all volumes in parallel
> 3. rsync data to all volumes in parallel
> 4. umount all volumes in parallel
> 5. fsck all volumes in parallel.
> 
> Before my update, I was having the above issue.
> 
> I intend to run the above tests over the weekend, then submit the update
> internally for code review/testing before I send the update to kernel.org.
> 
> Thanks,
> Don Brace

Super news Don, 
Please, let me know when the patch/update will be available. I'm interested to test it.

Thanks and regards,
Comment 32 Don 2019-04-23 20:36:22 UTC
Created attachment 282479 [details]
Patch to correct resets

    hpsa: correct device resets
    
    - Correct a race condition that occurs between the
      reset handler and the completion handler. There
      are times when the wait_event condition is never
      met due to this race condition and the reset never
      completes.
    
      The reset_pending field is NULL initially.
    
      t  Reset Handler Thread     Completion Thread
      -- --------------------     -----------------
      t1                          if (c->reset_pending)
      t2 c->reset_pending = dev;     if (atomic_dev_and_test(counter))
      t3 atomic_inc(counter)             wait_up_all(event_sync_wait_queue)
      t4
      t5 wait_event(...counter == 0)
    
    Kernel.org Bugzilla:
               https://bugzilla.kernel.org/show_bug.cgi?id=1994350
               Bug 199435 - HPSA + P420i resetting logical Direct-Access
                            never complete
    

Here is the patch I am preparing to send up to kernel.org. I have been testing this patch for some time now and I feel it is ready.
Comment 33 Anthony Hausman 2019-05-07 08:39:23 UTC
Hi Don,

Did you send it to kernel.org?
Any idea when the patch will be available in the kernel?

Is the patch compatible with the last kernel longterm release 4.19?

Thanks in advance for your response.
Comment 34 Don 2019-05-07 14:36:37 UTC
(In reply to Anthony Hausman from comment #33)
> Hi Don,
> 
> Did you send it to kernel.org?
> Any idea when the patch will be available in the kernel?
> 
> Is the patch compatible with the last kernel longterm release 4.19?
> 
> Thanks in advance for your response.

I will be sending the patch up today.

It will apply if all of the patches I am sending up are applied. 
Otherwise it will have to have some minor porting done to have it applied.

The patches I am sending up are:

hpsa-correct-simple-mode
hpsa-use-local-workqueues-instead-of-system-workqueues
hpsa-check-for-tag-collision
hpsa-wait-longer-for-ptraid-commands
hpsa-do-not-complete-cmds-for-deleted-devices
hpsa-correct-device-resets
hpsa-update-driver-version
Comment 35 Andrey Voronkov 2020-06-17 08:48:55 UTC
Hi guys, I'm from the future. It's 2020 already!
I have similar (exact the same?) problem with HPSA P410 already on two of my nodes with Kernel 5.7.1-1.el7.elrepo.x86_64

Here are the logs:

2020-06-16T14:59:00.8117 warning kern kernel  [679613.058375] hpsa 0000:06:00.0: scsi 0:1:0:0: resetting logical  Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
2020-06-16T14:59:23.3999 info kern kernel  [679635.647794] libceph: osd0 down
2020-06-16T14:59:23.3999 info kern kernel  [679635.648599] libceph: osd6 down
2020-06-16T14:59:24.4468 warning kern kernel  [679636.694762] rbd: rbd1: encountered watch error: -107
2020-06-16T14:59:24.4886 warning kern kernel  [679636.736747] rbd: rbd2: encountered watch error: -107
2020-06-16T14:59:28.4377 info kern kernel  [679640.685700] libceph: osd5 down
2020-06-16T14:59:36.6272 warning kern kernel  [679648.874179] hpsa 0000:06:00.0: Controller lockup detected: 0x0015002f after 30
2020-06-16T14:59:36.6272 warning kern kernel  [679648.875554] hpsa 0000:06:00.0: controller lockup detected: LUN:0000004000000000 CDB:01040000000000000000000000000000
2020-06-16T14:59:36.6272 warning kern kernel  [679648.875591] hpsa 0000:06:00.0: failed 15 commands in fail_all
2020-06-16T14:59:36.6272 warning kern kernel  [679648.876650] hpsa 0000:06:00.0: Controller lockup detected during reset wait
2020-06-16T14:59:36.6272 warning kern kernel  [679648.876655] hpsa 0000:06:00.0: scsi 0:1:0:0: reset logical  failed Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap- En- Exp=1
2020-06-16T14:59:36.6272 info kern kernel  [679648.876667] sd 0:1:0:2: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6272 info kern kernel  [679648.876672] sd 0:1:0:0: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6348 info kern kernel  [679648.883168] sd 0:1:0:0: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6348 info kern kernel  [679648.884214] sd 0:1:0:0: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6357 info kern kernel  [679648.885286] sd 0:1:0:1: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6367 info kern kernel  [679648.886297] sd 0:1:0:0: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6377 info kern kernel  [679648.887301] sd 0:1:0:2: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6395 info kern kernel  [679648.888269] sd 0:1:0:2: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6395 info kern kernel  [679648.889193] sd 0:1:0:3: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6419 info kern kernel  [679648.890076] sd 0:1:0:0: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6419 info kern kernel  [679648.891496] sd 0:1:0:0: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6447 info kern kernel  [679648.893012] sd 0:1:0:0: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6447 info kern kernel  [679648.894114] sd 0:1:0:0: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6466 info kern kernel  [679648.895182] sd 0:1:0:0: Device offlined - not ready after error recovery
2020-06-16T14:59:36.6466 info kern kernel  [679648.896204] sd 0:1:0:2: [sdc] tag#477 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=66s
2020-06-16T14:59:36.6489 info kern kernel  [679648.897223] sd 0:1:0:2: [sdc] tag#477 CDB: Read(10) 28 00 00 ed 13 90 00 00 08 00
2020-06-16T14:59:36.6489 err kern kernel  [679648.898309] blk_update_request: I/O error, dev sdc, sector 15537040 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
2020-06-16T14:59:36.6523 info kern kernel  [679648.899489] sd 0:1:0:0: [sda] tag#469 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=70s
2020-06-16T14:59:36.6523 err kern kernel  [679648.899956] sd 0:1:0:0: rejecting I/O to offline device
2020-06-16T14:59:36.6523 info kern kernel  [679648.900659] sd 0:1:0:0: [sda] tag#469 CDB: Read(10) 28 00 14 78 2f e0 00 00 10 00
2020-06-16T14:59:36.6524 err kern kernel  [679648.901820] blk_update_request: I/O error, dev sda, sector 929142240 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
2020-06-16T14:59:36.6537 err kern kernel  [679648.903092] blk_update_request: I/O error, dev sda, sector 343420896 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
2020-06-16T14:59:36.6537 info kern kernel  [679648.903138] sd 0:1:0:0: [sda] tag#470 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=70s
Comment 36 Andrey Voronkov 2020-06-17 08:54:51 UTC
/usr/sbin/hpssacli ctrl all show detail

Smart Array P410 in Slot 1
   Bus Interface: PCI
   Slot: 1
   Serial Number: PACCRID122902CV
   Cache Serial Number: PBCDH0CRH2K24K
   Controller Status: OK
   Hardware Revision: C
   Firmware Version: 6.64
   Rebuild Priority: Medium
   Expand Priority: Medium
   Surface Scan Delay: 3 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 15 secs
   Cache Board Present: True
   Cache Status: OK
   Cache Status Details: The current array controller had valid data stored in its battery/capacitor backed write cache the last time it was reset or was powered up.  This indicates that the system may not have been shut down gracefully.  The array controller has automatically written, or has attempted to write, this data to the drives.  This message will continue to be displayed until the next reset or power-cycle of the array controller.
   Cache Ratio: 25% Read / 75% Write
   Drive Write Cache: Disabled
   Total Cache Size: 512 MB
   Total Cache Memory Available: 400 MB
   No-Battery Write Cache: Disabled
   Cache Backup Power Source: Capacitors
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   SATA NCQ Supported: True
   Number of Ports: 2 Internal only
   Driver Name: hpsa
   Driver Version: 3.4.20
   Driver Supports HPE SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:06:00.0
   Sanitize Erase Supported: False
   Primary Boot Volume: logicaldrive 1 (600508B1001C3DAA9705279AD5D8DABA)
   Secondary Boot Volume: None
Comment 37 Andrey Voronkov 2020-06-17 09:16:48 UTC
Seems like related: https://bugzilla.kernel.org/show_bug.cgi?id=199435
Comment 38 Andrey Voronkov 2020-06-17 09:18:01 UTC
Sry ^ - wrong thread. Created new bug as the controller differs in my case: https://bugzilla.kernel.org/show_bug.cgi?id=208215

Note You need to log in before you can comment on or make changes to this bug.