Bug 208045

Summary: ARM ubuntu 18.04 as the iscsi server, using initiator login, the kernel crashes
Product: IO/Storage Reporter: JiangYu (lnsyyj)
Component: OtherAssignee: scsi_drivers-other
Status: NEW ---    
Severity: high CC: bostroesser, lnsyyj
Priority: P1    
Hardware: ARM   
OS: Linux   
Kernel Version: 4.19.118, 5.4.44, 5.6.16 Subsystem:
Regression: No Bisected commit-id:
Attachments: Patch 1/3
Patch 2/3
Patch 3/3
Patch 2/3, correctly marked as patch
Patch 4

Description JiangYu 2020-06-04 06:17:27 UTC
[Mon Jun  1 09:16:22 2020] scsi host5: iSCSI Initiator over TCP/IP
[Mon Jun  1 09:16:22 2020] scsi host6: iSCSI Initiator over TCP/IP
[Mon Jun  1 09:16:22 2020] scsi host7: iSCSI Initiator over TCP/IP
[Mon Jun  1 09:16:22 2020] Unable to handle kernel paging request at virtual address ffff7e00008fa940
[Mon Jun  1 09:16:22 2020] Mem abort info:
[Mon Jun  1 09:16:22 2020]   ESR = 0x96000004
[Mon Jun  1 09:16:22 2020]   Exception class = DABT (current EL), IL = 32 bits
[Mon Jun  1 09:16:22 2020]   SET = 0, FnV = 0
[Mon Jun  1 09:16:22 2020]   EA = 0, S1PTW = 0
[Mon Jun  1 09:16:22 2020] Data abort info:
[Mon Jun  1 09:16:22 2020]   ISV = 0, ISS = 0x00000004
[Mon Jun  1 09:16:22 2020]   CM = 0, WnR = 0
[Mon Jun  1 09:16:22 2020] swapper pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082681c2b
[Mon Jun  1 09:16:22 2020] [ffff7e00008fa940] pgd=0000000000000000
[Mon Jun  1 09:16:22 2020] Internal error: Oops: 96000004 [#1] SMP
[Mon Jun  1 09:16:22 2020] Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod cfg80211 target_core_user uio target_core_mod openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif joydev input_leds ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas hid_generic usbhid hid ast ttm drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops
[Mon Jun  1 09:16:22 2020]  cfbcopyarea fb bcache font crc32_ce crc64 drm megaraid_sas igb ixgbe ahci drm_panel_orientation_quirks libahci i2c_designware_platform i2c_designware_core i2c_algo_bit mdio i2c_core aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[Mon Jun  1 09:16:22 2020] Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a)
[Mon Jun  1 09:16:22 2020] CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic #202004230533
[Mon Jun  1 09:16:22 2020] Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019
[Mon Jun  1 09:16:22 2020] pstate: 80400005 (Nzcv daif +PAN -UAO)
[Mon Jun  1 09:16:22 2020] pc : flush_dcache_page+0x18/0x40
[Mon Jun  1 09:16:22 2020] lr : is_ring_space_avail+0x68/0x2f8 [target_core_user]
[Mon Jun  1 09:16:22 2020] sp : ffff000015123a80
[Mon Jun  1 09:16:22 2020] x29: ffff000015123a80 x28: 0000000000000000 
[Mon Jun  1 09:16:22 2020] x27: 0000000000001000 x26: ffff000023ea5000 
[Mon Jun  1 09:16:22 2020] x25: ffffcfa25bbe08b8 x24: 0000000000000078 
[Mon Jun  1 09:16:22 2020] x23: ffff7e0000000000 x22: ffff000023ea5001 
[Mon Jun  1 09:16:22 2020] x21: ffffcfa24b79c000 x20: 0000000000000fff 
[Mon Jun  1 09:16:22 2020] x19: ffff7e00008fa940 x18: 0000000000000000 
[Mon Jun  1 09:16:22 2020] x17: 0000000000000000 x16: ffff2d047e709138 
[Mon Jun  1 09:16:22 2020] x15: 0000000000000000 x14: 0000000000000000 
[Mon Jun  1 09:16:22 2020] x13: 0000000000000000 x12: ffff2d047fbd0a40 
[Mon Jun  1 09:16:22 2020] x11: 0000000000000000 x10: 0000000000000030 
[Mon Jun  1 09:16:22 2020] x9 : 0000000000000000 x8 : ffffc9a254820a00 
[Mon Jun  1 09:16:22 2020] x7 : 00000000000013b0 x6 : 000000000000003f 
[Mon Jun  1 09:16:22 2020] x5 : 0000000000000040 x4 : ffffcfa25bbe08e8 
[Mon Jun  1 09:16:22 2020] x3 : 0000000000001000 x2 : 0000000000000078 
[Mon Jun  1 09:16:22 2020] x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18 
[Mon Jun  1 09:16:22 2020] Call trace:
[Mon Jun  1 09:16:22 2020]  flush_dcache_page+0x18/0x40
[Mon Jun  1 09:16:22 2020]  is_ring_space_avail+0x68/0x2f8 [target_core_user]
[Mon Jun  1 09:16:22 2020]  queue_cmd_ring+0x1f8/0x680 [target_core_user]
[Mon Jun  1 09:16:22 2020]  tcmu_queue_cmd+0xe4/0x158 [target_core_user]
[Mon Jun  1 09:16:22 2020]  __target_execute_cmd+0x30/0xf0 [target_core_mod]
[Mon Jun  1 09:16:22 2020]  target_execute_cmd+0x294/0x390 [target_core_mod]
[Mon Jun  1 09:16:22 2020]  transport_generic_new_cmd+0x1e8/0x358 [target_core_mod]
[Mon Jun  1 09:16:22 2020]  transport_handle_cdb_direct+0x50/0xb0 [target_core_mod]
[Mon Jun  1 09:16:22 2020]  iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod]
[Mon Jun  1 09:16:22 2020]  iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod]
[Mon Jun  1 09:16:22 2020]  iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod]
[Mon Jun  1 09:16:22 2020]  iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod]
[Mon Jun  1 09:16:22 2020]  iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod]
[Mon Jun  1 09:16:22 2020]  kthread+0x130/0x138
[Mon Jun  1 09:16:22 2020]  ret_from_fork+0x10/0x18
[Mon Jun  1 09:16:22 2020] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260) 
[Mon Jun  1 09:16:22 2020] ---[ end trace 1e451c73f4266776 ]---
[Mon Jun  1 09:16:32 2020] ------------[ cut here ]------------
[Mon Jun  1 09:16:32 2020] WARNING: CPU: 0 PID: 7495 at kernel/kthread.c:391 __kthread_bind_mask+0xb0/0xb8
[Mon Jun  1 09:16:32 2020] Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod cfg80211 target_core_user uio target_core_mod openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif joydev input_leds ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas hid_generic usbhid hid ast ttm drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops
[Mon Jun  1 09:16:32 2020]  cfbcopyarea fb bcache font crc32_ce crc64 drm megaraid_sas igb ixgbe ahci drm_panel_orientation_quirks libahci i2c_designware_platform i2c_designware_core i2c_algo_bit mdio i2c_core aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[Mon Jun  1 09:16:32 2020] CPU: 0 PID: 7495 Comm: iscsi_ttx Tainted: G      D           4.19.118-0419118-generic #202004230533
[Mon Jun  1 09:16:32 2020] Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019
[Mon Jun  1 09:16:32 2020] pstate: 40400005 (nZcv daif +PAN -UAO)
[Mon Jun  1 09:16:33 2020] pc : __kthread_bind_mask+0xb0/0xb8
[Mon Jun  1 09:16:33 2020] lr : __kthread_bind_mask+0xb0/0xb8
[Mon Jun  1 09:16:33 2020] sp : ffff00001511bca0
[Mon Jun  1 09:16:33 2020] x29: ffff00001511bca0 x28: 0000000000000000 
[Mon Jun  1 09:16:33 2020] x27: ffffcfa05e811838 x26: ffff2d047f4e0a18 
[Mon Jun  1 09:16:33 2020] x25: ffff2d040bf06750 x24: 0000000000000008 
[Mon Jun  1 09:16:33 2020] x23: ffff2d047fbc8708 x22: ffff2d040bef25d8 
[Mon Jun  1 09:16:33 2020] x21: ffff2d047f202368 x20: 0000000000000040 
[Mon Jun  1 09:16:33 2020] x19: ffffcfa2080ee3c0 x18: 0000000000000010 
[Mon Jun  1 09:16:33 2020] x17: 0000000000000000 x16: 0000000000000000 
[Mon Jun  1 09:16:33 2020] x15: ffffffffffffffff x14: ffff2d047fbc8708 
[Mon Jun  1 09:16:33 2020] x13: ffff2d04ffd878b7 x12: ffff2d047fd878bf 
[Mon Jun  1 09:16:33 2020] x11: ffff2d047fbed000 x10: ffff00001511b980 
[Mon Jun  1 09:16:33 2020] x9 : 00000000ffffffd0 x8 : ffff2d047ed8e3f8 
[Mon Jun  1 09:16:33 2020] x7 : 5d20657265682074 x6 : ffffc9a27ff33158 
[Mon Jun  1 09:16:33 2020] x5 : ffffc9a27ff33158 x4 : 0000000000000000 
[Mon Jun  1 09:16:33 2020] x3 : ffffc9a27ff3bf88 x2 : 5879f7f686901900 
[Mon Jun  1 09:16:33 2020] x1 : 0000000000000000 x0 : 0000000000000024 
[Mon Jun  1 09:16:33 2020] Call trace:
[Mon Jun  1 09:16:33 2020]  __kthread_bind_mask+0xb0/0xb8
[Mon Jun  1 09:16:33 2020]  kthread_unpark+0x8c/0x90
[Mon Jun  1 09:16:33 2020]  kthread_stop+0x60/0x198
[Mon Jun  1 09:16:33 2020]  iscsit_close_connection+0x428/0x998 [iscsi_target_mod]
[Mon Jun  1 09:16:33 2020]  iscsit_take_action_for_connection_exit+0xc0/0x190 [iscsi_target_mod]
[Mon Jun  1 09:16:33 2020]  iscsi_target_tx_thread+0x180/0x200 [iscsi_target_mod]
[Mon Jun  1 09:16:33 2020]  kthread+0x130/0x138
[Mon Jun  1 09:16:33 2020]  ret_from_fork+0x10/0x18
[Mon Jun  1 09:16:33 2020] ---[ end trace 1e451c73f4266777 ]---
[Mon Jun  1 09:16:37 2020]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295230129, last ping 4295231424, now 4295232704
[Mon Jun  1 09:16:38 2020]  connection2:0: detected conn error (1022)
[Mon Jun  1 09:16:38 2020]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295230126, last ping 4295231424, now 4295232708
[Mon Jun  1 09:16:38 2020]  connection1:0: detected conn error (1022)
[Mon Jun  1 09:16:38 2020]  connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295230144, last ping 4295230144, now 4295232712
[Mon Jun  1 09:16:38 2020]  connection3:0: detected conn error (1022)
[Mon Jun  1 09:16:55 2020] iSCSI Login timeout on Network Portal 192.168.1.10:3260
[Mon Jun  1 09:17:44 2020] scsi 6:0:0:0: timing out command, waited 82s
[Mon Jun  1 09:17:44 2020] scsi 7:0:0:0: timing out command, waited 82s
[Mon Jun  1 09:17:44 2020] scsi 5:0:0:0: timing out command, waited 82s
[Mon Jun  1 09:18:38 2020]  session3: session recovery timed out after 120 secs
[Mon Jun  1 09:18:38 2020]  session1: session recovery timed out after 120 secs
[Mon Jun  1 09:18:38 2020]  session2: session recovery timed out after 120 secs
Comment 1 JiangYu 2020-06-04 06:22:29 UTC
Hello everyone, I'm using ARM ubuntu server 18.04 and I want to use ceph + tcmu. I tried to give feedback to the tcmu-runner team, but the problem may be in the kernel.

The link is as follows:
https://github.com/open-iscsi/tcmu-runner/issues/627

I have tried to upgrade the kernel to 5.4.44 and 5.6.16 on ARM ubuntu, the same problem
Comment 2 Bodo Stroesser 2020-06-09 13:48:33 UTC
Maybe I can help.


I already sent two patches to linux-scsi and target-devel, which I think fix possible crashes on ARM. Patch1 already was queued by Martin K. Petersen in his for-next branch. The second was sent as "PATCH RFC" and there is no comment yet.


Based on these two patches I prepared a third patch that might fix the described crash. I'm going to attach all three patches here.

Please test with all three patches applied. I think they should work with kernel 4.19 also.

Regards,
Bodo
Comment 3 Bodo Stroesser 2020-06-09 13:49:54 UTC
Created attachment 289575 [details]
Patch 1/3
Comment 4 Bodo Stroesser 2020-06-09 13:50:20 UTC
Created attachment 289577 [details]
Patch 2/3
Comment 5 Bodo Stroesser 2020-06-09 13:51:14 UTC
Created attachment 289579 [details]
Patch 3/3
Comment 6 Bodo Stroesser 2020-06-09 13:53:26 UTC
Created attachment 289581 [details]
Patch 2/3, correctly marked as patch
Comment 7 JiangYu 2020-06-11 10:10:59 UTC
Hi Bodo,

I tested these three patches. When logging in to the backend target, the kernel did not crash and worked well.
thank you very much!
Comment 8 Bodo Stroesser 2020-06-16 09:54:33 UTC
I will submit patches 2 and 3 to linux-scsi and target-devel.

Patch 1 already made it into 5.8-rc1
Comment 9 JiangYu 2020-06-17 07:48:34 UTC
I used the patch of 4.19.118 for testing and it worked well.
Comment 10 JiangYu 2020-06-22 02:33:01 UTC
[Fri Jun 19 11:19:46 2020] scsi host5: iSCSI Initiator over TCP/IP
[Fri Jun 19 11:19:46 2020] iSCSI Initiator Node: iqn.1993-08.org.debian:01:b27579df472 is not authorized to access iSCSI target portal group: 1.
[Fri Jun 19 11:19:47 2020] iSCSI Login negotiation failed.
[Fri Jun 19 11:21:33 2020] scsi host5: iSCSI Initiator over TCP/IP
[Fri Jun 19 11:23:32 2020] scsi host5: iSCSI Initiator over TCP/IP
[Fri Jun 19 11:23:32 2020] scsi 5:0:0:0: Direct-Access     LIO-ORG  TCMU device      0002 PQ: 0 ANSI: 5
[Fri Jun 19 11:23:32 2020] sd 5:0:0:0: Attached scsi generic sg11 type 0
[Fri Jun 19 11:23:32 2020] sd 5:0:0:0: [sdk] 2147483648 512-byte logical blocks: (1.10 TB/1.00 TiB)
[Fri Jun 19 11:23:32 2020] sd 5:0:0:0: [sdk] Write Protect is off
[Fri Jun 19 11:23:32 2020] sd 5:0:0:0: [sdk] Mode Sense: 2f 00 00 00
[Fri Jun 19 11:23:32 2020] sd 5:0:0:0: [sdk] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[Fri Jun 19 11:23:32 2020] sd 5:0:0:0: [sdk] Optimal transfer size 524288 bytes
[Fri Jun 19 11:23:32 2020] sd 5:0:0:0: [sdk] Attached SCSI disk
[Fri Jun 19 18:17:11 2020] EXT4-fs (sdk): mounted filesystem with ordered data mode. Opts: (null)
[Mon Jun 22 00:00:14 2020] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000
[Mon Jun 22 00:00:14 2020] Mem abort info:
[Mon Jun 22 00:00:14 2020]   ESR = 0x96000004
[Mon Jun 22 00:00:14 2020]   Exception class = DABT (current EL), IL = 32 bits
[Mon Jun 22 00:00:14 2020]   SET = 0, FnV = 0
[Mon Jun 22 00:00:14 2020]   EA = 0, S1PTW = 0
[Mon Jun 22 00:00:14 2020] Data abort info:
[Mon Jun 22 00:00:14 2020]   ISV = 0, ISS = 0x00000004
[Mon Jun 22 00:00:14 2020]   CM = 0, WnR = 0
[Mon Jun 22 00:00:14 2020] user pgtable: 4k pages, 48-bit VAs, pgdp = 000000002c314fa8
[Mon Jun 22 00:00:14 2020] [0000000000000000] pgd=0000000000000000
[Mon Jun 22 00:00:14 2020] Internal error: Oops: 96000004 [#1] SMP
[Mon Jun 22 00:00:14 2020] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache target_core_pscsi target_core_file target_core_iblock iscsi_target_mod cfg80211 target_core_user uio target_core_mod openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nls_iso8859_1 ipmi_ssif joydev input_leds ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel sunrpc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas ast ttm drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb
[Mon Jun 22 00:00:15 2020]  font drm ixgbe igb drm_panel_orientation_quirks i2c_algo_bit mdio bcache crc32_ce crc64 hid_generic i2c_designware_platform ahci i2c_designware_core megaraid_sas libahci i2c_core usbhid hid aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[Mon Jun 22 00:00:15 2020] Process tp_librbd (pid: 5707, stack limit = 0x00000000d11d520b)
[Mon Jun 22 00:00:15 2020] CPU: 52 PID: 5707 Comm: tp_librbd Not tainted 4.19.118-maxcn #1
[Mon Jun 22 00:00:15 2020] Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019
[Mon Jun 22 00:00:15 2020] pstate: 20400005 (nzCv daif +PAN -UAO)
[Mon Jun 22 00:00:15 2020] pc : flush_dcache_page+0x18/0x40
[Mon Jun 22 00:00:15 2020] lr : tcmu_handle_completions+0xc4/0x4a0 [target_core_user]
[Mon Jun 22 00:00:15 2020] sp : ffff00002245bc10
[Mon Jun 22 00:00:15 2020] x29: ffff00002245bc10 x28: ffffcd70fb93d580 
[Mon Jun 22 00:00:15 2020] x27: ffff0000223c5000 x26: ffff363403dcf680 
[Mon Jun 22 00:00:15 2020] x25: ffffc771da0393e0 x24: 0000000000000000 
[Mon Jun 22 00:00:15 2020] x23: ffff000021bc5000 x22: ffffc771da038000 
[Mon Jun 22 00:00:15 2020] x21: ffff0000223c4fd0 x20: 00000000007fffd0 
[Mon Jun 22 00:00:15 2020] x19: 0000000000000000 x18: 0000000000000000 
[Mon Jun 22 00:00:15 2020] x17: 0000000000000000 x16: ffff36340eba9e00 
[Mon Jun 22 00:00:15 2020] x15: 0000000000000000 x14: 0000000000000000 
[Mon Jun 22 00:00:15 2020] x13: 0000000000000000 x12: 0000000000000000 
[Mon Jun 22 00:00:15 2020] x11: 0000000000000000 x10: 0000000000000000 
[Mon Jun 22 00:00:15 2020] x9 : 0000000000000000 x8 : 00000000000013e0 
[Mon Jun 22 00:00:15 2020] x7 : 0000000000000000 x6 : ffff00002245bcf8 
[Mon Jun 22 00:00:15 2020] x5 : ffff00002245bcf8 x4 : 0000000000000000 
[Mon Jun 22 00:00:15 2020] x3 : ffff36340fab0000 x2 : ffffb91200000000 
[Mon Jun 22 00:00:15 2020] x1 : 0000000000000000 x0 : ffff363403dc8f6c 
[Mon Jun 22 00:00:15 2020] Call trace:
[Mon Jun 22 00:00:15 2020]  flush_dcache_page+0x18/0x40
[Mon Jun 22 00:00:15 2020]  tcmu_handle_completions+0xc4/0x4a0 [target_core_user]
[Mon Jun 22 00:00:15 2020]  tcmu_irqcontrol+0x34/0x58 [target_core_user]
[Mon Jun 22 00:00:15 2020]  uio_write+0xb8/0x138 [uio]
[Mon Jun 22 00:00:15 2020]  __vfs_write+0x60/0x190
[Mon Jun 22 00:00:15 2020]  vfs_write+0xac/0x1b0
[Mon Jun 22 00:00:15 2020]  ksys_write+0x74/0xf0
[Mon Jun 22 00:00:15 2020]  __arm64_sys_write+0x24/0x30
[Mon Jun 22 00:00:15 2020]  el0_svc_common+0x88/0x180
[Mon Jun 22 00:00:15 2020]  el0_svc_handler+0x38/0x78
[Mon Jun 22 00:00:15 2020]  el0_svc+0x8/0xc
[Mon Jun 22 00:00:15 2020] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260) 
[Mon Jun 22 00:00:15 2020] ---[ end trace cdb72dbc3b2a8038 ]---
[Mon Jun 22 00:01:16 2020] ABORT_TASK: Found referenced iSCSI task_tag: 42
[Mon Jun 22 00:01:16 2020] ------------[ cut here ]------------
[Mon Jun 22 00:01:16 2020] WARNING: CPU: 7 PID: 746959 at kernel/workqueue.c:2919 __flush_work+0x260/0x290
[Mon Jun 22 00:01:16 2020] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache target_core_pscsi target_core_file target_core_iblock iscsi_target_mod cfg80211 target_core_user uio target_core_mod openvswitch nsh nf_nat_ipv6 nf_nat_ipv4 nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nls_iso8859_1 ipmi_ssif joydev input_leds ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel sunrpc ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas ast ttm drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb
[Mon Jun 22 00:01:16 2020]  font drm ixgbe igb drm_panel_orientation_quirks i2c_algo_bit mdio bcache crc32_ce crc64 hid_generic i2c_designware_platform ahci i2c_designware_core megaraid_sas libahci i2c_core usbhid hid aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64
[Mon Jun 22 00:01:16 2020] CPU: 7 PID: 746959 Comm: kworker/u128:0 Tainted: G      D           4.19.118-maxcn #1
[Mon Jun 22 00:01:16 2020] Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019
[Mon Jun 22 00:01:16 2020] Workqueue: tmr-user target_tmr_work [target_core_mod]
[Mon Jun 22 00:01:16 2020] pstate: 40400005 (nZcv daif +PAN -UAO)
[Mon Jun 22 00:01:16 2020] pc : __flush_work+0x260/0x290
[Mon Jun 22 00:01:16 2020] lr : __flush_work+0x260/0x290
[Mon Jun 22 00:01:16 2020] sp : ffff0000167cbbd0
[Mon Jun 22 00:01:16 2020] x29: ffff0000167cbbd0 x28: 0000000000000000 
[Mon Jun 22 00:01:16 2020] x27: ffffccef4ebc68b8 x26: ffff0000167cbce0 
[Mon Jun 22 00:01:16 2020] x25: ffff3634100b6a40 x24: 0000000000000000 
[Mon Jun 22 00:01:16 2020] x23: ffff363410252000 x22: 0000000000000000 
[Mon Jun 22 00:01:16 2020] x21: ffff363410098000 x20: ffffcd6fad1b1550 
[Mon Jun 22 00:01:16 2020] x19: ffffcd6fad1b1550 x18: ffffffffffffffff 
[Mon Jun 22 00:01:16 2020] x17: 0000000000000000 x16: 0000000000000000 
[Mon Jun 22 00:01:16 2020] x15: ffff363410098708 x14: ffff3634902578af 
[Mon Jun 22 00:01:16 2020] x13: ffff3634102578be x12: ffff3634100bd000 
[Mon Jun 22 00:01:16 2020] x11: 0000000005f5e0ff x10: ffff363410099168 
[Mon Jun 22 00:01:16 2020] x9 : ffff36340fc7d018 x8 : ffff36340f24d010 
[Mon Jun 22 00:01:16 2020] x7 : 5d20657265682074 x6 : ffffc771fffe2158 
[Mon Jun 22 00:01:16 2020] x5 : ffffc771fffe2158 x4 : 0000000000000000 
[Mon Jun 22 00:01:16 2020] x3 : ffffc771fffeaf88 x2 : 203f2659feac9b00 
[Mon Jun 22 00:01:16 2020] x1 : 0000000000000000 x0 : 0000000000000024 
[Mon Jun 22 00:01:16 2020] Call trace:
[Mon Jun 22 00:01:16 2020]  __flush_work+0x260/0x290
[Mon Jun 22 00:01:16 2020]  __cancel_work_timer+0x134/0x1a8
[Mon Jun 22 00:01:16 2020]  cancel_work_sync+0x24/0x30
[Mon Jun 22 00:01:16 2020]  core_tmr_abort_task+0xfc/0x1b0 [target_core_mod]
[Mon Jun 22 00:01:16 2020]  target_tmr_work+0x108/0x1d8 [target_core_mod]
[Mon Jun 22 00:01:16 2020]  process_one_work+0x1f0/0x428
[Mon Jun 22 00:01:16 2020]  worker_thread+0x44/0x488
[Mon Jun 22 00:01:16 2020]  kthread+0x134/0x138
[Mon Jun 22 00:01:16 2020]  ret_from_fork+0x10/0x18
[Mon Jun 22 00:01:16 2020] ---[ end trace cdb72dbc3b2a8039 ]---
[Mon Jun 22 00:01:31 2020]  connection3:0: detected conn error (1021)
[Mon Jun 22 00:01:31 2020]  connection3:0: detected conn error (1021)
[Mon Jun 22 00:01:48 2020] iSCSI Login timeout on Network Portal 192.168.1.201:3260
[Mon Jun 22 00:03:31 2020]  session3: session recovery timed out after 120 secs
[Mon Jun 22 00:03:31 2020] sd 5:0:0:0: Device offlined - not ready after error recovery
[Mon Jun 22 00:03:31 2020] sd 5:0:0:0: [sdk] tag#57 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[Mon Jun 22 00:03:31 2020] sd 5:0:0:0: [sdk] tag#57 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00
[Mon Jun 22 00:03:31 2020] print_req_error: I/O error, dev sdk, sector 1727496192
[Mon Jun 22 00:03:31 2020] sd 5:0:0:0: rejecting I/O to offline device
[Mon Jun 22 00:03:31 2020] print_req_error: I/O error, dev sdk, sector 1727430656
[Mon Jun 22 00:03:31 2020] sd 5:0:0:0: rejecting I/O to offline device
[Mon Jun 22 00:03:31 2020] print_req_error: I/O error, dev sdk, sector 1727463424
[Mon Jun 22 10:07:20 2020] sd 5:0:0:0: rejecting I/O to offline device
[Mon Jun 22 10:07:20 2020] print_req_error: I/O error, dev sdk, sector 1074326448
[Mon Jun 22 10:07:20 2020] Aborting journal on device sdk-8.
[Mon Jun 22 10:07:20 2020] sd 5:0:0:0: rejecting I/O to offline device
[Mon Jun 22 10:07:20 2020] print_req_error: I/O error, dev sdk, sector 1074003968
[Mon Jun 22 10:07:21 2020] Buffer I/O error on dev sdk, logical block 134250496, lost sync page write
[Mon Jun 22 10:07:21 2020] JBD2: Error -5 detected when updating journal superblock for sdk-8.
[Mon Jun 22 10:25:43 2020] sd 5:0:0:0: rejecting I/O to offline device
[Mon Jun 22 10:25:43 2020] print_req_error: I/O error, dev sdk, sector 0
[Mon Jun 22 10:25:43 2020] Buffer I/O error on dev sdk, logical block 0, lost sync page write
[Mon Jun 22 10:25:43 2020] EXT4-fs (sdk): I/O error while writing superblock
[Mon Jun 22 10:25:43 2020] EXT4-fs error (device sdk): ext4_journal_check_start:61: Detected aborted journal
[Mon Jun 22 10:25:43 2020] EXT4-fs (sdk): Remounting filesystem read-only
[Mon Jun 22 10:25:43 2020] sd 5:0:0:0: rejecting I/O to offline device
[Mon Jun 22 10:25:43 2020] print_req_error: I/O error, dev sdk, sector 0
[Mon Jun 22 10:25:43 2020] Buffer I/O error on dev sdk, logical block 0, lost sync page write
[Mon Jun 22 10:25:43 2020] EXT4-fs (sdk): I/O error while writing superblock


Hello, after a few days of testing, I found another problem. When I use the initiator login target, and format the ext4 file system, and write some data, there will be an error as reported above.
Comment 11 Bodo Stroesser 2020-06-22 10:38:40 UTC
Ah, I think I know what's going wrong. I think this happens only during commmand completion if there is a padding at the end of the tcmu command ring which is shorter than sizeof(struct tcmu_cmd_entry).

Unfortunately I missed the problem at this place while I fixed a very similar problem during creation of a padding entry at the end of the ring :(


I'll come up with a patch soon.
Comment 12 Bodo Stroesser 2020-06-22 12:05:18 UTC
Created attachment 289787 [details]
Patch 4
Comment 13 Bodo Stroesser 2020-06-22 12:06:29 UTC
I attached patch 4. Please test it.
Comment 14 JiangYu 2020-06-24 04:56:55 UTC
I upgraded patch4 on kernel 4.19.118 and tested the read and write for a day. No problems encountered again.

Thank you!
Comment 15 Bodo Stroesser 2020-06-24 08:54:49 UTC
I just sent the patch to linux-scsi and target-devel.