Bug 195285 - qla2xxx FW immediatly crashing after target start
Summary: qla2xxx FW immediatly crashing after target start
Status: RESOLVED CODE_FIX
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: QLOGIC QLA2XXX (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: scsi_drivers-qla2xxx
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-08 00:11 UTC by Anthony
Modified: 2017-10-18 03:26 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.9.10-200.fc25.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Patch to address target configuration for ISP25xx (1.32 KB, application/mbox)
2017-05-18 18:10 UTC, himanshu.madhani@cavium.com
Details

Description Anthony 2017-04-08 00:11:28 UTC
System always become unresposive after target start with messages

qla2xxx [0000:07:00.0]-00fb:1: QLogic QLE2564 - PCI-Express Quad Channel 8Gb Fibre Channel HBA.
qla2xxx [0000:07:00.0]-00fc:1: ISP2532: PCIe (5.0GT/s x8) @ 0000:07:00.0 hdma+ host#=1 fw=8.06.00 (90d5).
qla2xxx [0000:07:00.1]-001a: : MSI-X vector count: 32.
qla2xxx [0000:07:00.1]-001d: : Found an ISP2532 irq 103 iobase 0xffffb830c62d5000.
qla2xxx [0000:07:00.1]-504b:2: RISC paused -- HCCR=40, Dumping firmware.
qla2xxx [0000:07:00.1]-8033:2: Unable to reinitialize FCE (258).
qla2xxx [0000:07:00.1]-8034:2: Unable to reinitialize EFT (258).
qla2xxx [0000:07:00.1]-00af:2: Performing ISP error recovery - ha=ffff88a2624e0000.
qla2xxx [0000:07:00.1]-504b:2: RISC paused -- HCCR=40, Dumping firmware.

trying - kernel 4.11-rc5

Apr 07 23:39:58 : ------------[ cut here ]------------
Apr 07 23:39:58 : WARNING: CPU: 0 PID: 1468 at lib/dma-debug.c:519 add_dma_entry+0x176/0x180
Apr 07 23:39:58 : DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000013e77000
Apr 07 23:39:58 : Modules linked in: vhost_net vhost tap tun ebtable_filter ebtables ip6table_filter ip6_tables tcm_qla2xxx target_core_user uio targ
Apr 07 23:39:58 :  nvme_core scsi_transport_sas
Apr 07 23:39:58 : CPU: 0 PID: 1468 Comm: qemu-system-x86 Tainted: G        W I     4.11.0-0.rc5.git3.1.fc27.x86_64 #1
Apr 07 23:39:58 : Hardware name: HP ProLiant DL180 G6  , BIOS O20 07/01/2013
Apr 07 23:39:58 : Call Trace:
Apr 07 23:39:58 :  dump_stack+0x8e/0xd1
Apr 07 23:39:58 :  __warn+0xcb/0xf0
Apr 07 23:39:58 :  warn_slowpath_fmt+0x5a/0x80
Apr 07 23:39:58 :  ? active_cacheline_read_overlap+0x2e/0x60
Apr 07 23:39:58 :  add_dma_entry+0x176/0x180
Apr 07 23:39:58 :  debug_dma_map_sg+0x11a/0x170
Apr 07 23:39:58 :  nvme_queue_rq+0x513/0x950 [nvme]
Apr 07 23:39:58 :  blk_mq_try_issue_directly+0xbb/0x110
Apr 07 23:39:58 :  blk_mq_make_request+0x3a9/0xa70
Apr 07 23:39:58 :  ? blk_queue_enter+0xa3/0x2c0
Apr 07 23:39:58 :  ? blk_queue_enter+0x39/0x2c0
Apr 07 23:39:58 :  ? generic_make_request+0xf9/0x3b0
Apr 07 23:39:58 :  generic_make_request+0x126/0x3b0
Apr 07 23:39:58 :  ? iov_iter_get_pages+0xc9/0x330
Apr 07 23:39:58 :  submit_bio+0x73/0x150
Apr 07 23:39:58 :  ? submit_bio+0x73/0x150
Apr 07 23:39:58 :  ? bio_iov_iter_get_pages+0xe0/0x120
Apr 07 23:39:58 :  blkdev_direct_IO+0x1f7/0x3e0
Apr 07 23:39:58 :  ? SYSC_io_destroy+0x1d0/0x1d0
Apr 07 23:39:58 :  ? __atime_needs_update+0x7f/0x1a0
Apr 07 23:39:58 :  generic_file_read_iter+0x2e5/0xad0
Apr 07 23:39:58 :  ? generic_file_read_iter+0x2e5/0xad0
Apr 07 23:39:58 :  ? rw_copy_check_uvector+0x8a/0x180
Apr 07 23:39:58 :  blkdev_read_iter+0x35/0x40
Apr 07 23:39:58 :  aio_read+0xeb/0x150
Apr 07 23:39:58 :  ? sched_clock+0x9/0x10
Apr 07 23:39:58 :  ? sched_clock_cpu+0x11/0xc0
Apr 07 23:39:58 :  ? __might_fault+0x3e/0x90
Apr 07 23:39:58 :  ? __might_fault+0x3e/0x90
Apr 07 23:39:58 :  do_io_submit+0x5f8/0x920
Apr 07 23:39:58 :  ? do_io_submit+0x5f8/0x920
Apr 07 23:39:58 :  SyS_io_submit+0x10/0x20
Apr 07 23:39:58 :  ? SyS_io_submit+0x10/0x20
Apr 07 23:39:58 :  entry_SYSCALL_64_fastpath+0x1f/0xc2
Apr 07 23:39:58 : RIP: 0033:0x7f73766216a7
Apr 07 23:39:58 : RSP: 002b:00007ffc9aac6108 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
Apr 07 23:39:58 : RAX: ffffffffffffffda RBX: 000055617d90b900 RCX: 00007f73766216a7
Apr 07 23:39:58 : RDX: 00007ffc9aac6120 RSI: 0000000000000002 RDI: 00007f7377800000
Apr 07 23:39:58 : RBP: 0000000000000258 R08: 00007ffc9aac6440 R09: 000055617d9a2000
Apr 07 23:39:58 : R10: 0000556188f93cf0 R11: 0000000000000246 R12: 0000000000000280
Apr 07 23:39:58 : R13: 0000000000000130 R14: 0000000000000001 R15: 0000000000000011
Apr 07 23:39:58 : ---[ end trace 81f169903702b67d ]---
Comment 1 himanshu.madhani@cavium.com 2017-04-10 19:27:01 UTC
Hi Anthony, 

(In reply to Anthony from comment #0)
> System always become unresposive after target start with messages
> 
> qla2xxx [0000:07:00.0]-00fb:1: QLogic QLE2564 - PCI-Express Quad Channel 8Gb
> Fibre Channel HBA.
> qla2xxx [0000:07:00.0]-00fc:1: ISP2532: PCIe (5.0GT/s x8) @ 0000:07:00.0
> hdma+ host#=1 fw=8.06.00 (90d5).
> qla2xxx [0000:07:00.1]-001a: : MSI-X vector count: 32.
> qla2xxx [0000:07:00.1]-001d: : Found an ISP2532 irq 103 iobase
> 0xffffb830c62d5000.
> qla2xxx [0000:07:00.1]-504b:2: RISC paused -- HCCR=40, Dumping firmware.
> qla2xxx [0000:07:00.1]-8033:2: Unable to reinitialize FCE (258).
> qla2xxx [0000:07:00.1]-8034:2: Unable to reinitialize EFT (258).
> qla2xxx [0000:07:00.1]-00af:2: Performing ISP error recovery -
> ha=ffff88a2624e0000.
> qla2xxx [0000:07:00.1]-504b:2: RISC paused -- HCCR=40, Dumping firmware.
> 
> trying - kernel 4.11-rc5
> 
> Apr 07 23:39:58 : ------------[ cut here ]------------
> Apr 07 23:39:58 : WARNING: CPU: 0 PID: 1468 at lib/dma-debug.c:519
> add_dma_entry+0x176/0x180
> Apr 07 23:39:58 : DMA-API: exceeded 7 overlapping mappings of cacheline
> 0x0000000013e77000
> Apr 07 23:39:58 : Modules linked in: vhost_net vhost tap tun ebtable_filter
> ebtables ip6table_filter ip6_tables tcm_qla2xxx target_core_user uio targ
> Apr 07 23:39:58 :  nvme_core scsi_transport_sas
> Apr 07 23:39:58 : CPU: 0 PID: 1468 Comm: qemu-system-x86 Tainted: G        W
> I     4.11.0-0.rc5.git3.1.fc27.x86_64 #1
> Apr 07 23:39:58 : Hardware name: HP ProLiant DL180 G6  , BIOS O20 07/01/2013
> Apr 07 23:39:58 : Call Trace:
> Apr 07 23:39:58 :  dump_stack+0x8e/0xd1
> Apr 07 23:39:58 :  __warn+0xcb/0xf0
> Apr 07 23:39:58 :  warn_slowpath_fmt+0x5a/0x80
> Apr 07 23:39:58 :  ? active_cacheline_read_overlap+0x2e/0x60
> Apr 07 23:39:58 :  add_dma_entry+0x176/0x180
> Apr 07 23:39:58 :  debug_dma_map_sg+0x11a/0x170
> Apr 07 23:39:58 :  nvme_queue_rq+0x513/0x950 [nvme]
> Apr 07 23:39:58 :  blk_mq_try_issue_directly+0xbb/0x110
> Apr 07 23:39:58 :  blk_mq_make_request+0x3a9/0xa70
> Apr 07 23:39:58 :  ? blk_queue_enter+0xa3/0x2c0
> Apr 07 23:39:58 :  ? blk_queue_enter+0x39/0x2c0
> Apr 07 23:39:58 :  ? generic_make_request+0xf9/0x3b0
> Apr 07 23:39:58 :  generic_make_request+0x126/0x3b0
> Apr 07 23:39:58 :  ? iov_iter_get_pages+0xc9/0x330
> Apr 07 23:39:58 :  submit_bio+0x73/0x150
> Apr 07 23:39:58 :  ? submit_bio+0x73/0x150
> Apr 07 23:39:58 :  ? bio_iov_iter_get_pages+0xe0/0x120
> Apr 07 23:39:58 :  blkdev_direct_IO+0x1f7/0x3e0
> Apr 07 23:39:58 :  ? SYSC_io_destroy+0x1d0/0x1d0
> Apr 07 23:39:58 :  ? __atime_needs_update+0x7f/0x1a0
> Apr 07 23:39:58 :  generic_file_read_iter+0x2e5/0xad0
> Apr 07 23:39:58 :  ? generic_file_read_iter+0x2e5/0xad0
> Apr 07 23:39:58 :  ? rw_copy_check_uvector+0x8a/0x180
> Apr 07 23:39:58 :  blkdev_read_iter+0x35/0x40
> Apr 07 23:39:58 :  aio_read+0xeb/0x150
> Apr 07 23:39:58 :  ? sched_clock+0x9/0x10
> Apr 07 23:39:58 :  ? sched_clock_cpu+0x11/0xc0
> Apr 07 23:39:58 :  ? __might_fault+0x3e/0x90
> Apr 07 23:39:58 :  ? __might_fault+0x3e/0x90
> Apr 07 23:39:58 :  do_io_submit+0x5f8/0x920
> Apr 07 23:39:58 :  ? do_io_submit+0x5f8/0x920
> Apr 07 23:39:58 :  SyS_io_submit+0x10/0x20
> Apr 07 23:39:58 :  ? SyS_io_submit+0x10/0x20
> Apr 07 23:39:58 :  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Apr 07 23:39:58 : RIP: 0033:0x7f73766216a7
> Apr 07 23:39:58 : RSP: 002b:00007ffc9aac6108 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000d1
> Apr 07 23:39:58 : RAX: ffffffffffffffda RBX: 000055617d90b900 RCX:
> 00007f73766216a7
> Apr 07 23:39:58 : RDX: 00007ffc9aac6120 RSI: 0000000000000002 RDI:
> 00007f7377800000
> Apr 07 23:39:58 : RBP: 0000000000000258 R08: 00007ffc9aac6440 R09:
> 000055617d9a2000
> Apr 07 23:39:58 : R10: 0000556188f93cf0 R11: 0000000000000246 R12:
> 0000000000000280
> Apr 07 23:39:58 : R13: 0000000000000130 R14: 0000000000000001 R15:
> 0000000000000011
> Apr 07 23:39:58 : ---[ end trace 81f169903702b67d ]---

We are working internally to reproduce this issue. we'll report what we find out from reproduction. 

Thanks,
Himanshu
Comment 2 Anthony 2017-04-11 08:09:37 UTC
All my test and and production environments a the same:
on tagret:
RAID controller (HP SmartArray / Adaptec 65xx)
BCache in writeback mode on Intel SSD NVME
QLE2564 in target mode

on initiator
QLE2562

optical patches 1 meter (MM) without FC switch
Comment 3 Anthony 2017-05-16 06:14:32 UTC
Two cards installed in new test machine: QLE2462 and QLE2560
qle2560 spam with errors (without starting a target)

scsi host8: qla2xxx
qla2xxx [0000:05:00.1]-00fb:8: QLogic QLE2462 - PCI-Express Dual Channel 4Gb Fibre Channel HBA.
qla2xxx [0000:05:00.1]-00fc:8: ISP2432: PCIe (2.5GT/s x4) @ 0000:05:00.1 hdma+ host#=8 fw=8.06.00 (9496).
qla2xxx [0000:09:00.0]-001a: : MSI-X vector count: 32.
qla2xxx [0000:09:00.0]-001d: : Found an ISP2532 irq 16 iobase 0xffffba4886335000.
qla2xxx [0000:09:00.0]-504b:9: RISC paused -- HCCR=40, Dumping firmware.
qla2xxx [0000:09:00.0]-d001:9: Firmware dump saved to temp buffer (9/ffffba488c001000), dump status flags (0x3f).
qla2xxx [0000:09:00.0]-1005:9: Cmd 0x59 aborted with timeout since ISP Abort is pending
scsi host9: qla2xxx
qla2xxx [0000:09:00.0]-00fb:9: QLogic QLE2560 - PCI-Express Single Channel 8Gb Fibre Channel HBA.
qla2xxx [0000:09:00.0]-00fc:9: ISP2532: PCIe (5.0GT/s x8) @ 0000:09:00.0 hdma+ host#=9 fw=8.06.00 (90d5).
qla2xxx [0000:05:00.0]-500a:7: LOOP UP detected (4 Gbps).
qla2xxx [0000:05:00.1]-500a:8: LOOP UP detected (4 Gbps).
qla2xxx [0000:09:00.0]-00af:9: Performing ISP error recovery - ha=ffff98315ee30000.
qla2xxx [0000:09:00.0]-504b:9: RISC paused -- HCCR=40, Dumping firmware.
qla2xxx [0000:09:00.0]-d009:9: Firmware has been previously dumped (ffffba488c001000) -- ignoring request.
qla2xxx [0000:09:00.0]-504b:9: RISC paused -- HCCR=40, Dumping firmware.
Comment 4 Anthony 2017-05-16 14:45:22 UTC
I'm trying to select working setting, and some investigation shows:
with parameter ql2xmqsupport=0 - target starting to work
Comment 5 loberman 2017-05-16 15:19:08 UTC
----- Original Message -----
> From: bugzilla-daemon@bugzilla.kernel.org
> To: linux-scsi@kernel.org
> Sent: Tuesday, May 16, 2017 10:45:22 AM
> Subject: [Bug 195285] qla2xxx FW immediatly crashing after target start
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=195285
> 
> --- Comment #4 from Anthony (anthony.bloodoff@gmail.com) ---
> I'm trying to select working setting, and some investigation shows:
> with parameter ql2xmqsupport=0 - target starting to work
> 
> --
> You are receiving this mail because:
> You are watching the assignee of the bug.
> 
OK, I default to MQ so will look at testing without it later.
Comment 6 himanshu.madhani@cavium.com 2017-05-18 18:09:51 UTC
Hi Anthony, Laurence, 

Can you try attached patch to see if it works for you? 

if Yes, I'll send out to SCSI mailing list to be included into upstream. 

Thanks,
Himanshu
Comment 7 himanshu.madhani@cavium.com 2017-05-18 18:10:40 UTC
Created attachment 256619 [details]
Patch to address target configuration for ISP25xx
Comment 8 loberman 2017-05-18 18:11:51 UTC
----- Original Message -----
> From: bugzilla-daemon@bugzilla.kernel.org
> To: linux-scsi@kernel.org
> Sent: Thursday, May 18, 2017 2:09:51 PM
> Subject: [Bug 195285] qla2xxx FW immediatly crashing after target start
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=195285
> 
> --- Comment #6 from himanshu.madhani@cavium.com (himanshu.madhani@qlogic.com)
> ---
> Hi Anthony, Laurence,
> 
> Can you try attached patch to see if it works for you?
> 
> if Yes, I'll send out to SCSI mailing list to be included into upstream.
> 
> Thanks,
> Himanshu
> 
> --
> You are receiving this mail because:
> You are watching the assignee of the bug.
> 
Absolutely, and thanks
Regards
Laurence
Comment 9 Anthony 2017-05-19 10:25:45 UTC
patch work fine on 4.12.0-0.rc1 with ql2xmqsupport enabled
Comment 10 himanshu.madhani@cavium.com 2017-05-19 16:11:42 UTC
Hi Anthony, 
(In reply to Anthony from comment #9)
> patch work fine on 4.12.0-0.rc1 with ql2xmqsupport enabled

Thanks for validation. I'll send this patch to scsi tree with proper tags. 

-Himanshu
Comment 11 loberman 2017-05-19 16:43:38 UTC
----- Original Message -----
> From: "Laurence Oberman" <loberman@redhat.com>
> To: bugzilla-daemon@bugzilla.kernel.org
> Cc: linux-scsi@kernel.org
> Sent: Thursday, May 18, 2017 2:11:43 PM
> Subject: Re: [Bug 195285] qla2xxx FW immediatly crashing after target start
> 
> 
> 
> ----- Original Message -----
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: linux-scsi@kernel.org
> > Sent: Thursday, May 18, 2017 2:09:51 PM
> > Subject: [Bug 195285] qla2xxx FW immediatly crashing after target start
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=195285
> > 
> > --- Comment #6 from himanshu.madhani@cavium.com
> > (himanshu.madhani@qlogic.com)
> > ---
> > Hi Anthony, Laurence,
> > 
> > Can you try attached patch to see if it works for you?
> > 
> > if Yes, I'll send out to SCSI mailing list to be included into upstream.
> > 
> > Thanks,
> > Himanshu
> > 
> > --
> > You are receiving this mail because:
> > You are watching the assignee of the bug.
> > 
> Absolutely, and thanks
> Regards
> Laurence

Its working fine for me too now
Thanks!!
Laurence
Comment 12 himanshu.madhani@cavium.com 2017-05-19 16:44:41 UTC
(In reply to loberman from comment #11)
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman@redhat.com>
> > To: bugzilla-daemon@bugzilla.kernel.org
> > Cc: linux-scsi@kernel.org
> > Sent: Thursday, May 18, 2017 2:11:43 PM
> > Subject: Re: [Bug 195285] qla2xxx FW immediatly crashing after target start
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: bugzilla-daemon@bugzilla.kernel.org
> > > To: linux-scsi@kernel.org
> > > Sent: Thursday, May 18, 2017 2:09:51 PM
> > > Subject: [Bug 195285] qla2xxx FW immediatly crashing after target start
> > > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=195285
> > > 
> > > --- Comment #6 from himanshu.madhani@cavium.com
> > > (himanshu.madhani@qlogic.com)
> > > ---
> > > Hi Anthony, Laurence,
> > > 
> > > Can you try attached patch to see if it works for you?
> > > 
> > > if Yes, I'll send out to SCSI mailing list to be included into upstream.
> > > 
> > > Thanks,
> > > Himanshu
> > > 
> > > --
> > > You are receiving this mail because:
> > > You are watching the assignee of the bug.
> > > 
> > Absolutely, and thanks
> > Regards
> > Laurence
> 
> Its working fine for me too now
> Thanks!!
> Laurence

Thanks Laurence. Appreciate your effort on testing this out.
Comment 13 Anthony 2017-10-18 03:26:50 UTC
Patch in upstream now. Please close the bug.
Comment 14 Himanshu.Madhani 2017-10-18 03:26:59 UTC
I am OOO. I will respond to your message when i am back at work.



Thanks,

Himanshu

Note You need to log in before you can comment on or make changes to this bug.