Bug 10486 - kernel 2.6.24 4G FC arrays don't work properly with qla2xxx driver
Summary: kernel 2.6.24 4G FC arrays don't work properly with qla2xxx driver
Status: CLOSED CODE_FIX
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: QLOGIC QLA2XXX (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: scsi_drivers-qla2xxx
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-19 14:46 UTC by Andrej Filipcic
Modified: 2008-04-29 14:12 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.24+
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
2.6.23 dmesg (28.09 KB, application/octet-stream)
2008-04-22 04:02 UTC, Andrej Filipcic
Details
2.6.25 dmesg (16.37 KB, application/octet-stream)
2008-04-22 04:03 UTC, Andrej Filipcic
Details
Correct relogin logic (729 bytes, patch)
2008-04-24 14:18 UTC, Andrew Vasquez
Details | Diff

Description Andrej Filipcic 2008-04-19 14:46:03 UTC
Latest working kernel version: 2.6.23
Earliest failing kernel version: 2.6.24
Distribution: gentoo
Hardware Environment: opteron
Software Environment: x86_64
Problem Description:

With 2.6.24 and 25 there are troubles with qla2xxx module. 4G FC
arrays (Infortrend) are not working properly, while 2G FC are OK. 

Detecting 4G array gives:
 sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
 sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB)
 sd 8:0:0:0: [sdc] Write Protect is off
 sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08
 sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
 sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB)
 sd 8:0:0:0: [sdc] Write Protect is off
 sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08
 sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sdc: sdc1
 sd 8:0:0:0: [sdc] Attached SCSI disk

and later on, generated by vgscan:

 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 0
 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 44918798208
 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 44918798320
 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 0
 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 8
 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 0
 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 44918798114
 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 44918798282
 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 34
 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 42
 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdc, sector 34


2.6.23 (gentoo-sources) works fine. I am not sure whether this is a regression
in the driver or something else. The controller is
QLogic QLE220 - PCI-Express to 4Gb FC, Single Channel
ISP5432: PCIe (2.5Gb/s x4)
and firmware version is 4.02.02 (from sys-block/qla-fc-firmware-20071207)
On qlogic site there is a newer firmware (4.03.02), although I doubt the
firmware would be the cause of the problems.

To be honest, I have not checked 2.6.25 release but rather rc8, however there are no qla2xxx related patches since then.


Steps to reproduce: boot 2.6.24 or 25, run vgscan
Comment 1 Daniel Drake 2008-04-19 15:04:35 UTC
This is a 2.6.24 regression

Downstream bug:
https://bugs.gentoo.org/show_bug.cgi?id=214883
Comment 2 Anonymous Emailer 2008-04-20 15:21:22 UTC
Reply-To: akpm@linux-foundation.org

> On Sat, 19 Apr 2008 14:46:04 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=10486
> 
>            Summary: kernel 2.6.25  4G FC arrays don't work properly with
>                     qla2xxx driver
>            Product: SCSI Drivers
>            Version: 2.5
>      KernelVersion: 2.6.25
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: blocking
>           Priority: P1
>          Component: QLOGIC QLA2XXX
>         AssignedTo: scsi_drivers-qla2xxx@kernel-bugs.osdl.org
>         ReportedBy: andrej.filipcic@ijs.si
> 
> 
> Latest working kernel version: 2.6.23
> Earliest failing kernel version: 2.6.24
> Distribution: gentoo
> Hardware Environment: opteron
> Software Environment: x86_64
> Problem Description:
> 
> With 2.6.24 and 25 there are troubles with qla2xxx module. 4G FC
> arrays (Infortrend) are not working properly, while 2G FC are OK. 
> 
> Detecting 4G array gives:
>  sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
>  sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB)
>  sd 8:0:0:0: [sdc] Write Protect is off
>  sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08
>  sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support
> DPO or FUA
>  sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
>  sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB)
>  sd 8:0:0:0: [sdc] Write Protect is off
>  sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08
>  sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support
> DPO or FUA
>  sdc: sdc1
>  sd 8:0:0:0: [sdc] Attached SCSI disk
> 
> and later on, generated by vgscan:
> 
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 0
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 44918798208
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 44918798320
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 0
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 8
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 0
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 44918798114
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 44918798282
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 34
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 42
>  sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK
>  end_request: I/O error, dev sdc, sector 34
> 
> 
> 2.6.23 (gentoo-sources) works fine. I am not sure whether this is a
> regression
> in the driver or something else. The controller is
> QLogic QLE220 - PCI-Express to 4Gb FC, Single Channel
> ISP5432: PCIe (2.5Gb/s x4)
> and firmware version is 4.02.02 (from sys-block/qla-fc-firmware-20071207)
> On qlogic site there is a newer firmware (4.03.02), although I doubt the
> firmware would be the cause of the problems.
> 
> To be honest, I have not checked 2.6.25 release but rather rc8, however there
> are no qla2xxx related patches since then.
> 
> 
> Steps to reproduce: boot 2.6.24 or 25, run vgscan
> 
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
Comment 3 Andrew Vasquez 2008-04-21 11:35:28 UTC
Could you provide some details on the configuration?  Size of luns in
question, storage type?  Are you stating that the report-luns information
is now incorrect with your 4gb storage???

Detecting 4G array gives:
 sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
 sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB)
 sd 8:0:0:0: [sdc] Write Protect is off
 sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08
 sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
 sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB)
 sd 8:0:0:0: [sdc] Write Protect is off
 sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08
 sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
 sdc: sdc1
 sd 8:0:0:0: [sdc] Attached SCSI disk

So you don't have a 22Tb storage device?  I'm not entirely clear on
what the problem is here...  BTW: The driver is simply a command passthru,
Lun discovery is handled by the midlayer...

Can you provide full logs, from driver load to failure.  Also what is 'vgscan'?
Comment 4 Andrej Filipcic 2008-04-21 12:02:13 UTC
The lun information is correct. The luns are ~23TB in size, the fibre channel connection speed is 4Gb/s. But as soon as vgscan is run, there are read errors from the device (like /dev/sdc). The device file does not have any successful operation. vgscan just tries to read /dev/sdc to scan for possible LVM groups, since it is easier to handle them in fstab. 

In short, all external disk are detected/reported correctly. device files of disks with 2Gb/s connection can be used (read), while device files wih 4Gb/s connection are unsuable (read errors from /dev/sdc)
Comment 5 Andrew Vasquez 2008-04-21 12:47:56 UTC
Could you send the resultant kernel messages (from driver load to failure) 
with the driver loaded with the ql2xextended_error_logging module-parameter 
set to 1.

   $ insmod qla2xxx ql2xextended_error_logging=1
Comment 6 Andrej Filipcic 2008-04-21 13:15:44 UTC
It will take a couple of days. It is production server and I have to find an empty slot.
Comment 7 Andrew Vasquez 2008-04-21 18:35:26 UTC
Ok, so in the interim can you provide some datapoints from the 
FC transport tree:

  $ cat /sys/class/fc_host/host*/*
  $ cat /sys/class/fc_remote_ports/rport-*

At least then we could take a look and see if there's some issue
on the FC side (a small glimpse at least).

BTW:  are there any other 'failure' messages in your logs during
driver load time?  Can you provide the logs without error-logging
enabled?
Comment 8 Andrej Filipcic 2008-04-22 04:02:51 UTC
Created attachment 15838 [details]
2.6.23 dmesg
Comment 9 Andrej Filipcic 2008-04-22 04:03:31 UTC
Created attachment 15839 [details]
2.6.25 dmesg
Comment 10 Andrej Filipcic 2008-04-22 04:08:26 UTC
Above, there are 2 files containing dmesg logs after loading qla2xxx with extended logging enabled. The 4G FC disks, which do not work, are all A24F-G2430 (4x 23TB + 1x 50TB). All the other disks work properly with 2.6.25. Of course, all of them work with 2.6.23.

The "Buffer I/O error on device sdc, logical block 0" message in 2.6.25 is generated by "fdisk /dev/sdc".
Comment 11 Andrej Filipcic 2008-04-23 01:03:08 UTC
FYI, I have tested 2.6.25 kernel with the qla2xx driver copied from 2.6.23 version 8.02.00-k3 and it works perfectly.
Comment 12 Andrew Vasquez 2008-04-24 14:18:11 UTC
Created attachment 15901 [details]
Correct relogin logic

Ok, I've reproduced this issue locally.  Could you try the
attached patch within your configuration?  I'll queue this
fix along with my latest batch of updates to linux-scsi. 

Unfortunately, as it turns out, this particular issue was
resolved some time back in the QLogic standard driver but
was missed for upstream inclusion (argg..).
Comment 13 Andrew Vasquez 2008-04-24 14:31:04 UTC
BTW: you can discard the bits of the patch which modify the top-level
Makefile (residual cleanup of an earlier merge).

The important bits are in drivers/scsi/qla2xxx/qla_os.c.
Comment 14 Andrej Filipcic 2008-04-24 14:39:27 UTC
The patch is fine. Everything works now. I guess it should be scheduled for the next 2.6.24 and 25 patch releases.

Thanks
Comment 15 Andrew Vasquez 2008-04-24 15:40:24 UTC
Patch submitted to stable@kernel.org and linux-scsi for
upstream inclusion:

http://article.gmane.org/gmane.linux.scsi/41175
Comment 16 Andrew Vasquez 2008-04-29 14:08:58 UTC
Can this bugzilla be closed out now that the patch is upstream
pending in stable?

Side note:  Andrej, the HBA you have (QLE220), has a hard-limit
on the total number of concurrent logins (8 target ports [PLOGI/PRLI]).
In many of your logs, there's more than 8 ports discovered during
SNS scans (fortunately your storage ports appear to be returned early
in the list, don't depend on this ordering), and the firmware will fail
PLOGI requests once the upper-limit is reached:

   scsi(9): Trying Fabric Login w/loop id 0x0087 for port 020100.
   qla24xx_login_fabric(9): failed to complete IOCB -- completion status (31)  ioparam=1c/20100.
   scsi(9): Retrying 15 login again loop_id 0x87
   scsi(9): Trying Fabric Login w/loop id 0x0088 for port 020200.
   qla24xx_login_fabric(9): failed to complete IOCB -- completion status (31)  ioparam=1c/20200.
   scsi(9): Retrying 15 login again loop_id 0x88
   scsi(9): Trying Fabric Login w/loop id 0x008a for port 020400.
   qla24xx_login_fabric(9): failed to complete IOCB -- completion status (31)  ioparam=1c/20400.
   scsi(9): Retrying 15 login again loop_id 0x8a

Please keep this in mind when connecting more target-ports
to this HBA.
Comment 17 Andrej Filipcic 2008-04-29 14:12:07 UTC
Yes, I guess it can be closed. Thanks for the hard-limit info.

Note You need to log in before you can comment on or make changes to this bug.