Latest working kernel version: 2.6.23 Earliest failing kernel version: 2.6.24 Distribution: gentoo Hardware Environment: opteron Software Environment: x86_64 Problem Description: With 2.6.24 and 25 there are troubles with qla2xxx module. 4G FC arrays (Infortrend) are not working properly, while 2G FC are OK. Detecting 4G array gives: sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16). sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB) sd 8:0:0:0: [sdc] Write Protect is off sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08 sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16). sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB) sd 8:0:0:0: [sdc] Write Protect is off sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08 sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdc: sdc1 sd 8:0:0:0: [sdc] Attached SCSI disk and later on, generated by vgscan: sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 0 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 44918798208 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 44918798320 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 0 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 8 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 0 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 44918798114 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 44918798282 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 34 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 42 sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK end_request: I/O error, dev sdc, sector 34 2.6.23 (gentoo-sources) works fine. I am not sure whether this is a regression in the driver or something else. The controller is QLogic QLE220 - PCI-Express to 4Gb FC, Single Channel ISP5432: PCIe (2.5Gb/s x4) and firmware version is 4.02.02 (from sys-block/qla-fc-firmware-20071207) On qlogic site there is a newer firmware (4.03.02), although I doubt the firmware would be the cause of the problems. To be honest, I have not checked 2.6.25 release but rather rc8, however there are no qla2xxx related patches since then. Steps to reproduce: boot 2.6.24 or 25, run vgscan
This is a 2.6.24 regression Downstream bug: https://bugs.gentoo.org/show_bug.cgi?id=214883
Reply-To: akpm@linux-foundation.org > On Sat, 19 Apr 2008 14:46:04 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10486 > > Summary: kernel 2.6.25 4G FC arrays don't work properly with > qla2xxx driver > Product: SCSI Drivers > Version: 2.5 > KernelVersion: 2.6.25 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: blocking > Priority: P1 > Component: QLOGIC QLA2XXX > AssignedTo: scsi_drivers-qla2xxx@kernel-bugs.osdl.org > ReportedBy: andrej.filipcic@ijs.si > > > Latest working kernel version: 2.6.23 > Earliest failing kernel version: 2.6.24 > Distribution: gentoo > Hardware Environment: opteron > Software Environment: x86_64 > Problem Description: > > With 2.6.24 and 25 there are troubles with qla2xxx module. 4G FC > arrays (Infortrend) are not working properly, while 2G FC are OK. > > Detecting 4G array gives: > sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16). > sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB) > sd 8:0:0:0: [sdc] Write Protect is off > sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08 > sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support > DPO or FUA > sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16). > sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB) > sd 8:0:0:0: [sdc] Write Protect is off > sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08 > sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support > DPO or FUA > sdc: sdc1 > sd 8:0:0:0: [sdc] Attached SCSI disk > > and later on, generated by vgscan: > > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 0 > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 44918798208 > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 44918798320 > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 0 > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 8 > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 0 > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 44918798114 > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 44918798282 > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 34 > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 42 > sd 8:0:0:0: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_OK > end_request: I/O error, dev sdc, sector 34 > > > 2.6.23 (gentoo-sources) works fine. I am not sure whether this is a > regression > in the driver or something else. The controller is > QLogic QLE220 - PCI-Express to 4Gb FC, Single Channel > ISP5432: PCIe (2.5Gb/s x4) > and firmware version is 4.02.02 (from sys-block/qla-fc-firmware-20071207) > On qlogic site there is a newer firmware (4.03.02), although I doubt the > firmware would be the cause of the problems. > > To be honest, I have not checked 2.6.25 release but rather rc8, however there > are no qla2xxx related patches since then. > > > Steps to reproduce: boot 2.6.24 or 25, run vgscan > > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is.
Could you provide some details on the configuration? Size of luns in question, storage type? Are you stating that the report-luns information is now incorrect with your 4gb storage??? Detecting 4G array gives: sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16). sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB) sd 8:0:0:0: [sdc] Write Protect is off sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08 sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 8:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16). sd 8:0:0:0: [sdc] 44918798336 512-byte hardware sectors (22998425 MB) sd 8:0:0:0: [sdc] Write Protect is off sd 8:0:0:0: [sdc] Mode Sense: 8f 00 00 08 sd 8:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdc: sdc1 sd 8:0:0:0: [sdc] Attached SCSI disk So you don't have a 22Tb storage device? I'm not entirely clear on what the problem is here... BTW: The driver is simply a command passthru, Lun discovery is handled by the midlayer... Can you provide full logs, from driver load to failure. Also what is 'vgscan'?
The lun information is correct. The luns are ~23TB in size, the fibre channel connection speed is 4Gb/s. But as soon as vgscan is run, there are read errors from the device (like /dev/sdc). The device file does not have any successful operation. vgscan just tries to read /dev/sdc to scan for possible LVM groups, since it is easier to handle them in fstab. In short, all external disk are detected/reported correctly. device files of disks with 2Gb/s connection can be used (read), while device files wih 4Gb/s connection are unsuable (read errors from /dev/sdc)
Could you send the resultant kernel messages (from driver load to failure) with the driver loaded with the ql2xextended_error_logging module-parameter set to 1. $ insmod qla2xxx ql2xextended_error_logging=1
It will take a couple of days. It is production server and I have to find an empty slot.
Ok, so in the interim can you provide some datapoints from the FC transport tree: $ cat /sys/class/fc_host/host*/* $ cat /sys/class/fc_remote_ports/rport-* At least then we could take a look and see if there's some issue on the FC side (a small glimpse at least). BTW: are there any other 'failure' messages in your logs during driver load time? Can you provide the logs without error-logging enabled?
Created attachment 15838 [details] 2.6.23 dmesg
Created attachment 15839 [details] 2.6.25 dmesg
Above, there are 2 files containing dmesg logs after loading qla2xxx with extended logging enabled. The 4G FC disks, which do not work, are all A24F-G2430 (4x 23TB + 1x 50TB). All the other disks work properly with 2.6.25. Of course, all of them work with 2.6.23. The "Buffer I/O error on device sdc, logical block 0" message in 2.6.25 is generated by "fdisk /dev/sdc".
FYI, I have tested 2.6.25 kernel with the qla2xx driver copied from 2.6.23 version 8.02.00-k3 and it works perfectly.
Created attachment 15901 [details] Correct relogin logic Ok, I've reproduced this issue locally. Could you try the attached patch within your configuration? I'll queue this fix along with my latest batch of updates to linux-scsi. Unfortunately, as it turns out, this particular issue was resolved some time back in the QLogic standard driver but was missed for upstream inclusion (argg..).
BTW: you can discard the bits of the patch which modify the top-level Makefile (residual cleanup of an earlier merge). The important bits are in drivers/scsi/qla2xxx/qla_os.c.
The patch is fine. Everything works now. I guess it should be scheduled for the next 2.6.24 and 25 patch releases. Thanks
Patch submitted to stable@kernel.org and linux-scsi for upstream inclusion: http://article.gmane.org/gmane.linux.scsi/41175
Can this bugzilla be closed out now that the patch is upstream pending in stable? Side note: Andrej, the HBA you have (QLE220), has a hard-limit on the total number of concurrent logins (8 target ports [PLOGI/PRLI]). In many of your logs, there's more than 8 ports discovered during SNS scans (fortunately your storage ports appear to be returned early in the list, don't depend on this ordering), and the firmware will fail PLOGI requests once the upper-limit is reached: scsi(9): Trying Fabric Login w/loop id 0x0087 for port 020100. qla24xx_login_fabric(9): failed to complete IOCB -- completion status (31) ioparam=1c/20100. scsi(9): Retrying 15 login again loop_id 0x87 scsi(9): Trying Fabric Login w/loop id 0x0088 for port 020200. qla24xx_login_fabric(9): failed to complete IOCB -- completion status (31) ioparam=1c/20200. scsi(9): Retrying 15 login again loop_id 0x88 scsi(9): Trying Fabric Login w/loop id 0x008a for port 020400. qla24xx_login_fabric(9): failed to complete IOCB -- completion status (31) ioparam=1c/20400. scsi(9): Retrying 15 login again loop_id 0x8a Please keep this in mind when connecting more target-ports to this HBA.
Yes, I guess it can be closed. Thanks for the hard-limit info.