Bug 5795
Summary: | kernel boot oops | ||
---|---|---|---|
Product: | SCSI Drivers | Reporter: | Luckey (sunjw) |
Component: | QLOGIC QLA2XXX | Assignee: | Andrew Vasquez (andrew.vasquez) |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | high | CC: | bunk, protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.15-rc7 smp, 64GB-mem, with patch "Fix Fibre Channel boot oop | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Luckey
2005-12-30 00:29:38 UTC
Do older kernels (e.g. 2.6.14.x) work? I've not encountered the same problem under kernel 2.6.14.2, but when I do "rmmod qla2300;modprobe qla2300" under 2.6.14.2, sometimes the kernel oops too. What's more, dm-multipath over qla2300 would fail to do path failback. I've posted the bug on: http://bugzilla.kernel.org/show_bug.cgi?id=5775 It is seemed that: if I reboot two hosts, which share the SAN storage through the HBA card, at the same time, then one of them will fail to boot and result as above. So I supect that: the kernel will boot into oops when some event happens in FC loops at the same time. Could you try to run with the following posted patch: http://marc.theaimsgroup.com/?l=linux-scsi&m=113630956721415&w=2 I have another problem now. host A and host B are connected to the same FC switch with qla2300 HBA card. And I can see device such as "/dev/sda" through command "fdisk -l" on A. but if I reboot host B, then the device is removed for some reason. The messages are: kernel: rport-0:0-0: blocked FC remote port time out: removing target and saving binding kernel: rport-1:0-0: blocked FC remote port time out: removing target and saving binding My problem is, the device can not come back automaticly unless I reload the driver module "qla2300". But after I reload module "qla2300" on host A, the device on host B disappeared. What's the problem? Thanks for any reply. My kernel has been patched with the patch above. Now I've updated my kernel to 2.6.15.git6? I select the driver as: <*> QLogic QLA2XXX Fibre Channel Support [ ] Use firmware-loader modules (DEPRECATED) but After I reboot my host to the new kernel, I can not find the device. The logs are: Jan 11 16:53:07 nd10 kernel: QLogic Fibre Channel HBA Driver Jan 11 16:53:07 nd10 kernel: ACPI: PCI Interrupt 0000:07:01.0[A] -> GSI 96 (level, low) -> IRQ 169 Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Found an ISP2312, irq 169, iobase 0xf8816000 Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Configuring PCI space... Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Configure NVRAM parameters... Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Verifying loaded RISC code... Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Firmware image unavailable. Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Failed to initialize adapter Jan 11 16:53:07 nd10 kernel: Trying to free free IRQ169 Jan 11 16:53:07 nd10 kernel: ACPI: PCI interrupt for device 0000:07:01.0 disabled Jan 11 16:53:07 nd10 kernel: ACPI: PCI interrupt for device 0000:07:01.0 disabled Jan 11 16:53:07 nd10 kernel: ACPI: PCI Interrupt 0000:07:01.1[B] -> GSI 97 (level, low) -> IRQ 177 Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Found an ISP2312, irq 177, iobase 0xf8816000 Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Configuring PCI space... Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Configure NVRAM parameters... Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Verifying loaded RISC code... Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Firmware image unavailable. Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Failed to initialize adapter Jan 11 16:53:08 nd10 kernel: Trying to free free IRQ177 if I select the driver as: <M> QLogic QLA2XXX Fibre Channel Support [*] Use firmware-loader modules (DEPRECATED) <M> Build QLogic ISP2100 firmware-module <M> Build QLogic ISP2200 firmware-module <M> Build QLogic ISP2300 firmware-module <M> Build QLogic ISP2322 firmware-module <M> Build QLogic ISP63xx firmware-module <M> Build QLogic ISP24xx firmware-module The problem of the comment 5 already exists. Howerver, if I use the kernel 2.6.11.3, I've not encountered this problem. What can I do for my system? after I type the command "fdisk -l", the logs are: Jan 11 17:57:47 nd10 kernel: rport-10:0-0: blocked FC remote port time out: removing target and saving binding Jan 11 17:57:48 nd10 kernel: 10:0:0:0: SCSI error: return code = 0x10000 Jan 11 17:57:48 nd10 kernel: end_request: I/O error, dev sda, sector 0 Jan 11 17:57:48 nd10 kernel: Buffer I/O error on device sda, logical block 0 Jan 11 17:57:48 nd10 kernel: 10:0:0:0: rejecting I/O to dead device Jan 11 17:57:48 nd10 kernel: Buffer I/O error on device sda, logical block 0 Jan 11 17:58:23 nd10 kernel: rport-11:0-0: blocked FC remote port time out: removing target and saving binding Jan 11 17:58:23 nd10 kernel: 11:0:0:0: SCSI error: return code = 0x10000 Jan 11 17:58:23 nd10 kernel: end_request: I/O error, dev sdb, sector 0 Jan 11 17:58:23 nd10 kernel: Buffer I/O error on device sdb, logical block 0 Jan 11 17:58:23 nd10 kernel: 11:0:0:0: rejecting I/O to dead device Jan 11 17:58:23 nd10 kernel: Buffer I/O error on device sdb, logical block 0 >if I select the driver as: ><M> QLogic QLA2XXX Fibre Channel Support > [*] Use firmware-loader modules (DEPRECATED) > <M> Build QLogic ISP2100 firmware-module > <M> Build QLogic ISP2200 firmware-module > <M> Build QLogic ISP2300 firmware-module > <M> Build QLogic ISP2322 firmware-module > <M> Build QLogic ISP63xx firmware-module > <M> Build QLogic ISP24xx firmware-module The default behaviour for post 2.6.15 kernels is to load the firmware image from user-space via hotplug. I've updated the Kconfig help and added an URL to retrieve the images: http://marc.theaimsgroup.com/?l=linux-scsi&m=113720091605197&w=2 With regards to the other issues you are seeing, could you apply the rollup patches recently posted: http://marc.theaimsgroup.com/?l=linux-scsi&m=113720091630672&w=2 you can download the group here: ftp://ftp.qlogic.com/outgoing/linux/patches/8.x/8.01.04k Thanks for your patience. I've got the kernel 2.6.15-git11 compiled which should have included those patches. And I've put the file ql2300_fw.bin to /lib/firmware/ql2300_fw.bin. But I encounter the problems again. the kernel boot's logs are: QLogic Fibre Channel HBA Driver ACPI: PCI Interrupt 0000:07:01.0[A] -> GSI 96 (level, low) -> IRQ 169 qla2xxx 0000:07:01.0: Found an ISP2312, irq 169, iobase 0xf8818000 qla2xxx 0000:07:01.0: Configuring PCI space... qla2xxx 0000:07:01.0: Configure NVRAM parameters... qla2xxx 0000:07:01.0: Verifying loaded RISC code... qla2xxx 0000:07:01.0: Firmware image unavailable. qla2xxx 0000:07:01.0: Failed to initialize adapter Trying to free free IRQ169 ACPI: PCI interrupt for device 0000:07:01.0 disabled ACPI: PCI interrupt for device 0000:07:01.0 disabled ACPI: PCI Interrupt 0000:07:01.1[B] -> GSI 97 (level, low) -> IRQ 177 After system init finished, I do the command "rmmod qla2xxx;modprobe qla2xxx", then I get the devices such as /dev/sda. But when I reboot another host B, host A loses it's devices again. the system logs after I do the command "fdisk -l" are: rport-6:0-0: blocked FC remote port time out: removing target and saving binding 6:0:0:0: SCSI error: return code = 0x10000 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 6:0:0:0: rejecting I/O to dead device Buffer I/O error on device sda, logical block 0 rport-7:0-0: blocked FC remote port time out: removing target and saving binding 7:0:0:0: SCSI error: return code = 0x10000 end_request: I/O error, dev sdb, sector 0 Buffer I/O error on device sdb, logical block 0 7:0:0:0: rejecting I/O to dead device Buffer I/O error on device sdb, logical block 0 Is this problem still present in 2.6.22+? Thanks. Please reopen this bug if it's still present with kernel 2.6.22. |