Most recent kernel where this bug did not occur: none Distribution: AS4 Hardware Environment: SAN shared stroage, HBA card drived by qla2300, CPU 2*Xeon 2GHz,mem 2GB Software Environment: Problem Description: the system boot log was: Creating /dev Sqla2300 0000:07:01.0: Configure NVRAM parameters... tarting udev Loading sd_mod.ko module Loading scsi_transport_fc.ko module Loading qla2xxx.ko qla2300 0000:07:01.0: Verifying loaded RISC code... module Loading qla2300.ko modulqla2300 0000:07:01.0: Waiting for LIP to complete... e qla2300 0000:07:01.0: LIP reset occured (f8f7). qla2300 0000:07:01.0: LIP occured (f8f7). qla2300 0000:07:01.0: LOOP UP detected (2 Gbps). qla2300 0000:07:01.0: Topology - (Loop), Host Loop address 0x0 scsi0 : qla2xxx qla2300 0000:07:01.0: QLogic Fibre Channel HBA Driver: 8.01.03-k QLogic QLA2342 - 133MHz PCI-X to 2Gb FC, Dual Channel ISP2312: PCI-X (133 MHz) @ 0000:07:01.0 hdma-, host#=0, fw=3.03.18 IPX ACPI: PCI Interrupt 0000:07:01.1[B] -> GSI 97 (level, low) -> IRQ 177 qla2300 0000:07:01.1: Found an ISP2312, irq 177, iobase 0xf881a000 qla2300 0000:07:01.1: Configuring PCI space... qla2300 0000:07:01.1: Configure NVRAM parameters... qla2300 0000:07:01.1: Verifying loaded RISC code... qla2300 0000:07:01.1: Waiting for LIP to complete... qla2300 0000:07:01.1: LIP reset occured (f8f7). qla2300 0000:07:01.1: LIP occured (f8f7). Vendor: TOYOU Model: NetStor DA9220F Rev: 342R Type: Direct-Access ANSI SCSI revision: 03 SCSI device sda: 999950336 512-byte hdwr sectors (511975 MB) SCSI device sda: drive cache: write back qla2300 0000:07:01.1: Topology - (Loop), Host Loop address 0x0 scsi1 : qla2xxx qla2300 0000:07:01.1: QLogic Fibre Channel HBA Driver: 8.01.03-k QLogic QLA2342 - 133MHz PCI-X to 2Gb FC, Dual Channel ISP2312: PCI-X (133 MHz) @ 0000:07:01.1 hdma-, host#=1, fw=3.03.18 IPX Loading dm-mod.kdevice-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm- devel@redhat.com o module sd 0:0:0:0: Attached scsi generic sg0 type 0 Creating root device Vendor: TOYOU Model: NetStor DA9220F Rev: 342R Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdb: 999950336 512-byte hdwr sectors (511975 MB) Mounting root fiSCSI device sdb: drive cache: write back lesystem SCSI device sdb: 999950336 512-byte hdwr sectors (511975 MB) kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. Switching to newSCSI device sdb: drive cache: write back root sdb: sd 1:0:0:0: Attached scsi disk sdb sd 1:0:0:0: Attached scsi generic sg1 type 0 unmounting old /proc unmounting old /sys Unable to handle kernel NULL pointer dereference at virtual address 0000002c printing eip: f885542a *pde = 003b1001 Oops: 0000 [#1] SMP Modules linked in: dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod CPU: 1 EIP is at qla2x00_queuecommand+0x72/0x122 [qla2xxx] eax: 00000000 ebx: c2331380 ecx: c2015fe0 edx: 00000002 esi: f7dd82a8 edi: 00000000 ebp: c021b96f esp: f79a3d10 ds: 007b es: 007b ss: 0068 Process scsi_wq_1 (pid: 835, threadinfo=f79a2000 task=f7fe1030) Stack: 00000287 f7dd8000 c2331380 00000000 c021b7ba c2331380 c021b96f c233939c c233939c f79adc00 f7dd8000 c2309044 c0220c02 c2331380 c2331380 c2331380 c2309044 00000000 c2339418 c233939c c01b6c83 c2309044 f79a3de4 c01b77de Call Trace: [<c021b7ba>] scsi_dispatch_cmd+0x1be/0x23c [<c021b96f>] scsi_done+0x0/0x1c [<c0220c02>] scsi_request_fn+0x25f/0x2d6 [<c01b6c83>] generic_unplug_device+0x16/0x23 [<c01b77de>] blk_execute_rq+0x80/0xa7 [<c01b7997>] blk_end_sync_rq+0x0/0x22 [<c01b872a>] blk_rq_bio_prep+0x63/0x81 [<c021fb3a>] scsi_execute+0xb2/0xcd [<c021fbab>] scsi_execute_req+0x56/0x78 [<c022237c>] scsi_report_lun_scan+0x1a2/0x3af [<c01bd53f>] kobject_put+0x16/0x19 [<c01bd51f>] kobject_release+0x0/0xa [<c0222073>] scsi_probe_and_add_lun+0x1e3/0x1ec [<c022276c>] __scsi_scan_target+0xaf/0xe4 [<c02227f2>] scsi_scan_target+0x51/0x61 [<f882e669>] fc_scsi_scan_rport+0x1f/0x26 [scsi_transport_fc] [<c0127c2a>] worker_thread+0x170/0x1de [<f882e64a>] fc_scsi_scan_rport+0x0/0x26 [scsi_transport_fc] [<c0116f3d>] default_wake_function+0x0/0x12 [<c0116f3d>] default_wake_function+0x0/0x12 [<c0127aba>] worker_thread+0x0/0x1de [<c012b1b7>] kthread+0x7c/0xa6 [<c012b13b>] kthread+0x0/0xa6 [<c0101ab5>] kernel_thread_helper+0x5/0xb Code: 83 ea 40 8b 52 24 31 c0 83 fa 02 74 0f 83 fa 04 b8 00 00 02 00 74 05 b8 00 00 01 00 85 c0 74 0b 89 83 3c 01 00 00 e9 a INIT: Steps to reproduce: boot the system from the smp kernel,but sometimes it does not occur.
Do older kernels (e.g. 2.6.14.x) work?
I've not encountered the same problem under kernel 2.6.14.2, but when I do "rmmod qla2300;modprobe qla2300" under 2.6.14.2, sometimes the kernel oops too. What's more, dm-multipath over qla2300 would fail to do path failback. I've posted the bug on: http://bugzilla.kernel.org/show_bug.cgi?id=5775
It is seemed that: if I reboot two hosts, which share the SAN storage through the HBA card, at the same time, then one of them will fail to boot and result as above. So I supect that: the kernel will boot into oops when some event happens in FC loops at the same time.
Could you try to run with the following posted patch: http://marc.theaimsgroup.com/?l=linux-scsi&m=113630956721415&w=2
I have another problem now. host A and host B are connected to the same FC switch with qla2300 HBA card. And I can see device such as "/dev/sda" through command "fdisk -l" on A. but if I reboot host B, then the device is removed for some reason. The messages are: kernel: rport-0:0-0: blocked FC remote port time out: removing target and saving binding kernel: rport-1:0-0: blocked FC remote port time out: removing target and saving binding My problem is, the device can not come back automaticly unless I reload the driver module "qla2300". But after I reload module "qla2300" on host A, the device on host B disappeared. What's the problem? Thanks for any reply.
My kernel has been patched with the patch above.
Now I've updated my kernel to 2.6.15.git6? I select the driver as: <*> QLogic QLA2XXX Fibre Channel Support [ ] Use firmware-loader modules (DEPRECATED) but After I reboot my host to the new kernel, I can not find the device. The logs are: Jan 11 16:53:07 nd10 kernel: QLogic Fibre Channel HBA Driver Jan 11 16:53:07 nd10 kernel: ACPI: PCI Interrupt 0000:07:01.0[A] -> GSI 96 (level, low) -> IRQ 169 Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Found an ISP2312, irq 169, iobase 0xf8816000 Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Configuring PCI space... Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Configure NVRAM parameters... Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Verifying loaded RISC code... Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Firmware image unavailable. Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Failed to initialize adapter Jan 11 16:53:07 nd10 kernel: Trying to free free IRQ169 Jan 11 16:53:07 nd10 kernel: ACPI: PCI interrupt for device 0000:07:01.0 disabled Jan 11 16:53:07 nd10 kernel: ACPI: PCI interrupt for device 0000:07:01.0 disabled Jan 11 16:53:07 nd10 kernel: ACPI: PCI Interrupt 0000:07:01.1[B] -> GSI 97 (level, low) -> IRQ 177 Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Found an ISP2312, irq 177, iobase 0xf8816000 Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Configuring PCI space... Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Configure NVRAM parameters... Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Verifying loaded RISC code... Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Firmware image unavailable. Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Failed to initialize adapter Jan 11 16:53:08 nd10 kernel: Trying to free free IRQ177 if I select the driver as: <M> QLogic QLA2XXX Fibre Channel Support [*] Use firmware-loader modules (DEPRECATED) <M> Build QLogic ISP2100 firmware-module <M> Build QLogic ISP2200 firmware-module <M> Build QLogic ISP2300 firmware-module <M> Build QLogic ISP2322 firmware-module <M> Build QLogic ISP63xx firmware-module <M> Build QLogic ISP24xx firmware-module The problem of the comment 5 already exists. Howerver, if I use the kernel 2.6.11.3, I've not encountered this problem. What can I do for my system?
after I type the command "fdisk -l", the logs are: Jan 11 17:57:47 nd10 kernel: rport-10:0-0: blocked FC remote port time out: removing target and saving binding Jan 11 17:57:48 nd10 kernel: 10:0:0:0: SCSI error: return code = 0x10000 Jan 11 17:57:48 nd10 kernel: end_request: I/O error, dev sda, sector 0 Jan 11 17:57:48 nd10 kernel: Buffer I/O error on device sda, logical block 0 Jan 11 17:57:48 nd10 kernel: 10:0:0:0: rejecting I/O to dead device Jan 11 17:57:48 nd10 kernel: Buffer I/O error on device sda, logical block 0 Jan 11 17:58:23 nd10 kernel: rport-11:0-0: blocked FC remote port time out: removing target and saving binding Jan 11 17:58:23 nd10 kernel: 11:0:0:0: SCSI error: return code = 0x10000 Jan 11 17:58:23 nd10 kernel: end_request: I/O error, dev sdb, sector 0 Jan 11 17:58:23 nd10 kernel: Buffer I/O error on device sdb, logical block 0 Jan 11 17:58:23 nd10 kernel: 11:0:0:0: rejecting I/O to dead device Jan 11 17:58:23 nd10 kernel: Buffer I/O error on device sdb, logical block 0
>if I select the driver as: ><M> QLogic QLA2XXX Fibre Channel Support > [*] Use firmware-loader modules (DEPRECATED) > <M> Build QLogic ISP2100 firmware-module > <M> Build QLogic ISP2200 firmware-module > <M> Build QLogic ISP2300 firmware-module > <M> Build QLogic ISP2322 firmware-module > <M> Build QLogic ISP63xx firmware-module > <M> Build QLogic ISP24xx firmware-module The default behaviour for post 2.6.15 kernels is to load the firmware image from user-space via hotplug. I've updated the Kconfig help and added an URL to retrieve the images: http://marc.theaimsgroup.com/?l=linux-scsi&m=113720091605197&w=2 With regards to the other issues you are seeing, could you apply the rollup patches recently posted: http://marc.theaimsgroup.com/?l=linux-scsi&m=113720091630672&w=2 you can download the group here: ftp://ftp.qlogic.com/outgoing/linux/patches/8.x/8.01.04k Thanks for your patience.
I've got the kernel 2.6.15-git11 compiled which should have included those patches. And I've put the file ql2300_fw.bin to /lib/firmware/ql2300_fw.bin. But I encounter the problems again. the kernel boot's logs are: QLogic Fibre Channel HBA Driver ACPI: PCI Interrupt 0000:07:01.0[A] -> GSI 96 (level, low) -> IRQ 169 qla2xxx 0000:07:01.0: Found an ISP2312, irq 169, iobase 0xf8818000 qla2xxx 0000:07:01.0: Configuring PCI space... qla2xxx 0000:07:01.0: Configure NVRAM parameters... qla2xxx 0000:07:01.0: Verifying loaded RISC code... qla2xxx 0000:07:01.0: Firmware image unavailable. qla2xxx 0000:07:01.0: Failed to initialize adapter Trying to free free IRQ169 ACPI: PCI interrupt for device 0000:07:01.0 disabled ACPI: PCI interrupt for device 0000:07:01.0 disabled ACPI: PCI Interrupt 0000:07:01.1[B] -> GSI 97 (level, low) -> IRQ 177 After system init finished, I do the command "rmmod qla2xxx;modprobe qla2xxx", then I get the devices such as /dev/sda. But when I reboot another host B, host A loses it's devices again. the system logs after I do the command "fdisk -l" are: rport-6:0-0: blocked FC remote port time out: removing target and saving binding 6:0:0:0: SCSI error: return code = 0x10000 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 6:0:0:0: rejecting I/O to dead device Buffer I/O error on device sda, logical block 0 rport-7:0-0: blocked FC remote port time out: removing target and saving binding 7:0:0:0: SCSI error: return code = 0x10000 end_request: I/O error, dev sdb, sector 0 Buffer I/O error on device sdb, logical block 0 7:0:0:0: rejecting I/O to dead device Buffer I/O error on device sdb, logical block 0
Is this problem still present in 2.6.22+? Thanks.
Please reopen this bug if it's still present with kernel 2.6.22.