Bug 5795 - kernel boot oops
Summary: kernel boot oops
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: QLOGIC QLA2XXX (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Andrew Vasquez
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-30 00:29 UTC by Luckey
Modified: 2007-09-16 13:24 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.15-rc7 smp, 64GB-mem, with patch "Fix Fibre Channel boot oop
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Luckey 2005-12-30 00:29:38 UTC
Most recent kernel where this bug did not occur:
none
Distribution:
AS4
Hardware Environment: 
SAN shared stroage, HBA card drived by qla2300, CPU 2*Xeon 2GHz,mem 2GB

Software Environment:
Problem Description:
the system boot log was:
Creating /dev
Sqla2300 0000:07:01.0: Configure NVRAM parameters...
tarting udev
Loading sd_mod.ko module
Loading scsi_transport_fc.ko module
Loading qla2xxx.ko qla2300 0000:07:01.0: Verifying loaded RISC code...
module
Loading qla2300.ko modulqla2300 0000:07:01.0: Waiting for LIP to complete...
e
qla2300 0000:07:01.0: LIP reset occured (f8f7).
qla2300 0000:07:01.0: LIP occured (f8f7).
qla2300 0000:07:01.0: LOOP UP detected (2 Gbps).
qla2300 0000:07:01.0: Topology - (Loop), Host Loop address 0x0
scsi0 : qla2xxx
qla2300 0000:07:01.0:
 QLogic Fibre Channel HBA Driver: 8.01.03-k
  QLogic QLA2342 - 133MHz PCI-X to 2Gb FC, Dual Channel
  ISP2312: PCI-X (133 MHz) @ 0000:07:01.0 hdma-, host#=0, fw=3.03.18 IPX
ACPI: PCI Interrupt 0000:07:01.1[B] -> GSI 97 (level, low) -> IRQ 177
qla2300 0000:07:01.1: Found an ISP2312, irq 177, iobase 0xf881a000
qla2300 0000:07:01.1: Configuring PCI space...
qla2300 0000:07:01.1: Configure NVRAM parameters...
qla2300 0000:07:01.1: Verifying loaded RISC code...
qla2300 0000:07:01.1: Waiting for LIP to complete...
qla2300 0000:07:01.1: LIP reset occured (f8f7).
qla2300 0000:07:01.1: LIP occured (f8f7).
  Vendor: TOYOU     Model: NetStor DA9220F   Rev: 342R
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sda: 999950336 512-byte hdwr sectors (511975 MB)
SCSI device sda: drive cache: write back


qla2300 0000:07:01.1: Topology - (Loop), Host Loop address 0x0
scsi1 : qla2xxx
qla2300 0000:07:01.1:
 QLogic Fibre Channel HBA Driver: 8.01.03-k
  QLogic QLA2342 - 133MHz PCI-X to 2Gb FC, Dual Channel
  ISP2312: PCI-X (133 MHz) @ 0000:07:01.1 hdma-, host#=1, fw=3.03.18 IPX
Loading dm-mod.kdevice-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-
devel@redhat.com
o module
sd 0:0:0:0: Attached scsi generic sg0 type 0
Creating root device
  Vendor: TOYOU     Model: NetStor DA9220F   Rev: 342R
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sdb: 999950336 512-byte hdwr sectors (511975 MB)
Mounting root fiSCSI device sdb: drive cache: write back
lesystem
SCSI device sdb: 999950336 512-byte hdwr sectors (511975 MB)
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Switching to newSCSI device sdb: drive cache: write back
 root
 sdb:
sd 1:0:0:0: Attached scsi disk sdb
sd 1:0:0:0: Attached scsi generic sg1 type 0
unmounting old /proc
unmounting old /sys
Unable to handle kernel NULL pointer dereference at virtual address 0000002c
 printing eip:
f885542a
*pde = 003b1001
Oops: 0000 [#1]
SMP
Modules linked in: dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod
CPU:    1


EIP is at qla2x00_queuecommand+0x72/0x122 [qla2xxx]
eax: 00000000   ebx: c2331380   ecx: c2015fe0   edx: 00000002
esi: f7dd82a8   edi: 00000000   ebp: c021b96f   esp: f79a3d10
ds: 007b   es: 007b   ss: 0068
Process scsi_wq_1 (pid: 835, threadinfo=f79a2000 task=f7fe1030)
Stack: 00000287 f7dd8000 c2331380 00000000 c021b7ba c2331380 c021b96f c233939c
       c233939c f79adc00 f7dd8000 c2309044 c0220c02 c2331380 c2331380 c2331380
       c2309044 00000000 c2339418 c233939c c01b6c83 c2309044 f79a3de4 c01b77de
Call Trace:
 [<c021b7ba>] scsi_dispatch_cmd+0x1be/0x23c
 [<c021b96f>] scsi_done+0x0/0x1c
 [<c0220c02>] scsi_request_fn+0x25f/0x2d6
 [<c01b6c83>] generic_unplug_device+0x16/0x23
 [<c01b77de>] blk_execute_rq+0x80/0xa7
 [<c01b7997>] blk_end_sync_rq+0x0/0x22
 [<c01b872a>] blk_rq_bio_prep+0x63/0x81
 [<c021fb3a>] scsi_execute+0xb2/0xcd
 [<c021fbab>] scsi_execute_req+0x56/0x78
 [<c022237c>] scsi_report_lun_scan+0x1a2/0x3af
 [<c01bd53f>] kobject_put+0x16/0x19
 [<c01bd51f>] kobject_release+0x0/0xa
 [<c0222073>] scsi_probe_and_add_lun+0x1e3/0x1ec
 [<c022276c>] __scsi_scan_target+0xaf/0xe4
 [<c02227f2>] scsi_scan_target+0x51/0x61
 [<f882e669>] fc_scsi_scan_rport+0x1f/0x26 [scsi_transport_fc]
 [<c0127c2a>] worker_thread+0x170/0x1de
 [<f882e64a>] fc_scsi_scan_rport+0x0/0x26 [scsi_transport_fc]
 [<c0116f3d>] default_wake_function+0x0/0x12
 [<c0116f3d>] default_wake_function+0x0/0x12
 [<c0127aba>] worker_thread+0x0/0x1de
 [<c012b1b7>] kthread+0x7c/0xa6
 [<c012b13b>] kthread+0x0/0xa6
 [<c0101ab5>] kernel_thread_helper+0x5/0xb
Code: 83 ea 40 8b 52 24 31 c0 83 fa 02 74 0f 83 fa 04 b8 00 00 02 00 74 05 b8 
00 00 01 00 85 c0 74 0b 89 83 3c 01 00 00 e9 a
INIT:

Steps to reproduce:
boot the system from the smp kernel,but sometimes it does not occur.
Comment 1 Adrian Bunk 2005-12-31 03:00:31 UTC
Do older kernels (e.g. 2.6.14.x) work?
Comment 2 Luckey 2005-12-31 23:39:14 UTC
I've not encountered the same problem under kernel 2.6.14.2,
but when I do "rmmod qla2300;modprobe qla2300" under 2.6.14.2, 
sometimes the kernel oops too.  What's more, dm-multipath over qla2300
would fail to do path failback. I've posted the bug on:
http://bugzilla.kernel.org/show_bug.cgi?id=5775
Comment 3 Luckey 2006-01-02 21:18:17 UTC
It is seemed that: 
if I reboot two hosts, which share the SAN storage through the HBA card, 
at the same time, then one of them will fail to boot and result as above.
So I supect that:
the kernel will boot into oops when some event happens in FC loops 
at the same time.
Comment 4 Andrew Vasquez 2006-01-03 10:48:24 UTC
Could you try to run with the following posted patch:

http://marc.theaimsgroup.com/?l=linux-scsi&m=113630956721415&w=2
Comment 5 Luckey 2006-01-10 23:48:54 UTC
I have another problem now.
host A and host B are connected to the same FC switch with qla2300 HBA card.
And I can see device such as "/dev/sda" through command "fdisk -l" on A.
but if I reboot host B, then the device is removed for some reason.
The messages are:
kernel:  rport-0:0-0: blocked FC remote port time out: removing target and 
saving binding
kernel:  rport-1:0-0: blocked FC remote port time out: removing target and 
saving binding

My problem is, the device can not come back automaticly unless I reload
the driver module "qla2300". But after I reload module "qla2300" on host A,
the device on host B disappeared.

 What's the problem? Thanks for any reply.
Comment 6 Luckey 2006-01-10 23:50:14 UTC
My kernel has been patched with the patch above.
Comment 7 Luckey 2006-01-11 01:18:58 UTC
Now I've updated my kernel to 2.6.15.git6? I select the driver as:
<*> QLogic QLA2XXX Fibre Channel Support
[ ]     Use firmware-loader modules (DEPRECATED)
but After I reboot my host to the new kernel, I can not find the device.
The logs are:
Jan 11 16:53:07 nd10 kernel: QLogic Fibre Channel HBA Driver
Jan 11 16:53:07 nd10 kernel: ACPI: PCI Interrupt 0000:07:01.0[A] -> GSI 96 
(level, low) -> IRQ 169
Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Found an ISP2312, irq 169, 
iobase 0xf8816000
Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Configuring PCI space...
Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Configure NVRAM 
parameters...
Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Verifying loaded RISC 
code...
Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Firmware image unavailable.
Jan 11 16:53:07 nd10 kernel: qla2xxx 0000:07:01.0: Failed to initialize adapter
Jan 11 16:53:07 nd10 kernel: Trying to free free IRQ169
Jan 11 16:53:07 nd10 kernel: ACPI: PCI interrupt for device 0000:07:01.0 
disabled
Jan 11 16:53:07 nd10 kernel: ACPI: PCI interrupt for device 0000:07:01.0 
disabled
Jan 11 16:53:07 nd10 kernel: ACPI: PCI Interrupt 0000:07:01.1[B] -> GSI 97 
(level, low) -> IRQ 177
Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Found an ISP2312, irq 177, 
iobase 0xf8816000
Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Configuring PCI space...
Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Configure NVRAM 
parameters...
Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Verifying loaded RISC 
code...
Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Firmware image unavailable.
Jan 11 16:53:08 nd10 kernel: qla2xxx 0000:07:01.1: Failed to initialize adapter
Jan 11 16:53:08 nd10 kernel: Trying to free free IRQ177


if I select the driver as:
 <M> QLogic QLA2XXX Fibre Channel Support
 [*]     Use firmware-loader modules (DEPRECATED)
 <M>       Build QLogic ISP2100 firmware-module
 <M>       Build QLogic ISP2200 firmware-module
 <M>       Build QLogic ISP2300 firmware-module
 <M>       Build QLogic ISP2322 firmware-module
 <M>       Build QLogic ISP63xx firmware-module
 <M>       Build QLogic ISP24xx firmware-module

The problem of the comment 5 already exists.
Howerver, if I use the kernel 2.6.11.3, I've not encountered this problem.

What can I do for my system?
Comment 8 Luckey 2006-01-11 01:52:10 UTC
after I type the command "fdisk -l", the logs are:
Jan 11 17:57:47 nd10 kernel:  rport-10:0-0: blocked FC remote port time out: 
removing target and saving binding
Jan 11 17:57:48 nd10 kernel:  10:0:0:0: SCSI error: return code = 0x10000
Jan 11 17:57:48 nd10 kernel: end_request: I/O error, dev sda, sector 0
Jan 11 17:57:48 nd10 kernel: Buffer I/O error on device sda, logical block 0
Jan 11 17:57:48 nd10 kernel:  10:0:0:0: rejecting I/O to dead device
Jan 11 17:57:48 nd10 kernel: Buffer I/O error on device sda, logical block 0
Jan 11 17:58:23 nd10 kernel:  rport-11:0-0: blocked FC remote port time out: 
removing target and saving binding
Jan 11 17:58:23 nd10 kernel:  11:0:0:0: SCSI error: return code = 0x10000
Jan 11 17:58:23 nd10 kernel: end_request: I/O error, dev sdb, sector 0
Jan 11 17:58:23 nd10 kernel: Buffer I/O error on device sdb, logical block 0
Jan 11 17:58:23 nd10 kernel:  11:0:0:0: rejecting I/O to dead device
Jan 11 17:58:23 nd10 kernel: Buffer I/O error on device sdb, logical block 0

Comment 9 Andrew Vasquez 2006-01-13 17:35:50 UTC
>if I select the driver as:
><M> QLogic QLA2XXX Fibre Channel Support
> [*]     Use firmware-loader modules (DEPRECATED)
> <M>       Build QLogic ISP2100 firmware-module
> <M>       Build QLogic ISP2200 firmware-module
> <M>       Build QLogic ISP2300 firmware-module
> <M>       Build QLogic ISP2322 firmware-module
> <M>       Build QLogic ISP63xx firmware-module
> <M>       Build QLogic ISP24xx firmware-module

The default behaviour for post 2.6.15 kernels is to load the firmware image from
user-space via hotplug.  I've updated the Kconfig help and added an URL to
retrieve the images:

http://marc.theaimsgroup.com/?l=linux-scsi&m=113720091605197&w=2

With regards to the other issues you are seeing, could you apply the rollup patches
recently posted:

http://marc.theaimsgroup.com/?l=linux-scsi&m=113720091630672&w=2

you can download the group here:

ftp://ftp.qlogic.com/outgoing/linux/patches/8.x/8.01.04k

Thanks for your patience.
Comment 10 Luckey 2006-01-16 01:02:55 UTC
I've got the kernel 2.6.15-git11 compiled 
which should have included those patches.
And I've put the file ql2300_fw.bin to /lib/firmware/ql2300_fw.bin.
But I encounter the problems again. the kernel boot's logs are:
 QLogic Fibre Channel HBA Driver
 ACPI: PCI Interrupt 0000:07:01.0[A] -> GSI 96 (level, low) -> IRQ 169
 qla2xxx 0000:07:01.0: Found an ISP2312, irq 169, iobase 0xf8818000
 qla2xxx 0000:07:01.0: Configuring PCI space...
 qla2xxx 0000:07:01.0: Configure NVRAM parameters...
 qla2xxx 0000:07:01.0: Verifying loaded RISC code...
 qla2xxx 0000:07:01.0: Firmware image unavailable.
 qla2xxx 0000:07:01.0: Failed to initialize adapter
 Trying to free free IRQ169
 ACPI: PCI interrupt for device 0000:07:01.0 disabled
 ACPI: PCI interrupt for device 0000:07:01.0 disabled
 ACPI: PCI Interrupt 0000:07:01.1[B] -> GSI 97 (level, low) -> IRQ 177

After system init finished, I do the command "rmmod qla2xxx;modprobe qla2xxx",
then I get the devices such as /dev/sda.

But when I reboot another host B, host A loses it's devices again.
the system logs after I do the command "fdisk -l" are:

 rport-6:0-0: blocked FC remote port time out: removing target and saving 
binding
 6:0:0:0: SCSI error: return code = 0x10000
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
 6:0:0:0: rejecting I/O to dead device
Buffer I/O error on device sda, logical block 0
 rport-7:0-0: blocked FC remote port time out: removing target and saving 
binding
 7:0:0:0: SCSI error: return code = 0x10000
end_request: I/O error, dev sdb, sector 0
Buffer I/O error on device sdb, logical block 0
 7:0:0:0: rejecting I/O to dead device
Buffer I/O error on device sdb, logical block 0
Comment 11 Natalie Protasevich 2007-07-22 17:13:18 UTC
Is this problem still present in 2.6.22+?
Thanks.
Comment 12 Adrian Bunk 2007-09-16 13:24:41 UTC
Please reopen this bug if it's still present with kernel 2.6.22.

Note You need to log in before you can comment on or make changes to this bug.