Bug 5712
Summary: | mptscsih: ioc0: task abort messages at boot. | ||
---|---|---|---|
Product: | Drivers | Reporter: | Warren Howard (warren_h) |
Component: | Other | Assignee: | Eric Moore (Eric.Moore) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.14.3 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
jd: additional info
jd: additional info(2) jd: additional info(3) /proc/config.gz jd: additional info; 2.6.16-rc1/fusionmpt ver. 3.03.06 jd: kernel/mpt-driver hangs (2.6.16-rc1/fusionmpt ver. 3.03.06) |
Description
Warren Howard
2005-12-08 05:49:31 UTC
Begin forwarded message: Date: Thu, 8 Dec 2005 05:57:42 -0800 From: bugme-daemon@bugzilla.kernel.org To: bugme-new@lists.osdl.org Subject: [Bugme-new] [Bug 5712] New: mptscsih: ioc0: task abort messages at boot. http://bugzilla.kernel.org/show_bug.cgi?id=5712 Summary: mptscsih: ioc0: task abort messages at boot. Kernel Version: 2.6.14.3 Is it possible to send me the entire dmesg boot log. Thanks, Eric Moore I have same kind of problem: [3715614.463289] mptbase: ioc1: IOCStatus(0x0003): Invalid SGL [3715614.463304] mptbase: ioc1: LogInfo(0x11070000): F/W: DMA Error [3715614.725935] mptbase: ioc1: LogInfo(0x11070000): F/W: DMA Error [3715614.725941] mptbase: ioc1: IOCStatus(0x004b): SCSI IOC Terminated [3715614.725955] mptbase: ioc1: LogInfo(0x11070000): F/W: DMA Error [3715614.725959] mptbase: ioc1: IOCStatus(0x004b): SCSI IOC Terminated [3715614.725969] mptbase: ioc1: LogInfo(0x11070000): F/W: DMA Error [3715614.725973] mptbase: ioc1: IOCStatus(0x004b): SCSI IOC Terminated [3715614.725983] mptbase: ioc1: LogInfo(0x11070000): F/W: DMA Error ... I got these errors when I copied ("cp -a") large (==2500 subdirecories, 12520 files) directory to disk-array (raid6) which is behind 2-channel LSI21320. LSI21320 isn't primary HBA so it has no effect to boot stage. ---------------------- #uname -a Linux unknown 2.6.13-rc5-noide-mpt #1 SMP Fri Aug 5 18:03:54 EEST 2005 ppc64 POWER5 (gr) CHRP IBM,9123-710 GNU/Linux #btw. (kernel is pristine 2.6.13-rc5) Created attachment 7080 [details]
jd: additional info
Created attachment 7081 [details]
jd: additional info(2)
btw. target filesystem is normal ext2:
#mount
...
/dev/md/2 on /ahost/archive type ext2 (rw,noatime)
Created attachment 7082 [details]
jd: additional info(3) /proc/config.gz
Humm, your having ppc. There have been several endian fix's since the 3.03.02 driver (which is the driver version in the 2.6.13-rc5 kernel). Wondering if you've the latest kernel yet; e.g. 2.6.15? Eric Moore You're rigth; I didn't notice version change until yesterday, so I definitely have to try newer kernel first... Dear Eric, I've tried to reproduce this bug using the 2.6.15.1 kernel and I can't. The system is booting normally with the same kernel options that were producing errors with the 2.6.14.3 kernel. Except for some problems with framebuffer (my framebuffer choices from 2.6.14.3 are not producing the desired results with 2.6.15.1), the system is booting and working smoothly with the 2.6.15.1 kernel. Please let me know if you would like me to provide anymore informations. Regards, Warren. Most likely the endianess issue was the problem. If you diff the 3.02.02 with 3.02.05 driver, then grep the diff looking for cpu, you will find alot of cpu's. Perhaps you could take the 3.02.05 driver and compile/test it in the older kernel. I suggest that you disable the CONFIG_FUSION_SAS driver, as I doubt it would compile. ERic Moore Created attachment 7121 [details]
jd: additional info; 2.6.16-rc1/fusionmpt ver. 3.03.06
Still whining same; although only "once" and it appeared only about 0.3
seconds.
Still whining with 2.6.16-rc1/fusionmpt ver. 3.03.06. It is now about 100% sure that some messages will come if machine is acting as NFS-server and big enough file (about >=2GB?) is dumped over NFS to disk (raid6 group). Machine has 4GB physical mem. It seems to be so that machine's load must be atleast >= 3...4 before error messages appear; it doesn't matter whether write target is raid0/raid6/whatnot. Also write speed to disk has no effect; if load is high enough (caused by nfsd for example) then errors will appear even if write speed is <= 10MB/s ... Single file dump (/dev/zero -> tst.dmp file)speed is now about >= 400MB/s to raid0-group (4 disks) and about 110MB/s to raid6-group (10 disks) when there is no other load than dd/cat "dumper". Created attachment 7428 [details]
jd: kernel/mpt-driver hangs (2.6.16-rc1/fusionmpt ver. 3.03.06)
2.6.16-rc1 hangs now under "medium load" 3-4 (see attachment)
If I use one of machines internal disks, which are behind
machine build-in adapter, then everything works ok, even
though load is some where between 7-10.
System seems to be more stable when I use only one disk
which is behind LSI22320. When using only one disk then
there is about 100-400 interrupts per second (50-200 /channel/s),
but with raid6/6xdisks there is about 2500-3500 irq/s before it
hangs.
If I use separate disks for every (4x)NFS-client then system seems
to be quite stable too; with 4 disks there is 100-400 irq/s also
So there seems to be some correlation with irq/s value; if value
is >=2000 then it is almost certain that kernel will hang eventually
Any updates on this problem? Thanks. Since the problem appears to be resolved, closing the bug. |