Bug 14242 - MPT SAS Fails on heavy operations
Summary: MPT SAS Fails on heavy operations
Status: CLOSED OBSOLETE
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: scsi_drivers-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-09-27 00:32 UTC by Denys Fedoryshchenko
Modified: 2012-06-13 17:14 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.31, 2.6.31.1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Denys Fedoryshchenko 2009-09-27 00:32:41 UTC
While on 2.6.30.5 MPT SAS controller worked fine, on 2.6.31 it fails on heavy operations and start spitting errors to dmesg (they vary). Failsystems also stopped, and i am unable to reboot box properly (only over sysrq or hardreset). 

x86, Sun Fire X4100, 8 GB RAM, PAE kernel enabled, module loaded with default options

I upgrade BIOS, LSI controller BIOS to latest version, it didn't fix the bug.
I cannot do bisection, because this is loaded server and semi-embedded system. But i can do tests of patches or reverse specific commits, if you point me to exact commit.

http://www.nuclearcat.com/files/dmesg.ok from 2.6.30.5 kernel
http://www.nuclearcat.com/files/dmesg.fail from 2.6.31.1 kernel
http://www.nuclearcat.com/files/config.gz config from 2.6.31.1 kernel

Let me know if you need any additional information.
Comment 1 Andrew Morton 2009-09-30 22:49:58 UTC
Reassigned to scsi, cc'ed Eric.
Comment 2 Denys Fedoryshchenko 2009-09-30 23:29:24 UTC
If i just copy fusion directory from previous kernel it works.
Most probably changes what trigger that is (just diff between kernels):


+static void
+mpt_add_sge_64bit(void *pAddr, u32 flagslength, dma_addr_t dma_addr)
+{
+       SGESimple64_t *pSge = (SGESimple64_t *) pAddr;
+       pSge->Address.Low = cpu_to_le32
+                       (lower_32_bits((unsigned long)(dma_addr)));
+       pSge->Address.High = cpu_to_le32
+                       (upper_32_bits((unsigned long)dma_addr));
+       pSge->FlagsLength = cpu_to_le32
+                       ((flagslength | MPT_SGE_FLAGS_64_BIT_ADDRESSING));
+}

-       } else {
-               SGESimple32_t *pSge = (SGESimple32_t *) pAddr;
-               pSge->FlagsLength = cpu_to_le32(flagslength);
-               pSge->Address = cpu_to_le32(dma_addr);
+/**
+ *     mpt_add_sge_64bit_1078 - Place a simple 64 bit SGE at address pAddr (1078 workaround).
+ *     @pAddr: virtual address for SGE
+ *     @flagslength: SGE flags and data transfer length
+ *     @dma_addr: Physical address
+ *
+ *     This routine places a MPT request frame back on the MPT adapter's
+ *     FreeQ.
+ **/
+static void
+mpt_add_sge_64bit_1078(void *pAddr, u32 flagslength, dma_addr_t dma_addr)
+{
+       SGESimple64_t *pSge = (SGESimple64_t *) pAddr;
+       u32 tmp;
+
+       pSge->Address.Low = cpu_to_le32
+                       (lower_32_bits((unsigned long)(dma_addr)));
+       tmp = (u32)(upper_32_bits((unsigned long)dma_addr));
+

Following patch in upstream (but not in latest stable kernel) seems fixing my issue. Probably it must be pushed to stable kernels?

commit  c55b89fba9872ebcd5ac15cdfdad29ffb89329f0

[SCSI] mptsas : PAE Kernel more than 4 GB kernel panic

This patch is solving problem for PAE kernel DMA operation.
On PAE system dma_addr and unsigned long will have different
values.
Now dma_addr is not type casted using unsigned long.

Note You need to log in before you can comment on or make changes to this bug.