Bug 121531 - Adaptec 7805H SAS HBA (pm80xx): hangs when writing >80MB at once
Summary: Adaptec 7805H SAS HBA (pm80xx): hangs when writing >80MB at once
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: linux-scsi@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-07-06 15:43 UTC by Martin von Wittich
Modified: 2016-11-06 19:16 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.16.0-4-amd64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dd loop output, writing 64 - 128 MB to a disk (2.83 KB, text/plain)
2016-07-06 15:43 UTC, Martin von Wittich
Details
dmidecode output (17.13 KB, text/plain)
2016-07-06 15:44 UTC, Martin von Wittich
Details
lspci -vnn output (23.08 KB, text/plain)
2016-07-06 15:44 UTC, Martin von Wittich
Details
modinfo pm80xx output (2.75 KB, text/plain)
2016-07-06 15:44 UTC, Martin von Wittich
Details
smartctl -a /dev/sdb output (2.67 KB, text/plain)
2016-07-06 15:45 UTC, Martin von Wittich
Details
smartctl -a /dev/sdc output (3.81 KB, text/plain)
2016-07-06 15:45 UTC, Martin von Wittich
Details
uname -a, cat /proc/version output (299 bytes, text/plain)
2016-07-06 15:46 UTC, Martin von Wittich
Details
dmesg output after booting, before writing to the disks (63.98 KB, text/plain)
2016-07-06 15:47 UTC, Martin von Wittich
Details
dmesg output a few seconds after writing >128 MB to one disk (110.36 KB, text/plain)
2016-07-06 15:48 UTC, Martin von Wittich
Details
dmesg output another few seconds later, completely filled with pm80xx errors (125.49 KB, text/plain)
2016-07-06 15:49 UTC, Martin von Wittich
Details
pm8001_defs.h patch (553 bytes, patch)
2016-10-25 15:21 UTC, Chloé Desoutter
Details | Diff
pm8001_defs.h patch [2] (628 bytes, patch)
2016-10-26 12:52 UTC, Chloé Desoutter
Details | Diff

Description Martin von Wittich 2016-07-06 15:43:09 UTC
Created attachment 222171 [details]
dd loop output, writing 64 - 128 MB to a disk

One of our customers attempted to install our Debian 8-based distribution on a Fujitsu PRIMERGY TX150 S8 server with an Adaptec 7805H SAS HBA. Unfortunately, the system tended to lock up during use; almost all services stopped responding, but it was still possible to run simple commands via SSH, e.g. "ssh server 'cat /proc/loadavg'" or "ssh server dmesg". Everything that required write access (like actually logging in via SSH, or using the web interface) just seemed to hang. Load average was extremely high (>100) and dmesg reported a lot of sas/pm80xx errors:

[11748.246360] sas: trying to find task 0xffff88082fcc7d40
[11748.246362] sas: sas_scsi_find_task: aborting task 0xffff88082fcc7d40
[11748.246572] pm80xx mpi_ssp_completion 1514:sas IO status 0x1
[11748.246574] pm80xx mpi_ssp_completion 1523:SAS Address of IO Failure Drive:5000c50062c1b09d
[11748.246576] sas: task done but aborted
[11748.246581] sas: sas_scsi_find_task: task 0xffff88082fcc7d40 is done
[11748.246583] sas: sas_eh_handle_sas_errors: task 0xffff88082fcc7d40 is done
[11748.246585] sas: trying to find task 0xffff88082fcc7c00
[11748.246587] sas: sas_scsi_find_task: aborting task 0xffff88082fcc7c00
[11748.246829] pm80xx mpi_ssp_completion 1514:sas IO status 0x1
[11748.246831] pm80xx mpi_ssp_completion 1523:SAS Address of IO Failure Drive:5000c50062c1b09d
[11748.246832] sas: task done but aborted
[11748.246837] sas: sas_scsi_find_task: task 0xffff88082fcc7c00 is done
[11748.246839] sas: sas_eh_handle_sas_errors: task 0xffff88082fcc7c00 is done
[11748.246841] sas: trying to find task 0xffff88082fcc7ac0
[11748.246844] sas: sas_scsi_find_task: aborting task 0xffff88082fcc7ac0
[11748.247055] pm80xx mpi_ssp_completion 1514:sas IO status 0x1
[11748.247057] pm80xx mpi_ssp_completion 1523:SAS Address of IO Failure Drive:5000c50062c1b09d
[11748.247059] sas: task done but aborted
[11748.247064] sas: sas_scsi_find_task: task 0xffff88082fcc7ac0 is done
[11748.247067] sas: sas_eh_handle_sas_errors: task 0xffff88082fcc7ac0 is done
[11748.247069] sas: trying to find task 0xffff88082fcc7840
[11748.247070] sas: sas_scsi_find_task: aborting task 0xffff88082fcc7840
[11748.247366] pm80xx mpi_ssp_completion 1514:sas IO status 0x1
[11748.247368] pm80xx mpi_ssp_completion 1523:SAS Address of IO Failure Drive:5000c50062c1b09d
[11748.247370] sas: task done but aborted
[11748.247375] sas: sas_scsi_find_task: task 0xffff88082fcc7840 is done
[11748.247377] sas: sas_eh_handle_sas_errors: task 0xffff88082fcc7840 is done
[11748.247379] sas: trying to find task 0xffff88082ff72e00
[11748.247380] sas: sas_scsi_find_task: aborting task 0xffff88082ff72e00
[11748.247591] pm80xx mpi_ssp_completion 1514:sas IO status 0x1
[11748.247593] pm80xx mpi_ssp_completion 1523:SAS Address of IO Failure Drive:5000c50062c1b09d
[11748.247595] sas: task done but aborted
[11748.247600] sas: sas_scsi_find_task: task 0xffff88082ff72e00 is done
[11748.247601] sas: sas_eh_handle_sas_errors: task 0xffff88082ff72e00 is done
[11748.247603] sas: trying to find task 0xffff88082ff72400
[11748.247605] sas: sas_scsi_find_task: aborting task 0xffff88082ff72400

At first we believed the underlying cause to be a hardware problem, but the problem persisted after the HBA and the backplane were replaced (the disks were ruled out as a possible cause because the selftests reported no errors).

To isolate the issue, I ran the following tests in a live system on the affected server:

1) "smartctl -t long" on both disks; both reported "Completed", so the disks seem to be OK.

2) "dd if=/dev/sdX of=/dev/null bs=1M" on both disks; both completed successfully, with an average speed of ~150 MB/s. Reading seems to be fine too.

3) "dd if=/dev/zero of=/dev/sdX bs=1M" on both disks. It stopped responding, and dmesg started spewing lots of sas/pm80xx errors. So apparently writing to the disks causes the problem.

To track it down further, I tried to repeatedly write 64 MB to one disk - this works without problems:

root@unassigned:~# for i in $(seq 1 8); do dd if=/dev/zero of=/dev/sdc bs=1M count=64; done
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.482716 s, 139 MB/s
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.482339 s, 139 MB/s
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.474302 s, 141 MB/s
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.464919 s, 144 MB/s
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.465673 s, 144 MB/s
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.465525 s, 144 MB/s
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.473932 s, 142 MB/s
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.472965 s, 142 MB/s

Then I tried to write increasing amounts of data to the disk; this reproducibly slows down at about ~80 MB. A few seconds later, dmesg starts spewing error messages.

root@unassigned:~# for i in $(seq 64 128); do dd if=/dev/zero of=/dev/sdc bs=1M count=$i; done
[...]
75+0 records in
75+0 records out
78643200 bytes (79 MB) copied, 0.595394 s, 132 MB/s
76+0 records in
76+0 records out
79691776 bytes (80 MB) copied, 33.6425 s, 2.4 MB/s
77+0 records in
77+0 records out
80740352 bytes (81 MB) copied, 0.631928 s, 128 MB/s
78+0 records in
78+0 records out
81788928 bytes (82 MB) copied, 0.621007 s, 132 MB/s
79+0 records in
79+0 records out
82837504 bytes (83 MB) copied, 0.651981 s, 127 MB/s
80+0 records in
80+0 records out
83886080 bytes (84 MB) copied, 0.674202 s, 124 MB/s
81+0 records in
81+0 records out
84934656 bytes (85 MB) copied, 33.7179 s, 2.5 MB/s
82+0 records in
82+0 records out
[...]

It seems to alternate between ~130 MB/sand 1-3 MB/s, and then completely hangs after 96 records. See dd-loop.txt for the full output. The errors in dmesg:

[ 2645.124944] sas: Enter sas_scsi_recover_host busy: 146 failed: 146
[ 2645.124963] sas: trying to find task 0xffff88083658b200
[ 2645.124966] sas: sas_scsi_find_task: aborting task 0xffff88083658b200
[ 2647.457375] sas: task done but aborted
[ 2647.457382] sas: task done but aborted
[ 2647.457385] sas: task done but aborted
[ 2647.457833] sas: task done but aborted
[ 2647.457840] sas: task done but aborted
[ 2647.457843] sas: task done but aborted
[ 2647.457851] sas: task done but aborted
[ 2647.457853] sas: task done but aborted
[ 2647.457856] sas: task done but aborted
[ 2647.457860] sas: task done but aborted
[ 2647.457863] sas: task done but aborted 
[ 2647.457865] sas: task done but aborted
[ 2647.457867] sas: task done but aborted
[ 2647.457869] sas: task done but aborted
[ 2647.457872] sas: task done but aborted
[ 2647.457874] sas: task done but aborted 
[ 2647.457876] sas: task done but aborted
[ 2647.457879] sas: task done but aborted 
[ 2647.457881] sas: task done but aborted
[ 2647.457883] sas: task done but aborted 
[ 2647.457885] sas: task done but aborted
[ 2647.458125] pm80xx mpi_ssp_completion 1514:sas IO status 0x1
[ 2647.458130] pm80xx mpi_ssp_completion 1523:SAS Address of IO Failure Drive:5000c50062c1b09d
[ 2647.458135] sas: task done but aborted 
[ 2647.458156] sas: sas_scsi_find_task: task 0xffff88083658b200 is done
[ 2647.458159] sas: sas_eh_handle_sas_errors: task 0xffff88083658b200 is done
[ 2647.458162] sas: trying to find task 0xffff880837ad30c0
[ 2647.458164] sas: sas_scsi_find_task: aborting task 0xffff880837ad30c0
[ 2647.458166] sas: sas_scsi_find_task: task 0xffff880837ad30c0 is done
[ 2647.458168] sas: sas_eh_handle_sas_errors: task 0xffff880837ad30c0 is done
[ 2647.458170] sas: trying to find task 0xffff880837ad3200
[ 2647.458172] sas: sas_scsi_find_task: aborting task 0xffff880837ad3200
[ 2647.458174] sas: sas_scsi_find_task: task 0xffff880837ad3200 is done
[ 2647.458176] sas: sas_eh_handle_sas_errors: task 0xffff880837ad3200 is done
[ 2647.458178] sas: trying to find task 0xffff880838dcfa80
[ 2647.458179] sas: sas_scsi_find_task: aborting task 0xffff880838dcfa80
[ 2647.458181] sas: sas_scsi_find_task: task 0xffff880838dcfa80 is done
[ 2647.458183] sas: sas_eh_handle_sas_errors: task 0xffff880838dcfa80 is done
[ 2647.458198] sas: trying to find task 0xffff880838d31700
[ 2647.458200] sas: sas_scsi_find_task: aborting task 0xffff880838d31700
[ 2647.458605] pm80xx mpi_ssp_completion 1514:sas IO status 0x1
[ 2647.458611] pm80xx mpi_ssp_completion 1523:SAS Address of IO Failure Drive:5000c50062c1b09d
[ 2647.458616] sas: task done but aborted
[ 2647.458638] sas: sas_scsi_find_task: task 0xffff880838d31700 is done
[ 2647.458641] sas: sas_eh_handle_sas_errors: task 0xffff880838d31700 is done
[ 2647.458644] sas: trying to find task 0xffff880838ca6e80
[ 2647.458646] sas: sas_scsi_find_task: aborting task 0xffff880838ca6e80
[ 2647.459184] pm80xx mpi_ssp_completion 1514:sas IO status 0x1
[ 2647.459190] pm80xx mpi_ssp_completion 1523:SAS Address of IO Failure Drive:5000c50062c1b09d
[ 2647.459194] sas: task done but aborted
[ 2647.459217] sas: sas_scsi_find_task: task 0xffff880838ca6e80 is done
[ 2647.459220] sas: sas_eh_handle_sas_errors: task 0xffff880838ca6e80 is done
[ 2647.459222] sas: trying to find task 0xffff88083658b480
[ 2647.459225] sas: sas_scsi_find_task: aborting task 0xffff88083658b480
[...]

To finally rule out a hardware issue, I installed Windows 10 onto one of the disks and copied the Windows 10 installation image (~ 5 GB) from a USB stick onto the first disk, then formatted the second disk too and copied the image on that disk too. That worked without problems, so I'm pretty sure that this has to be a bug in the Linux driver.

I'll attach full dmesg copies, dmidecode/lspci/smartctl/uname output after filing this bug.
Comment 1 Martin von Wittich 2016-07-06 15:44:11 UTC
Created attachment 222181 [details]
dmidecode output
Comment 2 Martin von Wittich 2016-07-06 15:44:39 UTC
Created attachment 222191 [details]
lspci -vnn output
Comment 3 Martin von Wittich 2016-07-06 15:44:58 UTC
Created attachment 222201 [details]
modinfo pm80xx output
Comment 4 Martin von Wittich 2016-07-06 15:45:16 UTC
Created attachment 222211 [details]
smartctl -a /dev/sdb output
Comment 5 Martin von Wittich 2016-07-06 15:45:29 UTC
Created attachment 222221 [details]
smartctl -a /dev/sdc output
Comment 6 Martin von Wittich 2016-07-06 15:46:06 UTC
Created attachment 222231 [details]
uname -a, cat /proc/version output
Comment 7 Martin von Wittich 2016-07-06 15:47:28 UTC
Created attachment 222241 [details]
dmesg output after booting, before writing to the disks
Comment 8 Martin von Wittich 2016-07-06 15:48:04 UTC
Created attachment 222251 [details]
dmesg output a few seconds after writing >128 MB to one disk
Comment 9 Martin von Wittich 2016-07-06 15:49:30 UTC
Created attachment 222261 [details]
dmesg output another few seconds later, completely filled with pm80xx errors
Comment 10 Martin von Wittich 2016-07-06 16:03:18 UTC
I forgot to mention: the issue is also reproducible on a Ubuntu 16.04 live system with Linux 4.4, by running "dd if=/dev/zero of=/dev/sdX bs=1M count=128".
Comment 11 Jack Wang 2016-07-06 16:15:26 UTC
Have you tried the version from microsemi? they have a lot of changes which are not
in upstream: 
http://storage.microsemi.com/en-us/support/sas/sas/asa-7805h/
Comment 12 Martin von Wittich 2016-07-07 11:38:45 UTC
@Jack: No, I hadn't tried that yet, but I will now:

I downloaded the driver from the homepage and installed the adaptec/debian_7.4/x64/pm80xx-1.4.0-11068-debian64.deb package on our dev server. After DKMS had compiled the pm80xx.ko module, I unloaded the pm80xx module on the installer live system and replaced /lib/modules/3.16.0-4-amd64/kernel/drivers/scsi/pm8001/pm80xx.ko with the newly compiled module and loaded it again. It seems to work better... I've been able to write 1024 MB successfully:

root@(none):~# dd if=/dev/zero of=/dev/sdc bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 6.51408 s, 165 MB/s

I'm now installing Debian onto the machine, I'll report back how that turns out.
Comment 13 Martin von Wittich 2016-07-07 15:36:11 UTC
Yup, the Debian installation worked and seems to work fine so far. I manually installed the pm80xx-1.4.0-11068-debian64.deb package in the target system while I was still in the installer by chrooting to /target; then I added "pm80xx" to /etc/initramfs/modules (without that update-initramfs wouldn't add the pm80xx module, I'm not really sure why).

After booting the system, I've succcessfully written ~500 GB of /dev/zero data into a file on an MD raid consisting of both of the disks. No error messages in the dmesg either.

Can you include those missing changes into the official kernel, or how can we resolve this bug? We'll ask to the customer if we can keep the server for an additional two weeks for testing, so if you need me to test builds, let me know.
Comment 14 Jack Wang 2016-07-07 16:34:31 UTC
The changes are huge, hard to do it, without help from MicroSemi/PMCs side.
And I don't have hardware to test.

I've asked developer from MicroSemi to upsteam their changes, but
sadly no reply.

2016-07-07 17:36 GMT+02:00  <bugzilla-daemon@bugzilla.kernel.org>:
> https://bugzilla.kernel.org/show_bug.cgi?id=121531
>
> --- Comment #13 from Martin von Wittich <martin.von.wittich@iserv.eu> ---
> Yup, the Debian installation worked and seems to work fine so far. I manually
> installed the pm80xx-1.4.0-11068-debian64.deb package in the target system
> while I was still in the installer by chrooting to /target; then I added
> "pm80xx" to /etc/initramfs/modules (without that update-initramfs wouldn't
> add
> the pm80xx module, I'm not really sure why).
>
> After booting the system, I've succcessfully written ~500 GB of /dev/zero
> data
> into a file on an MD raid consisting of both of the disks. No error messages
> in
> the dmesg either.
>
> Can you include those missing changes into the official kernel, or how can we
> resolve this bug? We'll ask to the customer if we can keep the server for an
> additional two weeks for testing, so if you need me to test builds, let me
> know.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 15 Jack Wang 2016-07-08 07:47:20 UTC
Hi Viswas,

Thanks for update.
Good to know MicroSemi is still working on it!

Could you update the Maintainer file about your guys working email address?

Regards,
Jack

2016-07-08 6:47 GMT+02:00 Viswas G <gviswas@gmail.com>:
> Patch set for pm80xx is pending for the last 3 quarters.
> We will submit those soon with all the buf fixes and  performance
> tuning changes.
>
> Regards,
> Viswas G
>
> On Thu, Jul 7, 2016 at 10:04 PM,  <bugzilla-daemon@bugzilla.kernel.org>
> wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=121531
>>
>> --- Comment #14 from Jack Wang <xjtuwjp@gmail.com> ---
>> The changes are huge, hard to do it, without help from MicroSemi/PMCs side.
>> And I don't have hardware to test.
>>
>> I've asked developer from MicroSemi to upsteam their changes, but
>> sadly no reply.
>>
>> 2016-07-07 17:36 GMT+02:00  <bugzilla-daemon@bugzilla.kernel.org>:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=121531
>>>
>>> --- Comment #13 from Martin von Wittich <martin.von.wittich@iserv.eu> ---
>>> Yup, the Debian installation worked and seems to work fine so far. I
>>> manually
>>> installed the pm80xx-1.4.0-11068-debian64.deb package in the target system
>>> while I was still in the installer by chrooting to /target; then I added
>>> "pm80xx" to /etc/initramfs/modules (without that update-initramfs wouldn't
>>> add
>>> the pm80xx module, I'm not really sure why).
>>>
>>> After booting the system, I've succcessfully written ~500 GB of /dev/zero
>>> data
>>> into a file on an MD raid consisting of both of the disks. No error
>>> messages in
>>> the dmesg either.
>>>
>>> Can you include those missing changes into the official kernel, or how can
>>> we
>>> resolve this bug? We'll ask to the customer if we can keep the server for
>>> an
>>> additional two weeks for testing, so if you need me to test builds, let me
>>> know.
>>>
>>> --
>>> You are receiving this mail because:
>>> You are on the CC list for the bug.
>>
>> --
>> You are receiving this mail because:
>> You are the assignee for the bug.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Comment 16 Chloé Desoutter 2016-10-25 15:20:05 UTC
Hello,

I'm a victim of the bug and this impaired heavily the use of this controller on my filer when I wrote data constantly on it. After a few hours it would freeze without possible recovery.
I've been researching the cause of this bug by comparing the source trees of Microsemi and what's in the kernel.

The SCSI queue depth is not at stake, as changing it did not fix the issue. In the  microsemi driver it is at 128, in the kernel tree it's at 508 but this changes nothing in the end (except I noticed a slight performance loss when set at 128).

However the MPI queue parameter is set way higher in the kernel tree than in the Microsemi tree.

In the Microsemi tree this is managed by the MAX_IB_QUEUE_ELEMENTS and MAX_OB_QUEUE_ELEMENTS defines. The events queue seems to be split evenly between reads and writes. The total queue length is 512. There is an equal number of inbound and outbound queues there.

In the kernel tree, this is handled by the PM8001_MPI_QUEUE define (value: 1024). There is 1 inbound queue and 4 outbound queues.

I noticed that the value PM8001_MPI_QUEUE = 1024 causes crashes of the driver on a "PMC-Sierra PM8001 SAS HBA" as reported earlier. Changing this value to 512 results in a much more stable driver. I guess setting the MPI queue to something too important results in instructions being lost when too much data gets queued and the controller cannot keep up with the writes.

I will attach the following patch.

--- linux/drivers/scsi/pm8001/pm8001_defs.h.orig	2016-10-25 15:15:40.470112331 +0000
+++ linux/drivers/scsi/pm8001/pm8001_defs.h	2016-10-24 19:13:46.533108727 +0000
@@ -76,7 +76,7 @@ enum port_type {
 
 /* driver compile-time configuration */
 #define	PM8001_MAX_CCB		 512	/* max ccbs supported */
-#define PM8001_MPI_QUEUE         1024   /* maximum mpi queue entries */
+#define PM8001_MPI_QUEUE         512   /* maximum mpi queue entries */
 #define	PM8001_MAX_INB_NUM	 1
 #define	PM8001_MAX_OUTB_NUM	 1
 #define	PM8001_MAX_SPCV_INB_NUM		1
Comment 17 Chloé Desoutter 2016-10-25 15:21:38 UTC
Created attachment 242701 [details]
pm8001_defs.h patch

PM8001_MPI_QUEUE: 1024 → 512 (more stable)
Comment 18 Chloé Desoutter 2016-10-25 23:03:27 UTC
Actually further MicroSemi driver analysis lets me think that the proper, recommended value is 256.

$ egrep '#define\s+MAX_[IO]B_QUEUE_ELEMENTS' *.h
pm8001_sas.h:#define    MAX_IB_QUEUE_ELEMENTS   256
pm8001_sas.h:#define    MAX_OB_QUEUE_ELEMENTS   256

so I shall test with this value for PM8001_MPI_QUEUE and see if I achieve real stability with constant workloads.
Comment 19 Chloé Desoutter 2016-10-25 23:11:00 UTC
Said parameter has been introduced by this commit. 99c72ebceb4dda445b4b74c6f46035feec95a2b3

The rationale is OK but the flaw is that sometimes the controller will crash completely so there needs to be another way.

I suggest we set the PM8001_MPI_QUEUE to 256 and find another way to mitigate these performance degradations because we cannot afford crashed SAS controllers.
Comment 20 Chloé Desoutter 2016-10-26 12:51:34 UTC
Currently testing w/ PM8001_MPI_QUEUE = 256.

Prospective patch attached.
Comment 21 Chloé Desoutter 2016-10-26 12:52:30 UTC
Created attachment 242801 [details]
pm8001_defs.h patch [2]

Set PM8001_MPI_QUEUE to 256, as in the MicroSemi driver.
Comment 22 Chloé Desoutter 2016-10-27 21:57:40 UTC
(In reply to Chloé Desoutter from comment #20)
> Currently testing w/ PM8001_MPI_QUEUE = 256.
> 
> Prospective patch attached.

I confirm that after 36 hours of intensive workload, I see no visible performance loss on a PM8001 and that there's been no data error since then.
Comment 23 Chloé Desoutter 2016-11-01 22:09:06 UTC
I can trigger crashes after a long while in a heavy workload, quite randomly, with 256.

Out of curiosity I checked the pmspcv driver from FreeBSD and they use a lower value still :


#define MPI_MAX_INBOUND_QUEUES          64     /**< Maximum number of inbound queues */
#define MPI_MAX_OUTBOUND_QUEUES         64     /**< Maximum number of outbound queues */

                                               /**< Max # of memory chunks supported */
#define MPI_MAX_MEM_REGIONS             (MPI_MAX_INBOUND_QUEUES + MPI_MAX_OUTBOUND_QUEUES) + 4
#define MPI_LOGSIZE                     4096  /**< default size */

so I'll try with this value and give feedback for stability and performance.

Note You need to log in before you can comment on or make changes to this bug.