Created attachment 165301 [details]
dmesg for 3.4.0 3.4.1 and 3.4.1 with pci=realloc=off
Up through 3.4.0 mpt2sas successfully loads. Starting in 3.4.1, the card is found but mpt3sas fails during the load; it will load if pci=realloc=off is in the kernel command line. The same problem exists in 3.19 rc6
Created attachment 167611 [details]
dmesg with 3.19 and LSI FW 18
Created attachment 168351 [details]
Hi Paul, can you please run "sudo lspci -vvv" and attach the output? The output attached to comment #2 was not collected as root, so it doesn't contain the SR-IOV information I was looking for.
I think your 3.4.0 and 3.4.1 kernels are from here:
Inside those packages are /boot/config-3.4.0-030400-generic and /boot/config-3.4.1-030401-generic. The 3.4.0 config does not have CONFIG_PCI_REALLOC_ENABLE_AUTO set, while the 3.4.1 config has CONFIG_PCI_REALLOC_ENABLE_AUTO=y.
Created attachment 172371 [details]
lspci from machine with LSI SAS2008 card running P17 FW
See also downstream bug report https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1363313 and email thread at http://lkml.kernel.org/r/54C81B4E.firstname.lastname@example.org
The LSI SAS2008 has an SR-IOV capability. The BIOS on your platform assigns space for the regular BARs, but not for the SR-IOV BARs:
pci 0000:01:00.0: reg 10: [io 0xce00-0xceff]
pci 0000:01:00.0: reg 14: [mem 0xfbdfc000-0xfbdfffff 64bit] # BAR 1
pci 0000:01:00.0: reg 1c: [mem 0xfbd80000-0xfbdbffff 64bit] # BAR 3
pci 0000:01:00.0: reg 30: [mem 0x00000000-0x0007ffff pref]
pci 0000:01:00.0: reg 174: [mem 0x00000000-0x00003fff 64bit]
pci 0000:01:00.0: reg 17c: [mem 0x00000000-0x0003ffff 64bit]
Starting with your 3.4.1 kernel, Linux tries to assign space for the SR-IOV BARs, and in the process, it moves the regular BARs as well:
pci 0000:01:00.0: BAR 1: assigned [mem 0xeff40000-0xeff43fff 64bit]
pci 0000:01:00.0: BAR 3: assigned [mem 0xefb00000-0xefb3ffff 64bit]
Yinghai's theory is that the LSI device has firmware that caches the BAR values, and it doesn't notice when Linux changes the BARs. That seems like a plausible theory to me. We tried to test that theory with , a patch intended to reset the LSI device after changing the BARs. That failed, but I don't think we really know why.
I'd like to try that again, like this:
# echo 0000:01:00.0 > /sys/bus/pci/drivers/mpt2sas/unbind
# echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
# echo 0000:01:00.0 > /sys/bus/pci/drivers/mpt2sas/bind
This would be on a kernel that exhibits the problem. If this doesn't help, it might also be interesting to try it on a kernel that *doesn't* exhibit the problem -- the driver should release the device, then claim it again after the reset without incident.
It looks like the mpt2sas driver has been replaced by mpt3sas, so if you test on a newer kernel than 3.4.x, you might have to replace "mpt2sas" above by "mpt3sas".
Created attachment 242991 [details]
test patch #1
This is the patch Yinghai proposed  (I edited it slightly and rebased them to v4.9-rc1). This is basically what you (Paul) tested at , so I expect this should probably work. If it does, I'd like to get this merged and resolve this.
I have a machine here still running this LSI card. It is on Ubuntu
kernel 3.13.0-100. There is no pci-realloc=off entry in the boot. LSI
firmware P19. It works correctly. I've archived my emails from this, but
I remember the problem was on a single earlier firmware version, just
not which one.
On 10/27/2016 04:21 PM, email@example.com wrote:
> --- Comment #7 from Bjorn Helgaas <firstname.lastname@example.org> ---
> Created attachment 242991 [details]
> --> https://bugzilla.kernel.org/attachment.cgi?id=242991&action=edit
> test patch #1
> This is the patch Yinghai proposed  (I edited it slightly and rebased them
> to v4.9-rc1). This is basically what you (Paul) tested at , so I expect
> this should probably work. If it does, I'd like to get this merged and
>  http://lkml.kernel.org/r/54D7AE67.email@example.com
If I understand correctly, the problem occurs with LSI FW 18, but not with P19.
The workaround proposed in comment #7 would be required for FW 18. It hasn't been tested, and it's a little messy, so I'm not going to apply it untested. If upgrading to LSI FW P19 is sufficient, maybe we don't need the workaround anyway.
Please reopen if you disagree.