Bug 81431
Created attachment 144791 [details]
v3.16rc7 Unsuccessful bridge window reallocation
No fix; regression affects 3.16 Created attachment 149561 [details]
v3.17rc4 Unsuccessful bridge window reallocation
Still affects v3.17rc4.
I'm unable to do bisect runs at present but this seems to be a pretty significant regression that PCI experts ought to be able to point a finger at suspect commits.
Created attachment 149581 [details]
diff -u of dmesg logs between 3.15.7 and 3.17rc4
Diff between 3.15.7 and 3.17rc4 with time-stamps removed using:
diff -u <(sed 's/^[\[ [:digit:]\.\]*] //' /var/log/dmesg) <(sed 's/^[\[ [:digit:]\.\]*] //' /var/log/dmesg.0)
Created attachment 179761 [details]
v4.1rc7 dmesg pci=realloc,use_crs
Created attachment 179771 [details]
v4.1rc7 dmesg pci=realloc,nocrs
Still affects v4.1rc7 Regression caused by: 5b28541552ef5eeffc41d6936105f38c2508e566] PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources git bisect log # bad: [7171511eaec5bf23fb06078f59784a3a0626b38f] Linux 3.16-rc1 # good: [8e56aed0b0579b667489bcb1d94c223726f0eaa1] PCI: hotplug: Remove unnecessary "dev->bus" test git bisect start '7171511' '8e56aed' '--' 'drivers/pci/' # bad: [d785260e2f57d87de5c059de2dabc3cd31b745f0] Merge branch 'pci/host-generic' into next git bisect bad d785260e2f57d87de5c059de2dabc3cd31b745f0 # bad: [e5558d1a516fa6924fa8d53152b665d4c26f142e] Merge branches 'dma-api', 'pci/virtualization', 'pci/msi', 'pci/misc' and 'pci/resource' into next git bisect bad e5558d1a516fa6924fa8d53152b665d4c26f142e # good: [518a6a34f645897ec3440e5cbcf53ced3493ee1c] Merge branches 'pci/hotplug', 'pci/msi', 'pci/virtualization' and 'pci/misc' into next git bisect good 518a6a34f645897ec3440e5cbcf53ced3493ee1c # bad: [5b28541552ef5eeffc41d6936105f38c2508e566] PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources git bisect bad 5b28541552ef5eeffc41d6936105f38c2508e566 # good: [31e9dd2565a6e27a3e698d7e3adf929db8d6c767] PCI: Don't set BAR to zero if dma_addr_t is too small git bisect good 31e9dd2565a6e27a3e698d7e3adf929db8d6c767 # good: [d739a099d0248c78d374b1b610cdb679c7bc052d] PCI: Don't add disabled subtractive decode bus resources git bisect good d739a099d0248c78d374b1b610cdb679c7bc052d # good: [14c8530dbc1b7cd5020c44b391e34bdb731fd098] PCI: Support BAR sizes up to 8GB git bisect good 14c8530dbc1b7cd5020c44b391e34bdb731fd098 # first bad commit: [5b28541552ef5eeffc41d6936105f38c2508e566] PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources (In reply to TJ from comment #5) > Created attachment 179761 [details] > v4.1rc7 dmesg pci=realloc,use_crs It has 3.15-7 instead of v4.1-rc7. Please attach correct log. Please check if git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-v4.1-rc8 works or not. Created attachment 180241 [details]
v4.1.0-rc8 with CONFIG_PCI_DEBUG=y
I've just done another run with v4.1.0-rc8 and CONFIG_PCI_DEBUG=y which might be more useful.
@Yinghai Lu: Our comments on v4.1-rc8 crossed. To avoid confusion the v4.1.0-rc8 dmesg log is for Linus' tree and doesn't include your branch 'for-pci-v4.1-rc8'. Your branch is building now and I'll upload a log and progress report once it has been tested. v3.15-6 [ 0.304561] pci 0000:00:1c.4: res[14]=[mem 0x01000000-0x06ffffff] get_res_add_size add_size 1000000 [ 0.304564] pci 0000:00:1c.4: BAR 14: assigned [mem 0xe0000000-0xe6ffffff] v3.16-rc7: [ 0.281232] pci 0000:00:1c.4: res[14]=[mem 0x01000000-0x08ffffff] get_res_add_size add_size 1000000 [ 0.281237] pci 0000:00:1c.4: BAR 14: can't assign mem (size 0x9000000) [ 0.281313] pci 0000:00:1c.4: BAR 14: can't assign mem (size 0x8000000) must+optional: 0x6000000+0x1000000 change to 0x8000000+0x1000000 so it request 0x2000000 more for must after that commit. Created attachment 180251 [details]
v4.1.0-rc8 + for-pci-v4.1-rc8 with CONFIG_PCI_DEBUG=y
That did it, thanks.
Which commit (or commits) are required to fix this. I'd like to get them cherry-picked to the Debian and Ubuntu kernels.
Created attachment 180271 [details]
correct check with old size
Created attachment 180281 [details]
correct alignment
please check those two patches one by one. Created attachment 180321 [details]
correct alignment v2 for multiple bridges
please use v2.
Created attachment 180331 [details]
correct alignment v3 for multiple bridges
Created attachment 180351 [details]
v4.1.0-rc8 + c15a69d PCI: get correct bridge mmio size ...
v4.1.0-rc8 + c15a69d
PCI: get correct bridge mmio size with old size checking
Didn't work
Created attachment 180361 [details]
v4.1.0-rc8 + 39c93b1 PCI: Optimize bus mem sizing ...
v4.1.0-rc8 + 39c93b1
PCI: Optimize bus mem sizing to small size
Doesn't work.
Created attachment 180371 [details]
v4.1.0-rc8 + c15a69d + 39c93b1
v4.1.0-rc8 + c15a69d + 39c93b1
PCI: get correct bridge mmio size with old size checking
PCI: Optimize bus mem sizing to small size
Doesn't work.
Just noticed your for-pci-v4.1-rc8 branch has been rebased and commit hashes have changed so the commit hashes here don't always match yours. These are the current hashes of the applied patches: 6aebe85 PCI: get correct bridge mmio size with old size checking 98674c2 Optimize bus mem sizing to small size Currently doing a build with 7ebfda8372fb8 added on top of the other 2 commits. 7ebfda8372fb8 PCI: don't release fixed resource for pci=realloc Created attachment 180421 [details]
dmesg v4.1-rc8 + c15a69d + 39c93b1 + 7ebfda8
v4.1-rc8 + c15a69d + 39c93b1 + 7ebfda8
PCI: get correct bridge mmio size with old size checking
PCI: Optimize bus mem sizing to small size
PCI: don't release fixed resource for pci=realloc
Doesn't work.
please check git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-v4.1-rc8 again. I dropped PCI: Optimize bus mem sizing to small size v3 from the branch. wonder if the first and third patch in the branch would fix the problem. you can try: v4.1-rc8 + b39e7316 (PCI: get correct bridge mmio size with old size checking) + 725530f4 (PCI: check pref compatible bit for mem64 resource of pcie device) or v4.1-rc8 + b39e7316 (PCI: get correct bridge mmio size with old size checking) + a9a729bf (PCI: get new realloc size for bridge that does not have children) They should make your system work again. never mind: a9a729bf (PCI: get new realloc size for bridge that does not have children) may not help. v4.1-rc8 + b39e7316 + 725530f4 doesn't work. I'm currently trying a build checked out at e26677c PCI: Don't shrink too much for hotplug bridge. If this works I'll reverse bisect to find the sweet spot. I've done tests of many combinations from your branch. The first good commit is v4.1-rc8 -> a9a729b PCI: get new realloc size for bridge that does not have children I've not been able to identify a minimal sub-set of commits that fix it so far, having tried various combinations which all fail: v4.1-rc8 -> 3b7ccb3 v4.1-rc8 -> 5698a97 v4.1-rc8 -> 3d3184a v4.1-rc8 + b39e7316 + a9a729bf v4.1-rc8 + b39e7316 + b2bbf93 I reorder patches sequence in the branch. That could let your find the patches solve the problem easier. Created attachment 180541 [details]
dmesg v4.1-rc8 + b39e731 + 725530f + a9a729b
I've identified the minimal set of commits required:
b39e731 PCI: get correct bridge mmio size with old size checking
725530f PCI: check pref compatible bit for mem64 resource of pcie device
a9a729b PCI: get new realloc size for bridge that does not have children
Thanks very much for addressing this issue.
I don't see anything merged into mainline so far. Will these patches be making it into v4.2? Not for v4.2. Should submit for v4.3 after v4.2-rc1 is released, and they will be marked to stable. BTW, please check the git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-v4.2-rc1 is still working on your setup. Thanks please check the git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-v4.2-rc1 as one patch is dropped. PCI: get correct bridge mmio size with old size checking Sorry for the delay in getting back to this issue. Testing latest mainline HEAD + for-pci-v4.2-rc1 failed. mainline HEAD @ 1c4c715 Merge tag 'ext4_for_linus_stable' I was unable to grab the dmesg output because I accidentally built a defconfig which didn't include the dm_crypt/cryptseup modules in the initrd. I'm running another build and will report back later. Created attachment 182081 [details]
dmesg v4.2-rc1 + for-pci-v4.2-rc1
Log of failed v4.2-rc1 + for-pci-v4.2-rc1
can you post output from "cat /proc/iomem" ? but the allocation result is the same as that in comment32. they both can not assign some ROM bars at last. Apologies, it is working! I was misled due to the regions reported as 'disabled' (by lspci), the nvidia driver is not build-able against v4.2 due to a EXPORT_GPL_SYMBOL issue and therefore its tell-tale messages were not available, and the nouveau driver loaded but didn't attach to the external GPUs. Turned out I had to explicitly do "modprobe nouveau modeset=1". If you really want to get rid of ROM bar assign problem, you can boot with "pci=realloc,assign_pref_bars". I hate to resurrect an old thread, but Yinghai - I appear to be having the same issue with PCI through a thunderbolt connection. I have pulled down your for4.2-rc1 tree and was going to use it, but wanted to check if it has been committed upstream yet, and/or if there is a better revision to use recently before I get started? It is not in upstream yet. could be v4.4, as I have to post it after 4.3-rc1. so please try git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-pci-v4.3 Created attachment 198991 [details]
boot dmesg 4.4.0-rc8 + for-pci-v4.5-next
This has been broken since August 2014 - can we *please* get the fixes into mainline?!
The -for-pci-v4.5-next branch of patches applied to the current v4.4-rc8 master (not sure about revisions since I reported a solution on 2015-07-07) cause even worse failure than before. Now, lspci cannot report the devices due to:
0c:00.0 PCI bridge [0604]: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 [10de:05be] (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: pcieport
A hot-plug of the device reports, amongst other things:
pcieport 0000:0c:00.0: Refused to change power state, currently in D3
pcieport 0000:0d:00.0: Refused to change power state, currently in D3
pcieport 0000:0d:02.0: Refused to change power state, currently in D3
pci_bus 0000:0d: busn_res: [bus 0d] is released
I attach boot dmesg, lspci, and hotplug kern.log.
Created attachment 199001 [details]
lspci -xxvvvnnk 4.4.0-rc8 + for-pci-v4.5-next
Created attachment 199011 [details]
hotplug kern.log 4.4.0-rc8 + for-pci-v4.5-next
Created attachment 199021 [details]
working boot dmesg 4.4.0-rc8 + for-pci-v4.5-next
Ignore my last report about it being broken with v4.4 + for-pci-v4.5-next. I can't pinpoint the cause but it seems there was some kind of hardware glitch that survived multiple reboots and power downs of the external PCIe/NVS420 device.
$ uname -r
4.4.0-rc8+
$ lspci -tvvvvnn
-[0000:00]-+-00.0 Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub [8086:2a00]
+-01.0-[01]----00.0 NVIDIA Corporation G84M [GeForce 8600M GT] [10de:0407]
+-1a.0 Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 [8086:2834]
+-1a.1 Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 [8086:2835]
+-1a.7 Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 [8086:283a]
+-1b.0 Intel Corporation 82801H (ICH8 Family) HD Audio Controller [8086:284b]
+-1c.0-[09]----00.0 Marvell Technology Group Ltd. 88E8040 PCI-E Fast Ethernet Controller [11ab:4354]
+-1c.1-[0b]----00.0 Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection [8086:4229]
+-1c.4-[0c-0f]----00.0-[0d-0f]--+-00.0-[0e]----00.0 NVIDIA Corporation G98 [Quadro NVS 420] [10de:06f8]
| \-02.0-[0f]----00.0 NVIDIA Corporation G98 [Quadro NVS 420] [10de:06f8]
+-1d.0 Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 [8086:2830]
+-1d.1 Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 [8086:2831]
+-1d.2 Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 [8086:2832]
+-1d.7 Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 [8086:2836]
+-1e.0-[03]--+-09.0 Ricoh Co Ltd R5C832 IEEE 1394 Controller [1180:0832]
| +-09.1 Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter [1180:0822]
| +-09.2 Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter [1180:0592]
| \-09.3 Ricoh Co Ltd xD-Picture Card Controller [1180:0852]
+-1f.0 Intel Corporation 82801HM (ICH8M) LPC Interface Controller [8086:2815]
+-1f.1 Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) IDE Controller [8086:2850]
+-1f.2 Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] [8086:2829]
\-1f.3 Intel Corporation 82801H (ICH8 Family) SMBus Controller [8086:283e]
Created attachment 199041 [details]
failing boot dmesg 4.4.0-rc8 + for-pci-v4.5-next
Unfortunately I was too optimistic. It seems it will work about 1 boot in 15 or so, and requires a complete power off in order to have a chance of working after a reboot.
When it fails none of the bridge windows for the GPUs is activated, leaving just:
0c:00.0 PCI bridge [0604]: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 [10de:05be] (rev a3)
I've also tried adding "hpmemsize=600M" to try and get hotplug to work to avoid the power/reboot cycling, and also triggering a rescan of the 0c:00.0 bridge via sysfs, but those don't help either.
This is still a regression affecting 4.15-rc9. The patches by Yinghai Lu seem to have gone AWOL. Created attachment 273847 [details]
dmesg v4.15-rc9 failing
Created attachment 273883 [details]
dmesg 4.15.0-rc9 PCI debug (boot, before device insertion)
Attaching a couple of logs gathered with PCI_DEBUG=y and loglevel debug.
First log is the system booting.
Second log is when the device is probed (ExpressCard/34 inserted)
Created attachment 273885 [details]
dmesg 4.15.0-rc9 PCI debug (device insertion)
|
Created attachment 144781 [details] v3.15.7 Successful bridge window reallocation Up until v3.15.7 booting with "pci=realloc,use_crs" successfully modifies the PCI bridge windows to allow an external ExpressCard <> ViDock4 + Nvidia Quadro NVS420 with its 2 GPUs to be configured. With v3.16rc6 and rc7 it fails to do this, leaving the NVS420 inoperable. I am attaching 2 dmesg captures. They are mainline builds packaged by Ubuntu. I've been using these builds for several years; this is the first time I've found a regression of this kind.