Bug 216795 - PCI resource allocation mismatch with BIOS
Summary: PCI resource allocation mismatch with BIOS
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-12-09 11:03 UTC by Mika Westerberg
Modified: 2023-02-07 06:18 UTC (History)
0 users

See Also:
Kernel Version: v6.1-rc8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Dmesg from the system (126.28 KB, text/plain)
2022-12-09 11:03 UTC, Mika Westerberg
Details
lspci -vv output before remove/rescan (196.84 KB, text/plain)
2022-12-09 11:03 UTC, Mika Westerberg
Details
/proc/iomem before remove/rescan (4.60 KB, text/plain)
2022-12-09 11:04 UTC, Mika Westerberg
Details
lspci -vv output after remove/rescan (196.63 KB, text/plain)
2022-12-09 11:04 UTC, Mika Westerberg
Details
/proc/iomem after remove/rescan (4.21 KB, text/plain)
2022-12-09 11:05 UTC, Mika Westerberg
Details
Hack patch to mimic BIOS resource allocation (3.88 KB, patch)
2023-02-07 06:18 UTC, Mika Westerberg
Details | Diff

Description Mika Westerberg 2022-12-09 11:03:06 UTC
Created attachment 303384 [details]
Dmesg from the system

The device in question is a GPU with an integrated PCIe switch connected
to a root port of a system:

0000:50:02.0 Root Port
  0000:51:00.0 Switch Upstream Port
  0000:52:01.0 Switch Downstream Port
   0000:53:00.0 GPU Endpoint

The GPU has SRIOV capability and the BIOS allocates resources for these
(see the attached dumps). However, if parts of the topology is removed
through sysfs and then re-scanned the resource allocation fails and that
leaves the GPU without any resources assigned.

The real use-case is in data centers if the GPU hangs to reset it
through Secondary Bus Reset. This avoids rebooting the whole system. The
below steps are the minimal to get it reproduced in the current
Linux mainline (v6.1-rc8).

The expectation is that the rescan results similar resource allocation
than what was done by the BIOS. What happens though is that the Linux
resource allocation seems to allocate "bigger" windows that then does
not fit into the BIOS allocated resources above the Downststream Port.

Steps
-----
1. Boot the system up
2. Take lspci and iomem dumps

# lspci -vv > lspci.before
# cp /proc/iomem iomem.before

3. Remove the Switch Downstream Port and the GPU Endpoint

# echo 1 > /sys/bus/pci/devices/0000:50:02.0/0000:51:00.0/0000:52:01.0/remove

4. Rescan from the Switch Upstream Port

# echo 1 > /sys/bus/pci/devices/0000:50:02.0/0000:51:00.0/rescan

5. Take the dumps

# lspci -vv > lspci.after
# cp /proc/iomem iomem.after

BIOS assigned resources (lspci.before)
--------------------------------------
52:01.0 PCI bridge: Intel Corporation Device 4fa4 (prog-if 00 [Normal decode])
        ...
        Bus: primary=52, secondary=53, subordinate=54, sec-latency=0
        I/O behind bridge: [disabled]
        Memory behind bridge: bb800000-bb9fffff [size=2M]
        Prefetchable memory behind bridge: 0000201c00000000-0000205e1fffffff [size=270848M]

53:00.0 Display controller: Intel Corporation Device 56c0 (rev 08)
        ...
        Region 0: Memory at 205e1f000000 (64-bit, prefetchable) [size=16M]
        Region 2: Memory at 201c00000000 (64-bit, prefetchable) [size=16G]
        Expansion ROM at bb800000 [disabled] [size=2M]
        ...
        Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 31, Total VFs: 31, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 1, stride: 1, Device ID: 56c0
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 0000205e00000000 (64-bit, prefetchable)
                Region 2: Memory at 0000202000000000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0

Linux assigned resources (lspci.after)
--------------------------------------
52:01.0 PCI bridge: Intel Corporation Device 4fa4 (prog-if 00 [Normal decode])
        ...
        Bus: primary=52, secondary=53, subordinate=54, sec-latency=0
        I/O behind bridge: [disabled]
        Memory behind bridge: bb800000-bb9fffff [size=2M]
        Prefetchable memory behind bridge: [disabled]

53:00.0 Display controller: Intel Corporation Device 56c0 (rev 08)
        ...
        Region 0: Memory at <ignored> (64-bit, prefetchable)
        Region 2: Memory at <ignored> (64-bit, prefetchable)
        ...
        Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 31, Total VFs: 31, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 1, stride: 1, Device ID: 56c0
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 0000205e00000000 (64-bit, prefetchable)
                Region 2: Memory at 0000202000000000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0

Relevant lines in dmesg
-----------------------
[  131.882092] i915 0000:53:00.0: PME# disabled
[  131.882115] i915 0000:53:00.0: vgaarb: pci_notify
[  131.997587] pci 0000:53:00.0: vgaarb: pci_notify
[  131.997646] pcieport 0000:52:01.0: PME# disabled
[  131.997658] pcieport 0000:52:01.0: vgaarb: pci_notify
[  131.997675] pci 0000:52:01.0: vgaarb: pci_notify
[  131.997690] pci 0000:53:00.0: vgaarb: pci_notify
[  131.997788] pci 0000:53:00.0: vgaarb: pci_notify
[  131.997811] pci 0000:53:00.0: device released
[  131.997820] pci_bus 0000:53: busn_res: [bus 53-54] is released
[  131.997868] pci 0000:52:01.0: vgaarb: pci_notify
[  131.997953] pcieport 0000:51:00.0: saving config space at offset 0x0 (reading 0x4fa08086)
[  131.997960] pcieport 0000:51:00.0: saving config space at offset 0x4 (reading 0x110147)
[  131.997966] pcieport 0000:51:00.0: saving config space at offset 0x8 (reading 0x6040001)
[  131.997970] pcieport 0000:51:00.0: saving config space at offset 0xc (reading 0x10008)
[  131.997975] pcieport 0000:51:00.0: saving config space at offset 0x10 (reading 0x2000000c)
[  131.997980] pcieport 0000:51:00.0: saving config space at offset 0x14 (reading 0x205e)
[  131.997985] pcieport 0000:51:00.0: saving config space at offset 0x18 (reading 0x545251)
[  131.997989] pcieport 0000:51:00.0: saving config space at offset 0x1c (reading 0x1f1)
[  131.997993] pcieport 0000:51:00.0: saving config space at offset 0x20 (reading 0xbb90bb80)
[  131.997998] pcieport 0000:51:00.0: saving config space at offset 0x24 (reading 0x1ff10001)
[  131.998002] pcieport 0000:51:00.0: saving config space at offset 0x28 (reading 0x201c)
[  131.998007] pcieport 0000:51:00.0: saving config space at offset 0x2c (reading 0x205e)
[  131.998011] pcieport 0000:51:00.0: saving config space at offset 0x30 (reading 0x0)
[  131.998015] pcieport 0000:51:00.0: saving config space at offset 0x34 (reading 0x40)
[  131.998020] pcieport 0000:51:00.0: saving config space at offset 0x38 (reading 0x0)
[  131.998024] pcieport 0000:51:00.0: saving config space at offset 0x3c (reading 0x301ff)
[  131.998072] pcieport 0000:51:00.0: PME# enabled
[  131.998122] pci 0000:52:01.0: vgaarb: pci_notify
[  131.998140] pci 0000:52:01.0: device released
[  132.009340] pcieport 0000:50:02.0: saving config space at offset 0x0 (reading 0x347a8086)
[  132.009353] pcieport 0000:50:02.0: saving config space at offset 0x4 (reading 0x100547)
[  132.009359] pcieport 0000:50:02.0: saving config space at offset 0x8 (reading 0x6040004)
[  132.009363] pcieport 0000:50:02.0: saving config space at offset 0xc (reading 0x10000)
[  132.009368] pcieport 0000:50:02.0: saving config space at offset 0x10 (reading 0x20800004)
[  132.009372] pcieport 0000:50:02.0: saving config space at offset 0x14 (reading 0x205e)
[  132.009377] pcieport 0000:50:02.0: saving config space at offset 0x18 (reading 0x545150)
[  132.009381] pcieport 0000:50:02.0: saving config space at offset 0x1c (reading 0x200000f0)
[  132.009385] pcieport 0000:50:02.0: saving config space at offset 0x20 (reading 0xbb90bb80)
[  132.009390] pcieport 0000:50:02.0: saving config space at offset 0x24 (reading 0x20710001)
[  132.009394] pcieport 0000:50:02.0: saving config space at offset 0x28 (reading 0x201c)
[  132.009398] pcieport 0000:50:02.0: saving config space at offset 0x2c (reading 0x205e)
[  132.009402] pcieport 0000:50:02.0: saving config space at offset 0x30 (reading 0x0)
[  132.009406] pcieport 0000:50:02.0: saving config space at offset 0x34 (reading 0x40)
[  132.009411] pcieport 0000:50:02.0: saving config space at offset 0x38 (reading 0x0)
[  132.009415] pcieport 0000:50:02.0: saving config space at offset 0x3c (reading 0x201ff)
[  132.009453] pcieport 0000:50:02.0: PME# enabled
[  150.136581] pci_bus 0000:51: scanning bus
[  150.148686] pcieport 0000:50:02.0: restoring config space at offset 0x2c (was 0x205e, writing 0x205e)
[  150.148700] pcieport 0000:50:02.0: restoring config space at offset 0x28 (was 0x201c, writing 0x201c)
[  150.148708] pcieport 0000:50:02.0: restoring config space at offset 0x24 (was 0x20710001, writing 0x20710001)
[  150.148783] pcieport 0000:50:02.0: PME# disabled
[  150.160911] pcieport 0000:51:00.0: restoring config space at offset 0x2c (was 0x205e, writing 0x205e)
[  150.160925] pcieport 0000:51:00.0: restoring config space at offset 0x28 (was 0x201c, writing 0x201c)
[  150.160932] pcieport 0000:51:00.0: restoring config space at offset 0x24 (was 0x1ff10001, writing 0x1ff10001)
[  150.160967] pcieport 0000:51:00.0: PME# disabled
[  150.160976] pcieport 0000:51:00.0: scanning [bus 52-54] behind bridge, pass 0
[  150.160988] pci_bus 0000:52: scanning bus
[  150.161024] pci 0000:52:01.0: [8086:4fa4] type 01 class 0x060400
[  150.161219] pci 0000:52:01.0: PME# supported from D0 D3hot D3cold
[  150.161228] pci 0000:52:01.0: PME# disabled
[  150.161372] pci 0000:52:01.0: vgaarb: pci_notify
[  150.161466] pci 0000:52:01.0: scanning [bus 53-54] behind bridge, pass 0
[  150.161536] pci_bus 0000:53: scanning bus
[  150.161565] pci 0000:53:00.0: [8086:56c0] type 00 class 0x038000
[  150.161597] pci 0000:53:00.0: reg 0x10: [mem 0x205e1f000000-0x205e1fffffff 64bit pref]
[  150.161620] pci 0000:53:00.0: reg 0x18: [mem 0x201c00000000-0x201fffffffff 64bit pref]
[  150.161656] pci 0000:53:00.0: reg 0x30: [mem 0xffe00000-0xffffffff pref]
[  150.161707] pci 0000:53:00.0: ASPM: overriding L1 acceptable latency from 0x0 to 0x7
[  150.161787] pci 0000:53:00.0: PME# supported from D0 D3hot
[  150.161794] pci 0000:53:00.0: PME# disabled
[  150.161832] pci 0000:53:00.0: reg 0x344: [mem 0x205e00000000-0x205e00ffffff 64bit pref]
[  150.161837] pci 0000:53:00.0: VF(n) BAR0 space: [mem 0x205e00000000-0x205e1effffff 64bit pref] (contains BAR0 for 31 VFs)
[  150.161854] pci 0000:53:00.0: reg 0x34c: [mem 0x202000000000-0x2021ffffffff 64bit pref]
[  150.161858] pci 0000:53:00.0: VF(n) BAR2 space: [mem 0x202000000000-0x205dffffffff 64bit pref] (contains BAR2 for 31 VFs)
[  150.162112] pci 0000:53:00.0: vgaarb: pci_notify
[  150.162173] pci_bus 0000:53: fixups for bus
[  150.162177] pci 0000:52:01.0: PCI bridge to [bus 53-54]
[  150.162187] pci 0000:52:01.0:   bridge window [mem 0xbb800000-0xbb9fffff]
[  150.162198] pci 0000:52:01.0:   bridge window [mem 0x201c00000000-0x205e1fffffff 64bit pref]
[  150.162202] pci_bus 0000:53: bus scan returning with max=53
[  150.162210] pci 0000:52:01.0: scanning [bus 53-54] behind bridge, pass 1
[  150.162219] pci_bus 0000:52: bus scan returning with max=54
[  150.162225] pcieport 0000:51:00.0: scanning [bus 52-54] behind bridge, pass 1
[  150.162233] pci_bus 0000:51: bus scan returning with max=54
[  150.162240] pci 0000:52:01.0: bridge window [mem 0x200000000-0x45ffffffff 64bit pref] to [bus 53-54] add_size 3e00000000 add_align 200000000
[  150.162259] pci 0000:52:01.0: BAR 15: no space for [mem size 0x8200000000 64bit pref]
[  150.162265] pci 0000:52:01.0: BAR 15: failed to assign [mem size 0x8200000000 64bit pref]
[  150.162270] pci 0000:52:01.0: BAR 14: assigned [mem 0xbb800000-0xbb9fffff]
[  150.162278] pci 0000:52:01.0: BAR 15: no space for [mem size 0x4400000000 64bit pref]
[  150.162282] pci 0000:52:01.0: BAR 15: failed to assign [mem size 0x4400000000 64bit pref]
[  150.162286] pci 0000:52:01.0: BAR 14: assigned [mem 0xbb800000-0xbb9fffff]
[  150.162295] pci 0000:53:00.0: BAR 2: no space for [mem size 0x400000000 64bit pref]
[  150.162299] pci 0000:53:00.0: BAR 2: failed to assign [mem size 0x400000000 64bit pref]
[  150.162304] pci 0000:53:00.0: BAR 9: no space for [mem size 0x3e00000000 64bit pref]
[  150.162308] pci 0000:53:00.0: BAR 9: failed to assign [mem size 0x3e00000000 64bit pref]
[  150.162313] pci 0000:53:00.0: BAR 0: no space for [mem size 0x01000000 64bit pref]
[  150.162316] pci 0000:53:00.0: BAR 0: failed to assign [mem size 0x01000000 64bit pref]
[  150.162321] pci 0000:53:00.0: BAR 7: no space for [mem size 0x1f000000 64bit pref]
[  150.162325] pci 0000:53:00.0: BAR 7: failed to assign [mem size 0x1f000000 64bit pref]
[  150.162329] pci 0000:53:00.0: BAR 6: assigned [mem 0xbb800000-0xbb9fffff pref]
[  150.162336] pci 0000:53:00.0: BAR 2: no space for [mem size 0x400000000 64bit pref]
[  150.162340] pci 0000:53:00.0: BAR 2: failed to assign [mem size 0x400000000 64bit pref]
[  150.162345] pci 0000:53:00.0: BAR 0: no space for [mem size 0x01000000 64bit pref]
[  150.162348] pci 0000:53:00.0: BAR 0: failed to assign [mem size 0x01000000 64bit pref]
[  150.162352] pci 0000:53:00.0: BAR 6: assigned [mem 0xbb800000-0xbb9fffff pref]
[  150.162357] pci 0000:53:00.0: BAR 9: no space for [mem size 0x3e00000000 64bit pref]
[  150.162361] pci 0000:53:00.0: BAR 9: failed to assign [mem size 0x3e00000000 64bit pref]
[  150.162365] pci 0000:53:00.0: BAR 7: no space for [mem size 0x1f000000 64bit pref]
[  150.162369] pci 0000:53:00.0: BAR 7: failed to assign [mem size 0x1f000000 64bit pref]
[  150.162374] pci 0000:52:01.0: PCI bridge to [bus 53-54]
[  150.162382] pci 0000:52:01.0:   bridge window [mem 0xbb800000-0xbb9fffff]
[  150.162418] pcieport 0000:52:01.0: vgaarb: pci_notify
[  150.162426] pcieport 0000:52:01.0: runtime IRQ mapping not provided by arch
[  150.162545] pcieport 0000:52:01.0: saving config space at offset 0x0 (reading 0x4fa48086)
[  150.162559] pcieport 0000:52:01.0: saving config space at offset 0x4 (reading 0x100143)
[  150.162565] pcieport 0000:52:01.0: saving config space at offset 0x8 (reading 0x6040000)
[  150.162570] pcieport 0000:52:01.0: saving config space at offset 0xc (reading 0x10008)
[  150.162574] pcieport 0000:52:01.0: saving config space at offset 0x10 (reading 0x0)
[  150.162579] pcieport 0000:52:01.0: saving config space at offset 0x14 (reading 0x0)
[  150.162584] pcieport 0000:52:01.0: saving config space at offset 0x18 (reading 0x545352)
[  150.162589] pcieport 0000:52:01.0: saving config space at offset 0x1c (reading 0x200000f0)
[  150.162594] pcieport 0000:52:01.0: saving config space at offset 0x20 (reading 0xbb90bb80)
[  150.162598] pcieport 0000:52:01.0: saving config space at offset 0x24 (reading 0x1fff1)
[  150.162603] pcieport 0000:52:01.0: saving config space at offset 0x28 (reading 0x0)
[  150.162607] pcieport 0000:52:01.0: saving config space at offset 0x2c (reading 0x0)
[  150.162612] pcieport 0000:52:01.0: saving config space at offset 0x30 (reading 0x0)
[  150.162616] pcieport 0000:52:01.0: saving config space at offset 0x34 (reading 0x40)
[  150.162621] pcieport 0000:52:01.0: saving config space at offset 0x38 (reading 0x0)
[  150.162625] pcieport 0000:52:01.0: saving config space at offset 0x3c (reading 0x300ff)
[  150.162766] pcieport 0000:52:01.0: vgaarb: pci_notify
[  150.162856] i915 0000:53:00.0: vgaarb: pci_notify
[  150.162868] i915 0000:53:00.0: runtime IRQ mapping not provided by arch
[  150.163121] i915 0000:53:00.0: vgaarb: pci_notify
Comment 1 Mika Westerberg 2022-12-09 11:03:57 UTC
Created attachment 303385 [details]
lspci -vv output before remove/rescan
Comment 2 Mika Westerberg 2022-12-09 11:04:30 UTC
Created attachment 303386 [details]
/proc/iomem before remove/rescan
Comment 3 Mika Westerberg 2022-12-09 11:04:58 UTC
Created attachment 303387 [details]
lspci -vv output after remove/rescan
Comment 4 Mika Westerberg 2022-12-09 11:05:25 UTC
Created attachment 303388 [details]
/proc/iomem after remove/rescan
Comment 5 Mika Westerberg 2023-01-30 09:49:11 UTC
Hi, any updates on this? Any additional information we can provide? Thanks!
Comment 6 Mika Westerberg 2023-02-07 06:18:38 UTC
Created attachment 303705 [details]
Hack patch to mimic BIOS resource allocation

Attaching a hack patch that changes the Linux allocation to mimic what the BIOS does.

Note You need to log in before you can comment on or make changes to this bug.