Bug 193951 - PCIe hotplug power control via sysfs broken
Summary: PCIe hotplug power control via sysfs broken
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-04 07:25 UTC by Lukas Wunner
Modified: 2017-02-05 06:28 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.10
Tree: Mainline
Regression: Yes


Attachments
Yinghai Lu's report as attachment (w/o line wrapping) (4.58 KB, text/plain)
2017-02-04 07:31 UTC, Lukas Wunner
Details
Fix reported to be working by Yinghai Lu (3.48 KB, patch)
2017-02-05 06:23 UTC, Lukas Wunner
Details | Diff
Problem case #2: Skylake machine (v4.10 log) (2.78 KB, text/plain)
2017-02-05 06:27 UTC, Lukas Wunner
Details
Problem case #2: Skylake machine (v4.10 log with 68db9bc reverted) (4.51 KB, text/plain)
2017-02-05 06:28 UTC, Lukas Wunner
Details

Description Lukas Wunner 2017-02-04 07:25:29 UTC
Opening this bugzilla entry as requested by Bjorn Helgaas.

Yinghai Lu reports:

4.9 is  working,

sca05-0a81e0db:~ # uname -a
Linux sca05-0a81e0db 4.9.0-yh #28 SMP Thu Feb 2 18:19:00 PST 2017 x86_64 x86_64 x86_64 GNU/Linux

sca05-0a81e0db:~ # echo 0 > /sys/bus/pci/slots/8/power
[  130.641527] mlx4_core 0000:65:00.0: PME# disabled
[  132.114003] iommu: Removing device 0000:65:00.0 from group 172
[  132.133504] pciehp 0000:60:03.2:pcie004: Timeout on hotplug command 0x11f1 (issued 70480 msec ago)
[  132.216228] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down
[  132.222477] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down event ignored; already powering off
sca05-0a81e0db:~ # echo 1 > /sys/bus/pci/slots/8/power
[  175.771846] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up
[  175.777898] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up event ignored; already powering on
[  175.956632] pci 0000:65:00.0: [15b3:1003] type 00 class 0x0c0600
[  175.963581] pci 0000:65:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit]
[  175.971312] pci 0000:65:00.0: reg 0x18: [mem 0x00000000-0x07ffffff 64bit pref]
[  175.980100] pci 0000:65:00.0: calling quirk_broken_intx_masking+0x0/0x20
[  175.987590] calling  quirk_broken_intx_masking+0x0/0x20 @ 16793 for 0000:65:00.0
[  175.995855] pci fixup quirk_broken_intx_masking+0x0/0x20 returned after 0 usecs for 0000:65:00.0
[  176.006876] pci 0000:65:00.0: reg 0x134: [mem 0x00000000-0x07ffffff 64bit pref]
[  176.015045] pci 0000:65:00.0: VF(n) BAR2 space: [mem 0x00000000-0x1ffffffff 64bit pref] (contains BAR2 for 64 VFs)
[  176.031852] iommu: Adding device 0000:65:00.0 to group 172
[  176.038263] pci 0000:65:00.0: BAR 2: assigned [mem 0x387800000000-0x387807ffffff 64bit pref]
[  176.047817] pci 0000:65:00.0: BAR 9: assigned [mem 0x387808000000-0x387a07ffffff 64bit pref]
[  176.057363] pci 0000:65:00.0: BAR 0: assigned [mem 0xc0000000-0xc00fffff 64bit]
[  176.065657] pcieport 0000:60:03.2: PCI bridge to [bus 65-67]
[  176.071983] pcieport 0000:60:03.2:   bridge window [io  0xa000-0xafff]
[  176.079277] pcieport 0000:60:03.2:   bridge window [mem 0xc0000000-0xc3ffffff]
[  176.087348] pcieport 0000:60:03.2:   bridge window [mem 0x387800000000-0x387bffffffff 64bit pref]
[  176.097267] pcieport 0000:60:03.2: Max Payload Size set to  256/ 256 (was  256), Max Read Rq  128
[  176.107253] pci 0000:65:00.0: Max Payload Size set to  256/ 256 (was  128), Max Read Rq  512
[  176.116910] mlx4_core: Initializing 0000:65:00.0
[  176.122103] mlx4_core 0000:65:00.0: enabling device (0140 -> 0142)
[  176.129142] mlx4_core 0000:65:00.0: enabling bus mastering
[  182.909586] mlx4_core 0000:65:00.0: Old device ETS support detected
[  182.916585] mlx4_core 0000:65:00.0: Consider upgrading device FW.
[  183.725530] mlx4_core 0000:65:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s
[  183.734471] mlx4_core 0000:65:00.0: PCIe link width is x8, device supports x8
[  184.073280] <mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0
[  184.080870] <mlx4_ib> mlx4_ib_add: counter index 1 for port 2 allocated 0
[  184.202450] RDS/IB: mlx4_0: FMR supported and preferred

sca05-0a81e0db:~ # uname -a
Linux sca05-0a81e0db 4.10.0-rc1-yh #29 SMP Thu Feb 2 18:45:03 PST 2017
x86_64 x86_64 x86_64 GNU/Linux

sca05-0a81e0db:~ # echo 0 > /sys/bus/pci/slots/8/power
[  141.838027] mlx4_core 0000:65:00.0: PME# disabled
[  143.279434] iommu: Removing device 0000:65:00.0 from group 172
[  143.292329] pcieport 0000:60:03.2: PME# enabled
[  143.297431] pciehp 0000:60:03.2:pcie004: Timeout on hotplug command 0x11f1 (issued 81476 msec ago)
[  143.337545] pcieport 0000:60:03.2: PME# disabled
[  143.380359] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down
[  143.386735] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down event ignored; already powering off
[  143.445483] pcieport 0000:60:03.2: PME# enabled
[  143.992915] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up
[  143.999004] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up event queued; currently getting powered off
[  144.025590] pcieport 0000:60:03.2: PME# disabled
[  144.133548] pcieport 0000:60:03.2: PME# enabled
[  144.333603] pciehp 0000:60:03.2:pcie004: Slot(8): Already enabled
sca05-0a81e0db:~ # [  144.357483] pcieport 0000:60:03.2: PME# disabled
[  144.465566] pcieport 0000:60:03.2: PME# enabled

sca05-0a81e0db:~ # echo 1 > /sys/bus/pci/slots/8/power
[  221.041664] pciehp 0000:60:03.2:pcie004: Slot(8): Already enabled

After reverting

From 68db9bc814362e7f24371c27d12a4f34477d9356 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas@wunner.de>
Date: Fri, 28 Oct 2016 10:52:06 +0200
Subject: PCI: pciehp: Add runtime PM support for PCIe hotplug ports

the hotplug work again.
Comment 1 Lukas Wunner 2017-02-04 07:31:27 UTC
Created attachment 254021 [details]
Yinghai Lu's report as attachment (w/o line wrapping)
Comment 2 Lukas Wunner 2017-02-04 08:20:52 UTC
Yinghai Lu reports that acquiring a runtime ref in drivers/pci/pciehp_ctrl.c:pciehp_enable_slot() does not solve the issue, but notes that an extra Link Up event is signaled with commit 68db9bc81436 applied. Perhaps this is caused by enabling PME when runtime suspending the port to D3?
Comment 3 Lukas Wunner 2017-02-05 06:23:13 UTC
Created attachment 254171 [details]
Fix reported to be working by Yinghai Lu
Comment 4 Lukas Wunner 2017-02-05 06:27:36 UTC
Created attachment 254181 [details]
Problem case #2: Skylake machine (v4.10 log)

The issue on the first problematic machine was caused by PME being enabled on runtime suspend and a fix was found.

However a second machine causes troubles even with the fix, it fails to train the link on runtime resume.
Comment 5 Lukas Wunner 2017-02-05 06:28:29 UTC
Created attachment 254191 [details]
Problem case #2: Skylake machine (v4.10 log with 68db9bc reverted)

Note You need to log in before you can comment on or make changes to this bug.