Created attachment 93291 [details] lspci Distribution: Debian Sid AMD64 Hardware: Lenovo X220, i5-2520M, 8GB Hotplugging the expresscard no longer works after 3.5. Bisected to commit 71a83bd727cc31c5fe960c3758cb396267ff710e "PCI/PM: add runtime PM support to PCIe port" "Because runtime suspend is broken for some chipsets, a black list is used to disable runtime PM support for these chipsets." I'm not sure how the blacklist is supposed to look, so I reversed the test and the problem is gone. Here's a diff against 3.7.7: diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index ebb3e5f..b427aad 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -225,7 +225,7 @@ static int __devinit pcie_portdrv_probe(struct pci_dev *dev, * it by default. */ dev->d3cold_allowed = false; - if (!pci_match_id(port_runtime_pm_black_list, dev)) + if (pci_match_id(port_runtime_pm_black_list, dev)) pm_runtime_put_noidle(&dev->dev); return 0;
Created attachment 93301 [details] reverse the blacklist test
Created attachment 93401 [details] dmesg_good.txt
Created attachment 93411 [details] dmesg_fail.txt
Created attachment 94101 [details] Debug patch for acpihp hotplug
(In reply to comment #4) > Created an attachment (id=94101) [details] > Debug patch for acpihp hotplug Hi, ALLurGroceries, Can you try the above patch?
(In reply to comment #5) > (In reply to comment #4) > > Created an attachment (id=94101) [details] [details] > > Debug patch for acpihp hotplug > > Hi, ALLurGroceries, > > Can you try the above patch? Hi Huang, I tried your patch but I am not sure what it did. I don't get more debug output from acpiphp or pci_hotplug on a hotplug event: [ 70.038722] acpiphp_glue: _handle_hotplug_event_bridge: Bus check notify on \_SB_.PCI0.EXP4 [ 70.038736] acpiphp_glue: _handle_hotplug_event_bridge: re-enumerating slots under \_SB_.PCI0.EXP4 [ 70.038750] acpiphp_glue: acpiphp_check_bridge: 0 enabled, 0 disabled
(In reply to comment #6) > Hi Huang, > > I tried your patch but I am not sure what it did. I don't get more debug > output > from acpiphp or pci_hotplug on a hotplug event: > > [ 70.038722] acpiphp_glue: _handle_hotplug_event_bridge: Bus check notify > on > \_SB_.PCI0.EXP4 > [ 70.038736] acpiphp_glue: _handle_hotplug_event_bridge: re-enumerating > slots > under \_SB_.PCI0.EXP4 > [ 70.038750] acpiphp_glue: acpiphp_check_bridge: 0 enabled, 0 disabled I tried hotplugging again and this time it worked! I'm not sure what happened. Plug: [ 281.211439] acpiphp_glue: _handle_hotplug_event_bridge: Bus check notify on \_SB_.PCI0.EXP4 [ 281.211449] acpiphp_glue: _handle_hotplug_event_bridge: re-enumerating slots under \_SB_.PCI0.EXP4 [ 281.211509] pci 0000:05:00.0: [197b:2360] type 00 class 0x010601 [ 281.211578] pci 0000:05:00.0: reg 10: [io 0x0000-0x0007] [ 281.211604] pci 0000:05:00.0: reg 14: [io 0x0000-0x0003] [ 281.211630] pci 0000:05:00.0: reg 18: [io 0x0000-0x0007] [ 281.211656] pci 0000:05:00.0: reg 1c: [io 0x0000-0x0003] [ 281.211681] pci 0000:05:00.0: reg 20: [io 0x0000-0x000f] [ 281.211708] pci 0000:05:00.0: reg 24: [mem 0x00000000-0x00001fff] [ 281.211828] pci 0000:05:00.0: PME# supported from D3hot [ 281.219542] pci 0000:05:00.0: BAR 5: assigned [mem 0xf1c00000-0xf1c01fff] [ 281.219563] pci 0000:05:00.0: BAR 4: assigned [io 0x4000-0x400f] [ 281.219576] pci 0000:05:00.0: BAR 0: assigned [io 0x4010-0x4017] [ 281.219588] pci 0000:05:00.0: BAR 2: assigned [io 0x4018-0x401f] [ 281.219601] pci 0000:05:00.0: BAR 1: assigned [io 0x4020-0x4023] [ 281.219613] pci 0000:05:00.0: BAR 3: assigned [io 0x4024-0x4027] [ 281.219641] pci 0000:05:00.0: no hotplug settings from platform [ 281.219860] ahci 0000:05:00.0: enabling device (0000 -> 0003) [ 281.235240] ahci 0000:05:00.0: AHCI 0001.0000 32 slots 1 ports 3 Gbps 0x1 impl SATA mode [ 281.235260] ahci 0000:05:00.0: flags: 64bit ncq pm led clo pmp pio slum part [ 281.235924] scsi9 : ahci [ 281.236061] ata10: SATA max UDMA/133 abar m8192@0xf1c00000 port 0xf1c00100 irq 19 [ 281.236130] acpiphp_glue: acpiphp_check_bridge: 1 enabled, 0 disabled [ 281.554635] ata10: SATA link down (SStatus 0 SControl 300) Unplug: [ 358.759492] acpiphp_glue: _handle_hotplug_event_bridge: Bus check notify on \_SB_.PCI0.EXP4 [ 358.759505] acpiphp_glue: _handle_hotplug_event_bridge: re-enumerating slots under \_SB_.PCI0.EXP4 [ 359.262150] ata10: failed to stop engine (-5) [ 359.262602] acpiphp_glue: acpiphp_check_bridge: 0 enabled, 1 disabled
I need to see the output of lspci -vvv from the affected machine and the output of $ find /sys/devices/pci0000\:00/ -name control -print -exec cat {} \;
And the output of acpidump from it pretty please. Do subsequent attempts to hotplug/unplug work with the patch applied?
Created attachment 94141 [details] pci
Created attachment 94151 [details] acpidump
(In reply to comment #8) > I need to see the output of lspci -vvv from the affected machine and the > output > of I attached lspci -vvv in the initial report: https://bugzilla.kernel.org/attachment.cgi?id=93291 > $ find /sys/devices/pci0000\:00/ -name control -print -exec cat {} \; https://bugzilla.kernel.org/attachment.cgi?id=94141 (In reply to comment #9) > And the output of acpidump from it pretty please. https://bugzilla.kernel.org/attachment.cgi?id=94151 > Do subsequent attempts to hotplug/unplug work with the patch applied? Yes, with the one card I have on hand now: 05:00.0 SATA controller: JMicron Technology Corp. JMB360 AHCI Controller (rev 02) Should I try more cards? I have one or two more that are not USB bridged (those worked fine without any patches, if I remember correctly).
Thanks! One card should be enough. Does the first hotplug attempt always reliably fail (with the patch applied)?
(In reply to comment #13) > Does the first hotplug attempt always reliably fail (with the patch applied)? Hm, no. I tried around 20+ insert/eject cycles and had no problems. I rebooted a few times and the first hotplug didn't fail. I was ready to chalk it up to user error when just now I hotplugged again and it failed. Let me know what else I can do to attack this. Here's my dmesg output (failed hotplug at 55.822641): [ 52.024976] ata7: failed to stop engine (-5) [ 52.025393] acpiphp_glue: acpiphp_check_bridge: 0 enabled, 1 disabled [ 55.822621] acpiphp_glue: _handle_hotplug_event_bridge: Bus check notify on \_SB_.PCI0.EXP4 [ 55.822632] acpiphp_glue: _handle_hotplug_event_bridge: re-enumerating slots under \_SB_.PCI0.EXP4 [ 55.822641] acpiphp_glue: acpiphp_check_bridge: 0 enabled, 0 disabled [ 82.389043] acpiphp_glue: _handle_hotplug_event_bridge: Bus check notify on \_SB_.PCI0.EXP4 [ 82.389054] acpiphp_glue: _handle_hotplug_event_bridge: re-enumerating slots under \_SB_.PCI0.EXP4 [ 82.389063] acpiphp_glue: acpiphp_check_bridge: 0 enabled, 0 disabled [ 86.733002] acpiphp_glue: _handle_hotplug_event_bridge: Bus check notify on \_SB_.PCI0.EXP4 [ 86.733013] acpiphp_glue: _handle_hotplug_event_bridge: re-enumerating slots under \_SB_.PCI0.EXP4 [ 86.733073] pci 0000:05:00.0: [197b:2360] type 00 class 0x010601 [ 86.733143] pci 0000:05:00.0: reg 10: [io 0x0000-0x0007] [ 86.733169] pci 0000:05:00.0: reg 14: [io 0x0000-0x0003] [ 86.733195] pci 0000:05:00.0: reg 18: [io 0x0000-0x0007] [ 86.733221] pci 0000:05:00.0: reg 1c: [io 0x0000-0x0003] [ 86.733248] pci 0000:05:00.0: reg 20: [io 0x0000-0x000f] [ 86.733275] pci 0000:05:00.0: reg 24: [mem 0x00000000-0x00001fff] [ 86.733395] pci 0000:05:00.0: PME# supported from D3hot [ 86.741022] pci 0000:05:00.0: BAR 5: assigned [mem 0xf1c00000-0xf1c01fff] [ 86.741044] pci 0000:05:00.0: BAR 4: assigned [io 0x4000-0x400f] [ 86.741056] pci 0000:05:00.0: BAR 0: assigned [io 0x4010-0x4017] [ 86.741069] pci 0000:05:00.0: BAR 2: assigned [io 0x4018-0x401f] [ 86.741081] pci 0000:05:00.0: BAR 1: assigned [io 0x4020-0x4023] [ 86.741093] pci 0000:05:00.0: BAR 3: assigned [io 0x4024-0x4027] [ 86.741121] pci 0000:05:00.0: no hotplug settings from platform [ 86.741342] ahci 0000:05:00.0: enabling device (0000 -> 0003) [ 86.756768] ahci 0000:05:00.0: AHCI 0001.0000 32 slots 1 ports 3 Gbps 0x1 impl SATA mode [ 86.756773] ahci 0000:05:00.0: flags: 64bit ncq pm led clo pmp pio slum part [ 86.757333] scsi7 : ahci [ 86.757461] ata8: SATA max UDMA/133 abar m8192@0xf1c00000 port 0xf1c00100 irq 19 [ 86.757530] acpiphp_glue: acpiphp_check_bridge: 1 enabled, 0 disabled [ 87.079951] ata8: SATA link down (SStatus 0 SControl 300)
Please remove the debug patch, do # echo on > /sys/devices/pci0000:00/0000:00:1c.3/power/control and see if hotplug works after that.
(In reply to comment #15) > Please remove the debug patch, do > > # echo on > /sys/devices/pci0000:00/0000:00:1c.3/power/control > > and see if hotplug works after that. It works.
OK, thanks. Why exactly do you have 'auto' in that file? The kernel doesn't put it in there by default, so either you or your distro did that.
(In reply to comment #17) > Why exactly do you have 'auto' in that file? > > The kernel doesn't put it in there by default, so either you or your distro > did > that. Where would it be configured?
Somewhere in the init scripts (or systemd configuration nowadays). The distro I use doesn't do that, so I don't know really. :-) Do you use powertop? Anyway, you can make your init scripts write "on" to /sys/devices/pci0000:00/0000:00:1c.3/power/control during boot as a workaround for the time being.
(In reply to comment #18) > Where would it be configured? I found it. It's configured by laptop mode tools. I changed its config file and now it is not managing runtime PM. But power consumption is higher.
Yes, it is. So here's a tradeoff: either you can enable the laptop mode's power saving, but then you need to wake up the port manually (like in comment #15) before inserting the hot-plug device, or you can disable it, but power consumption will be higher. Can you please check if the number of ACPI interrupts (in /proc/interrupts) increases when you try to insert the removable device when there's 'auto' in the 'power/control' file of the port? Just do "cat /proc/interrupts | grep acpi" before and after inserting the card and see if the number in the second column changes.
(In reply to comment #21) > Can you please check if the number of ACPI interrupts (in /proc/interrupts) > increases when you try to insert the removable device when there's 'auto' in > the 'power/control' file of the port? > > Just do "cat /proc/interrupts | grep acpi" before and after inserting the > card > and see if the number in the second column changes. On 3.7.9, unpatched from a fresh boot: cat /sys/devices/pci0000:00/0000:00:1c.3/power/control auto Before hotplug: cat /proc/interrupts | grep acpi 9: 663 0 0 0 IO-APIC-fasteoi acpi After hotplug: cat /proc/interrupts | grep acpi 9: 712 0 0 0 IO-APIC-fasteoi acpi
Can you also look into the gpe* files in /sys/firmware/acpi/interrupts/ and see which of the numbers increase when you insert the card (in the configuration like in comment #22)?
(In reply to comment #14) > (In reply to comment #13) > > Does the first hotplug attempt always reliably fail (with the patch > applied)? > > Hm, no. I tried around 20+ insert/eject cycles and had no problems. > > I rebooted a few times and the first hotplug didn't fail. > > I was ready to chalk it up to user error when just now I hotplugged again and > it failed. Let me know what else I can do to attack this. > > Here's my dmesg output (failed hotplug at 55.822641): [snip] With the debug patch, could you find any pattern which will trigger the failing? Or just randomly? Is it possible for you to try more times? And can you provide the model number of your express card?
(In reply to comment #23) > Can you also look into the gpe* files in /sys/firmware/acpi/interrupts/ and > see > which of the numbers increase when you insert the card (in the configuration > like in comment #22)? -/sys/firmware/acpi/interrupts/gpe01: 2 enabled +/sys/firmware/acpi/interrupts/gpe01: 3 enabled -/sys/firmware/acpi/interrupts/gpe11: 777 enabled +/sys/firmware/acpi/interrupts/gpe11: 816 enabled -/sys/firmware/acpi/interrupts/gpe_all: 780 +/sys/firmware/acpi/interrupts/gpe_all: 820 (In reply to comment #24) > [snip] > > With the debug patch, could you find any pattern which will trigger the > failing? Or just randomly? Is it possible for you to try more times? And > can > you provide the model number of your express card? As far as I can tell it fails at random. I cannot reliably trigger it. The expresscard I've been using is labeled "I/Ocrest e-Sata / USB2.0", the chipset is JMicron Technology Corp. JMB360
Created attachment 94341 [details] Disable runtime power management for hotplugable root port
(In reply to comment #26) > Created an attachment (id=94341) [details] > Disable runtime power management for hotplugable root port Hi, ALLurGroceries, Could you try the above patch?
(In reply to comment #27) > (In reply to comment #26) > > Created an attachment (id=94341) [details] [details] > > Disable runtime power management for hotplugable root port > > Hi, ALLurGroceries, > > Could you try the above patch? Very sorry for the long delay, I hadn't had an opportunity where I could reboot until now. Hotplugging works with your patch. [ 60.766010] acpiphp_glue: _handle_hotplug_event_bridge: Bus check notify on \_SB_.PCI0.EXP4 [ 60.766021] acpiphp_glue: _handle_hotplug_event_bridge: re-enumerating slots under \_SB_.PCI0.EXP4 [ 60.766082] pci 0000:05:00.0: [197b:2360] type 00 class 0x010601 [ 60.766153] pci 0000:05:00.0: reg 10: [io 0x0000-0x0007] [ 60.766179] pci 0000:05:00.0: reg 14: [io 0x0000-0x0003] [ 60.766205] pci 0000:05:00.0: reg 18: [io 0x0000-0x0007] [ 60.766231] pci 0000:05:00.0: reg 1c: [io 0x0000-0x0003] [ 60.766258] pci 0000:05:00.0: reg 20: [io 0x0000-0x000f] [ 60.766285] pci 0000:05:00.0: reg 24: [mem 0x00000000-0x00001fff] [ 60.766406] pci 0000:05:00.0: PME# supported from D3hot [ 60.773741] acpiphp_glue: bus exists... trim [ 60.773900] acpiphp_glue: acpi_bus_trim return 0 [ 60.774111] pci 0000:05:00.0: BAR 5: assigned [mem 0xf1c00000-0xf1c01fff] [ 60.774129] pci 0000:05:00.0: BAR 4: assigned [io 0x4000-0x400f] [ 60.774143] pci 0000:05:00.0: BAR 0: assigned [io 0x4010-0x4017] [ 60.774157] pci 0000:05:00.0: BAR 2: assigned [io 0x4018-0x401f] [ 60.774170] pci 0000:05:00.0: BAR 1: assigned [io 0x4020-0x4023] [ 60.774183] pci 0000:05:00.0: BAR 3: assigned [io 0x4024-0x4027] [ 60.774212] pci 0000:05:00.0: no hotplug settings from platform [ 60.774515] ahci 0000:05:00.0: enabling device (0000 -> 0003) [ 60.789727] ahci 0000:05:00.0: AHCI 0001.0000 32 slots 1 ports 3 Gbps 0x1 impl SATA mode [ 60.789746] ahci 0000:05:00.0: flags: 64bit ncq pm led clo pmp pio slum part [ 60.790629] scsi6 : ahci [ 60.790864] ata7: SATA max UDMA/133 abar m8192@0xf1c00000 port 0xf1c00100 irq 19 [ 60.790921] acpiphp_glue: acpiphp_check_bridge: 1 enabled, 0 disabled [ 61.108940] ata7: SATA link down (SStatus 0 SControl 300)
To summarize, we use acpiphp on this machine because the BIOS did not grant us permission to use PCIe native hotplug. acpiphp relies on an SCI generated by the root port (00:1c.3 in this case) when a hotplug event occurs. If runtime power management turns off the root port, the SCI doesn't happen and we don't notice the hotplug event. The comment #26 patch avoids this problem by noticing that there is a hotplug slot registered on bus 05 (the secondary bus of 00:1c.3), and keeping us from turning off 00:1c.3. acpiphp registered this slot because the ACPI EXP4 device for 00:1c.3 contains an _ADR method and a SLOT device with an _RMV method. However, I don't think this is a complete solution because acpiphp does not claim all root ports that can generate hotplug SCI events. Bug #54981 is one example. Root port 00:1c.4 on this machine is another. The BIOS clearly expects SCI interrupts from EXP5 (the 00:1c.4 root port, which leads to the SDXC/MMC Host Controller at 0d:00.0), but acpiphp doesn't claim it because it has no _RMV method, so we may still turn it off.
(In reply to comment #29) > To summarize, we use acpiphp on this machine because the BIOS did not grant > us > permission to use PCIe native hotplug. acpiphp relies on an SCI generated by > the root port (00:1c.3 in this case) when a hotplug event occurs. If runtime > power management turns off the root port, the SCI doesn't happen and we don't > notice the hotplug event. In comment #3 dmesg_fail.txt, when card is hot-added: [ 38.893118] acpiphp_glue: _handle_hotplug_event_bridge: Bus check notify on \_SB_.PCI0.EXP4 [ 38.893121] acpiphp_glue: _handle_hotplug_event_bridge: re-enumerating slots under \_SB_.PCI0.EXP4 [ 38.893125] acpiphp_glue: acpiphp_check_bridge: 0 enabled, 0 disabled It appears that the SCI is generated when hotplug event occurs even if the root port is in low power state. But because the configuration space of subordinate devices is not accessible when the root port is in low power state, the hotplugged device is not detected. > The comment #26 patch avoids this problem by noticing that there is a hotplug > slot registered on bus 05 (the secondary bus of 00:1c.3), and keeping us from > turning off 00:1c.3. acpiphp registered this slot because the ACPI EXP4 > device > for 00:1c.3 contains an _ADR method and a SLOT device with an _RMV method. > > However, I don't think this is a complete solution because acpiphp does not > claim all root ports that can generate hotplug SCI events. Bug #54981 is one > example. In #54981, if acpiphp is used, PCIe hotplug does not work regardless we put the root port into low power state or not. If pciehp is used, the pciehp will register the hotplug slots, so that we will prevent putting the root port into low power state. So I think the comment #26 patch is a practical solution. > Root port 00:1c.4 on this machine is another. The BIOS clearly > expects SCI interrupts from EXP5 (the 00:1c.4 root port, which leads to the > SDXC/MMC Host Controller at 0d:00.0), but acpiphp doesn't claim it because it > has no _RMV method, so we may still turn it off. Although I can not verify it on every machine, I think in general the SCI interrupt can be generated for the device under a root port even if we put the root port into low power state. In my x220, a Renesas PCIe express card (cold plugged) can generate SCI when new USB device is plugged even if the root port 00:1c.3 is in low power state. I can not test 00:1c.4, because it appears that runtime PM for SDXC/MMC HC at 0d:00.0 does not work for me. But I think it should work.
> https://bugzilla.kernel.org/show_bug.cgi?id=53811 > It appears that the SCI is generated when hotplug event occurs even if the > root > port is in low power state. But because the configuration space of > subordinate > devices is not accessible when the root port is in low power state, the > hotplugged device is not detected. If the problem is with config space access, not with the SCI, why can't you just put the root port back in D0 before scanning its secondary bus? That seems like a more direct fix. >> However, I don't think this is a complete solution because acpiphp does not >> claim all root ports that can generate hotplug SCI events. Bug #54981 is >> one >> example. > > In #54981, if acpiphp is used, PCIe hotplug does not work regardless we put > the > root port into low power state or not. If pciehp is used, the pciehp will > register the hotplug slots, so that we will prevent putting the root port > into > low power state. So I think the comment #26 patch is a practical solution. My point here is not about power; only that acpiphp cannot reliably identify all slots, so we can't rely on slot presence as a way to decide what to do about power. On the machine in bug #54981, we must use acpiphp because the BIOS doesn't let us use pciehp.
(In reply to comment #31) > > https://bugzilla.kernel.org/show_bug.cgi?id=53811 > > > It appears that the SCI is generated when hotplug event occurs even if the > root > > port is in low power state. But because the configuration space of > subordinate > > devices is not accessible when the root port is in low power state, the > > hotplugged device is not detected. > > If the problem is with config space access, not with the SCI, why > can't you just put the root port back in D0 before scanning its > secondary bus? That seems like a more direct fix. comment #4 patch does exactly like that. ALLurGroceries tried that patch and it works for him in comment #7. But in comment #14, ALLurGroceries reported that it does not work reliably enough. So I write comment #26 patch to fix the issue in a more conservative way, and it works reliable for ALLurGroceries. > >> However, I don't think this is a complete solution because acpiphp does > not > >> claim all root ports that can generate hotplug SCI events. Bug #54981 is > one > >> example. > > > > In #54981, if acpiphp is used, PCIe hotplug does not work regardless we put > the > > root port into low power state or not. If pciehp is used, the pciehp will > > register the hotplug slots, so that we will prevent putting the root port > into > > low power state. So I think the comment #26 patch is a practical solution. > > My point here is not about power; only that acpiphp cannot reliably > identify all slots, so we can't rely on slot presence as a way to > decide what to do about power. On the machine in bug #54981, we must > use acpiphp because the BIOS doesn't let us use pciehp. If acpiphp cannot identify the slots, acpiphp cannot work regardless whether the bridge is put into low power state. If acpiphp can identify the slots, with the help of comment #26 patch, acpiphp can work properly and the root port will not be put into low power state. So I think the point is to make acpiphp identify all slots. And comment #26 patch is a valid fix.
>> If the problem is with config space access, not with the SCI, why >> can't you just put the root port back in D0 before scanning its >> secondary bus? That seems like a more direct fix. > > comment #4 patch does exactly like that. ALLurGroceries tried that patch and > it works for him in comment #7. But in comment #14, ALLurGroceries reported > that it does not work reliably enough. So I write comment #26 patch to fix > the > issue in a more conservative way, and it works reliable for ALLurGroceries. The comment #4 patch adds "pm_runtime_get_sync(&bridge->pci_dev->dev)" before re-enumerating. If that is supposed to keep the bridge in D0 but doesn't, we need to figure out why. I don't know how PM works, but maybe Rafael will chime in. > If acpiphp cannot identify the slots, acpiphp cannot work regardless whether > the bridge is put into low power state. If acpiphp can identify the slots, > with the help of comment #26 patch, acpiphp can work properly and the root > port > will not be put into low power state. So I think the point is to make > acpiphp > identify all slots. And comment #26 patch is a valid fix. Please look at the bug #54981 DSDT and explain either (1) why we should use pciehp rather than acpiphp on that machine, or (2) how we can make acpiphp identify the ExpressCard slot. My contention is that the "slot identification" model currently used by acpiphp is fundamentally broken and that we should change acpiphp so it doesn't depend on identifying slots. I don't believe it is possible to identify all possible slots by looking at the DSDT, and therefore, I don't want to make changes like the comment #26 patch that depend on identifying slots.
On Friday, April 05, 2013 10:56:30 AM Bjorn Helgaas wrote: > >> If the problem is with config space access, not with the SCI, why > >> can't you just put the root port back in D0 before scanning its > >> secondary bus? That seems like a more direct fix. > > > > comment #4 patch does exactly like that. ALLurGroceries tried that patch > and > > it works for him in comment #7. But in comment #14, ALLurGroceries > reported > > that it does not work reliably enough. So I write comment #26 patch to fix > the > > issue in a more conservative way, and it works reliable for ALLurGroceries. > > The comment #4 patch adds "pm_runtime_get_sync(&bridge->pci_dev->dev)" > before re-enumerating. If that is supposed to keep the bridge in D0 > but doesn't, we need to figure out why. I don't know how PM works, > but maybe Rafael will chime in. Quite frankly, this looks like unreliable hardware to me in the face of comment #14. It looks like with that patch applied the problem is randomly reproducible, which suggests that we probably don't wait for a sufficiently long time during the PCIe port resume. Or there's a race I'm not seeing at the moment. So it might be a good idea to repeat the experiment with a similar patch, but with increased timeouts (and the "put" part need not be "sync").
(In reply to comment #34) > On Friday, April 05, 2013 10:56:30 AM Bjorn Helgaas wrote: > > >> If the problem is with config space access, not with the SCI, why > > >> can't you just put the root port back in D0 before scanning its > > >> secondary bus? That seems like a more direct fix. > > > > > > comment #4 patch does exactly like that. ALLurGroceries tried that patch > and > > > it works for him in comment #7. But in comment #14, ALLurGroceries > reported > > > that it does not work reliably enough. So I write comment #26 patch to > fix the > > > issue in a more conservative way, and it works reliable for > ALLurGroceries. > > > > The comment #4 patch adds "pm_runtime_get_sync(&bridge->pci_dev->dev)" > > before re-enumerating. If that is supposed to keep the bridge in D0 > > but doesn't, we need to figure out why. I don't know how PM works, > > but maybe Rafael will chime in. > > Quite frankly, this looks like unreliable hardware to me in the face of > comment #14. It looks like with that patch applied the problem is randomly > reproducible, which suggests that we probably don't wait for a sufficiently > long time during the PCIe port resume. Or there's a race I'm not seeing at > the moment. > > So it might be a good idea to repeat the experiment with a similar patch, > but with increased timeouts (and the "put" part need not be "sync"). OK. I will try this.
Created attachment 97651 [details] Debug patch 3 for acpihp hotplug Hi, ALLurGroceries, Could you try this new patch? The comment #26 patch is considered not good enough.
(In reply to comment #36) > Created an attachment (id=97651) [details] > Debug patch 3 for acpihp hotplug > > Hi, ALLurGroceries, > > Could you try this new patch? The comment #26 patch is considered not good > enough. I have tested it for a day and I haven't gotten a hotplug failure yet. However, I am getting random hard lockups with 3.9-rc6. I haven't tested rc6 without your patch, so I will do that now and report back.
(In reply to comment #37) > I am getting random hard lockups with 3.9-rc6. I haven't tested rc6 > without your patch, so I will do that now and report back. I am still getting lockups even without the patch. But hotplugging works just fine without the patch applied on 3.9-rc6.
Created attachment 98541 [details] Debug patch 4 for acpihp hotplug
(In reply to comment #38) > (In reply to comment #37) > > I am getting random hard lockups with 3.9-rc6. I haven't tested rc6 > > without your patch, so I will do that now and report back. > > I am still getting lockups even without the patch. But hotplugging works just > fine without the patch applied on 3.9-rc6. Thanks a lot! Oh, I checked 3.9-rc6, and found it fixed the issue in another way, disable runtime PM for PCIe port totally. But I still hope I can fix the bug without disabling runtime PM for PCIe port totally. So some people can benefit from the power saving from runtime PM support of PCIe port. Finally, I got a eSATA expresscard as follow: 05:00.0 SATA controller: JMicron Technology Corp. JMB362 SATA Controller (rev 10) The label on card is: "Orico PNE2-EC54 EXPRESS TO eSATA 2 PORT" It appears the brand is different from yours, but the chipset is similar. I tested it on my x220 with a) 3.9-rc5 b) 3.9-rc5 + comment #4 patch c) 3.9-rc5 + comment #39 patch a) has same bug as you. But b) and c) works good for me for tens hotadd/hotremove. So is it possible for you to test b) another time and c)?
(In reply to comment #40) > a) 3.9-rc5 > b) 3.9-rc5 + comment #4 patch > c) 3.9-rc5 + comment #39 patch > > a) has same bug as you. But b) and c) works good for me for tens > hotadd/hotremove. > > So is it possible for you to test b) another time and c)? Sure, I will try rc5 with comment #39 patch. I get this error compiling rc5 with comment #4 patch (it says it cleanly applies with fuzz): CC [M] drivers/pci/hotplug/acpiphp_glue.o drivers/pci/hotplug/acpiphp_glue.c: In function ‘_handle_hotplug_event_func’: drivers/pci/hotplug/acpiphp_glue.c:1242:23: error: ‘bridge’ undeclared (first use in this function) drivers/pci/hotplug/acpiphp_glue.c:1242:23: note: each undeclared identifier is reported only once for each function it appears in make[4]: *** [drivers/pci/hotplug/acpiphp_glue.o] Error 1
Created attachment 98681 [details] Debug patch 5 for acpiphp hotplug
(In reply to comment #41) > (In reply to comment #40) > > a) 3.9-rc5 > > b) 3.9-rc5 + comment #4 patch > > c) 3.9-rc5 + comment #39 patch > > > > a) has same bug as you. But b) and c) works good for me for tens > > hotadd/hotremove. > > > > So is it possible for you to test b) another time and c)? > > Sure, I will try rc5 with comment #39 patch. I get this error compiling rc5 > with comment #4 patch (it says it cleanly applies with fuzz): > CC [M] drivers/pci/hotplug/acpiphp_glue.o > drivers/pci/hotplug/acpiphp_glue.c: In function ‘_handle_hotplug_event_func’: > drivers/pci/hotplug/acpiphp_glue.c:1242:23: error: ‘bridge’ undeclared (first > use in this function) > drivers/pci/hotplug/acpiphp_glue.c:1242:23: note: each undeclared identifier > is > reported only once for each function it appears in > make[4]: *** [drivers/pci/hotplug/acpiphp_glue.o] Error 1 Sorry, I have updated comment #4 patch with comment #42 patch, which can be applied to 3.9-rc5 cleanly. Could you use comment #42 patch in test b) above?
(In reply to comment #43) > Sorry, I have updated comment #4 patch with comment #42 patch, which can be > applied to 3.9-rc5 cleanly. Could you use comment #42 patch in test b) > above? Both of the patches you asked me to test work fine on rc5. I tested 50 consecutive insert/ejects each in addition to normal use. I hope my expresscard spring isn't getting tired... :P
(In reply to comment #44) > (In reply to comment #43) > > Sorry, I have updated comment #4 patch with comment #42 patch, which can be > > applied to 3.9-rc5 cleanly. Could you use comment #42 patch in test b) > above? > > Both of the patches you asked me to test work fine on rc5. I tested 50 > consecutive insert/ejects each in addition to normal use. I hope my > expresscard > spring isn't getting tired... :P Thanks a lot!
Hi, ALLurGroceries, as you have tested, kernel v3.9-rc6 has fixed this bug in another way, that should be: commit de7d5f7 "PCI/PM: Disable runtime PM of PCIe ports". The commit has been merged by kernel v3.9. So, I close the bug report, please reopen if not fixed.