Currently, pciehp_resume will call pciehp_enable_slot() to add device if there is a device in the slot. But if the device was present before suspend, it's no necessary to add again. Now in such case, there is some uncomfortable message like Apr 19 19:06:50 thinkpad kernel: pciehp 0000:00:1c.1:pcie04: Device 0000:03:00.0 already exists at 0000:03:00, cannot hot-add Apr 19 19:06:51 thinkpad kernel: pciehp 0000:00:1c.1:pcie04: Cannot add device at 0000:03:00 Apr 19 23:18:51 thinkpad kernel: pciehp 0000:00:1c.1:pcie04: Device 0000:03:00.0 already exists at 0000:03:00, cannot hot-add Apr 19 23:18:51 thinkpad kernel: pciehp 0000:00:1c.1:pcie04: Cannot add device at 0000:03:00 The discussion can be also found here: http://comments.gmane.org/gmane.linux.kernel.pci/19876 I am curious if the patch [1] can be integrated into the kernel [1] http://us.generation-nt.com/patch-2-2-pci-pciehp-avoid-add-device-already-exist-during-pciehp-resume-help-211812872.html Steps to reproduce: Resume from hibernation / suspension should show up the message in your journalctl logs.
Was there any progress made on this? I've been having this quite a bit lately and am unsure how to make it stop. Granted, it's primarily an annoyance upon resume from suspend, but it bothers me to think something is not functioning properly. This is on Arch linux. $ uname -a Linux 4.5.0-1-ARCH #1 SMP PREEMPT Tue Mar 15 09:41:03 CET 2016 x86_64 GNU/Linux Message: $ dmesg |grep hot-add [ 43.105083] pciehp 0000:3d:03.0:pcie24: Device 0000:60:00.0 already exists at 0000:60:00, cannot hot-add [ 43.208376] pciehp 0000:3d:03.0:pcie24: Device 0000:60:00.0 already exists at 0000:60:00, cannot hot-add [ 2223.875613] pciehp 0000:3d:03.0:pcie24: Device 0000:60:00.0 already exists at 0000:60:00, cannot hot-add [ 2223.982274] pciehp 0000:3d:03.0:pcie24: Device 0000:60:00.0 already exists at 0000:60:00, cannot hot-add The device: $ sudo lspci -v 60:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5249 PCI Express Card Reader (rev 01) Subsystem: Hewlett-Packard Company Device 2253 Physical Slot: 0-2 Flags: bus master, fast devsel, latency 0, IRQ 10 Memory at cc000000 (32-bit, non-prefetchable) [size=4K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [148] Device Serial Number 00-00-00-01-00-4c-e0-00 Capabilities: [158] Latency Tolerance Reporting Capabilities: [160] L1 PM Substates Kernel modules: rtsx_pci I can open a new bug if that's recommended; obviously the kernel version is quite a bit ahead of when this was reported.
No it's fine try changing the function pciehp_enable_slot from : int pciehp_enable_slot(struct slot *p_slot) { u8 getstatus = 0; int rc; struct controller *ctrl = p_slot->ctrl; pciehp_get_adapter_status(p_slot, &getstatus); if (!getstatus) { ctrl_info(ctrl, "No adapter on slot(%s)\n", slot_name(p_slot)); return -ENODEV; } if (MRL_SENS(p_slot->ctrl)) { pciehp_get_latch_status(p_slot, &getstatus); if (getstatus) { ctrl_info(ctrl, "Latch open on slot(%s)\n", slot_name(p_slot)); return -ENODEV; } } if (POWER_CTRL(p_slot->ctrl)) { pciehp_get_power_status(p_slot, &getstatus); if (getstatus) { ctrl_info(ctrl, "Already enabled on slot(%s)\n", slot_name(p_slot)); return -EINVAL; } } pciehp_get_latch_status(p_slot, &getstatus); rc = board_added(p_slot); if (rc) pciehp_get_latch_status(p_slot, &getstatus); return rc; } to: int pciehp_enable_slot(struct slot *p_slot) { u8 getstatus = 0; int rc; struct controller *ctrl = p_slot->ctrl; pciehp_get_adapter_status(p_slot, &getstatus); if (!getstatus) { ctrl_info(ctrl, "No adapter on slot(%s)\n", slot_name(p_slot)); return -ENODEV; } if (MRL_SENS(p_slot->ctrl)) { pciehp_get_latch_status(p_slot, &getstatus); if (!getstatus) { ctrl_info(ctrl, "Latch open on slot(%s)\n", slot_name(p_slot)); return -ENODEV; } } if (POWER_CTRL(p_slot->ctrl)) { pciehp_get_power_status(p_slot, &getstatus); if (!getstatus) { ctrl_info(ctrl, "Already enabled on slot(%s)\n", slot_name(p_slot)); return -EINVAL; } } pciehp_get_latch_status(p_slot, &getstatus); rc = board_added(p_slot); if (rc) pciehp_get_latch_status(p_slot, &getstatus); return rc; }
I wasn't entirely sure how to do what you suggested, but here's what I inferred: - set up a custom kernel per the Arch build system [1] - grepped for pciehp_enable_slot to track down the right file to edit - editied src/linux-4.5/drivers/pci/hotplug/pciehp_ctrl.c - compile/install - reboot into new kernel Here's what I see in dmesg early on: [ 4.306599] pciehp 0000:00:1c.0:pcie04: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ LLActRep+ [ 4.306624] pciehp 0000:00:1c.0:pcie04: service driver pciehp loaded [ 4.306631] pciehp 0000:3d:02.0:pcie24: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+ [ 4.306662] pciehp 0000:3d:02.0:pcie24: service driver pciehp loaded [ 4.306671] pciehp 0000:3d:03.0:pcie24: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl- LLActRep+ [ 4.306701] pciehp 0000:3d:03.0:pcie24: service driver pciehp loaded [ 4.306705] pciehp: PCI Express Hot Plug Controller Driver version: 0.4 After a suspend/resume, I get the same: [ 1099.204710] pciehp 0000:3d:03.0:pcie24: Device 0000:60:00.0 already exists at 0000:60:00, cannot hot-add [ 1099.308013] pciehp 0000:3d:03.0:pcie24: Device 0000:60:00.0 already exists at 0000:60:00, cannot hot-add I also see these right after resume: [ 1099.102449] PM: early resume of devices complete after 0.942 msecs [ 1099.102557] e1000e 0000:00:19.0: System wakeup disabled by ACPI [ 1099.102893] pciehp 0000:3d:02.0:pcie24: Card not present on Slot(0-1) [ 1099.102915] pciehp 0000:3d:03.0:pcie24: slot(0-2): Link Up event Let me know if the process looks correct, or if I can verify that the changes are correct. I happened to note that the error sections in the file you wanted me to edit don't match the dmesg output... it's "already exists at", not "already enabled on". I grepped the linux source for the hot-add error, and found this section in driver/pci/hotplug/pciehp_pci.c: dev = pci_get_slot(parent, PCI_DEVFN(0, 0)); if (dev) { ctrl_err(ctrl, "Device %s already exists at %04x:%02x:00, cannot hot-add\n", pci_name(dev), pci_domain_nr(parent), parent->number); pci_dev_put(dev); ret = -EEXIST; goto out; } Does that interact with pciehp_ctrl, or is there a modification to that section that might solve the issue? Sorry I'm not more of a C guru to know, and thanks for the assistance! [1] https://wiki.archlinux.org/index.php/Kernels/Arch_Build_System
I tried applying a patch related to pciehp_pci.c, but that's still producing the errors.[1] It was from a long time ago and Arch doesn't ship with it, so not sure if it was ever implemented but it references this issue specifically. [1] http://www.spinics.net/lists/linux-pci/msg37049.html
Not that I can tell I would however like to see your dmesg log as this seems fixed in the unconfigure path seems your hitting it in the pciehp_configure_device path. Can you change the function pciehp_configure_device to this: int pciehp_configure_device(struct slot *p_slot) { struct pci_dev *dev; struct pci_dev *bridge = p_slot->ctrl->pcie->port; struct pci_bus *parent = bridge->subordinate; int num, ret = 0; struct controller *ctrl = p_slot->ctrl; link_active = pciehp_check_link_active(ctrl); pci_lock_rescan_remove(); dev = pci_get_slot(parent, PCI_DEVFN(0, 0)); if (dev) { ctrl_err(ctrl, "Device %s already exists at %04x:%02x:00, cannot hot-add\n", pci_name(dev), pci_domain_nr(parent), parent->number); pci_dev_put(dev); ret = -EEXIST; goto out; } num = pci_scan_slot(parent, PCI_DEVFN(0, 0)); if (num == 0) { ctrl_err(ctrl, "No new device found\n"); ret = -ENODEV; 60 goto out; 61 } 62 63 list_for_each_entry(dev, &parent->devices, bus_list) 64 if (pci_is_bridge(dev)) pci_hp_add_bridge(dev); pci_assign_unassigned_bridge_resources(bridge); pcie_bus_configure_settings(parent); pci_bus_add_devices(parent); out: pci_unlock_rescan_remove(); return ret; }
Mishit send here is the actual fix: Not that I can tell I would however like to see your dmesg log as this seems fixed in the unconfigure path seems your hitting it in the pciehp_configure_device path. Can you change the function pciehp_configure_device to this: int pciehp_configure_device(struct slot *p_slot) { struct pci_dev *dev; struct pci_dev *bridge = p_slot->ctrl->pcie->port; struct pci_bus *parent = bridge->subordinate; int num, ret = 0; struct controller *ctrl = p_slot->ctrl; link_active = pciehp_check_link_active(ctrl); pci_lock_rescan_remove(); dev = pci_get_slot(parent, PCI_DEVFN(0, 0)); if (dev && link_active) { ctrl_err(ctrl, "Device %s already exists at %04x:%02x:00, cannot hot-add\n", pci_name(dev), pci_domain_nr(parent), parent->number); pci_dev_put(dev); ret = -EEXIST; goto out; } num = pci_scan_slot(parent, PCI_DEVFN(0, 0)); if (num == 0) { ctrl_err(ctrl, "No new device found\n"); ret = -ENODEV; goto out; } list_for_each_entry(dev, &parent->devices, bus_list) if (pci_is_bridge(dev)) pci_hp_add_bridge(dev); pci_assign_unassigned_bridge_resources(bridge); pcie_bus_configure_settings(parent); pci_bus_add_devices(parent); out: pci_unlock_rescan_remove(); return ret; }
I'm getting build failures. Could you specify which files to edit? I started with a fresh kernel source and changed the above in peichp_pci.c. Then I tried having both the last change (in pciehp_pci.c) as well as the change two posts up from you in pciehp_ctrl.c, and am still getting a build failure for target 'drivers.' Let me know if I didn't understand the suggestion properly. Here's dmesg output from boot through one suspend/resume cycle: - http://pastebin.com/rJZSG2xH Thanks!
Actually for the build error change this line to: link_active = pciehp_check_link_active(ctrl); to: bool link_active =false; link_active = pciehp_check_link_active(ctrl); as this was probably your build error as I didn't declare that variable and let's get if the patch just needs to be in both config and unconfig paths.
Cool. Building. For clarity, I have a fresh src tree and have only applied the above to pciehp_pci.c, not the original change to pciehp_ctrl.c. Is that ok, or do I need both?
Tried just the latest mod as well as both together. Still getting the issue. Latest dmesg here (can't discern any difference by diff-ing them, though). - http://pastebin.com/zFPXdvd1
Created attachment 215531 [details] Test Patch
I just uploaded a patch that may fix your issue.
Getting this on build: drivers/pci/hotplug/pciehp_core.c: In function ‘pciehp_resume’: drivers/pci/hotplug/pciehp_core.c:306:19: error: ‘pbus_’ undeclared (first use in this function) if (list_empty(&pbus_>devices)) ^ drivers/pci/hotplug/pciehp_core.c:306:19: note: each undeclared identifier is reported only once for each function it appears in drivers/pci/hotplug/pciehp_core.c:306:25: error: ‘devices’ undeclared (first use in this function) if (list_empty(&pbus_>devices)) ^ drivers/pci/hotplug/pciehp_core.c:309:23: error: ‘pbus’ undeclared (first use in this function) else if(!list_empty(&pbus->devices)){ ^ scripts/Makefile.build:258: recipe for target 'drivers/pci/hotplug/pciehp_core.o' failed make[3]: *** [drivers/pci/hotplug/pciehp_core.o] Error 1 scripts/Makefile.build:407: recipe for target 'drivers/pci/hotplug' failed make[2]: *** [drivers/pci/hotplug] Error 2 scripts/Makefile.build:407: recipe for target 'drivers/pci' failed make[1]: *** [drivers/pci] Error 2 make[1]: *** Waiting for unfinished jobs.... Makefile:950: recipe for target 'drivers' failed make: *** [drivers] Error 2
Yikes, forget the line that declares that variable. Below this is a patch that should fix.
Created attachment 215551 [details] Test Patch
Hmmm. Still getting an error: drivers/pci/hotplug/pciehp_core.c: In function ‘pciehp_resume’: drivers/pci/hotplug/pciehp_core.c:306:19: error: ‘pbus_’ undeclared (first use in this function) if (list_empty(&pbus_>devices)) ^ drivers/pci/hotplug/pciehp_core.c:306:19: note: each undeclared identifier is reported only once for each function it appears in drivers/pci/hotplug/pciehp_core.c:306:25: error: ‘devices’ undeclared (first use in this function) if (list_empty(&pbus_>devices)) Sorry I don't know C :( I did try and look at other definitions, e.g. slot, and it features an initialization line: struct slot *slot = ctrl->slot; Followed by: slot = ctrl->slot; Do there need to be various initializations/re-definitions? Sorry... trying to leverage my arduino knowledge and somehow make sense of the error. For reference, the patch appears to be taking (just covering obvious errors on my part): $ grep pbus drivers/pci/hotplug/pciehp_core.c struct pci_bus *pbus = dev->port->subordinate;
No there don't but we can clean it up afterwards, it's more important that your issue is fixed first :).
I'm stuck I guess. It's still throwing the error about pbus not being declared despite the addition in the second patch. Scratch that, while I was typing I took one more look and caught what I think was the error: I think this: if (list_empty(&pbus_>devices)) Should have matched this: else if(!list_empty(&pbus->devices)) It's building after correcting the "_" to a "-". Will post back soon on results.
It built! Perhaps some slight progress, as I've always gotten the hot-add errors in pairs, where as this time I only got one (same device). Full dmesg through one suspend/resume cyle here: - http://pastebin.com/EUr8C8RU
Just checking this is the lines your talking about: [ 31.681019] pciehp 0000:3d:03.0:pcie24: Device 0000:60:00.0 already exists at 0000:60:00, cannot hot-add [ 31.681020] pciehp 0000:3d:03.0:pcie24: Cannot add device at 0000:60:00
Can you use the device now as that seems to me at warning message added by my patch if so then I guess this bug is closed(sorry about the second message).
Yes, those are the lines I'm referring to. From the original post, these warnings are what I thought this whole bug were about (as in, let's make them go away)? I don't understand when you say they were "added by your patch." I've had them the whole time (see my original comment 1, as well as comment 3). Now there is only one set of them instead of the typical behavior of 2 at a time, but it's still there and has not changed. Let me know if I'm not understanding what this bug report is about. Also, I don't want to use the device, I want to fix why linux thinks there's some issue with it every time I resume my system. Thanks!
There isn't a issue how what was happening is that your resuming and the card was checking against shared fields for different devices in the system. This seems fixed to be as it is correctly complaining due to the card already being set up on this line: [ 31.580730] pciehp 0000:3d:03.0:pcie24: slot(0-2): Link Up event If you did want to remove it we can try getting around it the driver's probe mention but otherwise it seems fine to me now.
I still don't get it, but I'm no kernel expert so I trust your word! Perhaps I misunderstood the point of the very first fix... the symptom of the issue seemed to be the hot-add lines in dmesg, which is what I experience. All three of my dmesg's posted above have the same "pciehp dev: slot(0-2): link up event" lines, so I'm left wondering if there was ever an issue? Do I even need the patch if this was correctly occurring before applying it? Sorry for being dense, just trying to understand!
Here's the issue. You basically were starting to reinsert the card again after the first link up and therefore got a pair of messages. This was due to a check on a shared piece of information when linking the card(s), therefore we need to link the card up first and if it already exists then just warn the user that this link up already occurred. Does that make sense to you?
I think so. So the bug was entirely about the messages coming in pairs, not the message itself. Is that accurate? If so, I'm good with your patch and will ignore them from here out.
Yes that's exactly it if the device is not working after resume please report back, as for now I don't have the permissions so closing this ticket is up to you.
Whoops -- didn't realize my comment re-opened. Thanks so much for your persistence, assistance, and the patch. Closing.
Sorry, are you sure it's in my control? I don't see any option to close... I see "Status:NEW" but no apparent place to change it. I thought clicking the hyperlinked status would let me close, but I just get a cheat sheet of potential states. Wondering if this is up to the OP?
Maybe, anyhow don't worry about it seems to be fixed.
Still an issue in Linux kernel 4.6.4 [181422.128540] pciehp 0000:00:1c.1:pcie04: Device 0000:03:00.0 already exists at 0000:03:00, cannot hot-add [181422.128541] pciehp 0000:00:1c.1:pcie04: Cannot add device at 0000:03:00
Is this just a reminder or are you getting this error with my patch applied still and the system is not working. If that is the case please sent me your dmesg and lspci output.
(In reply to bastienphilbert from comment #32) > Is this just a reminder or are you getting this error with my patch applied > still and the system is not working. If that is the case please sent me your > dmesg and lspci output. Hi. No, I haven't applied the patch yet. I misunderstood comment #28 that this was fixed in the mainline kernel. I will recompile my kernel now with the patch applied.
It doesn't compile: CC drivers/pci/hotplug/pciehp_core.o drivers/pci/hotplug/pciehp_core.c: In function ‘pciehp_resume’: drivers/pci/hotplug/pciehp_core.c:306:19: error: ‘pbus_’ undeclared (first use in this function) if (list_empty(&pbus_>devices)) ^~~~~ drivers/pci/hotplug/pciehp_core.c:306:19: note: each undeclared identifier is reported only once for each function it appears in drivers/pci/hotplug/pciehp_core.c:306:25: error: ‘devices’ undeclared (first use in this function) if (list_empty(&pbus_>devices)) ^~~~~~~ make[3]: *** [scripts/Makefile.build:292: drivers/pci/hotplug/pciehp_core.o] Error 1 make[2]: *** [scripts/Makefile.build:440: drivers/pci/hotplug] Error 2 make[1]: *** [scripts/Makefile.build:440: drivers/pci] Error 2 make: *** [Makefile:963: drivers] Error 2
Change those lines to -> after applying the patch and it should build. Then test it and see if the issue is indeed fixed for you.
(In reply to bastienphilbert from comment #35) > Change those lines to -> after applying the patch and it should build. Then > test it and see if the issue is indeed fixed for you. Thank you very much. The patch gets rid of the error and now I have a clean dmesg output again. Do you plan on merging this into the mainline kernel at some point? Thanks again!
I have a question. Sorry in advance if this is information I should already know. Is this warning purely a "cosmetic" issue or is there some functionality broken, or wrong counter ,or memory issue as a result? Thank you in advance.
@Hussam: It wasn't really clear to me, either. I didn't care enough to compile custom patched kernels on Arch Linux with every new release, and it *sounded* more cosmetic so I just let it go. Guessing they'll merge this at some point. Sorry I can't help understand more, as I don't entirely understand myself!
Ok, I am rebuilding the Arch Linux stock 4.8.2 kernel with this patch to check if it still does what it is supposed to. The difficulty in non-merged patches if whether they will still apply and work in new major kernel versions.