Bug 61891

Summary: Cannot switch off Radeon 6400M with vgaswitcheroo
Product: ACPI Reporter: madcatx
Component: Config-HotplugAssignee: Rafael J. Wysocki (rjw)
Status: CLOSED CODE_FIX    
Severity: normal CC: airlied, alexdeucher, lenb, lorenzo.stanco, madcatx, matttbe, mike, q, rjw, rjw, samsagax, sanjay.ankur, thad.fisch, tiagdtd-lava, trufanovan
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.12-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg
Error on hard freeze
dmesg from 3.12 final
Acpidump for HP ProBook4730s
ACPIPHP / radeon: Avoid removing devices that are not really gone
ACPIPHP / radeon: Avoid removing devices that are not really gone, v2
Dmesg with fix v2 applied.
ACPIPHP / radeon: Avoid removing devices that are not really gone, v3
Dmesg with patch 3
Dmesg from SysRescueCD
dmesg with v3 fix applied on clean 3.13-rc5
Output of "ls -lR /sys/devices/LNXSYSTM\:00/"
Output of "ls -lR /sys/devices/LNXSYSTM\:00/"
ACPIPHP / radeon: Debug switcheroo problem
Dmesg with reserved 26
Dmesg with runpm off - so switcheroo is enabled propery
Dmesg with "Debug" patch applied
ACPIPHP / radeon: Debug (and possibly fix) switcheroo problem
Seems to work :D
Dmesg with "v4" fix applied
ACPIPHP / radeon: Fix VGA switcheroo problem related to hotplug events
Works great
7970m+3.13rc5+acipphp+rafael's patch
Also acpidump from 7970M
Patch to fix the missed ignore_hotplug flag on some radeon pci devices
PCI / hotplug: Propagate the "ignore hotplug" setting to parent
PCI / hotplug / ACPI: Check ignore_hotplug for devices without ACPI companions
dmesg

Description madcatx 2013-09-22 14:36:33 UTC
Created attachment 109251 [details]
dmesg

I use systemd's tmpfiles to power off the Radeon DIS early during boot to save power. This works fine with kernel 3.11.1 but it breaks with 3.12-rc1. I removed the systemd rule and tried to power the card off manually (echo OFF > /sys/kernel/debug/vgaswitcheroo/switch). I got a kernel warning and vga_switcheroo died. See the attached file for full dmesg dump.
Comment 1 Alex Deucher 2013-09-22 14:38:41 UTC
Can you bisect?
Comment 2 Dave Airlie 2013-09-22 21:07:20 UTC
wierd I'm guessing something in acpi is causing a hot unplug we haven't seen before, or did we hook up the radeon release method and just see this now?
Comment 3 madcatx 2013-09-22 23:44:12 UTC
Created attachment 109261 [details]
Error on hard freeze
Comment 4 madcatx 2013-09-22 23:46:54 UTC
Bisecting of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git revealed that the first bad commit is "bbd34fcdd1b201e996235731a7c98fd5197d9e51". I was getting hard freezes from which the only escape was a hard reset during bisecting, see the attached image for the error message.
Comment 5 madcatx 2013-09-26 01:09:34 UTC
Kernel 3.12-rc2 seems to fix the issue for me. (The kernel still crashes with the patches for powerxpress dynamic power switching, but that's obviously another story).
Comment 6 madcatx 2013-11-05 20:27:05 UTC
Created attachment 113521 [details]
dmesg from 3.12 final
Comment 7 madcatx 2013-11-05 20:28:10 UTC
I just updated to 3.12 final and it is happening again. The problem seemed to be fixed in 3.12-rc2 and I switched back to stable releases then. New dmesg log attached...
Comment 8 Alex Deucher 2013-11-05 21:01:03 UTC
Can you bisect?
Comment 9 Matthieu Baerts 2013-11-23 13:29:56 UTC
Hello,

As Alex said on FreeDesktop's bugzilla, it looks like this bug might be a duplicate of this one: https://bugs.freedesktop.org/show_bug.cgi?id=70687

There, we can find another call trace of this crash and a bisect. Maybe it can help :-)
Comment 10 Alex Deucher 2013-12-13 14:32:58 UTC
see also:
https://bugs.freedesktop.org/show_bug.cgi?id=71930
Comment 11 madcatx 2013-12-15 23:38:16 UTC
I tried to revert my git repo to 3.12-rc2 tag to see if I could do another round of bisecting but it turns out that even 3.12-rc2 was broken. I don't know I missed that. It seems that the previously pinpointed commit is the cause. Unfortunately the commit won't revert cleanly no matter what I try...
Comment 12 Alex Deucher 2013-12-24 19:48:43 UTC
See also:
https://bugzilla.kernel.org/show_bug.cgi?id=65761
https://bugs.freedesktop.org/show_bug.cgi?id=70687

From:
https://bugs.freedesktop.org/show_bug.cgi?id=70687

"I seem to have discovered the root of the issue.

I've just built 3.13-rc5 kernel which has the dynamic powering of the discrete gpu and all hell broke loose.

I've narrowed the error down to the pci hotplug driver. My machine loads shpchp pci hotplug driver from what I can see in lsmod output. But the trick is, that there is another pci hotplug driver, acpi pci hotplug one, which seems to break all hell loose here. Disabling it seems to fix everything for me, at least on kernel 3.13.

# CONFIG_HOTPLUG_PCI_ACPI is not set

This kernel config option is the culprit for this, and that also can be seen from my backtrace:

[   22.731998]  [<ffffffff81343cb1>] ? acpiphp_check_bridge+0x72/0x88

So the trick behind this is that acpi pci hotplug driver conflicts with shpchp one that my machine uses. And since it is a builtin driver, and can't be built as module it is always loaded. The other possibility is that this machine doesn't support acpi hotplug, but does support shpc pci hotplug. We need a kernel workarround so that acpi pci hotplug is disabled and out of the way when shpc pci hotplug is enabled."

Rafael, any ideas?
Comment 13 madcatx 2013-12-27 09:49:46 UTC
"acpiphp.disable=1" in the kernel bootline fixes the problem for me. The Radeon is reported as off in vgaswitcheroo and the laptop draws less power. The DIS even powers up and down correctly with DRI_PRIME.

Acpidump provided
Comment 14 madcatx 2013-12-27 09:50:31 UTC
Created attachment 119691 [details]
Acpidump for HP ProBook4730s
Comment 15 Rafael J. Wysocki 2013-12-27 22:59:45 UTC
Yes, this most likely is related to PCI hotplug, because ACPIPHP now handles devices it didn't try to handle before.  This means that if there are ACPI hotplug events for those devices, it will try to handle them.

What happens is probably that there is a bus check or device check causing ACPIPHP to rescan the bus and during that bus rescan it finds a device that doesn't respond (no wonder), so it decides that the device has gone and tries to remove it.

The solution might be to tell ACPIPHP somehow that the device in question didn't really go away.  Or to ignore that device entirely.

I guess we may use a flag in struct acpi_device set for the graphics adapter's ACPI companion by the radeon driver during probe.  Or something like that.
Comment 16 Rafael J. Wysocki 2013-12-27 23:07:54 UTC
madcatx@atlas.cz: Can you please check if reverting commit ab1225901da2 makes any difference for you?
Comment 17 Rafael J. Wysocki 2013-12-27 23:15:33 UTC
Can anyone please point me to the switcheroo code removing power from the radeon device?
Comment 18 Alex Deucher 2013-12-27 23:17:59 UTC
See drivers/gpu/drm/radeon/radeon_atpx_handler.c
radeon_atpx_set_discrete_state() is the specific function that calls the ACPI method to power off the dGPU.
Comment 19 Rafael J. Wysocki 2013-12-27 23:23:21 UTC
Thanks.  And where is atpx->handler set?
Comment 20 Rafael J. Wysocki 2013-12-27 23:25:07 UTC
OK, I see.  The method is called "ATPX" and I suppose it is device-specific?
Comment 21 Alex Deucher 2013-12-27 23:26:17 UTC
ATPX is the AMD specific switching interface for AMD/AMD and AMD/Intel PowerXpress laptops.
Comment 22 Alex Deucher 2013-12-27 23:27:52 UTC
Nvidia/Nvidia and Nvidia/Intel laptops use a different interface (called DSM I think).
Comment 23 madcatx 2013-12-27 23:41:50 UTC
I created a patch file for commit ab1225901da2 ("Revert ACPI hotplug...") and applied it with -R onto 3.13-rc5 source, but it didn't change anything.
Comment 24 Rafael J. Wysocki 2013-12-27 23:43:23 UTC
Created attachment 119791 [details]
ACPIPHP / radeon: Avoid removing devices that are not really gone

Maybe something like this helps (for radeon).

Totally untested, may kill your hamster pet.
Comment 25 Rafael J. Wysocki 2013-12-27 23:47:33 UTC
No, it won't help, sorry.
Comment 26 Rafael J. Wysocki 2013-12-28 00:01:52 UTC
Created attachment 119801 [details]
ACPIPHP / radeon: Avoid removing devices that are not really gone, v2

Please try this one instead (hamster pet disclaimer still applies).
Comment 27 madcatx 2013-12-28 00:45:21 UTC
Applied onto 3.13-rc5, unfortunately this does not fix the problem (dmesg attached). Also KDM failed to start with this patch applied, I didn't try to start X manually...
Comment 28 madcatx 2013-12-28 00:45:55 UTC
Created attachment 119811 [details]
Dmesg with fix v2 applied.
Comment 29 Rafael J. Wysocki 2013-12-28 01:00:26 UTC
Created attachment 119821 [details]
ACPIPHP / radeon: Avoid removing devices that are not really gone, v3

This is a slightly modified version of the patch that should give us a bit more debug information.

Please apply it instead of the previous one, retest and attach dmesg.
Comment 30 Mike Lothian 2013-12-28 01:31:16 UTC
Created attachment 119831 [details]
Dmesg with patch 3

My whole system stutters when the card is being powered up / down

This is with runpm=1
Comment 31 Mike Lothian 2013-12-28 01:36:54 UTC
Yikes after testing this patch the laptop complains that there is no boot disk attached to the system.

I'm trying sysrescuecd now
Comment 32 Mike Lothian 2013-12-28 02:10:05 UTC
Created attachment 119841 [details]
Dmesg from SysRescueCD

I'm thinking either something has happened to the SSD or the controller

Also when X starts the screen goes blank and I don't know if the system remains responsive or not
Comment 33 Mike Lothian 2013-12-28 02:22:06 UTC
The drive shows up fine on another system so it looks like something's happened to the controller or its not being initialized properly. Is there anything obvious in the above dmesg that might suggest the problem
Comment 34 Mike Lothian 2013-12-28 02:57:06 UTC
Looks like leaving the laptop unplugged with the battery out for a wee while sorted the issue (phew)

Are there any other patches that need testing?
Comment 35 madcatx 2013-12-28 10:36:14 UTC
Created attachment 119861 [details]
dmesg with v3 fix applied on clean 3.13-rc5
Comment 36 Rafael J. Wysocki 2013-12-28 12:14:34 UTC
Thanks!  Evidently, the power_removed flag is set for a wrong device (i.e. not the one the hotplug events are signaled for).

Please attach the output of "ls -lR /sys/devices/LNXSYSTM\:00/" from your system.
Comment 37 madcatx 2013-12-28 12:18:16 UTC
Created attachment 119871 [details]
Output of "ls -lR /sys/devices/LNXSYSTM\:00/"
Comment 38 Rafael J. Wysocki 2013-12-28 12:21:20 UTC
@Mike: I have no idea why the system behaves like that with the patch applied, sorry about that.

At this point I need to figure out how this all thing is supposed to work and that doesn't appear to be straightforward.
Comment 39 Mike Lothian 2013-12-28 12:22:52 UTC
Created attachment 119881 [details]
Output of "ls -lR /sys/devices/LNXSYSTM\:00/"

In case mine is different
Comment 40 Rafael J. Wysocki 2013-12-28 12:40:00 UTC
(In reply to madcatx from comment #37)
> Created attachment 119871 [details]
> Output of "ls -lR /sys/devices/LNXSYSTM\:00/"

Thanks!

Please do:
cat /sys/devices/LNXSYSTM\:00/device\:00/PNP0A08\:00/LNXVIDEO\:01/path
cat /sys/devices/LNXSYSTM\:00/device\:00/PNP0A08\:00/device\:01/LNXVIDEO\:00/path

and post the output.
Comment 41 Rafael J. Wysocki 2013-12-28 12:45:32 UTC
(In reply to Mike Lothian from comment #39)
> Created attachment 119881 [details]
> Output of "ls -lR /sys/devices/LNXSYSTM\:00/"
> 
> In case mine is different

It is different, but the layout seems to be analogous.

In your case the files of interest are:
/sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:2f/LNXVIDEO:00/path
/sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:01/path

In both cases LNXVIDEO:00 and LNXVIDEO:01 seem to be the Radeon and the Intel graphics, respectively.

If that's the case, the attached dmesg output means that the power_removed flag in the patch is set for the Intel graphics, but the hotplug event is generated for the Radeon.  I'm not sure why at the moment.
Comment 42 Rafael J. Wysocki 2013-12-28 12:53:17 UTC
OK, so GFX0 is the Intel graphics and that's the one having the ATPX method.
Comment 43 madcatx 2013-12-28 12:55:31 UTC
"cat /sys/devices/LNXSYSTM\:00/device\:00/PNP0A08\:00/LNXVIDEO\:01/path"
\_SB_.PCI0.GFX0

"cat /sys/devices/LNXSYSTM\:00/device\:00/PNP0A08\:00/device\:01/LNXVIDEO\:00/path"
\_SB_.PCI0.PEGP.DGFX
Comment 44 Mike Lothian 2013-12-28 13:03:40 UTC
These are my two:

cat /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:2f/LNXVIDEO:00/path
\_SB_.PCI0.PEG0.PEGP

cat /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:01/path
\_SB_.PCI0.GFX0
Comment 45 Rafael J. Wysocki 2013-12-28 13:24:37 UTC
Created attachment 119891 [details]
ACPIPHP / radeon: Debug switcheroo problem

OK, thanks!  Everything is consistent at least. :-)

This patch doesn't fix anything yet, it just should set the no_hotplug flag for the radeon device (which ACPIPHP will be able to use later) during switcheroo detection.

Please apply it, boot the kernel and send dmesg.
Comment 46 Mike Lothian 2013-12-28 13:28:29 UTC
It's not applying cleanly:

patching file drivers/gpu/drm/radeon/radeon_atpx_handler.c
patching file include/acpi/acpi_bus.h
Hunk #1 FAILED at 163.
1 out of 1 hunk FAILED -- saving rejects to file include/acpi/acpi_bus.h.rej

Does it need to be applied in conjunction with one of the other patches?
Comment 47 madcatx 2013-12-28 13:29:20 UTC
Should we apply it over the "v3" fix attempt or on a clean source?
Comment 48 Mike Lothian 2013-12-28 13:30:57 UTC
Looks like the issue is

u32 reserved:27;

In your patch its 25 changing to 24
Comment 49 Mike Lothian 2013-12-28 13:35:26 UTC
Created attachment 119901 [details]
Dmesg with reserved 26

This is the dmesg with reserved 26 - I'm guessing the number is decremented every time you add a new option
Comment 50 Rafael J. Wysocki 2013-12-28 13:39:58 UTC
(In reply to madcatx from comment #47)
> Should we apply it over the "v3" fix attempt or on a clean source?

Clean source, but I forgot I had some more patches on top of the Linus' tree applied.

Mike did that right.
Comment 51 Mike Lothian 2013-12-28 13:40:36 UTC
Created attachment 119911 [details]
Dmesg with runpm off - so switcheroo is enabled propery
Comment 52 madcatx 2013-12-28 14:01:25 UTC
Created attachment 119921 [details]
Dmesg with "Debug" patch applied
Comment 53 Rafael J. Wysocki 2013-12-28 14:03:59 UTC
Created attachment 119931 [details]
ACPIPHP / radeon: Debug (and possibly fix) switcheroo problem

OK, thanks!

This patch contains the ACPIPHP part too, so hopefully it will help.

Please apply instead of the previous one (should apply cleanly on top of 3.13-rc5), retest and report back (please attach dmesg after a single attempt to switch graphics in any case).
Comment 54 Mike Lothian 2013-12-28 14:20:17 UTC
Created attachment 119941 [details]
Seems to work :D

That seems to work for me with radeon.runpm=1

So the system successfully powers up the card only when DRI_PRIME=1 is set when running an application (after xrandr --setprovideroffloadsink radeon Intel)

Thanks for this!

Do you think it'll land in 3.13?
Comment 55 madcatx 2013-12-28 14:26:56 UTC
Created attachment 119951 [details]
Dmesg with "v4" fix applied

Brilliant! This seems to work for me too. I can finally drop below 10 Watts again:) I'll do some more testing and report back if I come across anything odd. Thanks for taking care of this.
Comment 56 Rafael J. Wysocki 2013-12-28 14:59:10 UTC
Created attachment 119961 [details]
ACPIPHP / radeon: Fix VGA switcheroo problem related to hotplug events

Thanks for testing!

That was a debug-only version of the patch, though, because the no_hotplug flag also needs to be checked in trim_stale_devices().  The attached one is a candidate for the final version (in addition to extending the ACPIPHP changes I removed the debug output from it and added a comment explaining what's going on to radeon_atpx_detect().

Please test this one and report back.
Comment 57 Mike Lothian 2013-12-28 15:08:28 UTC
Created attachment 119971 [details]
Works great

Powering up and down automatically just fine

Feel free to add my tested by
Comment 58 Alex Deucher 2013-12-28 15:32:31 UTC
Looks like we need a similar patch for DSM on nvidia laptops.  See bug 64891.
Comment 59 madcatx 2013-12-28 15:59:58 UTC
Everything seems fine with the hopefully-final version. Good job!
Comment 60 Rafael J. Wysocki 2013-12-28 16:36:20 UTC
OK, I'll send the patch to mailing lists later today.  Many thanks to everyone involved!

@Alex: I'll have a look at that one too.
Comment 61 Jack 2013-12-28 17:22:11 UTC
Hey guys

I'm running into an issue with my 7970M/Intel muxless in which the discrete GPU doesn't actually power down once I've started X.

With acpiphp disabled I don't get the errors that were indicated in https://bugzilla.kernel.org/show_bug.cgi?id=65761 however vgaswitcheroo/switch remainds on DynPwr and never goes to DynOff (until I kill X, anyway)

I tried Rafael's patch in the hopes that this might resolve the issue, but it doesn't seem to have done so -- still stuck in DynPwr when X is started.

Is this patch specific for certain Radeon models, ie, would it not work with radeonsi? My first guess was no it shouldn't, but I don't know all that much s I figured I'd ask. :)

Thanks!
Comment 62 Jack 2013-12-28 17:25:40 UTC
Created attachment 119991 [details]
7970m+3.13rc5+acipphp+rafael's patch

This is the dmesg output from my 7970M with 3.13rc5 + acpiphp.disable + rafael's latest patch from this bug report
Comment 63 Jack 2013-12-28 17:26:53 UTC
Also both radeon.dpm=1 and radeon.runpm=1 were set in grub
Comment 64 Jack 2013-12-28 17:31:08 UTC
Created attachment 120001 [details]
Also acpidump from 7970M
Comment 65 Rafael J. Wysocki 2013-12-28 21:00:46 UTC
(In reply to Jack from comment #61)
> Hey guys
> 
> I'm running into an issue with my 7970M/Intel muxless in which the discrete
> GPU doesn't actually power down once I've started X.

The patch in this entry only fixes the problem where ACPIPHP is involved.  Moreover, I suppose that the failing removal of radeon may also play a role here, so that bug should be addressed first.  Please continue to use bug #65761 to track the issues you have reported.
Comment 66 Rafael J. Wysocki 2013-12-28 22:01:39 UTC
Patch submitted to mailing lists: https://patchwork.kernel.org/patch/3414401/
Comment 67 Rafael J. Wysocki 2013-12-31 02:01:53 UTC
Mike, madcatx, since the problem w/ nouveau in bug #64891 is slightly different, I modified the patch slightly and the current version is at:

https://bugzilla.kernel.org/attachment.cgi?id=120381&action=diff

Can you please double check if it still fixes the problem for you?
Comment 68 Mike Lothian 2013-12-31 03:39:44 UTC
Hi Rafael, yes the patch still works - I've not done any HDMI testing at all though - would you like me to hook my laptop up to the TV? 

Lastly I think I can see a bug / warning appear whilst the system is shutting down - unfortunately due to systemd being so quick I don't actually see the error - everything is compiled in on my system if that makes a difference
Comment 69 madcatx 2013-12-31 09:49:31 UTC
Everything is looking fine with 3.13-rc6 and the latest fix.

@Mike: Perhaps journald will have the problem logged?
Comment 70 Rafael J. Wysocki 2013-12-31 12:37:26 UTC
(In reply to Mike Lothian from comment #68)
> Hi Rafael, yes the patch still works - I've not done any HDMI testing at all
> though - would you like me to hook my laptop up to the TV? 

No, thanks, that's fine.

Thanks for testing!
Comment 71 Len Brown 2014-11-03 22:59:55 UTC
Unclear if this issue is resolved or not,
because parts of patch above were applied, then reverted:

commit f91ce35e471ae17552ce7bfe355cfd997e3ad781
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Wed Sep 10 15:30:08 2014 -0600
Follows: v3.17-rc2
Precedes: v3.17-rc6

    ACPIPHP / radeon / nouveau: Remove acpi_bus_no_hotplug()
    
    Revert parts of f244d8b623da ("ACPIPHP / radeon / nouveau: Fix VGA
    switcheroo problem related to hotplug").
    
    A previous commit 5493b31f0b55 ("PCI: Add pci_ignore_hotplug() to ignore
    hotplug events for a device") added equivalent functionality implemented in
    a different way for both acpiphp and pciehp.
    
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: Dave Airlie <airlied@redhat.com>
    Acked-by: Rajat Jain <rajatxjain@gmail.com>

Is 3.17 working?
Comment 72 Joaquín Aramendía 2014-12-01 13:11:15 UTC
Len: I had some issues starting from v3.16.4 that introduced a reimplementation of Rafael's patch.
Could you test if #86011 is affecting you and if the patch in comments fixes it?
Comment 73 Alex Deucher 2015-02-03 15:36:38 UTC
This appears to have been broken again in 3.19.  See:
https://bugs.freedesktop.org/show_bug.cgi?id=88927
Comment 74 Rafael J. Wysocki 2015-02-03 20:45:20 UTC
On Tuesday, February 03, 2015 03:36:38 PM bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=61891
> 
> --- Comment #73 from Alex Deucher <alexdeucher@gmail.com> ---
> This appears to have been broken again in 3.19.  See:
> https://bugs.freedesktop.org/show_bug.cgi?id=88927

This appears to be a different bug, in the PCI core somewhere this time.

Please open a new entry for it.
Comment 75 madcatx 2015-02-03 21:01:36 UTC
Does this work in 3.18? I might be able to borrow a Intel/AMD machine and bisect as long as there is a reasonable amount of commits to go through.
Comment 76 Alex Deucher 2015-02-03 21:06:31 UTC
(In reply to Rafael J. Wysocki from comment #74)
> On Tuesday, February 03, 2015 03:36:38 PM
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=61891
> > 
> > --- Comment #73 from Alex Deucher <alexdeucher@gmail.com> ---
> > This appears to have been broken again in 3.19.  See:
> > https://bugs.freedesktop.org/show_bug.cgi?id=88927
> 
> This appears to be a different bug, in the PCI core somewhere this time.
> 
> Please open a new entry for it.

Looks like there already is one:
https://bugzilla.kernel.org/show_bug.cgi?id=89731
Comment 77 tiagdtd-lava 2015-02-15 11:34:22 UTC
Created attachment 166951 [details]
Patch to fix the missed ignore_hotplug flag on some radeon pci devices

I still think that the bug https://bugs.freedesktop.org/show_bug.cgi?id=88927 is related to the original issue.

I'm the original reporter of the freedesktop bug and I did some more testing on my machine. As far as I can tell the acpiphp_glue.c:slot_no_hotplug function doesn't go deep enough to check the flag 'ignore_hotplug' set by the radeon driver.

Other functions go through the pci_dev->subordinate devices as well. I made a small patch to try this approach for the slot_no_hotplug function and it fixed this problem on my machine.

I'm no kernel developer and I don't have any knowledge on the pci driver system, but maybe my small patch can help to make a proper fix for this problem.
Comment 78 Rafael J. Wysocki 2015-03-06 22:55:46 UTC
Created attachment 169601 [details]
PCI / hotplug: Propagate the "ignore hotplug" setting to parent

Can you please check if this patch helps too?
Comment 79 Len Brown 2015-04-07 14:55:43 UTC
no response in a month.
please re-open if this is still an unresolved issue.
Comment 80 tiagdtd-lava 2015-04-12 12:18:02 UTC
(In reply to Rafael J. Wysocki from comment #78)
> Created attachment 169601 [details]
> PCI / hotplug: Propagate the "ignore hotplug" setting to parent
> 
> Can you please check if this patch helps too?

Sorry for the late answer.
Yes, this is still an issue (just tried the latest kernel version 4.0.0-rc7).

I tried your patch (attachment 169601 [details]) and it also fixes the problems on my machine.
Comment 82 Lorenzo S. 2015-05-01 21:29:59 UTC
I also tried the last patch from Rafael J. Wysocki, on kernel 3.19.0-15-generic on Xubuntu 15.04. Looks like it's working for me too, my laptop is an Acer Aspire TimelineX 4820TG with mixed integrated Intel graphics and ATI Mobility Radeon HD5470.
Comment 83 Rafael J. Wysocki 2015-05-19 22:29:27 UTC
Created attachment 177371 [details]
PCI / hotplug / ACPI: Check ignore_hotplug for devices without ACPI companions

Maybe we don't need to propagate ignore_hotplug to parents after all.

Anyone with a reproducer, can you please check if this patch helps too?
Comment 84 Lorenzo S. 2015-05-20 06:43:11 UTC
> Anyone with a reproducer, can you please check if this patch helps too?

I will asap. Should I try your last patch alone, or I should try it on top of 169601 I'm currently using?
Comment 85 Rafael J. Wysocki 2015-05-20 22:07:07 UTC
Alone, please.  Applies on top of 4.1-rc4, not sure about earlier kernels.
Comment 86 Lorenzo S. 2015-05-23 00:36:46 UTC
(In reply to Rafael J. Wysocki from comment #85)
> Alone, please.  Applies on top of 4.1-rc4, not sure about earlier kernels.

Unfortunately I'm able to test only on top of Ubuntu kernel, so I tested the patch alone on top of 3.19.0-18. It does not work, the symptoms are the same of the unpatched kernel (which were originally described here https://bugs.freedesktop.org/show_bug.cgi?id=88927).

I'm reverting to the kernel with patch 169601, which didn't gave me any problem since I tried it on the beginning of May.

Let's wait tiagdtd-lava for his test, maybe he's able to try on top of latest kernel rc.
Comment 87 Rafael J. Wysocki 2015-05-23 00:53:35 UTC
Thanks for the testing, this should be fine.

I believe we should just use the patch from Comment #78 then.
Comment 88 Alexander Trufanov 2015-06-11 08:05:49 UTC
Hi, 
The patch proposed at Comment #78 helped me a lot.
I don't want to update my linux core till it be fixed in it.
Does anyone know if currently distributed linux-image-3.19.0-20-generic contains this fix? If not, when this expected to happen? If this is unknown, then what to track to notice this moment?
Thanks.
Comment 89 Alex Deucher 2015-07-02 18:00:45 UTC
Ping.  Rafael, any chance you can send the fix upstream?
Comment 90 Thaddaeus Tintenfisch 2015-07-06 11:31:44 UTC
Created attachment 182001 [details]
dmesg

I am encountering a similar acpiphp bug with my Mobility Radeon HD 4330 [RV710] after applying the patch from a report which I filed against the radeon driver:
https://bugs.freedesktop.org/show_bug.cgi?id=61529

The patch from this report (comment #78) does not resolve the problem in my case.
Comment 91 Len Brown 2015-07-21 16:03:11 UTC
the patch from comment #78 shipped in Linux 4.2-rc1:

commit 0824965140fff1bf640a987dc790d1594a8e0699
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Mon Apr 13 16:23:36 2015 +0200

    PCI: Propagate the "ignore hotplug" setting to parent

--
re: comment #90
if I understand it, that is a similar bug, but not the same as this,
and that one is fixed by a patch in the referenced radeon bug report.
So I'm closing this bug -- please re-open if I mis-understood.
Comment 92 Len Brown 2015-07-21 17:16:15 UTC
*** Bug 67461 has been marked as a duplicate of this bug. ***