Bug 86121 - Regression: NVIDIA backlight not working after latest vgaarb change
Summary: Regression: NVIDIA backlight not working after latest vgaarb change
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-12 14:55 UTC by Petri Hodju
Modified: 2016-10-28 20:52 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.16.5
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
kernel boot logs & config (81.07 KB, application/gzip)
2014-10-12 14:55 UTC, Petri Hodju
Details
dmesg log (good - 7babfd7f) (72.25 KB, text/plain)
2014-10-13 17:08 UTC, Bjorn Helgaas
Details
dmesg log (bad - ce027dac5) (71.25 KB, text/plain)
2014-10-13 17:12 UTC, Bjorn Helgaas
Details
'lspci -v' of the system (10.61 KB, text/x-log)
2014-10-15 14:20 UTC, Petri Hodju
Details
more details on backlight control registration (86.54 KB, application/gzip)
2014-10-16 21:06 UTC, Petri Hodju
Details
3.16.4 vs. 3.16.5 with only EFIFB and with EFIFB + SYSFB (364.70 KB, application/gzip)
2014-10-25 20:34 UTC, Petri Hodju
Details
dmesg and lspci logs after boot and suspend/resume (102.70 KB, application/gzip)
2014-12-22 20:34 UTC, Petri Hodju
Details
loading gmux/i915/nvidia & gmux/i915/nouveau (202.55 KB, application/gzip)
2015-01-06 13:23 UTC, Petri Hodju
Details
Lock IO+MEM of boot_vga in apple-gmux (2.05 KB, patch)
2015-01-08 20:35 UTC, Bruno Prémont
Details | Diff
dmesg logs with IO+MEM locked to boot GPU (167.80 KB, application/gzip)
2015-02-04 04:44 UTC, Petri Hodju
Details
Lock IO+MEM of (first) IO decoding GPU (3.57 KB, patch)
2015-02-09 20:44 UTC, Bruno Prémont
Details | Diff
working backlight (125.50 KB, application/gzip)
2015-02-11 21:23 UTC, Petri Hodju
Details
loading apple_gmux multiple times (1.59 KB, application/gzip)
2015-02-17 19:57 UTC, Petri Hodju
Details

Description Petri Hodju 2014-10-12 14:55:58 UTC
Created attachment 153311 [details]
kernel boot logs & config

I have experienced a regression in backlight control with NVIDIA binary blob driver and kernel 3.16.5. I did a bisect between 3.16.4 and 3.16.5 and found out that the regression happens with commit "ce027dac vgaarb: Don't default exclusively to first video device with mem+io"

I have attached kernel boot log with working backlight (commit 7babfd7f) and non-working one (commit ce027dac) plus the config used.

I'm running Ubuntu 14.10
Comment 1 Bjorn Helgaas 2014-10-13 17:08:08 UTC
Created attachment 153451 [details]
dmesg log (good - 7babfd7f)

Extracted from attachment#153311 [details] (timestamps removed).
Comment 2 Bjorn Helgaas 2014-10-13 17:12:03 UTC
Created attachment 153461 [details]
dmesg log (bad - ce027dac5)

Extracted from attachment#153311 [details] (timestamps removed).

Diffs that look relevant (-good +bad):

@@ -692,9 +692,11 @@
 ACPI: PCI Interrupt Link [LNKH] (IRQs 1 3 4 5 6 7 11 12 14 15) *0, disabled.
 ACPI: Enabled 3 GPEs in block 00 to 3F
 ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
+vgaarb: setting as boot device: PCI:0000:00:02.0
 vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
 vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=none,locks=none
 vgaarb: loaded
+vgaarb: overriding boot device: PCI:0000:01:00.0
 vgaarb: bridge control possible 0000:01:00.0
 vgaarb: no bridge control possible 0000:00:02.0
 SCSI subsystem initialized
@@ -801,7 +803,6 @@
 UDP hash table entries: 4096 (order: 5, 131072 bytes)
 UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
 NET: Registered protocol family 1
-pci 0000:00:02.0: Boot video device
 pci 0000:00:14.0: enabling device (0000 -> 0002)
 pci 0000:00:14.0: can't derive routing for PCI INT A
 pci 0000:00:14.0: PCI INT A: no GSI
Comment 3 Bruno Prémont 2014-10-13 17:30:41 UTC
Could you please provide lspci for your system as well as the content of /sys/bus/pci/devices/*/boot_vga in both good (3.16.4) and bad case?
Looking at your logs you have two GPUs in there, one being integrated Intel graphics I guess and the other one the NVIDIA discrete GPU.

According to your kernel logs (lines added by "bad") your boot framebuffer is on the discrete NVIDIA GPU while the good case just states the intel GPU would be boot GPU.

Your binary blob is not telling anything useful though. Could you share your Xorg log for both cases (it may contain more information)?


Note, your kernel config is enabling both CONFIG_FB_EFI and CONFIG_X86_SYSFB.

Please try 3.16.4 with CONFIG_FB_EFI=y and CONFIG_X86_SYSFB unset - you should get the same result than the bad config. Unsetting both should get you back to working state (probably for you working state is boot_vga=1 for intel GPU and bad state is boot_vga=1 for NVIDIA GPU).

Note, you might need to access your system via ssh/network if Xorg does not start properly while not having any framebuffer console.

Check if loading i915 makes any difference.
Comment 4 Petri Hodju 2014-10-15 14:20:10 UTC
Created attachment 153871 [details]
'lspci -v' of the system
Comment 5 Petri Hodju 2014-10-15 14:34:20 UTC
As you suspected the boot_vga for 00:02.0 (Intel) is 1 and for 01:00.0 (NVIDIA) it is 0 in the good case and vice versa in the bad case. In both cases everything else works just fine except the backlight control. I'm running the cases again and attach the Xorg logs shortly. One remark is that I have explicitly set the BusID to PCI:1:0:0 in the xorg.conf to get the X running correctly with NVIDIA GPU. I will do one run with bad and good cases without BusID set and attach the Xorg logs too.
Comment 6 Bruno Prémont 2014-10-15 18:09:07 UTC
I would expect that NVIDIA binary driver might have better chances selecting/using the GPU with 3.16.5 (without mentioning BusID).

Looking at your lspci i915 is loaded and active (though I did't see it in your kernel logs).
Does loading it (and having X able to switch between GPUs using GMUX or however the optimus equivalent is called on Apple systems) improve or influence the way your backlight controls work?

Did you check for registration of backlights on kernel side (see /sys/class/backlight/) and the eventual backlight properties reported by X via randr?

It would be good to know how backlight shows up with 3.16.4 and how it does with 3.16.5 (and how exactly it fails there).
Comment 7 Petri Hodju 2014-10-16 21:06:08 UTC
Created attachment 154021 [details]
more details on backlight control registration

I did some rmmod / modprobe steps with both the good case (3.16.4+-7babfd7f) and new bad case (3.16.6-acfaf475) and checked out what demsg gives out and what is seen under /sys/class/backlight/

In the good case after removing the apple_gmux module, generic ACPI backlight controls were loaded for both Intel and NVIDIA cards (see the 06-sys-class-backlight.log in the 3.16.4+ case) and using the 'acpi_video0' I was also able to control the backlight just fine. Reloading the apple_gmux gave back the gmux_backlight control, which then continued to work correctly.

In the bad case removing the apple_gmux module left the /sys/class/backlight empty. Reloading it spitted out 'gmux device not present' even though it was
succesfully reported in the intial boot up (apple_gmux: Found gmux version 3.2.19 [indexed])
Comment 8 Bruno Prémont 2014-10-19 09:00:55 UTC
Looking at attachment #154021 [details] it seems that some ACPI/VIDEO and GMUX behavior changes are triggered though what/why exactly is hard to guess.


Did you try running 3.16.4 with CONFIG_X86_SYSFB unset (also unsetting CONFIG_FB_SIMPLE) and CONFIG_FB_EFI=y? If this config works the same as 3.16.5 at least the patch identified in bisection did not inherently break something but just surface/force a problematic configuration.



In order to get a better idea of what effectively happens, please report all the resources associated to GMUX, PNP device 'APP000B' (dumping /proc/iomem and /proc/ioports should be sufficient if APP000B can be identified there).


Also insert pr_debug() call into gmux_is_indexed() function in drivers/platform/x86/apple_gmux.c and let all pr_debug() calls in there reach dmesg by making sure CONFIG_DYNAMIC_DEBUG=y and following guidelines in https://www.kernel.org/doc/Documentation/dynamic-debug-howto.txt
Probably this should do it:
 echo 'file apple_gmux.c +p' > $debugfs/dynamic_debug/control

The interesting details in gmux_is_indexed() are (especially when loading apple_gmux):
  gmux_data->iostart
  inb(gmux_data->iostart + 0xcc)
  inb(gmux_data->iostart + 0xcd)
at start of function and
  val in case it's different from 0x55aa

From the error printed by apple_gmux on second modprobe it seems gmux state does not match expected state anymore (e.g. val differs from 0x55aa).


Why the acpi_video0 backlight device (and it seems the whole ACPI VIDEO interface) does not work/show up as expected would be good to understand as well (a dump of ACPI DSDT might give us a clue on this):
  cat /sys/firmware/acpi/tables/DSDT > dsdt.dat
and eventually decompile it:
  iasl -d dsdt.dat
to produce a dsdt.dsl file.
Comment 9 Petri Hodju 2014-10-25 20:34:03 UTC
Created attachment 155101 [details]
3.16.4 vs. 3.16.5 with only EFIFB and with EFIFB + SYSFB

Looks like you were right that 3.16.4 using only the EFIFB framebuffer doesn't work, but using EFIFB + SYSFB does. With 3.16.5 backlight doesn't work either way.  In the failing cases it seems that gmux_is_indexed() gets value 0xffff from inb(gmux_data->iostart + {0xcc,0xcd}) when trying to load the module second time.

I hope that the dumps in the attachment gives some clue what might be wrong. I'm happy to provide any further information that could help.
Comment 10 Bruno Prémont 2014-10-25 23:15:46 UTC
(In reply to Petri Hodju from comment #9)
> Created attachment 155101 [details]
> 3.16.4 vs. 3.16.5 with only EFIFB and with EFIFB + SYSFB
> 
> Looks like you were right that 3.16.4 using only the EFIFB framebuffer
> doesn't work, but using EFIFB + SYSFB does. With 3.16.5 backlight doesn't
> work either way.  In the failing cases it seems that gmux_is_indexed() gets
> value 0xffff from inb(gmux_data->iostart + {0xcc,0xcd}) when trying to load
> the module second time.
> 
> I hope that the dumps in the attachment gives some clue what might be wrong.
> I'm happy to provide any further information that could help.

Thanks for these details and confirming my assumption that my (two) patches just force the problematic setup and are not changing the behavior originally performed by efifb (when it is not disabled by SYSFB).


From the inb readings in apple_gmux initialization when it fails, it seems something (probably vgaarb) disabled/blocked IO for the gmux (0x0700)

The 0xffff readings of inb indicate that there is no one (anymore) at the 0x0700+0xcc IO port being read and probably vgaarb has somehow disabled the corresponding device or disabled IO on parent PCI bridge.

Comparing lspci -vvv could reveal a difference between different boot stages (or between godd/bad kernels). If you can capture this lspci output prior to loading any graphics/gmux module and after loading them, differences might point out what is needed to get/keep the IO port active.


I will have to study the DSDT in the hope to find some hint on why things go wrong (at least ACPI backlight devices not showing up properly after rmmod apple_gmux), though I guess it's related to the IO port!


If you can get your system to actively switch between both GPUs (with 3.16.4 and using SYSFB or disabling EFIFB) it would be interesting to see if/how switching the GPU affects ability to operate backlight and access gmux IO port. When doing so, check kernel log for further messages about vgaarb changing settings.
I expect it could degrade experience on 3.16.4 (like causing apple_gmux to not initialize anymore) or start working on 3.16.5.
Comment 11 Petri Hodju 2014-12-22 20:34:13 UTC
Created attachment 161641 [details]
dmesg and lspci logs after boot and suspend/resume

One more observation on this. I used now 3.18.0 and after boot the apple_gmux backlight control is loaded but not working. I then do suspend/resume and after wakeup the backlight is working. There is difference in the lspci -vvv before and after suspend.
Comment 12 Bruno Prémont 2014-12-22 21:08:53 UTC
(In reply to Petri Hodju from comment #11)
> Created attachment 161641 [details]
> dmesg and lspci logs after boot and suspend/resume
> 
> One more observation on this. I used now 3.18.0 and after boot the
> apple_gmux backlight control is loaded but not working. I then do
> suspend/resume and after wakeup the backlight is working. There is
> difference in the lspci -vvv before and after suspend.

The biggest difference is that VGA bits on the bridge have changed, thus "reverting" the vgaarb GPU switch triggered by gmux while loading nvidia driver.

So on your system backlight does not work when VGA-arbitration enables VGA routing to your nvidia GPU on the bridge the nvidia GPU is connected to.

The proper fix for you would probably be to inhibit vga arbitration actions (enabling VGA on bridge to nvidia).


On your system, could you check what happens, step by step, checking
backlight function via backlight node under sysfs:
- boot (blacklisting i915 and nouveau)
  backlight should be working
- load gmux
  backlight should be working
- load i915
  backlight should be working
- load nvidia
  backlight should stop working (unless only when X loads)

Redo the same steps using nouveau instead of nvidia.
Comment 13 Petri Hodju 2015-01-06 13:23:18 UTC
Created attachment 162591 [details]
loading gmux/i915/nvidia & gmux/i915/nouveau

Please find attached logs on loading the modules in the order you requested. I did blacklist nouveau, i915, nvidia and apple_gmux and booted directly in console mode avoiding DE to be loaded.

The end results are mostly as you expected. The only noteworthy difference is that straight after boot the /sys/class/backlight/ was empty. After loading the gmux the backlight started to work. Also after loading i915 the backlight was working. After loading nvidia the backlight stopped to work as you expected. Using nouveau instead of nvidia kept the backlight working.

Then I removed gmux from the blacklist and booted to the console. In this case the gmux was visible under the /sys/class/backlight but was not working.
Comment 14 Bruno Prémont 2015-01-08 18:22:03 UTC
The fact that using nouveau does not kill the backlight controls and also does not trigger any vga-arbitration switches is good news.
This probably means that the backlight is in fact controlled by Intel GPU though announced to OS via gmux PNP device (not via legacy VGA IO but still via Intel GPU's IO)

A quick search for VGA arbitration and nvidia revealed the following link:
  https://devtalk.nvidia.com/default/topic/545560/vfio-vga-arbitration-lock/

Maybe that patch would help, though I haven't looked at the sources of nvidia driver.

I'm thinking about letting gmux lock vga_default_device as owning device so nvidia would fail at switching things around. This way I should avoid affecting any third-parties (desktops) that need to do this kind of arbitration with some IGP.

Hopefully I can more or less test that on my AMD APU with secondary NVIDIA GPU and using nvidia blob.
Comment 15 Bruno Prémont 2015-01-08 20:35:24 UTC
Created attachment 162861 [details]
Lock IO+MEM of boot_vga in apple-gmux

Petri, could you test this patch (just compile-tested here)?

I'm interested in dmesg for loading nvidia after apple-gmux with this patch applied.

Possible outcomes I envision:
- no change
- it prevents decode changes toward nvidia GPU and thus keeps backlight working
- loading/initialization of nvidia never completes (could happen if nvidia calls vga_get() instead of vga_tryget())

For dmesg output to be more verbose, please follow guidelines in Documentation/dynamic-debug-howto.txt to enable dynamic debugging and have it generate debug output for both vgaarb and apple-gmux (that is, files: drivers/platform/x86/apple-gmux.c and drivers/gpu/vga/vgaarb.c).
Enable dynamic debug before loading gmux.
Comment 16 Petri Hodju 2015-02-04 04:44:21 UTC
Created attachment 165781 [details]
dmesg logs with IO+MEM locked to boot GPU

Sorry about the delay here. With the patch things took a step back I'm afraid. Loading of the apple_gmux fails now with iostart[{0xcc,0xcd}] giving out 0xffff. It's like you mentioned earlier that looks like the intel side IO is shutdown by someone after the vga arbitration on boot changes to use Nvidia instead of Intel.
Comment 17 Bruno Prémont 2015-02-09 20:44:06 UTC
Created attachment 166231 [details]
Lock IO+MEM of (first) IO decoding GPU

(In reply to Petri Hodju from comment #16)
> Created attachment 165781 [details]
> dmesg logs with IO+MEM locked to boot GPU
> 
> Sorry about the delay here. With the patch things took a step back I'm
> afraid. Loading of the apple_gmux fails now with iostart[{0xcc,0xcd}] giving
> out 0xffff. It's like you mentioned earlier that looks like the intel side
> IO is shutdown by someone after the vga arbitration on boot changes to use
> Nvidia instead of Intel.

Oops, I passed the wrong pdev to vga_tryget()!
In our case boot_vga is the dGPU (nvidia) and I wanted to pass the iGP (intel).

Attached is new patch which should get that right. With this patch we just ensure gmux continues working but we don't fix things in case of detection failure.
Comment 18 Petri Hodju 2015-02-11 21:23:17 UTC
Created attachment 166511 [details]
working backlight

Seems like that was it! Now the gmux backlight control keeps working on my system even after loading the nvidia module in console mode. Thanks for your effort on this! Please find attached dmesg logs showing the progress while loading: apple_gmux -> i915 -> nvidia. The patch applied cleanly over 3.19, from which the logs are captured.
Comment 19 Bruno Prémont 2015-02-14 10:21:48 UTC
Good news.

Could you also verify that rmmod and modprobe of apple-gmux a few times in a row works? If so I will send it to Bjorn and CC maintainers as well as dri-devel to get some more testing and then get it applied.

If you modprobe nvidia while apple-gmux is not loaded trying to load it at a later time (even after unloading nvidia) would probably fail. As I don't know what device handles backlight IO for all Apple Macs I'm reluctant to randomly try locking IO for graphics devices (especially as it might confuse running GPU drivers)
Comment 20 Petri Hodju 2015-02-17 19:57:37 UTC
Created attachment 167391 [details]
loading apple_gmux multiple times

Things seem to work as you expected. When loading / unloading the apple_gmux multiple times in a row, the gmux is able to lock to the correct I/O and the backlight keeps functioning correctly. The backlight also keeps working if the gmux is loaded at the time nvidia gets loaded. As you expected, if I unload the gmux and then load nvidia, the backlight stops functioning and won't recover even by unloading nvidia and loading the gmux again. This behaviour is understandable as you described. I've attached here dmesg logs to show how things worked out.
Comment 21 Bjorn Helgaas 2015-03-09 20:31:32 UTC
Bruno, if you sent this patch to me, I lost it.  Can you resend it and make sure linux-pci is cc'd?
Comment 22 Bruno Prémont 2015-03-09 21:54:55 UTC
Bjorn, I've sent it to Darren Hart as he is the maintainer of apple_gmux (you were CCed).

Resent, 8bit encoded for Darren and linux-pci CCed.
Comment 23 Bjorn Helgaas 2016-10-28 20:52:39 UTC
4eebd5a4e726 ("apple-gmux: lock iGP IO to protect from vgaarb changes") appeared in v4.1 and mentions this bugzilla.  I'm assuming that change fixed this bug, so I'm closing it.  Please reopen if this is incorrect.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4eebd5a4e726

Note You need to log in before you can comment on or make changes to this bug.