Bug 58091 - Backlight control regression on System 76 laptop with Nvidia GPU
Summary: Backlight control regression on System 76 laptop with Nvidia GPU
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Video (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Aaron Lu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-14 01:20 UTC by Jason Cassell
Modified: 2013-06-15 00:11 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.9.2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
lspci -vvv output (38.72 KB, text/plain)
2013-05-14 01:20 UTC, Jason Cassell
Details
syslog excerpt showing 3.8.12 booting (79.51 KB, text/plain)
2013-05-14 01:24 UTC, Jason Cassell
Details
syslog excerpt showing 3.9.2 booting (81.64 KB, text/plain)
2013-05-14 01:25 UTC, Jason Cassell
Details
config for 3.8.12 (102.85 KB, text/plain)
2013-05-14 01:30 UTC, Jason Cassell
Details
config for 3.9.2 (104.77 KB, text/plain)
2013-05-14 01:30 UTC, Jason Cassell
Details
acpidump output, while running 3.8.12 (188.32 KB, text/plain)
2013-05-14 08:18 UTC, Jason Cassell
Details
acpi_devtree_dump.out (10.92 KB, text/plain)
2013-05-15 03:14 UTC, Jason Cassell
Details
Debug patch to see where video module bails out (2.97 KB, patch)
2013-05-20 09:23 UTC, Aaron Lu
Details | Diff
3.9.2 booting with acpi_video_debug.patch (82.61 KB, text/plain)
2013-05-20 22:14 UTC, Jason Cassell
Details
Clevo_P150HMx_Gentoo~amd64 data (57.18 KB, application/octet-stream)
2013-05-24 15:38 UTC, Dmitry S. Demin
Details
Add debug message to see what's the problem (3.35 KB, patch)
2013-05-30 02:34 UTC, Aaron Lu
Details | Diff
dmesg from 3.9.2 booting with acpi_pci_root_debug.patch (90.90 KB, text/plain)
2013-05-30 05:01 UTC, Jason Cassell
Details
dmesg booting v3.9.2 jn Clevo with acpi_pci_root_debug patch (26.44 KB, text/plain)
2013-05-30 08:16 UTC, Dmitry S. Demin
Details
Do not probe acpi devices that already have driver_data (549 bytes, patch)
2013-05-31 01:41 UTC, Aaron Lu
Details | Diff
dmesg on gentoo-sources-3.9.4 with ACPI-do-not-match-devices-that-have-scan-handler.patch (29.89 KB, text/plain)
2013-06-04 20:47 UTC, Dmitry S. Demin
Details
ACPI / scan: do not match drivers against objects having scan handlers (1.87 KB, patch)
2013-06-04 21:07 UTC, Rafael J. Wysocki
Details | Diff
dmesg from 3.9.4 booting with the patch with the WARN_ON (112.26 KB, text/plain)
2013-06-05 00:04 UTC, Jason Cassell
Details
ACPI / video: Do not bind to device objects with drivers or scan handlers (524 bytes, patch)
2013-06-08 01:12 UTC, Rafael J. Wysocki
Details | Diff
ACPI / video: Do not bind to device objects with drivers or scan handlers (525 bytes, patch)
2013-06-08 01:14 UTC, Rafael J. Wysocki
Details | Diff
ACPI / video: Do not bind to device objects with drivers or scan handlers (1012 bytes, patch)
2013-06-08 01:25 UTC, Rafael J. Wysocki
Details | Diff
ACPI / video: Do not bind to device objects with a scan handler (2.21 KB, patch)
2013-06-09 21:29 UTC, Rafael J. Wysocki
Details | Diff
dmesg from 3.9.4 booting with the patch from comment #57 (57.07 KB, text/plain)
2013-06-10 06:39 UTC, Jason Cassell
Details
dmesg on gentoo-sources-3.9.5 with acpi-video-fix.patch (26.30 KB, text/plain)
2013-06-10 09:00 UTC, Dmitry S. Demin
Details

Description Jason Cassell 2013-05-14 01:20:45 UTC
Created attachment 101341 [details]
lspci -vvv output

The backlight controls worked fine with kernel version 3.8.12, and don't work with 3.9.2.  /sys/class/backlight is now empty.

I had this problem with 3.4.x too, and the "still use ACPI backlight control if _DOS doesn't exist" patch in bug 43168 fixed it for me then.

In syslog, this line is there when I boot with 3.8.12, but not 3.9.2:
nouveau  [     DRM] ACPI backlight interface available, not registering our own
Comment 1 Jason Cassell 2013-05-14 01:24:54 UTC
Created attachment 101351 [details]
syslog excerpt showing 3.8.12 booting
Comment 2 Jason Cassell 2013-05-14 01:25:46 UTC
Created attachment 101361 [details]
syslog excerpt showing 3.9.2 booting
Comment 3 Jason Cassell 2013-05-14 01:30:01 UTC
Created attachment 101371 [details]
config for 3.8.12
Comment 4 Jason Cassell 2013-05-14 01:30:43 UTC
Created attachment 101381 [details]
config for 3.9.2
Comment 5 Aaron Lu 2013-05-14 07:34:35 UTC
What's the output of
$ ls /sys/class/backlight
under v3.8?

Please attach acpidump output.

Thanks.
Comment 6 Jason Cassell 2013-05-14 08:16:53 UTC
$ ls /sys/class/backlight/
acpi_video0
Comment 7 Jason Cassell 2013-05-14 08:18:44 UTC
Created attachment 101391 [details]
acpidump output, while running 3.8.12
Comment 8 Aaron Lu 2013-05-15 02:25:40 UTC
Please run the following script and attach the output file at /tmp/acpi_devtree_dump.out.

#!/bin/bash
dumpfile=/tmp/acpi_devtree_dump.out
rm -f $dumpfile
touch $dumpfile
for i in `find /sys/devices/LNXSYSTM:00 -name path`;
	do echo "$i -> `cat $i`" >> $dumpfile;
done

BTW, is the video module loaded?
Thanks.
Comment 9 Jason Cassell 2013-05-15 03:13:22 UTC
Yes, the video module is loaded.
Comment 10 Jason Cassell 2013-05-15 03:14:27 UTC
Created attachment 101591 [details]
acpi_devtree_dump.out
Comment 11 Aaron Lu 2013-05-20 06:38:40 UTC
There are only 2 cleanup commits to the video driver in v3.9, so there must be something wrong elsewhere...

Are kernel compiling an option to you?
I can prepare a verbose debug patch, let's see where it bailed out, thanks.
Comment 12 Jason Cassell 2013-05-20 07:07:14 UTC
Yeah, let's do that.
Comment 13 Aaron Lu 2013-05-20 08:17:18 UTC
Which version of the kernel tree do you have? I can prepare a patch for that, 3.9+ is required.
Comment 14 Jason Cassell 2013-05-20 08:43:55 UTC
I have 3.9.2, but I noticed that 3.9.3 is out now and I can get that one easily if you think it'd be better to debug that one.
Comment 15 Aaron Lu 2013-05-20 08:47:55 UTC
No need, I think there is no change in this regard.
I'll prepare a debug patch on top of v3.9.2, thanks.
Comment 16 Aaron Lu 2013-05-20 09:23:56 UTC
Created attachment 102031 [details]
Debug patch to see where video module bails out

Apply on top of v3.9.2.

Please also add acpi.debug_layer=0x10000000 acpi.debug_level=0x4 to kernel command line.
Comment 17 Jason Cassell 2013-05-20 22:14:06 UTC
Created attachment 102121 [details]
3.9.2 booting with acpi_video_debug.patch

Here's the result of the debug patch.
Comment 18 Dmitry S. Demin 2013-05-24 15:38:32 UTC
Created attachment 102451 [details]
Clevo_P150HMx_Gentoo~amd64 data

I experience the same issue with missing /sys/class/backlight/acpi_video0 symlink on my Sager-branded Clevo P150HMx notebook. 
All the data collected using dmegs, lspci and patched kernel sources is attached. 
I have completed git-bisect v3.9 v3.8.13 routine and found that the symlink is missing since commit 00c43b9682507dc622c03172fde1032e2a216e9d. 
The bisect log is also in attached archive.
The distributive used is Gentoo testing ~amd64 branch, but I use proprietary Nvidia X11 driver.
If I can help in any other way to solve this problem, please feel free to contact.
Comment 19 Aaron Lu 2013-05-25 04:06:19 UTC
Thanks a lot for the bisection, Dmitry!
Comment 20 Aaron Lu 2013-05-30 01:00:20 UTC
May 20 14:37:34 piedra kernel: [    6.585307] video PNP0A08:00: acpi_video_bus_add: status=0x0

This is weird, it means the acpi device for PCI root bridge is driven by video module. I don't see how this can happen.
Comment 21 Aaron Lu 2013-05-30 02:32:46 UTC
Can not reproduce this issue on a HP notebook, which has a dedicated graphics card under a PCI bridge.
Comment 22 Aaron Lu 2013-05-30 02:34:20 UTC
Created attachment 102931 [details]
Add debug message to see what's the problem

Let's dig this further. Please apply this patch on top of v3.9.x and attach dmesg here, thanks.
Comment 23 Jason Cassell 2013-05-30 05:01:04 UTC
Created attachment 102951 [details]
dmesg from 3.9.2 booting with acpi_pci_root_debug.patch
Comment 24 Dmitry S. Demin 2013-05-30 08:16:37 UTC
Created attachment 102981 [details]
dmesg booting v3.9.2 jn Clevo with acpi_pci_root_debug patch
Comment 25 Aaron Lu 2013-05-30 08:27:30 UTC
Thanks for the information.

The problem is root caused: there is a _ROM ACPI control method defined under the PCI root bridge, which made Linux think the PCI root bridge is a graphics card, since according to ACPI spec, only graphics controller device is supposed to expose a _ROM method. And the result is, the PCI root bridge device's private data is emptied by video module, making the acpi_get_pci_dev fail for the real graphics controller device.

Previously, there is a acpi pci root bridge driver, it will take the acpi root bridge device so that when acpi video module is loaded, the acpi pci root node will not be tried to match against video module. Since the bisected commit, the acpi pci root bridge driver is removed, so the pci root bridge will now be tried to match against video module and as it has _ROM method, it will successfully matched and then the probe will fail, making subsequent acpi_get_pci_dev call fail.

It's not clear to me what the clean fix is right now, stay tuned.
Comment 26 Aaron Lu 2013-05-31 01:41:08 UTC
Created attachment 103041 [details]
Do not probe acpi devices that already have driver_data

Hi,

Please give this patch a test, apply on top of v3.9.y, thanks.
Comment 27 Jason Cassell 2013-05-31 06:05:02 UTC
(In reply to comment #26)
> Created an attachment (id=103041) [details]
> Do not probe acpi devices that already have driver_data
> 
> Hi,
> 
> Please give this patch a test, apply on top of v3.9.y, thanks.

That fixes it for me, and I haven't noticed it causing any regressions yet.  Yay!

Thank you.
Comment 28 Dmitry S. Demin 2013-05-31 06:39:27 UTC
Yes, that fixes backlight regulation for me too, both on vanilla-3.9.2 and gentoo-sources-3.9.4. 

Thank you very much.
Comment 29 Aaron Lu 2013-06-04 05:25:55 UTC
Hi,

I just sent a patch to ACPI mailing list, there is a small change to address a comment from the maintainer, but I think it shouldn't affect the result. Anyway, if possible, please test it again, thanks.

I should have added you to the cc list but I forgot when submitting, sorry for this, the submitted patch is here:
https://patchwork.kernel.org/patch/2656921/
Comment 30 Dmitry S. Demin 2013-06-04 20:47:05 UTC
Created attachment 103431 [details]
dmesg on gentoo-sources-3.9.4 with ACPI-do-not-match-devices-that-have-scan-handler.patch

Now I observe a quite long set of kernel messages at boot after applying the ACPI-do-not-match-devices-that-have-scan-handler.patch over gentoo-sources-3.9.4.
Comment 31 Dmitry S. Demin 2013-06-04 20:53:04 UTC
Backlight regulation is fully functional with ACPI-do-not-match-devices-that-have-scan-handler.patch
Sorry for not mentioning that in the comment #30.
Comment 32 Rafael J. Wysocki 2013-06-04 21:00:54 UTC
Thanks.

Do you use the binary NVidia graphics driver by chance?

The warnings in comment #30 look like they are coming from it.

I think we'll need to remove the WARN_ON() for now ...
Comment 33 Rafael J. Wysocki 2013-06-04 21:07:38 UTC
Created attachment 103441 [details]
ACPI / scan: do not match drivers against objects having scan handlers

Can you please test this one just to be sure?
Comment 34 Jason Cassell 2013-06-05 00:04:20 UTC
Created attachment 103481 [details]
dmesg from 3.9.4 booting with the patch with the WARN_ON

I use nouveau and I have pretty much the same warnings.  And my backlight control works too.

I'll try the new patch.
Comment 35 Jason Cassell 2013-06-05 01:24:13 UTC
I tried the new patch.  The warnings are gone.
Comment 36 Aaron Lu 2013-06-05 01:47:49 UTC
(In reply to comment #35)
> I tried the new patch.  The warnings are gone.

Thanks for the test!
Comment 37 Aaron Lu 2013-06-05 01:48:10 UTC
(In reply to comment #32)
> Thanks.
> 
> Do you use the binary NVidia graphics driver by chance?
> 
> The warnings in comment #30 look like they are coming from it.
> 
> I think we'll need to remove the WARN_ON() for now ...


Right...

I misunderstand the match callback, I thought only devices that do not have driver yet will be attempted for a match, but obviously it's all the devices owned by the bus will be attempted for a match. It's the probe phase that will only probe for devices that do not have ->driver field set yet.

So we'll have to remove the WARN_ON for now till we do not have any ACPI driver left, or there will be a lot of warnings generated by every ACPI device that already has a driver bound during the match phase on a new acpi driver load.

BTW, does it make sense to match for devices that already have driver bound? It doesn't seem to be necessary to me.
Comment 38 Aaron Lu 2013-06-05 01:49:23 UTC
(In reply to comment #37)
> BTW, does it make sense to match for devices that already have driver bound?
> It
> doesn't seem to be necessary to me.

I mean, in the driver core.
Comment 39 Dmitry S. Demin 2013-06-05 06:13:53 UTC
Yes, I do use binary nvidia driver.
And yes, with WARN_ON removed messages at boot are gone too. Backlight is also functioning properly.
Comment 40 Rafael J. Wysocki 2013-06-08 01:06:02 UTC
Unfortunately, the patch from comment #33 caused a boot regression to happen on an IA-64 machine and will have to be reverted.
Comment 41 Rafael J. Wysocki 2013-06-08 01:07:28 UTC
(In reply to comment #25)
> Thanks for the information.
> 
> The problem is root caused: there is a _ROM ACPI control method defined under
> the PCI root bridge, which made Linux think the PCI root bridge is a graphics
> card, since according to ACPI spec, only graphics controller device is
> supposed
> to expose a _ROM method. And the result is, the PCI root bridge device's
> private data is emptied by video module, making the acpi_get_pci_dev fail for
> the real graphics controller device.

I think we can fix this just by modifying the video driver.
Comment 42 Rafael J. Wysocki 2013-06-08 01:12:50 UTC
Created attachment 103861 [details]
ACPI / video: Do not bind to device objects with drivers or scan handlers

I wonder if this patch fixes the problem?
Comment 43 Rafael J. Wysocki 2013-06-08 01:14:22 UTC
Created attachment 103871 [details]
ACPI / video: Do not bind to device objects with drivers or scan handlers

Sorry, the previous version was broken (a typo).

Please test this one instead.
Comment 44 Aaron Lu 2013-06-08 01:15:14 UTC
(In reply to comment #41)
> (In reply to comment #25)
> > Thanks for the information.
> > 
> > The problem is root caused: there is a _ROM ACPI control method defined
> under
> > the PCI root bridge, which made Linux think the PCI root bridge is a
> graphics
> > card, since according to ACPI spec, only graphics controller device is
> supposed
> > to expose a _ROM method. And the result is, the PCI root bridge device's
> > private data is emptied by video module, making the acpi_get_pci_dev fail
> for
> > the real graphics controller device.
> 
> I think we can fix this just by modifying the video driver.

The ACPI bus' probe callback will empty the driver_data filed for any matched ACPI device if the driver's add callback failed, so solving it in video driver is too late. We will need to think about how to avoid adding the PNP id for video to the PCI root device.
Comment 45 Rafael J. Wysocki 2013-06-08 01:25:57 UTC
Created attachment 103881 [details]
ACPI / video: Do not bind to device objects with drivers or scan handlers

Well, why don't we do this instead?

I don't really think acpi_bus_driver_init() has to clear driver and driver_data, since they haven't been assigned yet.
Comment 46 Aaron Lu 2013-06-08 01:30:42 UTC
(In reply to comment #45)
> Created an attachment (id=103881) [details]
> ACPI / video: Do not bind to device objects with drivers or scan handlers
> 
> Well, why don't we do this instead?
> 
> I don't really think acpi_bus_driver_init() has to clear driver and
> driver_data, since they haven't been assigned yet.

I agree. I think the patch is good, let's do that and wait for reporter's test. Thanks.
Comment 47 Aaron Lu 2013-06-08 01:32:37 UTC
(In reply to comment #46)
> (In reply to comment #45)
> > Created an attachment (id=103881) [details] [details]
> > ACPI / video: Do not bind to device objects with drivers or scan handlers
> > 
> > Well, why don't we do this instead?
> > 
> > I don't really think acpi_bus_driver_init() has to clear driver and
> > driver_data, since they haven't been assigned yet.
> 
> I agree. I think the patch is good, let's do that and wait for reporter's
> test.
> Thanks.

Except that some ACPI driver assign driver_data early in their add callback, and when some error happened, they simply return without clearing that field. I'll need to take a look at the existing drivers, see if someone does that.
Comment 48 Aaron Lu 2013-06-08 01:35:18 UTC
Oh crap, at least drivers/acpi/ac.c doesn't clear driver_data on error...
Comment 49 Rafael J. Wysocki 2013-06-08 01:36:07 UTC
Well, I agree, but I'd say that would be a bug and it would be easy to fix.

But please do check the existing drivers (although there's a number of them).

Jason, Dmitry, can you please test the patch from comment #45 in the meantime?
Comment 50 Aaron Lu 2013-06-08 01:38:02 UTC
Yes, looks like there are a number of them don't clear driver_data on error, but as you say, it's easy to fix :-)
Comment 51 Rafael J. Wysocki 2013-06-08 01:39:37 UTC
(In reply to comment #48)
> Oh crap, at least drivers/acpi/ac.c doesn't clear driver_data on error...

Even so, I'm not sure what problems that may lead to, practically.  Yes, that's a bug, yes it needs to be fixed, but will it actually cause things to break?
Comment 52 Aaron Lu 2013-06-08 01:48:24 UTC
That may need to be checked on a per-driver basis, I'm not sure at the moment.
Comment 53 Rafael J. Wysocki 2013-06-08 01:53:30 UTC
Honestly, I don't remember any ACPI driver checking if driver_data is NULL before attempting to overwrite it.  Sadly.

Which means that the change from comment #45 should be totally safe, but if it turns out to cause problems to happen, they should be easy to address.

Let's check if it helps with the original issue first, though. :-)
Comment 54 Aaron Lu 2013-06-08 02:00:55 UTC
(In reply to comment #53)
> Honestly, I don't remember any ACPI driver checking if driver_data is NULL
> before attempting to overwrite it.  Sadly.

No they don't :-)

I can only see the notify handler would need to use driver_data, but since the notify is only installed after a successful add, so it wouldn't cause problem.

> 
> Which means that the change from comment #45 should be totally safe, but if
> it
> turns out to cause problems to happen, they should be easy to address.
> 
> Let's check if it helps with the original issue first, though. :-)

Yes I agree.
Comment 55 Jason Cassell 2013-06-08 05:50:55 UTC
I'm running 3.9.4 with the patch from comment #45.  The backlight control works, and there are no obvious problems so far.
Comment 56 Rafael J. Wysocki 2013-06-08 19:41:13 UTC
Thanks for the confirmation!
Comment 57 Rafael J. Wysocki 2013-06-09 21:29:40 UTC
Created attachment 104051 [details]
ACPI / video: Do not bind to device objects with a scan handler

Jason, can you please verify if this slightly modified patch from comment #45 still works for you?
Comment 58 Jason Cassell 2013-06-10 06:35:32 UTC
I have good news and bad news.  The good news is the patch from comment #57 appears to work just as well as the one from comment #45.  The bad news is that they both make this happen:

[    6.309685] video: probe of PNP0A08:00 failed with error -22
Comment 59 Jason Cassell 2013-06-10 06:39:55 UTC
Created attachment 104121 [details]
dmesg from 3.9.4 booting with the patch from comment #57

Here's my whole dmesg output in case I missed anything else.
Comment 60 Dmitry S. Demin 2013-06-10 09:00:39 UTC
Created attachment 104151 [details]
dmesg on gentoo-sources-3.9.5 with acpi-video-fix.patch

Tested patch from comment #45 on gentoo-3.9.4 - backlight control worked, but didn't look into dmesg.

Tested patch from comment #47 on gentoo-3.9.5 - backlight control works, have same message in dmesg as Jason:

[    6.665498] video: probe of PNP0A08:00 failed with error -22
[    6.665512] [Firmware Bug]: ACPI(MXM3) defines _DOD but not _DOS
[    6.665580] [Firmware Bug]: ACPI: No _BQC method, cannot determine initial brightness
Comment 61 Rafael J. Wysocki 2013-06-10 10:57:12 UTC
(In reply to comment #58)
> I have good news and bad news.  The good news is the patch from comment #57
> appears to work just as well as the one from comment #45.  The bad news is
> that
> they both make this happen:
> 
> [    6.309685] video: probe of PNP0A08:00 failed with error -22

That is a consequence of the way those patches work and is totally harmless (it actually is useful, because it shows that something weird is going on due to a BIOS bug).

Thanks for testing!

Note You need to log in before you can comment on or make changes to this bug.