Bug 177151

Summary: no idle pc3/pc6/pc7 starting in Linux 4.5 - Macbook Air
Product: Power Management Reporter: David Purton (dcpurton)
Component: OtherAssignee: Len Brown (lenb)
Status: CLOSED DOCUMENTED    
Severity: normal CC: cengizc, dcpurton, edouard.thuleau, hugo, lenb, lv.zheng, rui.zhang, tom
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: acpidump for MacBookAir6,1 (Early 2014)
acpidump on a Mid 2013 MacBook Air, linux 4.8.1
turbostat --debug for good kernel
turbostat --debug for bad kernel
acpi debug trace for 4.4.39 with acpi_osi=
acpi debug trace for 4.9.0+ with acpi_osi=
acpi debug trace for 4.9.0+ with acpi_osi=Linux
acpi debug trace for 4.9.0+ with acpi_osi=Darwin
acpi debug trace for 4.4.39 with acpi_osi=
acpi debug trace for 4.9.0+ with acpi_osi=
acpi debug trace for 4.9.0+ with acpi_osi=Linux
acpi debug trace for 4.9.0+ with acpi_osi=Darwin
acpi debug trace for 4.9.0+ with acpi_osi=Linux (retest)
acpi debug trace for 4.4.39 with acpi_osi unset
acpi debug trace for 4.4.39 with acpi_osi=!Darwin
_SB.PCI0._INI acpi debug trace for v4.4.39
Customized dsdt to make OSYS fixed
Customized DSDT to be used with DSDT override
turbostat --debug for v4.6

Description David Purton 2016-10-10 09:30:12 UTC
I am seeing battery life regression on my MacBook Air since kernel 4.4. Under 4.4 with "acpi_osi=" my idle power usage is around 3.5W. This translates to fantastic battery life lasting me all day.

Things broke badly with 4.5 (I gather this was to do with thunderbolt support). But even with 4.6 and 4.7, I'm still not able to achieve the same power usage. With kernel 4.7 and "acpi_osi=" my power usage is just under 5W. All other combinations I've tried for acpi_osi (including !Darwin) give even worse results.

I have installed OS X El Capitan 10.11.6 on the Mac partitian.

Model Name: MacBook Air (11-inch, Early 2014)
Model Identifier: MacBookAir6,1
Model Number: MD712X/B
Comment 1 Len Brown 2016-10-10 23:58:13 UTC
Is thunderbolt actually in use in this configuration?
Comment 2 Len Brown 2016-10-10 23:59:14 UTC
please attach the output from acpidump from this machine
Comment 3 David Purton 2016-10-11 00:56:43 UTC
Thunderbolt is definitely not in use with 4.4 and with "acpi=" it's not in use with later kernels either. (At least lspci doesn't show it.) I don't need thunderbolt, but am concerned about battery life.

acpidump attached.
Comment 4 David Purton 2016-10-11 00:58:34 UTC
Created attachment 241401 [details]
acpidump for MacBookAir6,1 (Early 2014)

Taken under kernel debian 4.4.6 (if that makes a difference)
Linux detritus 4.4.0-1-amd64 #1 SMP Debian 4.4.6-1 (2016-03-17) x86_64 GNU/Linux
Comment 5 Hugo 2016-10-11 19:26:15 UTC
Created attachment 241551 [details]
acpidump on a Mid 2013 MacBook Air, linux 4.8.1

acpidump on a Mid 2013 MacBook Air.

Linux athena.barrera.io 4.8.1-1-ARCH #1 SMP PREEMPT Fri Oct 7 19:14:45 CEST 2016 x86_64 GNU/Linux
Comment 6 Zhang Rui 2016-11-14 05:00:14 UTC
please attach the output of turbostat --debug for both good and bad kernel.
Comment 7 David Purton 2016-11-16 00:01:46 UTC
Created attachment 244711 [details]
turbostat --debug for good kernel

turbostat --debug for good kernel

(I can run it for longer if you need---wasn't really sure.)

Linux detritus 4.4.0-1-amd64 #1 SMP Debian 4.4.6-1 (2016-03-17) x86_64 GNU/Linux
Comment 8 David Purton 2016-11-16 00:07:18 UTC
Created attachment 244721 [details]
turbostat --debug for bad kernel

Linux detritus 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 GNU/Linux
Comment 9 Zhang Rui 2016-11-16 01:18:46 UTC
we can see that the system used to spend 60%+ time in pc7 state, in 4.4 kernel. And it can not enter package c-state deeper than pc2 in 4.7. This explains why the idle power increases.
Comment 10 Tom B 2016-11-27 19:12:52 UTC
This is also an issue for Macbook Pro 11,5. Idle battery life has been reduced considerably and idle temperatures are abnormally high (60C instead of 50C previously)
Comment 11 Zhang Rui 2016-12-19 07:04:19 UTC
can you please check if the problem still exists in latest upstream kernel?
can you please check if the problem still exists in the latest upstream kernel with acpi_osi="!Darwin"?
Comment 12 Lv Zheng 2016-12-19 08:43:24 UTC
Why do you think it is an _OSI regression?

Have you figured out the executed AML control methods that are related to acpi_osi="!Darwin"?
And have you tried to obtain AML trace log for these methods with acpi_osi="!Darwin" acpi.trace_state=method acpi.trace_method_name="The method name you figured out" specified for your kernels (since it's reported as regression, you need to test 2 kernels).

Thanks
Lv
Comment 13 Zhang Rui 2016-12-19 09:14:18 UTC
(In reply to Lv Zheng from comment #12)
> Why do you think it is an _OSI regression?
> 
No one said this is an _OSI regression.

> Have you figured out the executed AML control methods that are related to
> acpi_osi="!Darwin"?

what methods? please tell me.

> And have you tried to obtain AML trace log for these methods with
> acpi_osi="!Darwin" acpi.trace_state=method acpi.trace_method_name="The
> method name you figured out" specified for your kernels (since it's reported
> as regression, you need to test 2 kernels).
>
again, what methods?
Comment 14 David Purton 2016-12-20 00:20:18 UTC
Zhang Rui, the problem still exists in linux 4.9+. There is no appreciable difference between acpi_osi= and acpi_osi=!Darwin. The turbostat --debug output is similar to the 4.7 one I posted above and powertop indicates an idle power draw of greater than 5 W.

(I built commit e93b1cc8a8965da137ffea0b88e5f62fa1d2a9e6).


Lv Zheng, Sorry, I don't know enough to answer your questions. (I'm just an end user not a kernel developer.) But I can do tests if you explain how.


Thanks,

David
Comment 15 Lv Zheng 2016-12-20 06:30:06 UTC
(In reply to Zhang Rui from comment #13)
> (In reply to Lv Zheng from comment #12)
> > Why do you think it is an _OSI regression?
> > 
> No one said this is an _OSI regression.

But according to the bug description, sounds like an _OSI regression.

> 
> > Have you figured out the executed AML control methods that are related to
> > acpi_osi="!Darwin"?
> 
> what methods? please tell me.
> 
> > And have you tried to obtain AML trace log for these methods with
> > acpi_osi="!Darwin" acpi.trace_state=method acpi.trace_method_name="The
> > method name you figured out" specified for your kernels (since it's
> reported
> > as regression, you need to test 2 kernels).
> >
> again, what methods?

Let me check it and get back to you guys later. :)

Thanks
Lv
Comment 16 Lv Zheng 2016-12-20 06:57:55 UTC
1. _OSI is only invoked in DSDT, setting OSYS:
	Line 7676:             If (CondRefOf (\_OSI, Local0))
	Line 7678:                 If (_OSI ("Darwin"))
	Line 7683:                 If (\_OSI ("Linux"))
	Line 7688:                 If (\_OSI ("Windows 2009"))
	Line 7693:                 If (\_OSI ("Windows 2012"))
	Line 7727:             If (CondRefOf (_OSI, Local0))
	Line 7729:                 If (_OSI ("Darwin"))
	Line 7735:                     If (_OSI ("Linux"))
	Line 7741:                         If (_OSI ("Windows 2009"))
	Line 7747:                             If (_OSI ("Windows 2012"))
   Actually the above results are all from the 2 functions:
    Scope (\_SB.PCI0)
    {
        Method (_INI, 0, NotSerialized)  // _INI: Initialize
        {
            Store (0x07DC, OSYS)
            If (CondRefOf (\_OSI, Local0))
            {
                If (_OSI ("Darwin"))
                {
                    Store (0x2710, OSYS)
                }

                If (\_OSI ("Linux"))
                {
                    Store (0x03E8, OSYS)
                }

                If (\_OSI ("Windows 2009"))
                {
                    Store (0x07D9, OSYS)
                }

                If (\_OSI ("Windows 2012"))
                {
                    Store (0x07DC, OSYS)
                }
            }
        }
    }
    Scope (\)
    {
        Method (PINI, 0, NotSerialized)
        {
            Store (0x07DC, OSYS)
            If (CondRefOf (_OSI, Local0))
            {
                If (_OSI ("Darwin"))
                {
                    Store (0x2710, OSYS)
                }
                Else
                {
                    If (_OSI ("Linux"))
                    {
                        Store (0x03E8, OSYS)
                    }
                    Else
                    {
                        If (_OSI ("Windows 2009"))
                        {
                            Store (0x07D9, OSYS)
                        }
                        Else
                        {
                            If (_OSI ("Windows 2012"))
                            {
                                Store (0x07DC, OSYS)
                            }
                        }
                    }
                }
            }
            Else
            {
                Store (0x07DC, OSYS)
            }
        }
    }
   However I cannot see \PINI invoked. So _OSI is checked in \_SB.PCI0._INI.
   But \_SB.PCI0._INI looks buggy, doesn't contain Else, while \PINI looks more promising.

   The values of OSYS then will be:
    Default 0x07DC
    Darwin 0x2710
    Linux 0x03E8
    Windows 2009 0x07D9
    Windows 2012 0x07DC

   So if you want to specify a specific OSI (ex., Linux), on old Linux, you need to:
   "acpi_osi=! acpi_osi=Linux"
   I think we have a commit converting "acpi_osi=Darwin" into "acpi_osi=! acpi_osi=Darwin", "acpi_osi=Linux" into "acpi_osi=! acpi_osi=Linux"

   acpi_osi can be used to affect the result of OSYS:
    acpi_osi=: OSYS=0x07DC
    acpi_osi=!Darwin: OSYS=0x07DC
    acpi_osi=Linux: OSYS=0x03E8
    acpi_osi=Darwin: OSYS=0x2710

3. OSYS value is only checked in DSDT:
    \_SB.PCI0.IGPU.ABCL
	Line 2796:                     If (LLess (OSYS, 0x07DC))
    \_SB.PCI0.IGPU.GNOT
	Line 4062:                         If (LOr (LGreater (OSYS, 0x07D0), LLess (OSYS, 0x07D6)))
    HPET._STA
	Line 5090:                         If (LGreaterEqual (OSYS, 0x07D1))
    ALS0._STA
	Line 5415:                         If (LGreaterEqual (OSYS, 0x07D9))
    EC._REG
	Line 5799:                         If (LOr (LEqual (Arg0, 0x03), LGreaterEqual (OSYS, 0x07D6)))
    \OSDW
	Line 7714:             If (LEqual (OSYS, 0x2710))

Since bug is related to OSYS=0x2710(Darwin), I would rather to suggest the following tests:

1. Using 4.4 kernel, specifying "acpi_osi= acpi.trace_state=opcode acpi.trace_method_name=OSDW", (requiring CONFIG_ACPI_DEBUG=y), boot the built kernel, and upload the dmesg out here
2. Using latest kernel, specifying "acpi_osi= acpi.trace_state=opcode acpi.trace_method_name=OSDW", (requiring CONFIG_ACPI_DEBUG=y), boot the built kernel, and upload the dmesg out here
3. Using latest kernel, specifying "acpi_osi=Linux acpi.trace_state=opcode acpi.trace_method_name=OSDW", (requiring CONFIG_ACPI_DEBUG=y), boot the built kernel, and upload the dmesg out here
Please see Documentation/acpi/method-tracing.txt for detailed usages of AML tracer.
4. Using latest kernel, specifying "acpi_osi=Darwin acpi.trace_state=opcode acpi.trace_method_name=OSDW", (requiring CONFIG_ACPI_DEBUG=y), boot the built kernel, and upload the dmesg out here
Please see Documentation/acpi/method-tracing.txt for detailed usages of AML tracer.

If you have CONFIG_ACPI_DEBUGGER_USER=m enabled, you probably could just run acpidbg to obtain the result instead of using "acpi.trace_state=opcode acpi.trace_method_name=OSDW":
"acpidbg -b "ex \OSDW"
Please see Documentation/acpi/aml-debugger.txt for detailed usages of AML debugger.

Thanks and best regards
Lv
Comment 17 David Purton 2016-12-20 11:19:45 UTC
Lv Zheng, don't attach too much importance to the acpi_osi comments in the report. This is for info, rather than intended to be directly related to the problem. I have no idea where the cause is. I originally thought it was related to a different bug which was related to acpi_osi, but this bug was fixed and did not affect the problem I am seeing. I'll run your suggested tests and post results here.

David
Comment 18 David Purton 2016-12-20 12:40:04 UTC
Created attachment 248101 [details]
acpi debug trace for 4.4.39 with acpi_osi=
Comment 19 David Purton 2016-12-20 12:41:07 UTC
Created attachment 248111 [details]
acpi debug trace for 4.9.0+ with acpi_osi=
Comment 20 David Purton 2016-12-20 12:43:30 UTC
Created attachment 248121 [details]
acpi debug trace for 4.9.0+ with acpi_osi=Linux

Note that this gives a higher power usage than acpi_osi= even under 4.9.0+
Comment 21 David Purton 2016-12-20 12:45:10 UTC
Created attachment 248131 [details]
acpi debug trace for 4.9.0+ with acpi_osi=Darwin

Note that this gives a higher power usage than acpi_osi= even under 4.9.0+. It is about the same as apci_osi=Linux.
Comment 22 David Purton 2016-12-20 12:56:18 UTC
Hmmm. I notice these dmesg outputs are truncated. Sorry! I'll try again.
Comment 23 David Purton 2016-12-20 13:16:24 UTC
Created attachment 248141 [details]
acpi debug trace for 4.4.39 with acpi_osi=

not truncated this time
Comment 24 David Purton 2016-12-20 13:19:31 UTC
Created attachment 248151 [details]
acpi debug trace for 4.9.0+ with acpi_osi=

not truncated this time
Comment 25 David Purton 2016-12-20 13:20:13 UTC
Created attachment 248161 [details]
acpi debug trace for 4.9.0+ with acpi_osi=Linux

not truncated this time
Comment 26 David Purton 2016-12-20 13:21:19 UTC
Created attachment 248171 [details]
acpi debug trace for 4.9.0+ with acpi_osi=Darwin

not truncated this time
Comment 27 Lv Zheng 2016-12-21 01:52:29 UTC
(In reply to David Purton from comment #17)
> Lv Zheng, don't attach too much importance to the acpi_osi comments in the
> report. This is for info, rather than intended to be directly related to the
> problem. I have no idea where the cause is. I originally thought it was
> related to a different bug which was related to acpi_osi, but this bug was
> fixed and did not affect the problem I am seeing. I'll run your suggested
> tests and post results here.
> 
> David

No problem.
I was just helping to confirm the root cause. :)
Comment 28 Lv Zheng 2016-12-21 02:11:48 UTC
(In reply to David Purton from comment #23/comment #24/comment #25/comment #26)

Thanks for the tests.

> Created attachment 248141 [details]
> acpi debug trace for 4.4.39 with acpi_osi=

I can see Else executed for this test.
So OSYS is not Darwin in this case.

> Created attachment 248151 [details]
> acpi debug trace for 4.9.0+ with acpi_osi=

I can see Else executed for this test.
So OSYS is not Darwin in this case.

> Created attachment 248161 [details]
> acpi debug trace for 4.9.0+ with acpi_osi=Linux
> 
> not truncated this time

I cannot see Else executed in this test.
OSYS is Darwin in this case.

> Created attachment 248171 [details]
> acpi debug trace for 4.9.0+ with acpi_osi=Darwin
> 
> not truncated this time

I cannot see Else executed in this test.
OSYS is Darwin in this case.

I cannot understand comment #25.
Will check the code and get back here later.
Can you also help to check if something wrong with the comment #25 test.

Thanks and best regards
Lv
Comment 29 Lv Zheng 2016-12-21 02:12:56 UTC
Could you also try the following tests:

Using 4.9 kernel, you actually can obtain OSYS value directly from userspace.
The steps are:
1. Enable CONFIG_ACPI_DEBUGGER=y, CONFIG_ACPI_DEBUGGER_USER=m
2. Rebuild the kernel and acpi tools, for tools build, in kernel source tree
    $ cd tools
    $ make acpi
3. After booting, load the AML debugger
    # mount -t debugfs none /sys/kernel/debug
    # modprobe acpi_dbg
4. Execute the following command for different acpi_osi= tests
    # tools/power/acpi/acpidbg -b "ex \OSYS"

For acpi_osi=, the result should be 0x07DC;
For acpi_osi=Linux, the result should be 0x03E8;
For acpi_osi=Darwin, the result should be 0x2710;
For acpi_osi=!, the result should be 0x07DC
For acpi_osi=! acpi_osi="Windows 2009", the result should be 0x07D9
For acpi_osi=! acpi_osi="Windows 2012", the result should be 0x07DC
If something wasn't as expected, please let me know.

Thanks in advance and best regards
Lv
Comment 30 David Purton 2016-12-21 03:08:47 UTC
(In reply to Lv Zheng from comment #29)
> For acpi_osi=, the result should be 0x07DC;
> For acpi_osi=Linux, the result should be 0x03E8;
> For acpi_osi=Darwin, the result should be 0x2710;
> For acpi_osi=!, the result should be 0x07DC
> For acpi_osi=! acpi_osi="Windows 2009", the result should be 0x07D9
> For acpi_osi=! acpi_osi="Windows 2012", the result should be 0x07DC

These all match *except* for acpi_osi=!. The result is 0x2710 in this case.
Comment 31 David Purton 2016-12-21 03:14:38 UTC
Created attachment 248221 [details]
acpi debug trace for 4.9.0+ with acpi_osi=Linux (retest)

As per comment #28, retest dmesg output for 4.9.0+ with acpi_osi=Linux acpi.trace_state=opcode acpi.trace_method_name=OSDW
Comment 32 Lv Zheng 2016-12-21 03:48:06 UTC
(In reply to David Purton from comment #30)

Good!
Seems everything is correct.

> > For acpi_osi=, the result should be 0x07DC;

Good.

> > For acpi_osi=Linux, the result should be 0x03E8;

I just checked the code.
My expectation wasn't entirely correct here.
acpi_osi=Linux doesn't mean "acpi_osi=! acpi_osi=Linux".
However DMI quirks of Darwin helps to add acpi_osi=!
So the result is still 0x03E8 as If (Linux) check comes after If (Darwin) check.

> > For acpi_osi=Darwin, the result should be 0x2710;
> > For acpi_osi=!, the result should be 0x07DC
> > For acpi_osi=! acpi_osi="Windows 2009", the result should be 0x07D9
> > For acpi_osi=! acpi_osi="Windows 2012", the result should be 0x07DC
> 

Good.

> These all match *except* for acpi_osi=!. The result is 0x2710 in this case.

This looks correct, my expectation was wrong. :)
The boot parameter deletes all Windows strings, and Darwin was added due to the DMI quirks...

So the problem is not related to _OSI, I'll leave this bug for Yu.

Thanks and best regards
Lv
Comment 33 Zhang Rui 2016-12-22 00:55:18 UTC
I'm more curious about
1. the OSYS value in 4.4 kernel, without any acpi_osi parameters.
2. the OSYS value in 4.9 kernel, with acpi_osi="!Darwin"
this should be the same.
Comment 34 Lv Zheng 2016-12-22 02:55:59 UTC
(In reply to Zhang Rui from comment #33)
> I'm more curious about
> 1. the OSYS value in 4.4 kernel, without any acpi_osi parameters.

I'm not sure if Matthew's forcing Darwin commit is there.
And I'm not sure if "Windows 2012" OSI string is enabled for that kernel.
So:
1. If force Darwin=false, Windows 2012=false, OSYS should be 0x07D9;
2. If force Darwin=false, Windows 2012=true, OSYS should be 0x07DC;
3. If force Darwin=true, Windows 2012=false, OSYS should be 0x07D9;
4. If force Darwin=true, Windows 2012=true, OSYS should be 0x07DC.

I'm not sure if AML debugger is working in 4.4 kernels.
If it's working, we can obtain OSYS value using acpidbg.

> 2. the OSYS value in 4.9 kernel, with acpi_osi="!Darwin"
> this should be the same.

I'm not sure if we can simply think in this way.
DSDT contains different logics related to _OSI handling.
And Linux kernel has special quirks for different platforms/different kernel versions.
So it needs to be examined case by case.

For this case:
1. When acpi_osi_dmi_blacklisted() is invoked
darwin_dmi=1
darwin_enable=true
default_disabling=ACPI_DISABLE_ALL_VENDOR_STRINGS
Darwin is added to osi_setup_entries, and its enable=true
2. When acpi_osi= is parsed
darwin_dmi=0
darwin_cmdline=1
darwin_enable=false
default_disabling=0
Darwin is still in osi_setup_entries, and its enable=false
3. When acpi_osi_setup_late() is invoked
acpi_update_interfaces() won't be invoked
acpi_remove_interface("Darwin") will be invoked
The OSYS result should be 0x07DC.

Thanks and best regards
Lv
Comment 35 David Purton 2016-12-22 05:38:52 UTC
(In reply to Zhang Rui and Lv Zheng from comment #33 and comment #34)

There does not seem to be an acpi_dbg module for kernel 4.4 so I'll upload the acpi debug trace from dmesg for when acpi_osi is unset and for when acpi_osi=!Darwin (I don't know what I'm looking for in these logs). FWIW, these both give worse battery life than acpi_osi=.

For kernel 4.9.0+, the acpidbg -b "ex \OSDW" results are:
    - acpi_osi is unset: 0x0001
    - acpi_osi=!Darwin:  0x0000
Comment 36 David Purton 2016-12-22 05:40:09 UTC
Created attachment 248281 [details]
acpi debug trace for 4.4.39 with acpi_osi unset
Comment 37 David Purton 2016-12-22 05:41:14 UTC
Created attachment 248291 [details]
acpi debug trace for 4.4.39 with acpi_osi=!Darwin
Comment 38 Lv Zheng 2016-12-23 03:33:50 UTC
(In reply to David Purton from comment #35)
> (In reply to Zhang Rui and Lv Zheng from comment #33 and comment #34)
> 
> There does not seem to be an acpi_dbg module for kernel 4.4 so I'll upload
> the acpi debug trace from dmesg for when acpi_osi is unset and for when
> acpi_osi=!Darwin (I don't know what I'm looking for in these logs). FWIW,
> these both give worse battery life than acpi_osi=.

Hmm, it's difficult to figure that out if there is no acpidbg available.
Let me think about that and get back to you later.

> 
> For kernel 4.9.0+, the acpidbg -b "ex \OSDW" results are:
>     - acpi_osi is unset: 0x0001

Yes, the DMI quirk will force your platform to be "Darwin".

>     - acpi_osi=!Darwin:  0x0000
And Darwin is effectived disabled by !Darwin acpi_osi command.
Looks good.

However you should use acpidbg -b "ex \OSYS" instead.
Which gives more useful information.
My analysis might be wrong.
Comment 39 David Purton 2016-12-23 10:46:11 UTC
(In reply to Lv Zheng from comment #38)
> However you should use acpidbg -b "ex \OSYS" instead.
> Which gives more useful information.
> My analysis might be wrong.

Ah - I miss remembered the commands. This gives:

For kernel 4.9.0+, the acpidbg -b "ex \OSYS" results are:
    - acpi_osi is unset: 0x2710
    - acpi_osi=!Darwin:  0x07DC
Comment 40 Lv Zheng 2016-12-26 02:22:39 UTC
(In reply to David Purton from comment #39)
> (In reply to Lv Zheng from comment #38)
> > However you should use acpidbg -b "ex \OSYS" instead.
> > Which gives more useful information.
> > My analysis might be wrong.
> 
> Ah - I miss remembered the commands. This gives:
> 
> For kernel 4.9.0+, the acpidbg -b "ex \OSYS" results are:
>     - acpi_osi is unset: 0x2710

4.9 kernel seems to have forced acpi_osi=Darwin.
Which makes it not possible to use "acpi_osi=".
This seems to need an improvement.
I'm not sure if this is the cause.

>     - acpi_osi=!Darwin:  0x07DC

This is exactly as expectations.
For this bug, does "acpi_osi=!Darwin" with 4.9 consumes same power as "acpi_osi=" with 4.4 kernel?

Thanks
Lv
Comment 41 Lv Zheng 2016-12-26 03:03:52 UTC
(In reply to Lv Zheng from comment #40)
> (In reply to David Purton from comment #39)
> > (In reply to Lv Zheng from comment #38)
> > > However you should use acpidbg -b "ex \OSYS" instead.
> > > Which gives more useful information.
> > > My analysis might be wrong.
> > 
> > Ah - I miss remembered the commands. This gives:
> > 
> > For kernel 4.9.0+, the acpidbg -b "ex \OSYS" results are:
> >     - acpi_osi is unset: 0x2710
> 
> 4.9 kernel seems to have forced acpi_osi=Darwin.
> Which makes it not possible to use "acpi_osi=".
> This seems to need an improvement.

I couldn't see problems in the kernel source.

> I'm not sure if this is the cause.

My misunderstanding.
I was thinking you are talking about "acpi_osi=", but you should be talking about no "acpi_osi" in the boot parameter.
It seems this bug is not related to acpi_osi implementation.

And _OSI implementation in 4.9 kernel just looks fine to me, and according to the test result here.

Please find different causes.

Thanks
Lv
Comment 42 David Purton 2016-12-26 23:43:53 UTC
(In reply to Lv Zheng from comment #40)
> This is exactly as expectations.
> For this bug, does "acpi_osi=!Darwin" with 4.9 consumes same power as
> "acpi_osi=" with 4.4 kernel?

No, 4.9 still consumes much more power even with acpi_osi=!Darwin. It continues to not go below pc2.
Comment 43 Lv Zheng 2016-12-27 00:18:51 UTC
I think this can help to figure out OSYS value for your 4.4 kernel:

Using 4.4 kernel, specifying "acpi.trace_state=opcode-once acpi.trace_method_name=_SB.PCI0._INI" in addition to your acpi_osi settings (requiring CONFIG_ACPI_DEBUG=y), boot the built kernel, and upload the dmesg out here.

Thanks
Lv
Comment 44 Zhang Rui 2016-12-27 01:11:23 UTC
yes, I think we should make clear
1. what is the default OSYS value in 4.4
2. what is the default OSYS value in 4.9 and how to set it to the same value as 4.4
and then, if the problem still exists with the same OSYS value, we need to figure out some other reason may cause this problem.

BTW, IMO, the most efficient way to find out the root cause of a regression is to use git bisect to find out the offending commit.
Comment 45 Lv Zheng 2016-12-27 08:00:24 UTC
(In reply to Zhang Rui from comment #44)
> BTW, IMO, the most efficient way to find out the root cause of a regression
> is to use git bisect to find out the offending commit.

However there are Matthew's _OSI change and my _OSI change between the 2 kernels. So the tester need to use different acpi_osi boot parameter to boot different kernels, and after booting into the kernel, use acpidbg to confirm if OSYS is as expected...
This sounds difficult, and is impossible before knowing the OSYS value in 4.4.

Thanks and best regards
Lv
Comment 46 Len Brown 2016-12-27 21:07:21 UTC
David,

It seems that the cause of the power regression is shown clearly
with turbostat for 4.4 and 4.5.  The working kernel get pc3/pc6/pc7,
and the failing kernel gets only pc2.

So the question is what between 4.4 and 4.5 broke this.

As assumption was made, above, that this was related to changes
in acpi_osi=Darwin defaults -- but that assumption appears to be false.
Indeed, that default did not change between 4.4 and 4.5.

Is it possible for you to git-bisect v4.4..v4.5 to find
what commit broke the kernel?  As this change happens
without any special cmdline parameters -- you should test
without any special cmdline parameters.

My guess is i915 or PCI.
Comment 47 David Purton 2016-12-28 00:11:30 UTC
(In reply to Len Brown from comment #46)
> David,
> 
> It seems that the cause of the power regression is shown clearly
> with turbostat for 4.4 and 4.5.  The working kernel get pc3/pc6/pc7,
> and the failing kernel gets only pc2.
> 
> So the question is what between 4.4 and 4.5 broke this.
> 
> As assumption was made, above, that this was related to changes
> in acpi_osi=Darwin defaults -- but that assumption appears to be false.
> Indeed, that default did not change between 4.4 and 4.5.
> 
> Is it possible for you to git-bisect v4.4..v4.5 to find
> what commit broke the kernel?  As this change happens
> without any special cmdline parameters -- you should test
> without any special cmdline parameters.
> 
> My guess is i915 or PCI.

Yep, I'll give this a go.
Comment 48 David Purton 2016-12-28 02:45:13 UTC
Actually, the git bisect is not so simple.

When I boot v4.5 with "acpi_osi=" /sys/class/power_supply/BAT0 *does not exist*. v4.7, v4.8, and v4.9 do not behave this way. I have not tested v4.6.

What's more, I notice that the system does reach pc7 under v4.5 (with "acpi_osi="). If I change the acpi_osi kernel parameter (tried a few combinations), /sys/class/power_supply/BAT0 appears, but system only gets to pc2.

So before I waste time doing the bisect it would be good to make sure that I can boot v4.4 and v4.5 with the same OSYS value.

Lv Zheng, I'll upload the dmesg you asked for ("acpi.trace_state=opcode-once acpi.trace_method_name=_SB.PCI0._INI") for v4.4. Then suggest what I should try/look for in v4.5 to get the same OSYS value.

Thanks,

David
Comment 49 David Purton 2016-12-28 02:58:21 UTC
Created attachment 248771 [details]
_SB.PCI0._INI acpi debug trace for v4.4.39

Lv Zheng, can you explain what I'm looking for in this dmesg output please?
Comment 50 Lv Zheng 2016-12-28 05:58:16 UTC
> Actually, the git bisect is not so simple.

Yes.
There are different changes between your testing kernels:
1. 7bc5a2bad0b8: since 4.7
2. e10cfdc33a0f: since 3.18
3. 2d3349de8072: since 4.7
4. ced043663e4b: since 4.7

> When I boot v4.5 with "acpi_osi="
> /sys/class/power_supply/BAT0 *does not exist*.
> v4.7, v4.8, and v4.9 do not behave this way.
> I have not tested v4.6.
> What's more,
> I notice that the system does reach pc7 under v4.5 (with "acpi_osi=").
> If I change the acpi_osi kernel parameter
> (tried a few combinations),
> /sys/class/power_supply/BAT0 appears,
> but system only gets to pc2.

Battery's _STA has logics related to _OSI and EC._REG execution:
                        Method (_STA, 0, NotSerialized)  // _STA: Status
                        {
                            If (OSDW ())
                            {
                                Return (0x00)
                            }
                            If (\_SB.PCI0.LPCB.EC.ECOK)
                            {
                                UBSS ()
                                If (And (BSSW, 0x01))
                                {
                                    Return (0x1F)
                                }
                                Else
                                {
                                    Return (0x0F)
                                }
                            }
                            Else
                            {
                                Return (0x0F)
                            }
                        }
If battery doesn't exist, OSYS value should be 0x2710.
However I don't know why it is changed for 4.5 kernel.

> So before I waste time doing the bisect it
> would be good to make sure that I can boot v4.4 and v4.5
> with the same OSYS value.

1. 4.4 OSYS value for your test
OK, here is the test result for 4.4 kernel.

[    2.191843]   exdebug-0435 ex_trace_point        : Method End [0xffffc9000000c0b6:\_SB.PCI0.RMCR] execution.
[    2.191847]   exdebug-0435 ex_trace_point        : Opcode Begin [0xffffc9000001667f:Store] execution.
[    2.191850]   exdebug-0435 ex_trace_point        : Opcode Begin [0xffffc90000016680:WordConst] execution.
[    2.191854]   exdebug-0435 ex_trace_point        : Opcode End [0xffffc90000016680:WordConst] execution.
[    2.191864]   exdebug-0435 ex_trace_point        : Opcode End [0xffffc9000001667f:Store] execution.
[    2.191868]   exdebug-0435 ex_trace_point        : Opcode Begin [0xffffc90000016687:If] execution.
[    2.191872]   exdebug-0435 ex_trace_point        : Opcode Begin [0xffffc9000001668a:CondRefOf] execution.
[    2.191881]   exdebug-0435 ex_trace_point        : Opcode Begin [0xffffc90000016691:Local0] execution.
[    2.191884]   exdebug-0435 ex_trace_point        : Opcode End [0xffffc90000016691:Local0] execution.
[    2.191894]   exdebug-0435 ex_trace_point        : Opcode End [0xffffc9000001668a:CondRefOf] execution.
[    2.191898]   exdebug-0435 ex_trace_point        : Opcode End [0xffffc90000016687:If] execution.

From that, I can see that OSYS is 0x07DC as nothing in If is executed.
I think you have booted your 4.4 kernel with "acpi_osi=" and the 4.4 kernel doesn't contain Matthew and my _OSI changes.

2. different acpi_osi for different kernels
   4.4 acpi_osi=
   4.5 
   4.6 
   4.8-4.9 acpi_osi=!Darwin, I think acpi_osi= should also work
I do not know why 4.5 kernel returns OSYS=02710, thus I don't know a mean to set OSYS to 0x07DC for 4.5-4.6 kernels.
Maybe you can try not to use any acpi_osi parameters in the command line.

2. confirm OSYS value
I think you can always boot kernel using "acpi.trace_state=opcode-once acpi.trace_method_name=_SB.PCI0._INI" and observe the AML execution to determine the OSYS value.

But I also noticed 1 more thing, that PINI is actually invoked in \_SB._INI. My observation of "However I cannot see \PINI invoked." in comment 16 is wrong.
    Scope (\_SB)
    {
        Method (_INI, 0, NotSerialized)  // _INI: Initialize
        {
            PINI ()
            DSPI ()
        }
So the order of \_SB._INI and \_SB.PCI0._INI also maters.
As \_SB._INI is executed before executing \_SB.PCI0._INI, I think the OSYS value set by PINI will be reset in \_SB.PCI0._INI.
But we may still need to confirm the order before making such a conclusion.
So please boot 4.4/4.9 kernel with "acpi_trace_state=enable", and upload the dmesg here.

If some kernels executed _SB.INI after executing _SB.PCI0._INI, for that kernels, you should use "acpi_osi=" boot parameters suitable for PINI method and boot the kernels using "acpi.trace_state=opcode-once acpi.trace_method_name=_SB._INI" to confirm the OSYS value.

Thanks and best regards
Lv
Comment 51 Lv Zheng 2016-12-28 06:29:43 UTC
Maybe a simpler way is to customize DSDT, and use it for bisection.
Comment 52 Lv Zheng 2016-12-28 06:34:20 UTC
Created attachment 248781 [details]
Customized dsdt to make OSYS fixed

We may be able to make OSYS fixed to do the bisection.
Comment 53 Lv Zheng 2016-12-28 06:42:46 UTC
Created attachment 248791 [details]
Customized DSDT to be used with DSDT override

I customized your DSDT using patch in comment 52.
This helps to eliminate order issues mentioned for _SB._INI/_SB.PCI0._INI and eliminate the uncertainty related to "Windows 2013" _OSI string commit.

Please following https://01.org/linux-acpi/documentation/overriding-dsdt
to use this file.
Or you can:
1. Put this file into include/acpi in the kernel source tree
2. Configure CONFIG_ACPI_CUSTOM_DSDT_FILE as "acpi/dsdt.hex"
3. Rebuild the kernel.

With DSDT customized, if 4.4 kernel consumes 3.5w 4.9 kernel consumes 4.5w, it may be possible for you to continue bisection.

Thanks
Lv
Comment 54 David Purton 2016-12-28 09:31:06 UTC
(In reply to Lv Zheng from comment #50)
> If battery doesn't exist, OSYS value should be 0x2710.
> However I don't know why it is changed for 4.5 kernel.

v4.5 does have CONFIG_ACPI_DEUGGER_USER.

acpidbg -b "ex \OSYS" gives 0x07DC (with "acpi_osi=")

This is the configuration that doesn't detect the battery and so I don't know how much power it's drawing. But the system definitely does reach pc7. So I guess there's a chance that it could be similar to v4.4.
Comment 55 Len Brown 2016-12-28 11:56:10 UTC
The bisect is between 4.4 and 4.5, yes?
OSI Darwin did not change between those two releases, right?
If 4.4 can get into pc7 (and it should not need any cmdline),
and 4.5, also without any cmdline param, does not get into pc7,
then the bisect can proceed.  Am I missing something?
Comment 56 David Purton 2016-12-29 06:46:23 UTC
Wait, wait!

I just tested v4.6 and it works (with "acpi_os=")!

I'm not sure how I missed this. v4.5 doesn't work (no battery shows up). v4.7 doesn't work (only pc2). But these are different problems.

I'll do the bisect from v4.6 to v4.7 and hopefully that will expose the problem.


David
Comment 57 David Purton 2016-12-29 07:20:18 UTC
Created attachment 249021 [details]
turbostat --debug for v4.6

Actually... comment #56 might have been a bit premature. v4.6 does reach pc7 state, but it doesn't spend as much time there as v4.4. So battery life still isn't as good, but it is of course much better than v4.7 which only gets to pc2 state.

So I guess there are a couple of problems. I'll run some bisects on v4.6..v4.7 and v4.4..v4.5 and see what comes up. Might take me a while.

David
Comment 58 David Purton 2017-01-01 11:14:06 UTC
OK. I've done a bisect from v4.6..v4.7. I checked for when the system stopped reaching pc7 state. Here's the git bisect log.

git bisect start
# good: [2dcd0af568b0cf583645c8a317dd12e344b1c72a] Linux 4.6
git bisect good 2dcd0af568b0cf583645c8a317dd12e344b1c72a
# bad: [523d939ef98fd712632d93a5a2b588e477a7565e] Linux 4.7
git bisect bad 523d939ef98fd712632d93a5a2b588e477a7565e
# good: [0694f0c9e20c47063e4237e5f6649ae5ce5a369a] radix tree test suite: remove dependencies on height
git bisect good 0694f0c9e20c47063e4237e5f6649ae5ce5a369a
# bad: [e4f7bdc2ec0d0dcc27f7d70db27a620dfdc1f697] Merge branch 'for-4.7-zac' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
git bisect bad e4f7bdc2ec0d0dcc27f7d70db27a620dfdc1f697
# good: [2f37dd131c5d3a2eac21cd5baf80658b1b02a8ac] Merge tag 'staging-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good 2f37dd131c5d3a2eac21cd5baf80658b1b02a8ac
# bad: [2b669875332fbdff0a7ad559e8662e875e7a1526] drm/msm: Drop load/unload drm_driver ops
git bisect bad 2b669875332fbdff0a7ad559e8662e875e7a1526
# bad: [adc7f04bfda9cd0b2b4b84b5e8b72fd4c7b56d0a] drm/i915/bxt: Add HW state verification for DDI PHY and CDCLK
git bisect bad adc7f04bfda9cd0b2b4b84b5e8b72fd4c7b56d0a
# bad: [9d6c875db4ada9f91e1643b5a72d74bea043d67e] drm/i915: move sdvo mappings to vbt data
git bisect bad 9d6c875db4ada9f91e1643b5a72d74bea043d67e
# bad: [6b93e9c89ee553c1f062cd5c7dcd35e06846568f] drm/i915/bxt: fix dsi hw state pipe readout
git bisect bad 6b93e9c89ee553c1f062cd5c7dcd35e06846568f
# bad: [37f2248e3d422f0463800b574ea9288ede97cea2] drm/i915: Use DIV_ROUND_CLOSEST for PWM calculations
git bisect bad 37f2248e3d422f0463800b574ea9288ede97cea2
# bad: [f85db0590dd8601a1fc513a40dd002c3bdf56b84] drm/i915/error: Capture WA ctx batch in error state
git bisect bad f85db0590dd8601a1fc513a40dd002c3bdf56b84
# bad: [3465c580afc86c066cc55842bae38ba3cb1acb99] drm/i915: Don't pass plane+plane_state to intel_pin_and_fence_fb_obj()
git bisect bad 3465c580afc86c066cc55842bae38ba3cb1acb99
# bad: [c6a2ac712d7dee13c13e44c4c4184478853dcb37] drm/i915: Execlists small cleanups and micro-optimisations
git bisect bad c6a2ac712d7dee13c13e44c4c4184478853dcb37
# bad: [c3454d575da162cd310d9b83696baefb29d10a70] drm/i915: remove left over dead code
git bisect bad c3454d575da162cd310d9b83696baefb29d10a70
# bad: [ed4a6a7ca853253f9b86f3005d76345482a71283] drm/i915: Add two-stage ILK-style watermark programming (v11)
git bisect bad ed4a6a7ca853253f9b86f3005d76345482a71283
# first bad commit: [ed4a6a7ca853253f9b86f3005d76345482a71283] drm/i915: Add two-stage ILK-style watermark programming (v11)


I can confirm that the commit before this (5790ff742b1feee62f60a95f4caf78827f656f58) does reach pc7 state. I can't check battery life because this is during the time that the battery does not appear. But the battery does reappear again by the time we get to v4.7 so I don't think this is a major problem.

Hope this helps!
Comment 59 Zhang Rui 2017-01-09 09:21:52 UTC
To make sure it is the graphics that breaks package cstate, please
1. confirm the problem does not exist with kernel "git checkout ed4a6a7ca853253f9b86f3005d76345482a71283~", which I think you have done in the previous comment.
2. confirm the problem does exist with kernel "git checkout ed4a6a7ca853253f9b86f3005d76345482a71283~"
Comment 60 David Purton 2017-01-10 01:25:43 UTC
(In reply to Zhang Rui from comment #59)
> To make sure it is the graphics that breaks package cstate, please
> 1. confirm the problem does not exist with kernel "git checkout
> ed4a6a7ca853253f9b86f3005d76345482a71283~", which I think you have done in
> the previous comment.
> 2. confirm the problem does exist with kernel "git checkout
> ed4a6a7ca853253f9b86f3005d76345482a71283~"

git checkout ed4a6a7ca853253f9b86f3005d76345482a71283~ works.

git checkout ed4a6a7ca853253f9b86f3005d76345482a71283 (I assume you didn't mean to put the tilde in both times) does *not* work.


I built both versions clean in this way under 4.4.0-1-amd64 #1 SMP Debian 4.4.6-1:

$ make distclean
$ make localmodconfig (can upload .config if you need it)
$ scripts/config --disable DEBUG_INFO
$ make deb-pkg

I checked the package cstate using powertop.


Other notes:

gcc version 6.2.1 20161124 (Debian 6.2.1-5)

I had to modify the Makefile to work around build issues with gcc 6. I added "$(call cc-option,-fno-PIE)" to KBUILD_CFLAGS and KBUILD_AFLAGS. This was taken from more recent versions of the Makefile.

I use two other kernel modules not in the vanilla kernel:
    - broadcom-sta 6.30.223.271 (Wireless driver)
    - mba6x_bl 20161219.b96aafd (Backlight driver)
These don't seem to alter things whether they are loaded or not.
Comment 61 Zhang Rui 2017-01-10 02:35:56 UTC
(In reply to David Purton from comment #60)
> git checkout ed4a6a7ca853253f9b86f3005d76345482a71283~ works.
> 
> git checkout ed4a6a7ca853253f9b86f3005d76345482a71283 does *not* work.
> 
This confirms that the problem is introduced by commit ed4a6a7ca853253f9b86f3005d76345482a71283.

For this graphics issue, please file a bug in freedesktop.org instead, to get help from the graphics experts.
And please attach the freedesltop bug URL in this bug report.

This bug report is closed.
Comment 62 David Purton 2017-01-10 05:01:33 UTC
Thanks everyone.

Bug reported at https://bugs.freedesktop.org/show_bug.cgi?id=99340