Bug 92111
Description
Attila
2015-01-26 21:51:40 UTC
Just to mention, the values above are idle values. The same difference occurs when full load. please attach the turbostat output on both kernels (3.17.7 and 3.18.3/3.19-rc6). Created attachment 164851 [details]
Output of Turbostat on Kernel series 3.16
Created attachment 164861 [details]
Output of Turbostat on Kernel series 3.18
Created attachment 164871 [details]
Output of dmesg on Kernel series 3.16
Created attachment 164881 [details]
Output of dmesg on Kernel series 3.18
Created attachment 164891 [details]
Output of lsmod on Kernel series 3.16
Created attachment 164901 [details]
Output of lsmod on Kernel series 3.18
Created attachment 164911 [details]
Output of ps -A on Kernel series 3.16
Created attachment 164921 [details]
Output of ps -A on Kernel series 3.18
Created attachment 164931 [details]
Output of lspci on Kernel series 3.16
Created attachment 164941 [details]
Output of lspci on Kernel series 3.18
Created attachment 164951 [details]
Output of lsusb on Kernel series 3.18
I added some other useful info as well. Just as a note, when running turbostat, power consumption (by powertop battery drain calculation) was 13.5W on Kernel 3.16 and 15.8W on Kernel 3.18. Please note if you need any more info, or if I should try anything ! Any idea ? What could I try to get the cause identified ? Thanks Looks like it is related to the residency change in Pkg%pc6: in v3.16, it stays in Pkg%pc6 63% when idle while in v3.18, it stays in Pkg%pc3 instead: v3.18.3: Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt GFXWatt - - 3 0.11 2219 1995 0 0.17 0.00 1.06 98.66 39 41 40.07 58.79 0.00 0.00 3.34 0.01 0.00 v3.16: Core CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt GFXWatt - - 2 0.19 908 1995 0 0.30 0.01 0.01 99.48 37 42 34.64 0.44 63.05 0.00 2.84 0.01 0.00 Not sure what caused this, can you do a bisect? Hi Aaron ! Thanks for looking into it. I can compile a kernel, but never did a bisect. Can you suggest a tutorial or something what is the best way to do ? I guess, I should try to compile different git revisions between 3.17 and 3.18RC1. That is a LOT of commits :-) Any idea where to look for the culprit ? Thanks in advanced ! (In reply to Attila from comment #17) > Hi Aaron ! > > Thanks for looking into it. I can compile a kernel, but never did a bisect. > Can you suggest a tutorial or something what is the best way to do ? Take a look at this: http://git-scm.com/docs/git-bisect > > I guess, I should try to compile different git revisions between 3.17 and > 3.18RC1. That is a LOT of commits :-) Any idea where to look for the culprit > ? v3.16 is OK and v3.18 is bad. I think you can first test v3.17. If v3.17 is good, than bisect between v3.17..v3.18; if v3.17 is bad, then bisect between v3.16..v3.17. Anyway, it will take a long time. Let's see if Len Brown has any better idea here: Len, Attila's system stays less time in Pkg%pc6 state and that caused some more power consumption. Do you have any idea what might be the cause? (In reply to Aaron Lu from comment #18) > > Len, > Attila's system stays less time in Pkg%pc6 state and that caused some more > power consumption. Do you have any idea what might be the cause? The only reason I can see at the moment is that in the newer kernels the governor's next event time predictions are consistently less than the state's wakeup latency. That may result from one of two things: Either the governor's prediction algorithm has changed or the states definitions have changed to that effect. What combination of the cpuidle driver/governor is used? intel_idle/menu or something else? I have seen the same power regression. But for me it is due to a firmware update thet were installed as a part of the lates os x update. I have a enabled gpe that triggers alot... It is visible if you run: grep . -r /sys/firmware/acpi/interrupts/ for me it is gpe06 others has reported gpe66 /sys/firmware/acpi/interrupts/gpe06: 871215 enabled (In reply to Johan Olby from comment #20) > I have seen the same power regression. But for me it is due to a firmware > update thet were installed as a part of the lates os x update. > for me it is gpe06 others has reported gpe66 > /sys/firmware/acpi/interrupts/gpe06: 871215 enabled Thanks for the info. I have the line: /sys/firmware/acpi/interrupts/gpe06: 8 enabled The other enabled interrupt was: /sys/firmware/acpi/interrupts/gpe17: 14699 enabled I have not really updated OSX lately. How can I know I have the same problem ? I get a couple of million wakeups due to the gpe handler within a minute after system start. So our issues is not the same if the gpe17 with 14699 events is the highest number. Running: perf top Shows acpi methods with high overhead until i disable the gpe. (In reply to Johan Olby from comment #22) > I get a couple of million wakeups due to the gpe handler within a minute > after system start. So our issues is not the same if the gpe17 with 14699 > events is the highest number. Correct. My issue is different. I tried to disable this interrupts, but nothing changed in power consumption. Something changed between 3.17 and 3.18rc1 that prevens my cpu going into pc6. (In reply to Rafael J. Wysocki from comment #19) > What combination of the cpuidle driver/governor is used? intel_idle/menu or > something else? cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver intel_pstate cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor powersave I have not changed the default values coming with Ubuntu 15.04. I was talking about cpuidle, not about cpufreq. Also it is correct that the difference may be due to the increased number of wakeups in which case the processor will wake up from idle to often for the governor to even consider C6 (which then autoporomotes to PC6 in the right conditions). It looks like bisection would be the most straightforward way to determine what change caused that. Before that, have you tried 3.19-rc7 to see if the issue is still there? (In reply to Rafael J. Wysocki from comment #25) > I was talking about cpuidle, not about cpufreq. > > Also it is correct that the difference may be due to the increased number of > wakeups in which case the processor will wake up from idle to often for the > governor to even consider C6 (which then autoporomotes to PC6 in the right > conditions). > > It looks like bisection would be the most straightforward way to determine > what change caused that. > > Before that, have you tried 3.19-rc7 to see if the issue is still there? Tried 3.19-rc7 with the same results. I checked cpuidle. I am using intel_idle/menu. What I don't understand (sorry for being noob here) is by turbostat output cpu spends most of the time in C7 state, but it NEVER gets to PC6: CPU%c1 CPU%c3 CPU%c6 CPU%c7 Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt 0.17 0.00 1.06 98.66 40.07 58.79 0.00 0.00 3.34 Isn't that odd ? It looks like it never goes to PC7, even with 3.16, though. Can you please run powertop with 3.17 and 3.18 and see if there are any obvious differences in the numbers of wakeups reported by it? (In reply to Rafael J. Wysocki from comment #27) > Can you please run powertop with 3.17 and 3.18 and see if there are any > obvious differences in the numbers of wakeups reported by it? Done it. No difference in wakeups/s. In fact 3.18 had less (25-35) compared to 3.17's 50-60. I started the bisect. At half time between v3.17 and v3.18-rc1 (git rev: 35a9ad8) it is fine. Attila, any update? (In reply to Zhang Rui from comment #29) > Attila, any update? I am slowly aproaching. 8 bisect steps to go. I can provide the info tonight in ca. 16 hours from now, when I get home and finish the bisect. Still a few steps to go, but what is already sure that the problem starts with "Merge tag 'pm+acpi-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm " https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=b528392669415dc1e53a047215e5ad6c2de879fc I am going forward to narrow it down to one commit out of 105 in this pull. Strange result, but git bisect shows this commit to be the first bad: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7bc5a2bad0b8d9d1ac9f7b8b33150e4ddf197334 I guess this acpi change results that CPU is not allowed to go deeper states than pc3. I will do a build of 3.18 reverting this commit to make sure this was the problem. I will enclose a bisect log soon. Created attachment 166431 [details]
Git bisect log
Not as strange as it seems. Does it help if you comment out the if (!strcmp("Darwin", interface)) { /* * Apple firmware will behave poorly if it receives positive * answers to "Darwin" and any other OS. Respond positively * to Darwin and then disable all other vendor strings. */ acpi_update_interfaces(ACPI_DISABLE_ALL_VENDOR_STRINGS); supported = ACPI_UINT32_MAX; } block in acpi_osi_handler() (drivers/acpi/osl.c) in 3.19? Or try to pass acpi_os="!Darwin" to the kernel in the command line. That should have the same effect as the above. (In reply to Rafael J. Wysocki from comment #35) > Or try to pass acpi_os="!Darwin" to the kernel in the command line. That > should have the same effect as the above. Argh, typo. acpi_osi="!Darwin" is the correct one, sorry. Well. Reverting the whole commit works, but just putting acpi_os="!Darwin" in the boot parameters does NOT! Maybe this is not the problematic part of that commit. There is also this part: + * Apple always return failure on _OSC calls when _OSI("Darwin") has + * been called successfully. We know the feature set supported by the + * platform, so avoid calling _OSC at all + */ + + if (dmi_match(DMI_SYS_VENDOR, "Apple Inc.")) { + root->osc_control_set = ~OSC_PCI_EXPRESS_PME_CONTROL; + decode_osc_control(root, "OS assumes control of", + root->osc_control_set); + return; + } + + /* I am doing a build checking which part of the commit is causeing the problem. (In reply to Attila from comment #37) > Well. Reverting the whole commit works, but just putting acpi_os="!Darwin" > in the boot parameters does NOT! Please see comment #36. That should be acpi_osi="!Darwin" (the second "i" in the option name being essential). I did further tests. Results: Changes in -drivers/acpi/pci_root.c: unchanged, -drivers/acpi/osl.c: reverted Result: BUG Persists Changes in -drivers/acpi/pci_root.c: reverted, -drivers/acpi/osl.c: unchanged Result: BUG Solved Changes in -drivers/acpi/pci_root.c: unchanged, -drivers/acpi/osl.c: unchanged Result: BUG Persists Changes in -drivers/acpi/pci_root.c: reverted, -drivers/acpi/osl.c: reverted Result: BUG Solved So indeed the problem is when this if statement code runs: if (!strcmp("Darwin", interface)) { /* * Apple firmware will behave poorly if it receives positive * answers to "Darwin" and any other OS. Respond positively * to Darwin and then disable all other vendor strings. */ acpi_update_interfaces(ACPI_DISABLE_ALL_VENDOR_STRINGS); supported = ACPI_UINT32_MAX; } I was unable to pass the Kernel parameter correctly. I tried all kinds of escaping without a luck. Any idea how to put this into /etc/default/grub ? I tried: GRUB_CMDLINE_LINUX='acpi_osi=!Darwin' GRUB_CMDLINE_LINUX='acpi_osi="!Darwin"' GRUB_CMDLINE_LINUX='acpi_osi=\"!Darwin\"' GRUB_CMDLINE_LINUX='acpi_osi=\\"!Darwin\\"' GRUB_CMDLINE_LINUX='acpi_osi=\\\"!Darwin\\\"' dmesg |grep OSI shows that I "ADDED" \"!Darwin\" to the system :-) I will try to put it in manually when I get home, but no way I can do it over ssh so far. (In reply to Rafael J. Wysocki from comment #38) > Please see comment #36. That should be > > acpi_osi="!Darwin" > > (the second "i" in the option name being essential). Now I am officially stuck with removing Darwin with kernel parameter. I tried EVERYTHING already. I can remove "Windows 2006" with grub line: GRUB_CMDLINE_LINUX='acpi_osi="!Windows 2006"' cat /proc/cmdline outputs: BOOT_IMAGE=/boot/vmlinuz-3.19.0-git root=UUID=5185e8da-7372-4a38-afd4-65cdcb2ac09f ro "acpi_osi=!Windows 2006" quiet splash vt.handoff=7 dmesg |grep OSI outputs: [ 0.321788] ACPI: Added _OSI(Module Device) [ 0.321790] ACPI: Added _OSI(Processor Device) [ 0.321792] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.321793] ACPI: Added _OSI(Processor Aggregator Device) [ 0.321794] ACPI: Deleted _OSI(Windows 2006) [ 0.329275] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored But when I try it with !Darwin it NEVER works. Grub line: GRUB_CMDLINE_LINUX='acpi_osi="!Darwin"' cat /proc/cmdline outputs: BOOT_IMAGE=/boot/vmlinuz-3.19.0-git root=UUID=5185e8da-7372-4a38-afd4-65cdcb2ac09f ro acpi_osi=!Darwin quiet splash vt.handoff=7 dmesg |grep OSI outputs: [ 0.302015] ACPI: Added _OSI(Module Device) [ 0.302017] ACPI: Added _OSI(Processor Device) [ 0.302019] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.302020] ACPI: Added _OSI(Processor Aggregator Device) [ 0.309294] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored Darwin is not deleted and CPU states reach only up to pc3. The problematic code gets executed by the extra log I made there. I even tried it manually typing in while booting with all possible escaping without any luck.The only line that works is GRUB_CMDLINE_LINUX='acpi_osi=' This way I get with dmesg |grep OSI [ 0.000000] ACPI: _OSI method disabled [ 0.302141] ACPI: Added _OSI(Module Device) [ 0.302143] ACPI: Added _OSI(Processor Device) [ 0.302144] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.302146] ACPI: Added _OSI(Processor Aggregator Device) And CPU reaches up to PC6 states. But I think it is not a good workaround to remove all OSI strings. Any suggestions ? Following from previous post, I discovered why acpi_osi="!Darwin" parameter does not work. It is because it has never been defined in the default supported interfaces: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/acpica/utosi.c?id=refs/tags/v3.19#n87 So the function acpi_remove_interface(str) here https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/osl.c?id=refs/tags/v3.19#n1523 ALWAYS returns a failure. So currently there is no way to explicitly remove Darwin only. I guess the right way would be to define Darwin as supported interface, so it can be enabled or disabled later and check that state with the problematic if statement. So to summarize for other users, who also faced by this bug: - Affected hardware: Intel CPU based Apple products - Situation before 3.18: Apple thunderbolt was broken with default kernel parameters, but power consumption was a lot better due to CPU can enter into better power-saving states. - Situation after 3.18: Apple thunderbolt is working with default kernel parameters, but we have increased power consumption, due to CPU only entering a less power saving state. Temporary workaround: disable the ability of the kernel to report "Darwin" to Apple firmware, by putting 'acpi_osi=' into the kernel command line as a parameter. For most common distributions, for this you have to put this line into the default grub configuration file (on Ubuntu it is located at /etc/default/grub) GRUB_CMDLINE_LINUX='acpi_osi=' Note that with this workaround you will have broken thunderbolt again. Hopefully kernel devs will have a solution in the future to have both thunderbolt and the best possible power consumption together with default kernel paramaters. Created attachment 171871 [details]
turbostat utility
please show the output from the attached turbostat utility
# turbostat --debug sleep 10 2>&1 | tee ts.out
it will show what C-states are enabled in hardware by the BIOS
eg. for another system, pc6 is enabled:
cpu0: MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x00008403 (locked: pkg-cstate-limit=3: pc6)
Created attachment 172201 [details]
turbostat output with "acpi_osi=" in kernel command line
Created attachment 172211 [details]
turbostat output with NO "acpi_osi=" in kernel command line
Added output of turbostat with both cases. The only noticable difference (besides the different power saving states) Is this line: cpu0: MSR_IA32_POWER_CTL: 0x0004005f (C1E auto-promotion: ENabled) It is DISabled when we have the increased power consumption. I think this is important.
The high power case gets only into pc3,
while the low power case gets into pc6.
This may be related to enabling thunderbolt.
powertop
may help debug that.
you can show what that utility sees with "powertop --html"
I can't explain why C1E promotion would not be disabled
for both cases, as "dmesg | grep idle" shows that you
are running the intel_idle driver in both cases.
That is a mystery, but compared to PC3 vs PC6, it shouldn't
be material to power savings when profoundly idle.
A bigger factor may be this difference:
< cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000006 (balanced)
> cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000000 (performance)
I can't explain why they would be different in your different
scenarios. If the platform boots with EPB 0, the kernel sets it to 6.
I don't know why this didn't happen, or why it got un-done
in the high power case.
Please run
# x86_energy_perf_policy -v normal
and re-measure.
If your distro doesn't have this utility, then you can
get it from the Linux kernel source tree:
tools/power/x86/x86_energy_perf_policy/
This bug is confirmed also on Macbook Air 6,2. Adding "acpi_osi=" to the kernel cmdline helps. On the core i7: the low power case get to pc7. the high power case only get to pc3. I bisected the kernel and got to the same commit: 7bc5a2bad0b8d9d1ac9f7b8b33150e4ddf197334 is the first bad commit. Also reported here: https://askubuntu.com/questions/617413/why-ubuntu-15-04-with-kernel-3-19-0-15-is-using-significantly-more-power (In reply to Len Brown from comment #47) > That is a mystery, but compared to PC3 vs PC6, it shouldn't > be material to power savings when profoundly idle. In Nir's case, there was actually much more load in the "profoundly idle" case. He has 4 CPUs and it was always CPU3 that was busier, and it never had a long "duration" (time between executions of the intel_pstate driver). Something I have never seen before on any other "profoundly idle" system. That being said, I don't have a similar trace for the "good" kernel for a complete comparison. Nir, thanks for confirming that 7bc5a2bad0b8d9d1ac9f7b8b33150e4ddf197334 "ACPI: Support _OSI("Darwin") correctly" causes this regression. Atilla, Thanks for exposing that the same commit hard-coded an OSI string, but doesn't honor acpi_osi=!Darwin and does not honor acpi_osi=!* and that disabling OSI capability completely with acpi_osi= is necessary to restore the power level on the system. Nir, Can you show turbostat --debug output with and without acpi_osi= to see if the ENERGY_PERF_BIAS is also an issue the macbook air? If yes, please run the test requested comment #47 to see if manually changing ENERGY_PERF_BIAS helps. Beware that the BIOS may have a different default for this MSR depending on if the system is on AC or DC power. Also, for both systems, please attache the output from acpidump, so we can see what _OSI("Darwin") actually does. The original cause for the regression was to power-on Thunderbolt -- it is also possible that we simply have to choose between enabling thunderbolt and power saving. Created attachment 177061 [details]
turbostat output, 3 scenarios
Comment on attachment 177061 [details]
turbostat output, 3 scenarios
Len,
Please see attached info, I hope this helps. On my end, as far as battery life goes, newer kernels are much worse, this bug is only one cause, and there are other which I could not yet figure out. The first situation in the attachment is the best case so far.
This is interesting: cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x0000000f (custom) Nir: It might be worth running "powertop --html" as per Len's comment 47 anyhow. Doug: The MSR_IA32_ENERGY_PERF_BIAS: 0x0000000f is set after one run `x86_energy_perf_policy powersave` I'm now using 'normal' and see little different. I did run `powertop --html`, and will run it again. Is there any particular information you'd like to see, or just attach all of it, for the 3 scenarios? FYI, manually changing MSR_IA32_ENERGY_PERF_BIAS using `x86_energy_perf_policy normal` does not help when "acpi_osi=" isn't present in the cmdline. (In reply to Nir from comment #54) > I did run `powertop --html`, and will run it again. Is there any particular > information you'd like to see, or just attach all of it, for the 3 scenarios? I didn't see the data from your previous run, I must have missed it somewhere. I do not know what I'd like to see, until I see it. It is juts that I was unable to determine (both on this bug report and another) what is going on from the trace data. So Len's suggestion see,ed like a good one. Yes all 3 scenarios. > > FYI, manually changing MSR_IA32_ENERGY_PERF_BIAS using > `x86_energy_perf_policy normal` does not help when "acpi_osi=" isn't present > in the cmdline. The effect seems rather subtle in powersave mode and there is no effect in performance mode. I'll attach a graph in a moment. Created attachment 177501 [details]
Compare various settings of energy bias
just comparing response verses some settings for energy perf bias. Ignore the "doug 0 6" line, as it is related to my proposed patch set.
Doug, how is the graph in comment 56 related to this issue? It has lots of ink on it, but I don't understand that that ink means. > cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x0000000f (custom)
BTW. this is a typo in turbostat, it should print "(powersave)".
I will fix that.
(In reply to Len Brown from comment #57) > Doug, how is the graph in comment 56 related to this issue? > It has lots of ink on it, but I don't understand that that ink means. Len: The short answer is that I don't know that the graph is related to this issue. The longer answer is that on this, and other, bug reports I have been noticing that the state of MSR_IA32_ENERGY_PERF_BIAS seems inconsistent and was, perhaps, a reason for some of the increased power consumptions being observed. I did the graph, just to know what we really get for the various settings. Subsequently, I did another graph using fixed work packet mode (I'll post it in a moment), instead of fixed load mode. Conclusion: The state of MSR_IA32_ENERGY_PERF_BIAS seems, to me at least, to be a red herring, as in the end it makes no difference. Created attachment 177961 [details]
shows differences (or lack thereof) in response to various energy bias
This graph uses fixed work packets, which is more representative of real world scenarios. In the end, there is virtually no difference, and the conclusion is that the perf bias register state isn't a contributor.
Poking through /sys/firmware/acpi produces the following diff: diff -Nur nodarwin/acpi/interrupts/gpe4E darwin/acpi/interrupts/gpe4E --- nodarwin/acpi/interrupts/gpe4E 2015-05-27 16:49:00.907552831 +0200 +++ darwin/acpi/interrupts/gpe4E 2015-05-27 16:51:40.724663610 +0200 @@ -1 +1 @@ - 1237 enabled + 1712 enabled diff -Nur nodarwin/acpi/interrupts/gpe_all darwin/acpi/interrupts/gpe_all --- nodarwin/acpi/interrupts/gpe_all 2015-05-27 16:49:00.911552972 +0200 +++ darwin/acpi/interrupts/gpe_all 2015-05-27 16:51:40.724663610 +0200 @@ -1 +1 @@ - 1239 + 1714 diff -Nur nodarwin/acpi/interrupts/sci darwin/acpi/interrupts/sci --- nodarwin/acpi/interrupts/sci 2015-05-27 16:49:00.911552972 +0200 +++ darwin/acpi/interrupts/sci 2015-05-27 16:51:40.724663610 +0200 @@ -1 +1 @@ - 1239 + 1714 diff -Nur nodarwin/acpi/interrupts/sci_not darwin/acpi/interrupts/sci_not --- nodarwin/acpi/interrupts/sci_not 2015-05-27 16:49:00.911552972 +0200 +++ darwin/acpi/interrupts/sci_not 2015-05-27 16:51:40.724663610 +0200 @@ -1 +1 @@ - 2 + 0 for starts, i'll write a patch to support acpi_osi=!darwin as mjg's original patch did not correctly support that. My two cents : 3.17.6-1 > Decent battery life out of the box > Thunderbolt working, hotplug working, no HDMI sound 4.1.4 > Decent battery life using acpi_osi= > Thunderbolt, hotplug and HDMI sound all working - even with acpi_osi= Some more info : * Last Macbook Air (early 2015) : Apple Inc. 1.0 MacBookAir7,1 * BIOS : MBA71.88Z.0166.B02.1503241251 * Latest Arch Kernel : Linux 4.1.4-1-ARCH Hi, I have not tried older kernels (before 3.18) on my MacBook Pro 11,3 but in my opinion the problem is that now when Thunderbolt is enabled its ASPM is in disabled state. I have also had no package C6 state residency in recent kernels (without using acpi_osi=) but I have tried enabling ASPM with the following two commands. setpci -s 06:00.0 0xd0.B=0x43 setpci -s 07:06.0 0xd0.B=0x43 After issuing those two commands my C6 residency came back to 70-80% on battery and now I get 7-9W usage in idle (gnome-terminal with NVIDIA card powered off with gmux commands). MacBook Pro (mid-2014) 11,3 Linux 4.2.2-1-ARCH #1 SMP PREEMPT Tue Sep 29 22:21:33 CEST 2015 x86_64 GNU/Linux addition cmdline params: elevator=noop i915.enable_fbc=1 i915.lvds_downclock=1 lspci excerpt regarding Thunderbolt devices: 06:00.0 PCI bridge: Intel Corporation Device 156d 07:00.0 PCI bridge: Intel Corporation Device 156d 07:03.0 PCI bridge: Intel Corporation Device 156d 07:04.0 PCI bridge: Intel Corporation Device 156d 07:05.0 PCI bridge: Intel Corporation Device 156d 07:06.0 PCI bridge: Intel Corporation Device 156d Hi, I too noticed the problem on my MacBookPro11,3, I also noticed that the laptop would resume from suspend by itself at regular interval and very quickly, then goes back to sleep. Haven't had the occasion to look into it (using analyze_suspend.py, crash when it tries to close the fd, have to give it another go), but I tried enabling ASPM using the command given by Piotr and left my laptop in suspend for the night. I lost 3% of battery (I don't have value in mA so I don't know if it means much). Piotr: How are you able to check the state residency of your CPU ? And how did you measure the consumption ? A wattmeter ? Linux 4.2.3-mbpr #1 SMP Sat Oct 10 12:44:33 CEST 2015 x86_64 GNU/Linux cmdline: rootflags=data=writeback acpi_osi=Darwin libata.force=noncq According to Len's suggestion at #Comment 62, a patch might be needed to first fix the 'regression' that acpi_osi=!Darwin does not work problem. Hi Attila, Nir, could you please help check if the patch help drop the energe consumed if acpi_osi=!Darwin is provided? And I recently get a Mac pro, I'll try to do some investigation/reproduce the problem on this platform. Created attachment 202011 [details]
disable darwin if acpi_osi=!darwin is provided in command line
(In reply to Chen Yu from comment #66) > According to Len's suggestion at #Comment 62, a patch might be needed to > first fix the 'regression' that acpi_osi=!Darwin does not work problem. > Hi Attila, Nir, could you please help check if the patch help drop the > energe consumed if acpi_osi=!Darwin is provided? And I recently get a Mac > pro, I'll try to do some investigation/reproduce the problem on this > platform. Sorry guys, but I really don't have any free-time nowadays. Also my test environment is gone as I had to re-install the OS. Could someone else help out testing this patch ? (In reply to Attila from comment #68) > (In reply to Chen Yu from comment #66) > > According to Len's suggestion at #Comment 62, a patch might be needed to > > first fix the 'regression' that acpi_osi=!Darwin does not work problem. > > Hi Attila, Nir, could you please help check if the patch help drop the > > energe consumed if acpi_osi=!Darwin is provided? And I recently get a Mac > > pro, I'll try to do some investigation/reproduce the problem on this > > platform. > > Sorry guys, but I really don't have any free-time nowadays. Also my test > environment is gone as I had to re-install the OS. Could someone else help > out testing this patch ? Never mind, I'll make a double check on my side. BTW, with regard to your original report, did it occur when the whole system was in idle? (In reply to Chen Yu from comment #69) > Never mind, I'll make a double check on my side. BTW, with regard to your > original report, did it occur when the whole system was in idle? Yes it was in idle state, after leaving the laptop idle down for 2-3 minutes after boot-time (this was always needed at every reboot), but note that the difference was also noticeable in load mode. It would be helpful if an ACPI dump could be attached to this bug for each affected model. The real solution to this bug would be to add runtime pm to the thunderbolt driver. I have a thunderbolt branch on GitHub which adds support for more controllers and I've pretty much figured out how to add runtime pm for the 1st gen Light Ridge controller. Apple provides ACPI methods to power the controller up and down, but they're different for each controller and I would need an ACPI dump to come up with patches for other controllers: https://github.com/l1k/linux/commits/thunderbolt Created attachment 202721 [details]
acpidump
acpidump from Mac Pro 12.1, i5, 2.7GHz, memory 8G
besides here's the link for disable Darwin in commandline: https://patchwork.kernel.org/patch/8185441/ (In reply to Attila from comment #0) > I realized that when upgrading the kernel from 3.17.7 to 3.18.3 my Intel HSW > Iris Pro based Macbook pro draws 2.5W more (15.8W instead of 13.3W) energy. > That is almost 20%. Battery time is of course much less in this case. > > I tried 3.18RC1 as well and it already had the problem. 3.19RC6 still has > the problem. > > I used Ubuntu Vivid and downloaded kernels from the Ubuntu Kernel PPA. > > I even tried to boot into debug mode with most of the modules disabled. The > Power difference was less, but still 1.3W. > > For power measure I used Powertop unplugged from AC. > > Is this a known problem, or just related to hardware ? > > Thank you for the help in advanced ! > > Attila BTW, Attila, how do you measure power consumption by Powertop? can you provide your command? thanks update: There is a better solution for acpi_osi=!Darwin, regards of the patch I sent previously, and Lv is planning to take over the fix for !Darwin. update: Per Lv's suggestion, new solution for !Darwin would look like: 1. revert matthew patch 2. improve acpi_osi=Linux/Darwin, automatically do acpi_osi=! for them (meanwhile add entry in acpi_default_supported_interfaces) I've implemented runtime pm for Thunderbolt now, this should at least partially fix the power regression. Maybe we don't need to change the OSI behaviour at all if this gets merged? https://github.com/l1k/linux/commits/thunderbolt It would be great if others could test it, so far I've only tested it with the 1st gen Light Ridge controller built into my MBP, it's unclear if it works with the Cactus Ridge and Falcon Ridge built into newer machines. The version on GitHub has one minor annoyance, it runtime resumes before system suspend and before shutdown, I have fixed this in my local repo and will push it to GitHub in a bit. I had to make some changes to PCI core but I think I will be able to rework the patch to do without that. I've just posted an initial version of runtime pm for thunderbolt.ko to linux-pci, linux-acpi, linux-pm. It would be good if someone with a Cactus Ridge or Falcon Ridge controller could test this as I only have a machine with an older Light Ridge available for testing. The patches work fine on that machine, I'm seeing a 1.5 W drop in powertop once the controller is powered down. The patches can be fetched from GitHub or as a tarball, they apply cleanly to 4.5: https://github.com/l1k/linux/commits/thunderbolt_runpm_v1 http://wunner.de/thunderbolt_runpm_v1.tar.gz (In reply to Lukas Wunner from comment #78) > It would be good if someone with a Cactus Ridge or Falcon Ridge controller > could test this as I only have a machine with an older Light Ridge available > for testing. The patches work fine on that machine, I'm seeing a 1.5 W drop > in powertop once the controller is powered down. Tested on a 2013 MBA with Cactus Ridge. On bootup there's no change at all, but if I plug in the ethernet adapter and remove it, PM seems to kick in. We're talking about ~11.3W (bootup), ~13W (adapter plugged in without any connection), ~9.3W (after removing the adapter). The even more interesting part is that after plugging in the adapter again, consumption only goes up to ~10.4W instead of the initial value. Since on of the recent releases (unsure which one), setting `acpi_osi=` seems to have introduced a new regression; the battery device is no longer created in /sys/class/power_supply, so I can't tell how much power I'm using/remaining battery. This is reproducible in Linux 4.5.0. (In reply to Hugo Osvaldo Barrera from comment #80) > Since on of the recent releases (unsure which one), setting `acpi_osi=` > seems to have introduced a new regression; the battery device is no longer > created in /sys/class/power_supply, so I can't tell how much power I'm > using/remaining battery. > > This is reproducible in Linux 4.5.0. I checked the log commit for osl.c, but can not find any changes directly related to your issue. Could you please help do a bisect for us? thx (In reply to Lukas Wunner from comment #77) > I've implemented runtime pm for Thunderbolt now, this should at least > partially fix the power regression. Maybe we don't need to change the OSI > behaviour at all if this gets merged? > > https://github.com/l1k/linux/commits/thunderbolt IMO, Matthew's commit breaks acpi_osi= behavior, thus need to be reverted. It's not related to the gap. And the same functionality should be achieved in a different way. Thanks and best regards -Lv @Lv Zheng: I'm fine with that as long as the default behaviour on Macs is to masquerade as Darwin (as it is now). Otherwise the Thunderbolt controller isn't accessible at all. (It's powered down on boot if the OS isn't Darwin.) (In reply to Lukas Wunner from comment #83) > @Lv Zheng: I'm fine with that as long as the default behaviour on Macs is to > masquerade as Darwin (as it is now). Otherwise the Thunderbolt controller > isn't accessible at all. (It's powered down on boot if the OS isn't Darwin.) Yes, we know. The default behavior should be achieved with other means. While the current way breaks things. IMO, Linux should be able to detect if the machine is Darwin very early. Earlier than the _OSI(Darwin) or any other _OSI(xxx) is invoked. Then Linux can disable all Windows strings and enable Darwin string at that early stage. Otherwise the code breaks things a lot. It makes mess to other users. Why don't you let the user specifying acpi_osi=Darwin from command line. Users know this better than us. It is almost impossible to make Linux to pretend to be both Windows and MacOS. The choice has to be made by the user. Thanks and best regards -Lv (In reply to Lv Zheng from comment #84) > The default behavior should be achieved with other means. The following should be functionally equivalent to the current behaviour: if (dmi_match(DMI_SYS_VENDOR, "Apple Inc.") || dmi_match(DMI_SYS_VENDOR, "Apple Computer, Inc.")) { <disable all Windows strings and enable Darwin string> } I'd be fine with that. Would you be comfortable with it as well? I'm not sure exactly at which point you would like to have that called so it would probably be best if you could come up with a patch. I'll be happy to test it, feel free to cc: me. We already have a couple of similar Apple-specific quirks in drivers/acpi/. The reason we need to check for two vendor names is that Apple changed their name in 2007. I don't think they ever used anything else as vendor name in the DMI table. (In reply to Lukas Wunner from comment #85) > (In reply to Lv Zheng from comment #84) > > The default behavior should be achieved with other means. > > The following should be functionally equivalent to the current behaviour: > > if (dmi_match(DMI_SYS_VENDOR, "Apple Inc.") || > dmi_match(DMI_SYS_VENDOR, "Apple Computer, Inc.")) { > <disable all Windows strings and enable Darwin string> > } > > I'd be fine with that. Would you be comfortable with it as well? I'm not > sure exactly at which point you would like to have that called so it would > probably be best if you could come up with a patch. I'll be happy to test > it, feel free to cc: me. > > We already have a couple of similar Apple-specific quirks in drivers/acpi/. > > The reason we need to check for two vendor names is that Apple changed their > name in 2007. I don't think they ever used anything else as vendor name in > the DMI table. OK. We'll generate a patch according to your suggestion. Thanks -Lv Reassign to Lv for patch rewrite. Created attachment 213941 [details]
[PATCH 1/3] ACPI / osi: Cleanup _OSI("Linux") related code before introducing new support
Created attachment 213951 [details]
[PATCH 2/3] ACPI / osi: Fix an issue that acpi_osi=!* cannot disable ACPICA internal strings
Created attachment 213961 [details]
[PATCH 3/3] ACPI / osi: Change default _OSI(Darwin) support
Created attachment 213971 [details]
[PATCH] ACPI / osi: Change default _OSI(Darwin) support
Uploaded the wrong version.
This is a correction.
Hi, Please help to: 1. apply the following patches: attachment 213941 [details] attachment 213951 [details] attachment 213971 [details] 2. build and boot the kernel on an apple platform to see if _OSI("Darwin") != 0, and all _OSI("WindowsXXX") = 0 3. boot the kernel with acpi_osi=!Darwin on an apple platform to see if _OSI("Darwin") = 0 and all _OSI("WindowsXX") != 0 Thanks on advance. Best regards -Lv Patch at https://patchwork.kernel.org/patch/8953891/ to avoid (In reply to Chen Yu from comment #93) > Patch at > https://patchwork.kernel.org/patch/8953891/ > to avoid Patch sent to maillist to support acpi_osi=!Darwin. And for runtime pm thunderbolt, I think it should be a feature rather than a bug fix? @Lukas Wunner The patch fails to apply for on both on 4.5.2, and mainline (4.6rc5): $ cat src/linux-4.5/drivers/acpi/osl.c.rej --- drivers/acpi/osl.c +++ drivers/acpi/osl.c @@ -135,6 +135,9 @@ static struct acpi_osi_config { unsigned int linux_enable:1; unsigned int linux_dmi:1; unsigned int linux_cmdline:1; + unsigned int darwin_enable:1; + unsigned int darwin_dmi:1; + unsigned int darwin_cmdline:1; u8 default_disabling; } osi_config = {0, 0, 0, 0}; (In reply to Chen Yu from comment #94) > Patch sent to maillist to support acpi_osi=!Darwin. > And for runtime pm thunderbolt, I think it should be a feature rather than > a bug fix? @Lukas Wunner Sorry for the delay, Chen Yu & Lv Zheng, I've just saved v2 of your patches from the mailing list, will test them and report back either today or tomorrow. Thanks for your patience. I messed (In reply to Hugo Osvaldo Barrera from comment #95) > The patch fails to apply for on both on 4.5.2, and mainline (4.6rc5): > > $ cat src/linux-4.5/drivers/acpi/osl.c.rej > --- drivers/acpi/osl.c > +++ drivers/acpi/osl.c > @@ -135,6 +135,9 @@ static struct acpi_osi_config { > unsigned int linux_enable:1; > unsigned int linux_dmi:1; > unsigned int linux_cmdline:1; > + unsigned int darwin_enable:1; > + unsigned int darwin_dmi:1; > + unsigned int darwin_cmdline:1; > u8 default_disabling; > } osi_config = {0, 0, 0, 0}; Don't worry, we re-based the patches to make it stable materials. So it won't trigger so many back porting issues. http://www.spinics.net/lists/linux-acpi/msg65564.html Thanks -Lv (In reply to Lukas Wunner from comment #96) > (In reply to Chen Yu from comment #94) > > Patch sent to maillist to support acpi_osi=!Darwin. > > And for runtime pm thunderbolt, I think it should be a feature rather than > > a bug fix? @Lukas Wunner > > Sorry for the delay, Chen Yu & Lv Zheng, I've just saved v2 of your patches > from the mailing list, will test them and report back either today or > tomorrow. Thanks for your patience. It's not necessary. Yu has macbook test platform and he has confirmed the patches to be working. So linux-pm tree will ship them and you'll see them in 4.7 mainline. If you want to give the fixes a try: 1. You need to modify attachment 213971 [details] by changing "Apple INC." and "Apple Computer, INC." to "Apple Inc." and "Apple Computer, Inc.". I only did necessary unit test by faking DSDT on non Apple platforms. So since I do not have real macbook I cannot notice this mistake. 2. You also can download the latest version from the following links: https://patchwork.kernel.org/patch/8953941/ https://patchwork.kernel.org/patch/8953931/ https://patchwork.kernel.org/patch/8953921/ https://patchwork.kernel.org/patch/8953891/ For the last patch, you also need to modify "Apple INC." and "Apple Computer, INC." to "Apple Inc." and "Apple Computer, Inc.". Thanks -Lv It is not easy to confirm if Linux kernel returns _OSI(“WindowsXXX") correctly. We confirmed that using the following unit testing mechanism: Modify osl.c and add the following lines into acpi_osi_handler(): ===== if (!strcmp("Darwin", interface)) pr_info("_OSI(Darwin) - %d\n", supported); if (!strcmp("Windows 2000", interface)) pr_info("_OSI(Windows 2000) - %d\n", supported); ===== Hope the above information is usful for those want to confirm the patches. Thanks -Lv (In reply to Lv Zheng from comment #99) > It is not easy to confirm if Linux kernel returns _OSI(“WindowsXXX") > correctly. > We confirmed that using the following unit testing mechanism: > Modify osl.c and add the following lines into acpi_osi_handler(): > ===== > if (!strcmp("Darwin", interface)) > pr_info("_OSI(Darwin) - %d\n", supported); > if (!strcmp("Windows 2000", interface)) > pr_info("_OSI(Windows 2000) - %d\n", supported); Actually they are 'Windows 2009' and 'Windows 2012'on Mac pro 12. (In reply to Chen Yu from comment #100) > (In reply to Lv Zheng from comment #99) > > We confirmed that using the following unit testing mechanism: > > Modify osl.c and add the following lines into acpi_osi_handler(): > > ===== > > if (!strcmp("Darwin", interface)) > > pr_info("_OSI(Darwin) - %d\n", supported); > > if (!strcmp("Windows 2000", interface)) > > pr_info("_OSI(Windows 2000) - %d\n", supported); > > Actually they are 'Windows 2009' and 'Windows 2012'on Mac pro 12. IMO, the validators should find the queried "WindowsXXX" string from the DSDT used on his own test platforms. You can find the useful test string by the following commands: # acpidump -b # iasl -d dsdt.dat # find *.dsl | xargs grep "_OSI (\"Windows" Thanks and best regards -Lv Patch was merged by linux-pm repo and will appear 4.7 mainline. Let's close it. Thanks and best regards -Lv Hi everyone, I have a 11" MacBook Air 6,2 (early 2013). I am seeing battery life regression since kernel 4.4. Under 4.4 with acpi_osi= my idle power usage is around 3.5W. This translates to fantastic battery life lasting me all day. Things broke badly with 4.5 (I gather this was to do with thunderbolt support). I'd hoped that the patch from this bug report would improve the situation, but I'm still not able to achieve the same power usage under 4.7. With kernel 4.7 and acpi_osi= my power usage is just under 5W. All other combinations I've tried for acpi_osi (including !Darwin) give even worse results. Can anyone give me any pointers? I use debian kernels. But have confirmed with vanilla ones built from git myself. I am able to use git bisect to try and track things down. But if the problems are related to thunderbolt support being included from 4.5 I'm not sure how helpful this will be. Thanks. David Please file another bug, tracking this power regression. New bug reported: https://bugzilla.kernel.org/show_bug.cgi?id=177151 |