Bug 16362
Summary: | cpu frequency - BIOS limits frequency because of 65 Watts AC adapter if battery is removed - thinkpad | ||
---|---|---|---|
Product: | ACPI | Reporter: | timshel |
Component: | Power-Processor | Assignee: | Thomas Renninger (trenn) |
Status: | CLOSED DOCUMENTED | ||
Severity: | normal | CC: | akpm, djwong, hmh, lenb, pierre-bugzilla, ri_richter, rjw, trenn |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.35-rc4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 14885 | ||
Attachments: |
lspci -vvnn
dmidecode acpidump acpidump acpidump --addr ... output dmesg output with acpi_osi="Linux" Source code (ASL) of the instrumented LPMD function Binary code (AML) of the instrumented LPMD to push into /sys/kernel/debug/acpi/custom_method |
Description
timshel
2010-07-10 14:27:20 UTC
dmesg cpu output: [ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 [ 0.000000] ACPI: SSDT 00000000bb779000 009F1 (v01 PmRef CpuPm 00003000 INTL 20050513) [ 0.000000] ACPI: SSDT 00000000bb778000 00259 (v01 PmRef Cpu0Tst 00003000 INTL 20050513) [ 0.000000] SMP: Allowing 4 CPUs, 0 hotplug CPUs [ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:4 nr_node_ids:1 [ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff880001e00000 s84136 r8192 d22360 u524288 [ 0.000000] pcpu-alloc: s84136 r8192 d22360 u524288 alloc=1*2097152 [ 0.000000] pcpu-alloc: [0] 0 1 2 3 [ 0.001838] CPU: Physical Processor ID: 0 [ 0.001839] CPU: Processor Core ID: 0 [ 0.001846] mce: CPU supports 9 MCE banks [ 0.001854] CPU0: Thermal monitoring enabled (TM1) [ 0.185787] CPU0: Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz stepping 02 [ 0.566303] Brought up 4 CPUs [ 0.734900] ACPI: SSDT 00000000bb71aa18 004B1 (v01 PmRef Cpu0Ist 00003000 INTL 20050513) [ 0.735707] ACPI: SSDT 00000000bb718718 006B2 (v01 PmRef Cpu0Cst 00003001 INTL 20050513) [ 0.818797] cpuidle: using governor ladder [ 0.818798] cpuidle: using governor menu [ 119.423212] CPUFREQ: Per core ondemand sysfs interface is deprecated - up_threshold /sys/devices/system/cpu/cpu0/cpufreq> ls affected_cpus ondemand scaling_governor bios_limit related_cpus scaling_max_freq cpuinfo_cur_freq scaling_available_frequencies scaling_min_freq cpuinfo_max_freq scaling_available_governors scaling_setspeed cpuinfo_min_freq scaling_cur_freq cpuinfo_transition_latency scaling_driver /sys/devices/system/cpu/cpu0/cpufreq> cat * 0 1199000 cat: cpuinfo_cur_freq: Permission denied 2534000 1199000 10000 cat: ondemand: Is a directory 0 1 2 3 2534000 2533000 2399000 2266000 2133000 1999000 1866000 1733000 1599000 1466000 1333000 1199000 conservative userspace powersave ondemand performance 1199000 acpi-cpufreq ondemand 1199000 1199000 <unsupported> Created attachment 27064 [details]
lspci -vvnn
I think I have found the patch causing my issues, at least from the description. I haven't removed and tested it then but it seems obvious: commit e2f74f355e9e2914483db10c05d70e69e0b7ae04 Author: Thomas Renninger <trenn@suse.de> Date: Thu Nov 19 12:31:01 2009 +0100 [ACPI/CPUFREQ] Introduce bios_limit per cpu cpufreq sysfs interface This interface is mainly intended (and implemented) for ACPI _PPC BIOS frequency limitations, but other cpufreq drivers can also use it for similar use-cases. Why is this needed: Currently it's not obvious why cpufreq got limited. People see cpufreq/scaling_max_freq reduced, but this could have happened by: - any userspace prog writing to scaling_max_freq - thermal limitations - hardware (_PPC in ACPI case) limitiations Therefore export bios_limit (in kHz) to: - Point the user that it's the BIOS (broken or intended) which limits frequency - Export it as a sysfs interface for userspace progs. While this was a rarely used feature on laptops, there will appear more and more server implemenations providing "Green IT" features like allowing the service processor to limit the frequency. People want to know about HW/BIOS frequency limitations. All ACPI P-state driven cpufreq drivers are covered with this patch: - powernow-k8 - powernow-k7 - acpi-cpufreq Tested with a patched DSDT which limits the first two cores (_PPC returns 1) via _PPC, exposed by bios_limit: # echo 2200000 >cpu2/cpufreq/scaling_max_freq # cat cpu*/cpufreq/scaling_max_freq 2600000 2600000 2200000 2200000 # #scaling_max_freq shows general user/thermal/BIOS limitations # cat cpu*/cpufreq/bios_limit 2600000 2600000 2800000 2800000 # #bios_limit only shows the HW/BIOS limitation CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com> CC: Len Brown <lenb@kernel.org> CC: davej@codemonkey.org.uk CC: linux@dominikbrodowski.net Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com> I am using a Thinkpad so I don't think that the bios is broken and the power management options are default but I am going to adjust them. Atm I have disabled Speedstep in the bios to get the highest frequency with 2.6.34. Ok, it is a bug. According to dmidecode the highest speed is available while bios_limit shows the lowest one or is there another source used clock determination? Handle 0x0006, DMI type 4, 42 bytes Processor Information Socket Designation: None Type: Central Processor Family: Other Manufacturer: GenuineIntel ID: 52 06 02 00 FF FB EB BF Version: Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz Voltage: 1.3 V External Clock: 133 MHz Max Speed: 2530 MHz Current Speed: 2530 MHz Status: Populated, Enabled Upgrade: None L1 Cache Handle: 0x000A L2 Cache Handle: 0x000B L3 Cache Handle: 0x000C Serial Number: Not Specified Asset Tag: Not Specified Part Number: Not Specified Core Count: 2 Core Enabled: 2 Thread Count: 4 Characteristics: None Recategorised to ACPI. Commit e2f74f355e9e2914483db10c05d70e69e0b7ae04 is not the cause, it only shows the symptom. Does cat /sys/../cpu*/cpufreq/bios_limit not show max frequencies? Then _PPC returns not zero and the cpufreq limit is correctly decreased. We already had similar reports (Ingo Molnar's Lenovo) where _PPC was wrongly initialized. Still _PPC is read again unconditionally at processor driver load time, instead of waiting for an ACPI event to read _PPC limitation. Can you please attach acpidump and dmidecode. So I expect this patch breaks your machine (processor.ignore_ppc=1 should work around it): commit 455c0d71d46e86b0b7ff2c9dcfc19bc162302ee9 Author: Darrick J. Wong <djwong@us.ibm.com> Date: Thu Feb 18 10:28:20 2010 -0800 ACPI: Fix regression where _PPC is not read at boot even when ignore_ppc=0 Earlier, Ingo Molnar posted a patch to make it so that the kernel would avoid reading _PPC on his broken T60. Unfortunately, it seems that with Thomas Renninger's patch last July to eliminate _PPC evaluations when the processor driver loads, the kernel never actually reads _PPC at all! This is problematic if you happen to boot your non-T60 computer in a state where the BIOS _wants_ _PPC to be something other than zero. Thanks for your reply. I have a Thinkpad X201 and the latest Bios 1.16. Acpidump gives me some checksum errors which are also saved in the log. Created attachment 27081 [details]
dmidecode
Created attachment 27082 [details]
acpidump
I can confirm that adding processor.ignore_ppc=1 fixes the problem for me. Thanks. I seem to be experiencing this issue as well here on my Thinkpad R61. In my case the max frequency fluctuates though and isn't consistently locked at the lowest frequency. Can't say it is clear from the discussion if this is a BIOS or kernel bug. Any work upstream to fix this properly? Fluctuates at runtime or after reboots? Is: /sys/devices/system/cpu/cpu0/cpufreq/bios_limit (only available since some kernel versions, 2.6.33?) sometimes not set to scaling_max_freq for a whole reboot or does it change while running? If it's fixed for a whole boot, this may also come from the "_PPC returns an uninitialized variable, thus calling it unconditionally at processor module load time may result in bogus CPU frequency limits" problem. -> Theoretically a BIOS update may help. If it changes during runtime, better also monitor your temperature when this happens. It could be that the limitation is done on purpose. The lm-sensors package (watch -n1 sensors) should show you the temperatures of your temp sensors. acpi_listen may show processor events if the CPU frequency limit is (un-)applied. If you get limited frequency due to too high temperatures you should clean your fans first... It fluctuates at runtime. I have this report with red hat that has more info: https://bugzilla.redhat.com/show_bug.cgi?id=618149 It doesn't get that hot, and the ACPI limit info doesn't show any thermal triggers. I checked the temperature during one of the episodes and it was 65°C, which isn't terribly hot. I'll have a look at acpi_listen and see if I see anything. Please attach acpidump then (best on both bug reports or a pointer in one). Also check for BIOS updates and/or go through your BIOS options. Thinkpads often have a related BIOS option. While we want things to just work with any, it would be interesting to know the related BIOS setting. If you update the BIOS, check whether the problem persists and send acpidump after the BIOS update if it does. You could also check whether the bios limitation is at least related to the temperatures: watch -n1 "sensors;cat /sys/devices/system/cpu/cpu*/cpufreq/bios_limit" Have you registered any native thermal sensor drivers? Eventually the native ones and acpi driven ones do not like each other resulting in seldom, wrong temperature readings. (In reply to comment #14) > Please attach acpidump then (best on both bug reports or a pointer in one). > > Also check for BIOS updates and/or go through your BIOS options. Lenovo's web site has such a craptactular reliability that I haven't been able to check for updates. Seems to finally be up and I'm running the next latest version. Nothing in the changelog for the latest seems relevant here though, but I can do a test later. Will have a look at the options. > Thinkpads often have a related BIOS option. While we want things to just work > with any, it would be interesting to know the related BIOS setting. Any memory of what such an option might be called? > You could also check whether the bios limitation is at least related to the > temperatures: > watch -n1 "sensors;cat /sys/devices/system/cpu/cpu*/cpufreq/bios_limit" Running. It's been up to 66°C without throttling so far. Looked a bit later and now it is throttled at 55°C. Stopped the intensive process, and it got unthrottled after a bit again. Temp at 52°C at that point. Got throttled after a while again, with temp around 67°C. At around 50°C it got unthrottled. Not terribly consistent, but it does show some correlation. The BIOS might consider both CPU and GPU temp when deciding throttling. > Have you registered any native thermal sensor drivers? > Eventually the native ones and acpi driven ones do not like each other > resulting in seldom, wrong temperature readings. /etc/sysconfig/lm_sensors is empty at least. "sensors" gives me acpitz-virtual-0 and thinkpad-isa-0000, which have some overlap. Created attachment 27257 [details]
acpidump
BIOS settings might have been the key. I found something called "Adaptive thermal management", and the help info says something about throttling. It was currently set to "minimum noise" or something like that and I changed it to "maximum performance". I don't seem to get any throttling now, even when I exceed the temperatures previously seen. (I should also mention that I fiddled with the speedstep settings from "maximum battery" to "automatic", just for good measure.) > It was currently set to "minimum noise"
Hm, that should not be the default setting. It seems the BIOS algorithm and the corresponding kernel code works as designed and your system gets successfully limited if temp raises to about 67C and also gets unthrottled again at about 50C. This should have nothing to do with the original report where systems get wrongly limited if _PPC is initially called.
Even for this, I doubt we can do much but point to the processor.ignore_ppc=1 boot param and close this bug as documented. The BIOS bug is known and other machines (sticking to the specs) need the initial _PPC call if a limit has to be applied at boot up.
-> closing the bug "Resolved Documented"
ThinkPad people who see their system throttled as soon as the processor module gets loaded and without obvious reason:
cat /sys/devices/system/cpu/cpu0/cpufreq/bios_limit
must pass processor.ignore_ppc=1 boot parameter as a workaround.
So the final solution would be for Lenovo to fix the bios? Does anybody know the official way to inform them and does it make sense - do they normally fix something like that? Agreed that this (in my case at least) was a case of user error and the system was working entirely as expected. The only thing that could use some improvement is the user interface with a notification that the CPU is being throttled for some reason. But that's a user space issue, not a kernel one. Timeshel: Can you please execute the following commands and attach the output:
acpidump --addr 0xBB71AA18 --length 0x4B1 >/tmp/CPU0IST
acpidump --addr 0xBB718718 --length 0x6B2 >/tmp/CPU0CST
acpidump --addr 0xBB719A98 --length 0x303 >/tmp/APIST
acpidump --addr 0xBB717D98 --length 0x119 >/tmp/APCST
The _PPC func seem to be in one of the above SSDT tables which get dynamically loaded at runtime and therefore are not included in acpidump output.
> So the final solution would be for Lenovo to fix the bios?
Yep, I try to come up with an ACPI code patch, hopefully Lenovo will pick it up.
Created attachment 27266 [details]
acpidump --addr ... output
Thanks for your effort. So I don't need to contact Lenovo?
I have created the dump while using processor.ignore_ppc=1 . If this is a problem I create a new one.
I also changed the bios ac speedstep setting to max performance instead of automatic but it had no effect to bios_limit.
It's not that easy... Can you attach dmesg output, please. There is an additional SMI and EC write if _OSI("Linux") returns true. Linux does not return true anymore, but some Lenovos/ThinkPads got whitelisted. dmesg |grep Linux may reveal what is done on your system and you can toggle the behavior by following boot param: acpi_osi="Linux" -> try if _OSI("Linux") got ignored acpi_osi="!Linux" -> try if _OSI("Linux") is recognized on your machine The _PPC functions do exist and return for all CPUs \_PR.CPU0._PPC This function is in a separate SSDT table which must get loaded first. If this is not the case you should see an acpi error message in dmesg -> could be the reason... CPU0._PPC calls \_SB.PCI0.LPC.EC.LPMD() which seem not to be zero in your case. In case your power adapter is plugged LPMD returns the value at this address: acpidump --addr 0xBB6C576C --length 4 >/tmp/LPST If you are working on battery an Embedded Controller variable is checked whether it's below 90. This probably is a check whether it can still give you 90 Watts: LLess (HWAT, 0x5A) You can check this value (only makes sense when this happens on battery) by loading thinkpad_acpi driver with ecdump=1 (check modinfo thinkpad_acpi) and then do: cat /proc/acpi/ibm/ec (or similar). The one byte value at 0xC9 is HWAT. I can't read the code further, but if this problem is only on battery an HWAT is below 0x5a (or 90 decimal), you may need another battery... Created attachment 27271 [details]
dmesg output with acpi_osi="Linux"
I haven't checked if the problem appears with battery but it wouldn't really matter there because performance isn't that important while on battery.
I have added dmesg with acpi_osi="Linux".
The command acpidump --addr 0xBB6C576C --length 4 outputs only blanks while the a power adapter is connected (to the docking station). I have also switched the speedstep settings back to default since they had no effect. Maybe that is the reason.
It would be great if you could still monitor this for a while (best without ignore_ppc=1 and having a look at battery power and bios_limit). watch -n1 cat /proc/acpi/battery/*/state /sys/devices/system/cpu/cpu0/cpufreq/bios_limit (or similar) might help. The problem not showing up on AC might be an indicator that limitation only happens at (low?) battery power. Eventually we were hunting a ghost here as well and everything is working as designed. The problem appears with AC all the time but the acpidump command I have mentioned before just didn't output anything except of blanks. Oh wait, the check on HWAT < 90 also is in the "on AC power" code path. It's the last bit of byte 0x34 which you check like the HWAT variable described at the end of comment 24. If byte 0x34 of the embedded controller is bigger or equal than 128 (0x80), then also check whether the byte 0xC9 (HWAT) is below 90. At the end I pasted some parts of the relevant ACPI code for reproducing. Ok, I made up an instrumented LPMD method. There is the possibility to override single ACPI functions by feeding a sysfs file with them. It would be great if you can give it a try. How to do that is documented here: Documentation/acpi/method-customizing.txt I'll attach two files, the source code and the already compiled version. All you have to do is to (hope your kernel has ACPI debug compiled in): - increase ACPI debug level: echo 0xF >/sys/module/acpi/parameters/debug_level - Override the function: cat lpmd_binary.aml >/sys/kernel/debug/acpi/custom_method -> Watch dmesg or /var/log/messages for output. You can also listen to processor events. The whole code is only executed (and limits applied) if a processor event of type 0x80 is issued by the HW. acpi_listen tool in a separate shell shows ACPI events. Relevant code from the DSDT with some comments: -> \_SB.PCI0.LPC.EC.LPMD does not return zero if _PPC is not returning zero. If _PPC is not zero, limits get active. Method (LPMD, 0, NotSerialized) { Store (0x00, Local0) Store (0x00, Local1) Store (0x00, Local2) If (\H8DR) /* Checking OS ACPI version above 1 -> true */ { If (HPAC) /* Checking whether on AC power -> true */ { If (HPLO) /* Checking an Embedded Controller bit */ { Store (\LPST, Local0) } Else /* Doing a power source capacity check? */ { If (LLess (HWAT, 0x5A)) ... } Return(Local0) } Wait, to activate acpi debug you also have to increase the layer: echo 0xF >/sys/module/acpi/parameters/debug_level echo 0xFFFFFFFF >/sys/module/acpi/parameters/debug_layer (I forgot the latter...). There could be some noise from the thermal and battery ACPI driver. If you can still reproduce the problem without having them loaded, we get better/cleaner logs (rmmod battery;rmmod thermal). Created attachment 27291 [details]
Source code (ASL) of the instrumented LPMD function
Created attachment 27292 [details]
Binary code (AML) of the instrumented LPMD to push into /sys/kernel/debug/acpi/custom_method
Hmm, is this the original power adapter or have you bought another one? You should check whether it fulfills the requirements for the machine. Wow, ok, that might be the reason. I have an original AC adapter but it only has 65 Watt. All x201 are shipped with it afaik. It might still be a copy and paste BIOS bug for this machine. Are you able get a 90W power supply from someone to give it a test? At first many thanks for your effort and time. I mean this is only a little problem and it seems to be Lenovo's fault so ... I have tried to load thinkpad_acpi with the parameter ecdump=1 to check HWAT but I get an error invalid argument. I were able to change the acpi debug level but while trying to insert new_ssdt.aml in /sys/kernel/debug/acpi/custom_method I got the error "cat: write error: Invalid argument" (I was root). > I mean this is only a little problem It's not that little..., the machine is nearly unusable out of the box. And now that I dug that deep, I want to know the root cause... > I have tried to load thinkpad_acpi with the parameter ecdump=1 Sorry, I should have read the documentation (Documentation/laptops/thinkpad-acpi.txt in the kernel sources): To use this feature, you need to supply the experimental=1 parameter when loading the module. > but while trying to insert new_ssdt.aml .. I got the error..: Yep, this is a debug feature not heavily tested. The func has quite some external symbols, possibly I got it wrong somewhere in the test ssdt. It was worth a try, but it's not worth looking at why it did not work. Experimental=1 works but ecdump=1 not even when supplied with the experimental parameter. The notebook was in the docking station and had no battery inserted while running the command. PPC was ignored. Here is the output: # modprobe thinkpad-acpi experimental=1 # cat /proc/acpi/ibm/ecdump EC +00 +01 +02 +03 +04 +05 +06 +07 +08 +09 +0a +0b +0c +0d +0e +0f EC 0x00: *a7 *05 *a1 *e2 00 *86 00 00 00 *02 *47 00 00 00 *80 00 EC 0x10: 00 00 *ff *ff *f4 *3c *07 *89 *7b *ff *83 *f1 *ff *ff *2d 00 EC 0x20: 00 00 00 00 00 00 00 *cf 00 00 00 00 00 00 00 *80 EC 0x30: *07 *80 *02 00 *30 *04 00 00 00 00 *20 *10 00 *50 00 00 EC 0x40: 00 00 00 00 00 00 *10 *04 *61 *04 00 00 00 00 00 00 EC 0x50: 00 *80 *02 *19 *da *07 *07 *1e *0a *2b *07 *05 *05 *d0 *07 *10 EC 0x60: *0e *d8 *0e *d2 *0f *c6 *11 00 00 00 00 00 00 00 00 00 EC 0x70: 00 00 00 00 00 00 00 00 *3c 00 00 00 00 00 00 00 EC 0x80: 00 00 *05 *06 *f2 *0d *3b *0c 00 00 00 00 00 00 *2b 00 EC 0x90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 EC 0xa0: 00 00 00 00 *ff *ff 00 00 00 00 00 00 *ff *ff 00 00 EC 0xb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 EC 0xc0: 00 00 00 00 00 00 00 00 *11 *41 00 00 *01 *35 00 00 EC 0xd0: *07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 EC 0xe0: 00 00 00 00 00 00 00 00 *10 *90 *c2 *12 *e4 *2e *44 *03 EC 0xf0: *36 *51 *48 *54 *32 *38 *57 *57 *14 *89 *53 *50 00 00 *10 00 0xC9 is 0x41 == 65 Watts. Hmm, strange it's marked with a *, that would mean the value changed. Could it be that you removed the battery at runtime and the value is another one with the battery inserted? This would match with Lenovo's explanation that they assume that people will never remove the battery from their system and this only happens with all batteries explicitly removed and having a power adapter (even it's the original sold one matching the system) with less than 90W. -> another ppc limitation mystery less... Yes, I always remove my battery to increase its lifetime. It worked for me with previous Sony Notebooks in which case the battery was still fine after four years. I am going to check it soon. I haven't removed the battery during runtime but it is loaded atm with the docking station. Maybe that's the reason for the *. Nevertheless it is a bit weird to have a battery inserted all the time to support the ac adapter. This also would mean that the battery could be unloaded partly during runtime with ac adapter which could really decrease their lifetime enormously. Afaik it isn't important how long it is charged, every charging activity counts even it is only for some seconds. So people are supposed to buy another 90 W ac to prevent this behavior without the kernel parameter? Also this might explain why I am the only bug reporter. I can confirm that the bios_limit shows the correct value as long as the battery is inserted and the ac adapter connected. And I can also confirm bios_limit shows the correct value if the battery is removed but an 90W ac adapter is used. So it is definitely no software bug. I'm facing the same problem since maybe Kernel 2.6.33, too. I have a Dell Inspiron 6400 Laptop with an Intel Core 2 Duo T7200 @ 2 GHz which is stuck at 1GHz all the time on both cores. The last few Comments made me throw in my experience, because I don't have the battery inserted. My battery isn't functioning anymore so I removed it. I'm using OpenSuse 11.3 with Kernel 2.6.34 at the moment, but with Arch Linux (Kernel 2.6.34 and I think Kernel 2.6.33) , Ubuntu and Fedora the problem was the same. cat /sys/devices/system/cpu/cpu0/cpufreq/bios_limit 1000000 (For me) This seems to be the figure that limits frequency scaling. My bios is up to date. I'm leaving other outputs out for now, because I don't know which would provide new information. > My battery isn't functioning anymore so I removed it. Can you add it, so that at least the BIOS/ACPI parts think it's plugged. Also search for a related BIOS option if it does not help. Is the limit always active or only sometimes (after some time, e.g. temperature related?). Hmm, looking at the previous problems, the kernel code seem to work as expected now. Sorry, but I won't spend much time on this anymore unless you have a strong hint that there really might be a kernel and no BIOS or "broken by design" issue. Please go through the bug and this document: ftp://ftp.suse.com/pub/people/trenn/ACPI_BIOS_on_Linux_guide/Frequency_is_limited_how_to.txt and try harder to find out why the BIOS limits the frequency. |