Bug 8255 - Frequency Scaling not working properly using powernow-k7
Summary: Frequency Scaling not working properly using powernow-k7
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: cpufreq
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-24 11:11 UTC by Dustin Surawicz
Modified: 2007-06-29 16:19 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.19 and newer (up to 2.6.21-rc)
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg.log with debug output (164 bytes, text/html)
2007-03-24 11:16 UTC, Dustin Surawicz
Details
kernel configuration I used for 2.26.21-rc3 to reproduce the problem (48.17 KB, text/plain)
2007-03-24 11:18 UTC, Dustin Surawicz
Details
diagnostic patch (2.31 KB, patch)
2007-04-26 18:52 UTC, Daniel Drake
Details | Diff
dmesg output after applying Daniel's patch (30.24 KB, text/plain)
2007-04-27 04:13 UTC, Dustin Surawicz
Details
Output of acpidump (62.88 KB, text/plain)
2007-04-27 04:14 UTC, Dustin Surawicz
Details
dmesg output after applying Daniel's patch (29.89 KB, text/plain)
2007-04-27 04:53 UTC, Dustin Surawicz
Details
new patch (2.33 KB, text/plain)
2007-04-27 06:11 UTC, Daniel Drake
Details
dmesg output after applying 2nd patch (19.92 KB, text/plain)
2007-05-01 02:09 UTC, Dustin Surawicz
Details
modified DSDT (107.83 KB, text/plain)
2007-05-01 13:11 UTC, Daniel Drake
Details
DSDT ASL diff (1.36 KB, patch)
2007-05-01 13:12 UTC, Daniel Drake
Details | Diff
dmesg output after using fixed DSDT (21.36 KB, text/plain)
2007-05-01 15:27 UTC, Dustin Surawicz
Details
new patch (1.42 KB, patch)
2007-05-01 17:31 UTC, Daniel Drake
Details | Diff
hopefully final patch (2.76 KB, patch)
2007-05-02 08:26 UTC, Daniel Drake
Details | Diff
dmesg output (20.04 KB, text/plain)
2007-05-02 11:41 UTC, Dustin Surawicz
Details
dmesg from ubuntu gutsy daily 9 june 2007 (21.49 KB, text/plain)
2007-06-09 06:01 UTC, Marco
Details

Description Dustin Surawicz 2007-03-24 11:11:58 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.18
Distribution: Gentoo Linux 2006.1
Hardware Environment: i686 mobile AMD Athlon(tm) XP 1500+ AuthenticAMD GNU/Linux
Problem Description: 

I've filed this bug already to Gentoo bugzilla
(http://bugs.gentoo.org/show_bug.cgi?id=164802) and they asked me to post it
here as well.

Here is a copy of the bug description:

----------------snip--------------------

After switching to the newly stabilized gentoo-sources-2.6.19-r5 kernel, my CPU
runs only on reduced speed:

solaris dustin # cd /sys/devices/system/cpu/cpu0/cpufreq/

solaris cpufreq # cat cpuinfo_max_freq 
1266768

solaris cpufreq # cat cpuinfo_cur_freq 
933408

solaris cpufreq # cat scaling_available_frequencies 
1266768 1000080 933408 800064 666720 

solaris cpufreq # cat scaling_available_governors   
ondemand powersave userspace performance 

solaris cpufreq # cat scaling_governor            
performance

solaris cpufreq # cat scaling_driver 
powernow-k7

solaris cpufreq # uname -a
Linux solaris 2.6.19-gentoo-r5 #1 Wed Jan 31 23:49:21 CET 2007 i686 mobile AMD
Athlon(tm) XP 1500+ AuthenticAMD GNU/Linux

Although in performance mode, the CPU is running not at max. frequency.
Switching to powersafe mode will reduce the frequency to minimum:

solaris cpufreq # echo powersave > scaling_governor 
solaris cpufreq # cat cpuinfo_cur_freq 
666720

With ondemand, frequency stays at minimum until CPU load increases, but never
exceeds 933408.

With CONFIG_CPU_FREQ unset CPU runs at max frequency.

----------------snap--------------------

After I received some help from Gentoo developers, I was able to find the
following lines in my dmesg.log, that seem to point to the problem:

----------------snip--------------------
powernow: Minimum speed 666 MHz. Maximum speed 1266 MHz.
freq-table: setting show_table for cpu 0 to cec867e0
freq-table: table entry 0: 1266768 kHz, 3085 index
freq-table: table entry 1: 1000080 kHz, 3593 index
freq-table: table entry 2: 933408 kHz, 3592 index
freq-table: table entry 3: 800064 kHz, 4358 index
freq-table: table entry 4: 666720 kHz, 4868 index
cpufreq-core: setting new policy for CPU 0: 666720 - 1266768 kHz
freq-table: request for verification of policy (666720 - 1266768 kHz) for cpu 0
freq-table: verification lead to (666720 - 1266768 kHz) for cpu 0
freq-table: request for verification of policy (666720 - 950000 kHz) for cpu 0
freq-table: verification lead to (666720 - 950000 kHz) for cpu 0
cpufreq-core: new min and max freqs are 666720 - 950000 kHz

It seems that after detecting the correct max speed and setting the correct
policy, the max speed is reduced and again validated, which leads to an
incorrect max speed.
----------------snap--------------------

Steps to reproduce

1. Configure kernel to use frequency scaling with powernow-k7
2. Compile kernel and reboot using the new kernel
3. Check CPU frequency


I will attach the full log as well as my kernel config as well.

BR,
Dustin
Comment 1 Dustin Surawicz 2007-03-24 11:16:06 UTC
Created attachment 10932 [details]
dmesg.log with debug output
Comment 2 Dustin Surawicz 2007-03-24 11:18:26 UTC
Created attachment 10933 [details]
kernel configuration I used for 2.26.21-rc3 to reproduce the problem
Comment 3 Daniel Drake 2007-03-24 16:48:30 UTC
This is a 2.6.19 regression (2.6.18 was OK) which has been reproduced as of
2.6.21-rc3
Comment 4 Dustin Surawicz 2007-03-29 06:04:01 UTC
Okay. I tracked down the problem with git. Here is what is the submission that
causes the problem:

solaris linux-git # git bisect good
0916bd3ebb7cefdd0f432e8491abe24f4b5a101e is first bad commit
commit 0916bd3ebb7cefdd0f432e8491abe24f4b5a101e
Author: Dave Jones <davej@redhat.com>
Date:   Wed Nov 22 20:42:01 2006 -0500

        [PATCH] Correct bound checking from the value returned from _PPC method.
        
        processor_perflib.c::acpi_processor_ppc_notifier() check if the value
        returned by the processor's _PPC method is 0 and return failed if so.
        This is wrong since 0 indicate that the bios think the processor can go
        to the highest frequency.  This patch for example fix the HP NX 6125 to
        allow its highest frequency to be available.

        Signed-off-by: Bruno Ducrot <ducrot@poupinou.org>
        Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
        Signed-off-by: Dave Jones <davej@redhat.com>
        Signed-off-by: Linus Torvalds <torvalds@osdl.org>

        :040000 040000 d6696bf57e1a08b39e051bcf22c838723ccdf0bf
a67b7741e5163c598cb3663c7997dc663516695e M      drivers


Maybe someone can look into it.
Comment 5 Dustin Surawicz 2007-03-29 07:57:52 UTC
I google a bit and found that this regression is already known and a patch has
been proposed:

http://www.mail-archive.com/linux-acpi@vger.kernel.org/msg04484.html

Don't know whether this is just a workaround or a fix. I do not know anything
about kernel hacking...

Comment 6 Daniel Drake 2007-03-29 08:38:38 UTC
I already suspected that exact issue which is why I asked you to test
gentoo-sources-2.6.19-r6 on the downstream bug, as that patch was merged into
2.6.19.3. So although it appears to be the same commit at fault, it must be a
different issue.
Comment 7 Daniel Drake 2007-03-29 08:43:26 UTC
Dave,

bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=8255

> Okay. I tracked down the problem with git. Here is what is the submission that
> causes the problem:
> 
> solaris linux-git # git bisect good
> 0916bd3ebb7cefdd0f432e8491abe24f4b5a101e is first bad commit
> commit 0916bd3ebb7cefdd0f432e8491abe24f4b5a101e
> Author: Dave Jones <davej@redhat.com>
> Date:   Wed Nov 22 20:42:01 2006 -0500
> 
>         [PATCH] Correct bound checking from the value returned from _PPC method.

The above commit seems to be the culprit behind this 2.6.19 regression, 
which is still present as of 2.6.21-rc3 (so it's not the earlier issue 
found by Ingo). Any ideas?

Thanks,
Daniel

Comment 8 Dustin Surawicz 2007-04-01 14:41:00 UTC
After analyzing the log, my guess is that there is a bug in some acpi piece of
source of powernow-k7. Here is what I found out:

from dmesg.log:
-------------snip--------------
powernow-k7: acpi:  P0: 950 MHz 24000 mW 125 uS control 009c418d SGTC 10000
powernow-k7:    FID: 0xd (9.5x [1266MHz])  VID: 0xc (1.400V)
powernow-k7: acpi:  P1: 750 MHz 16337 mW 125 uS control 009c41c9 SGTC 10000
powernow-k7:    FID: 0x9 (7.5x [1000MHz])  VID: 0xe (1.300V)
powernow-k7: acpi:  P2: 700 MHz 15248 mW 125 uS control 009c41c8 SGTC 10000
powernow-k7:    FID: 0x8 (7.0x [933MHz])  VID: 0xe (1.300V)
powernow-k7: acpi:  P3: 600 MHz 12084 mW 125 uS control 009c4226 SGTC 10000
powernow-k7:    FID: 0x6 (6.0x [800MHz])  VID: 0x11 (1.250V)
powernow-k7: acpi:  P4: 500 MHz 9280 mW 125 uS control 009c4264 SGTC 10000
powernow-k7:    FID: 0x4 (5.0x [666MHz])  VID: 0x13 (1.200V)
-------------snap--------------

As you can see, acpi detects the states available with my cpu are,
500,600,700,750, and 950MHz. But the correct values are 666,800,933,1000, and
1266MHz, as shown in the lines stating values FID and VID.

If one devides the corresponding values of FID by the ones detected by acpi, e.g
666/500, this always yields 1.33. Could this be somehow related to the FSB
frequency?

powernow-k7: FSB: 133MHz

Furthermore, ' # cat /proc/acpi/processor/CPU0/performance' give the following
output, regardless whether I use a 2.6.18 (before applying the above mentioned
patch) or later:

state count:             5
active state:            P0
states:
   *P0:                  950 MHz, 24000 mW, 125 uS
    P1:                  750 MHz, 16337 mW, 125 uS
    P2:                  700 MHz, 15248 mW, 125 uS
    P3:                  600 MHz, 12084 mW, 125 uS
    P4:                  500 MHz, 9280 mW, 125 uS

If I change the active state to P4 and back to P0, the cpu never speeds up to
1267MHz any more but stays at a around 1000MHz. Exact speed depends on the
kernel version:

2.6.18: 1000MHz
2.6.21: 950MHz

1000MHz is a valid speed according to FID table, which is set because it is
close to the invalid 950MHz.

After applying the patch, the cpufreq_verify_within_limits call in
processor_perflib.c is executed and, thus, I suspect that the frequency is
compared to the table obtained by acpi, where this is a valid frequency.


Any comments?
Comment 9 Dustin Surawicz 2007-04-11 07:52:57 UTC
This bug seems not to be of much interest...
I would love to look myself but I am really a noob. Could at least someone give
me a hint, which piece of code is responsible to set these:

state count:             5
active state:            P0
states:
   *P0:                  950 MHz, 24000 mW, 125 uS
    P1:                  750 MHz, 16337 mW, 125 uS
    P2:                  700 MHz, 15248 mW, 125 uS
    P3:                  600 MHz, 12084 mW, 125 uS
    P4:                  500 MHz, 9280 mW, 125 uS

Thanks.
Comment 10 Daniel Drake 2007-04-26 18:52:57 UTC
Created attachment 11279 [details]
diagnostic patch

I'm not knowledgeable about any of this stuff, but here's an attempt to further
clarify what's going on: Please apply this patch and attach new dmesg output.

I think one important part of the issue is that powernow-k7 can't look up the
frequency tables on it's own (possibly broken BIOS), so it falls back on ACPI.

Also, sometimes the ACPI developers find acpidump output useful. You can emerge
pmtools to get this utility - I suggest you attach the output here.
Comment 11 Dustin Surawicz 2007-04-27 04:12:14 UTC
Okay, here comes the new dmesg output after applying the patch to the
2.6.21-gentoo kernel source. I enabled acpi debugging and added cpufreq.debug=7
to my kernel parameters.

Output of acpidump to be attached as well.
Comment 12 Dustin Surawicz 2007-04-27 04:13:50 UTC
Created attachment 11284 [details]
dmesg output after applying Daniel's patch
Comment 13 Dustin Surawicz 2007-04-27 04:14:51 UTC
Created attachment 11285 [details]
Output of acpidump
Comment 14 Dustin Surawicz 2007-04-27 04:52:20 UTC
Comment on attachment 11284 [details]
dmesg output after applying Daniel's patch

truncated output
Comment 15 Dustin Surawicz 2007-04-27 04:53:41 UTC
Created attachment 11286 [details]
dmesg output after applying Daniel's patch
Comment 16 Daniel Drake 2007-04-27 06:11:16 UTC
Created attachment 11287 [details]
new patch

It does look like your BIOS is broken. Before we go further, please check if
updates are available.

If not, try applying this patch. It may get things back on their feet, by
ignoring the broken field in the BIOS.
Comment 17 Dustin Surawicz 2007-04-28 02:40:18 UTC
I checked the availability of a bios update. Unfortunately, I am running the
newest revision and since my laptop is not brand new any more, there won't be
any. Bios is from Jan 2003...

So I will try with your patch. Should I apply this one additionally to the
previous one or instead of it?

Thanks for your efforts so far. Hope that this is fixed soon.
Comment 18 Daniel Drake 2007-04-28 06:56:31 UTC
apply it instead of the first one
Comment 19 Dustin Surawicz 2007-05-01 02:09:21 UTC
Created attachment 11357 [details]
dmesg output after applying 2nd patch

The patch did not solve the problem, it even got worse. CPU just running with
800MHz now. Available frequencies in the log are looking quite weird. There is
the slowest frequency 633MHz and then other entries state 800MHz.
Comment 20 Daniel Drake 2007-05-01 10:33:25 UTC
OK, your BIOS is broken so we shouldn't go further down that route.
I'll hopefully find some time to dig into why acpi_perflib isn't obeying the
133mhz FSB value. I suspect this may be another BIOS bug on your system...
Comment 21 Daniel Drake 2007-05-01 13:11:26 UTC
Created attachment 11364 [details]
modified DSDT
Comment 22 Daniel Drake 2007-05-01 13:12:05 UTC
Created attachment 11365 [details]
DSDT ASL diff

for reference
Comment 23 Daniel Drake 2007-05-01 13:20:43 UTC
(the above diff is accidentally reversed)

Dustin,

First reverse the above patches so that you're working from clean kernel sources.

I have generated a custom ACPI BIOS for you. The one stored in the system has
those incorrect frequencies.

Download the modified DSDT and compile it with:

 # iasl -tc DSDT.dsl

You'll then get a "DSDT.hex" file in the same directory.

Now in the kernel config, enable CONFIG_ACPI_CUSTOM_DSDT and provide the path to
the .hex file

Then recompile the kernel, boot into the new one, and see if things have improved.
Comment 24 Dustin Surawicz 2007-05-01 13:56:48 UTC
Managed to compile the DSDT.dsl file. But I can't find any
CONFIG_ACPI_CUSTOM_DSDT in menuconfig (2.6.21-gentoo). grepping for it in the
.config gives no match either...
Can I just add it to the .config file and f yes where?
Comment 25 Dustin Surawicz 2007-05-01 14:21:28 UTC
Never mind. Found how to get the option :)
Comment 26 Dustin Surawicz 2007-05-01 15:27:28 UTC
Created attachment 11366 [details]
dmesg output after using fixed DSDT

Bingo! This seems to have fixed it. I attach dmesg output in case there is
still something suspicious.

Thank you so much, Daniel, for your effort!!!
Comment 27 Daniel Drake 2007-05-01 17:31:53 UTC
Created attachment 11367 [details]
new patch

Please now apply this patch to a clean kernel (no custom DSDT, no previous
patches applied) and see what happens...
Comment 28 Dustin Surawicz 2007-05-02 03:43:20 UTC
Patch works. CPU runs at highest frequency now.
Comment 29 Daniel Drake 2007-05-02 08:18:27 UTC
Thanks. Can I see dmesg output from using it?

The modified DSDT I gave you didn't quite work correctly. If you look in the logs:

freq-table: table entry 0: 1266768 kHz, 3085 index
[...]
cpufreq-core: setting new policy for CPU 0: 666720 - 1266768 kHz
freq-table: request for verification of policy (666720 - 1266768 kHz) for cpu 0
freq-table: verification lead to (666720 - 1266768 kHz) for cpu 0
freq-table: request for verification of policy (666720 - 1266000 kHz) for cpu 0
freq-table: verification lead to (666720 - 1266000 kHz) for cpu 0
cpufreq-core: new min and max freqs are 666720 - 1266000 kHz
[...]
performance: setting to 1266000 kHz because of event 1
cpufreq-core: target for CPU 0: 1266000 kHz, relation 1
freq-table: request for target 1266000 kHz (relation: 1) for cpu 0
freq-table: target is 1 (1000080 kHz, 3593)

In other words, it's not actually reaching the maximum frequency due to rounding
between mhz/khz values.

I'm pretty sure the last patch I posted has the same bug. If so, please apply
the updated version I'll post in a few mins.


Also, there's something else going on here.

From the logs:
Detected 1333.446 MHz processor.

and:
userspace: managing cpu 0 started (666720 - 1266000 kHz, currently 1333440 kHz)

And I suspect /proc/cpuinfo will reflect that if you don't build cpufreq support.

That aside, I don't think that's a new issue (I think 2.6.18 also behaved the
same way for you), so if we can get it going at 1266mhz again I think this bug
is fixed (and you're welcome to file a new bug for the 1333mhz issue).
Comment 30 Daniel Drake 2007-05-02 08:26:40 UTC
Created attachment 11373 [details]
hopefully final patch

This should fix the rounding problem (but not the 1333mhz one). It replaces all
previous patches/DSDTs. Please post dmesg output from using this patch.
Comment 31 Dustin Surawicz 2007-05-02 11:41:09 UTC
Created attachment 11376 [details]
dmesg output

You are correct, Daniel. I applied now your latest patch and the frequency
table should now be correct.

Concerning the discrepancy between 'detected 1333.446MHz processor' and max
frequency of 1266768MHz: Could this be related with my broken BIOS and do I
then need to correct the DSDT or is this a more general prob? In the latter
case, I would file another bug.

BR,
Dustin
Comment 32 Dustin Surawicz 2007-05-02 14:04:10 UTC
I recompiled the patched kernel without frequency scaling and, of course, the
CPU is then running at 1333MHz. Should I file a seperate bug?
Comment 33 Daniel Drake 2007-05-02 14:55:09 UTC
Here's how it works for you at the moment:

powernow-k7 can't find any of the powernow performances tables that match your
CPU. Instead, it falls back on the performance tables in the ACPI BIOS (DSDT) to
figure out the available frequencies.

It turns out that those tables aren't much good either. Taking the first entry:

                Package (0x06)
                {
                    0x03B6,
                    0x5DC0, 
                    0x7D, 
                    0x7D, 
                    0x009C418D, 
                    0x018D
                }, 

The first value (0x03B6) is 950 in decimal, and this represents the CPU
frequency for this performance state.

The FID and VID of the performance state are encoded in the 5th value
(0x009C418D). The FID can be looked up in a table in the powernow-k7 driver to
deduce a multiplier, 9.5 in this case. Multiply 9.5 by the FSB and you get the
frequency.

So, it looks like your ACPI tables were written for a system with a 100mhz FSB
(9.5 * 100 = 950, which is consistent with the first value). However, your FSB
is 133mhz.

We have been working on the earlier assumptions that the multiplier encoded by
the FID code is correct, whereas the frequency (in the first field) is
incorrect. e.g. we decided that "9.5 * 133 = 1266mhz" is the right calculation.
This isn't an unfair assumption to make, and this is how the powernow driver
works anyway (uses ACPI-supplied-multiplier * fsb, not ACPI-supplied-frequency)
but the reality is that it looks like both the FID(multiplier) *and* frequency
values in the ACPI performance state tables are wrong. To reach maximum
frequency there would have to be a performance state with a FID that encodes a
multiplier of 10.0 (10 * 133 = 1333mhz).

At this point we're beyond the level where this could be fixed in the kernel
(since all of the data sources are wrong!). If you know what the allowed
frequencies were *supposed* to be, I could help provide a new DSDT. Or, we could
even just add another performance state with a 10.0 multiplier. However I'm not
sure how safe it is to mess with stuff like this...
Comment 34 Dustin Surawicz 2007-05-02 15:09:31 UTC
I understood what you mean. I am using an AMD1500+ processor, not sure whether
it is a mobile one. Is it possible to get the needed information for it so that
I can use a corrected DSDT locally? I could open my laptop to see the exact
label of it if this helps.
Comment 35 Dustin Surawicz 2007-05-02 15:43:43 UTC
One comment: If write a value smaller than f_max to
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq and want to revert
afterwards to f_max, the CPU freq will not be set to the correct value due to
kHz/MHz rounding issue.
Comment 36 Daniel Drake 2007-05-02 15:55:34 UTC
I'm not sure what you mean. Can you expand on that, perhaps with annotated debug
logs?
Comment 37 Dustin Surawicz 2007-05-05 07:56:30 UTC
I am quite busy right now and on vacation the next three weeks. After I am back,
I will give you the desired information.
Comment 38 Marco 2007-06-09 06:00:11 UTC
Hi,
I have the same problem here with an asus L3D notebook that previously has an
Athlon XP-M 2000+ that was 1600 Mhz and was recognized perfectly.

Now there is an Athlon XP-M 2400+ that has 1800 Mhz and when kernel is started
is recognized well after powernow recognized it as min and max = 1473 Mhz.
I've tried latest ubuntu gutsy snapshot that has kernel 2.6.22-6 that in theory
has 2.6.22rc2 but I'm not quite sure, so I don't know if it has this patch
included or not.

Can be my problem related? I attach dmesg.

Thanx
Comment 39 Marco 2007-06-09 06:01:32 UTC
Created attachment 11719 [details]
dmesg from ubuntu gutsy daily 9 june 2007
Comment 40 Daniel Drake 2007-06-09 08:12:18 UTC
Please reproduce this on an unpatched 2.6.22-rc4 kernel. Compile it with cpufreq
debugging support enabled and boot with cpufreq.debug=3 then attach new logs if
the bug still exists.
Comment 41 Daniel Drake 2007-06-29 16:19:47 UTC
marking fixed as this patch is upstream. If there are still issues, please open a new bug with the requested info and put me on CC.

Note You need to log in before you can comment on or make changes to this bug.