Bug 8033 - CPU loses cpufreq link after offline/online transition (AMD, i386 specific problem)
Summary: CPU loses cpufreq link after offline/online transition (AMD, i386 specific pr...
Status: CLOSED CODE_FIX
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Thomas Renninger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-18 04:18 UTC by Bernhard Bender
Modified: 2007-05-08 01:22 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.20
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Don't delete cpu_devs data to identify different x86 types in late_initcall (4.70 KB, patch)
2007-04-27 08:43 UTC, Thomas Renninger
Details | Diff

Description Bernhard Bender 2007-02-18 04:18:27 UTC
Most recent kernel where this bug did *NOT* occur: unknown
Distribution: openSuSE10.2 (probably irrelevant)
A previous descussion of this problem is here:
https://bugzilla.novell.com/show_bug.cgi?id=246525

Hardware Environment: hp compaq nx6325 (AMD Turion64 X2
Software Environment:

Problem Description:
After taking CPU1 offline and online again, the
/sys/devices/system/cpu/cpu1/cpufreq link is missing.
This also happens after suspend2disk and resume.

While the CPU will work normally, kpowersave will report it as disabled.

Steps to reproduce:
as root user:
# echo 0 >/sys/devices/system/cpu/cpu1/online
# echo 1 >/sys/devices/system/cpu/cpu1/online
# l /sys/devices/system/cpu/*
-rw-r--r-- 1 root root 4096 2007-02-16 23:12
/sys/devices/system/cpu/sched_mc_power_savings

/sys/devices/system/cpu/cpu0:
total 0
drwxr-xr-x 5 root root    0 2007-02-16 23:12 ./
drwxr-xr-x 4 root root    0 2007-02-16 23:12 ../
drwxr-xr-x 5 root root    0 2007-02-17 00:08 cache/
drwxr-xr-x 3 root root    0 2007-02-16 23:12 cpufreq/
-r-------- 1 root root 4096 2007-02-16 23:12 crash_notes
drwxr-xr-x 2 root root    0 2007-02-17 00:08 topology/

/sys/devices/system/cpu/cpu1:
total 0
drwxr-xr-x 4 root root    0 2007-02-16 23:13 ./
drwxr-xr-x 4 root root    0 2007-02-16 23:12 ../
drwxr-xr-x 5 root root    0 2007-02-16 23:13 cache/
-r-------- 1 root root 4096 2007-02-16 23:12 crash_notes
-rw------- 1 root root    0 2007-02-16 23:13 online
drwxr-xr-x 2 root root    0 2007-02-16 23:13 topology/
#
Comment 1 Rafael J. Wysocki 2007-02-18 06:47:04 UTC
AFAICT, this only happens if the kernel is 32-bit (i386).  At least I'm unable 
to reproduce the problem on nx6325 with x86_64 kernels (2.6.20 or later).
Comment 2 Lukasz Fibinger 2007-03-14 17:17:59 UTC
That has been indeed bugging me for a long time (Athlon 64 X2 3800+, Asus 
A8N-E, 32 bit distro).
Comment 3 Bernhard Bender 2007-04-03 15:50:10 UTC
The problem persists with kernel version 2.6.20.4
Comment 4 Thomas Renninger 2007-04-26 10:15:06 UTC
This does not look like a cpufreq problem... for now I found out that the 
vendor string gets overridden when the CPU gets offlined:

Apr 26 19:10:47 adalid kernel: Checking CPU 0
Apr 26 19:10:47 adalid kernel: k
Apr 26 19:10:47 adalid kernel: u
Apr 26 19:10:47 adalid kernel: vendor: 2 - We have an AMD CPU ...
Apr 26 19:10:47 adalid kernel: a
Apr 26 19:10:47 adalid kernel: b
Apr 26 19:10:47 adalid kernel: c
Apr 26 19:10:47 adalid kernel: d
Apr 26 19:10:47 adalid kernel: e
Apr 26 19:10:47 adalid kernel: CPU 0 supported
Apr 26 19:10:47 adalid kernel: Checking CPU 1
Apr 26 19:10:47 adalid kernel: k
Apr 26 19:10:47 adalid kernel: u
Apr 26 19:10:47 adalid kernel: vendor: 255 - NOT AN AMD CPU ?!?
Apr 26 19:10:47 adalid kernel: Checking CPU 2
Apr 26 19:10:47 adalid kernel: k
Apr 26 19:10:47 adalid kernel: u
Apr 26 19:10:47 adalid kernel: vendor: 255 - NOT AN AMD CPU ?!?
Apr 26 19:10:47 adalid kernel: Checking CPU 3
Apr 26 19:10:47 adalid kernel: k
Apr 26 19:10:47 adalid kernel: u
Apr 26 19:10:47 adalid kernel: vendor: 255 - NOT AN AMD CPU ?!?
Apr 26 19:10:47 adalid kernel: Num online: 4

This is in powernow-k8.c:check_supported_cpu(unsigned int cpu) here:

	if (current_cpu_data.x86_vendor != X86_VENDOR_AMD){
		printk ("vendor: %d - NOT AN AMD CPU ?!?\n", 
current_cpu_data.x86_vendor);
		goto out;
	}

	printk ("vendor: %d - We have an AMD CPU ...\n", 
current_cpu_data.x86_vendor);

Remember: This is a 32 bit kernel problem!
I will try some more tomorrow...
Comment 5 Thomas Renninger 2007-04-27 08:43:40 UTC
Created attachment 11290 [details]
Don't delete cpu_devs data to identify different x86 types in late_initcall

This one should fix it.

Dave, Andi will someone of you pick this one up or do I have to explicitly post
it to lkml or somewhere?

I did test this with i386 -> works and test compiled with and without
CONFIG_HOTPLUG_CPU

---------------------------------
*Unrelated*:
Some change to older or x86_64 kernel I realized:
when doing:

l -d cpu*/cpufreq
drwxr-xr-x cpu0/cpufreq/
lrwxrwxrwx cpu1/cpufreq -> ../../../../devices/system/cpu/cpu0/cpufreq/
drwxr-xr-x cpu2/cpufreq/
lrwxrwxrwx cpu3/cpufreq -> ../../../../devices/system/cpu/cpu2/cpufreq/

then doing:
adalid:/sys/devices/system/cpu # echo 0 >cpu2/online 
adalid:/sys/devices/system/cpu # l -d cpu*/cpufreq
drwxr-xr-x cpu0/cpufreq/
lrwxrwxrwx cpu1/cpufreq -> ../../../../devices/system/cpu/cpu0/cpufreq/

The cpufreq dir in both cores, cpu2 and cpu3 vanishes.
This was on a recent i386 kernel.
I am pretty sure I saw the cpu3/cpufreq getting a real directory when offlining
cpu2 and when adding cpu2 again, cpu2/cpufreq gets linked to cpu3/cpufreq.
This should be the right behaviour, so the kernel is broken here, right?

-> You might want to reply to the list, this one is forwarded to, should be
easiest, don't want to write a separate mail now...
Comment 6 Lukasz Fibinger 2007-05-01 07:09:02 UTC
Alas, this patch doesn't work for me - with it applied modprobe powernow-k8 
applies spits out "cannot load module: no such device".
Comment 7 Lukasz Fibinger 2007-05-01 14:07:35 UTC
Actually, it works - at first tried on a kernel with some other patches, 
must've been their fault. Mea culpa.

Thanks a lot for this - now I can use suspend and not worry about processes on 
cpu1 not triggering throttling.
Comment 8 Thomas Renninger 2007-05-02 02:15:25 UTC
Thanks for verification.
Hopefully Andi or Dave can just pick the patch up.

If there is no msg the next days I'll send a little pointer to lkml for review 
and ask for integration...
Comment 9 Thomas Renninger 2007-05-02 02:20:16 UTC
Arg, I shouldn't have assigned this one to me...
Set it to i386 component, can someone please review and pick up the patch from 
comment #5.
Comment 10 Andi Kleen 2007-05-02 03:24:19 UTC
Yes the patch looks correct.  I just merged it.

Owner should close the bug now
Comment 11 Thomas Renninger 2007-05-08 01:21:52 UTC
Accept bug for closing.

Note You need to log in before you can comment on or make changes to this bug.