Bug 55411 - sysfs per-cpu cpufreq subdirs/symlinks screwed up after s2ram
Summary: sysfs per-cpu cpufreq subdirs/symlinks screwed up after s2ram
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-18 09:13 UTC by Duncan
Modified: 2013-06-04 00:56 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.9-rc3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
kernel config (70.32 KB, text/plain)
2013-03-18 09:13 UTC, Duncan
Details
dmesg after s2ram/resume cycle (62.45 KB, text/plain)
2013-03-18 09:21 UTC, Duncan
Details
cpufreq-info, pre-s2ram (good) (3.60 KB, text/plain)
2013-03-18 09:26 UTC, Duncan
Details
cpufreq-info, post-resume (bad) (2.65 KB, text/plain)
2013-03-18 09:38 UTC, Duncan
Details
grep . cpu?/cpufreq/* (good/pre-suspend) (5.99 KB, text/plain)
2013-03-18 09:56 UTC, Duncan
Details
grep . cpu?/cpufreq/* (bad/post-resume) (3.99 KB, text/plain)
2013-03-18 10:40 UTC, Duncan
Details

Description Duncan 2013-03-18 09:13:02 UTC
Created attachment 95681 [details]
kernel config

I have a kde/superkaramba theme that monitors system state, including current cpu frequencies.  Testing the 3.9-pre kernels (from git), I noticed that after a suspend2ram and resume, several of the current frequency readouts were missing!  Investigating, I found that the per-cpu sysfs cpufreq subdirs/symlinks were screwed up after an s2ram/resume cycle.

The CPU's an AMD bulldozer tri-cluster/6-core.

Abridged (perms/dates omitted) fresh-boot (good)
ls -dl /sys/devices/system/cpu/cpu?/cpufreq :

/sys/devices/system/cpu/cpu0/cpufreq/
/sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq/
/sys/devices/system/cpu/cpu2/cpufreq/
/sys/devices/system/cpu/cpu3/cpufreq -> ../cpu2/cpufreq/
/sys/devices/system/cpu/cpu4/cpufreq/
/sys/devices/system/cpu/cpu5/cpufreq -> ../cpu4/cpufreq/

After a s2ram/resume cycle (now bad):

/sys/devices/system/cpu/cpu0/cpufreq/
/sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq/
/sys/devices/system/cpu/cpu3/cpufreq/
/sys/devices/system/cpu/cpu5/cpufreq/

The subdirs for cpu2 and 4 disappeared, and cpu3 and 5 now have subdirs instead of symlinks to the (missing) cp2 2 and 4 subdirs.

No WONDER the cat of the scaling_cur_freq files ends up returning nothing for some cpus, those files, along with the dirs they were in, disappeared!

Additionally, some of the values in the remaining files are screwy.  I'll comment on them as I attach...

Attached is my config (redirected zcat /proc/config.gz, note that it's monolithic, no modules), and I'll attach dmesg, a grep . of the cpufreq files both good and bad, and cpufreq-info output both good and bad (based on the attachment requests I saw for a different cpufreq bug), as well.
Comment 1 Duncan 2013-03-18 09:21:07 UTC
Created attachment 95691 [details]
dmesg after s2ram/resume cycle

Here's dmesg after s2ram/resume.  It was a fresh boot and then suspend/resume, so full boot info is captured as well, then the s2ram/resume.
Comment 2 Duncan 2013-03-18 09:26:27 UTC
Created attachment 95701 [details]
cpufreq-info, pre-s2ram (good)

Good (pre-suspend) cpufreq-info.  Note the pairs: 0,1 2,3 4,5
Comment 3 Duncan 2013-03-18 09:38:55 UTC
Created attachment 95711 [details]
cpufreq-info, post-resume (bad)

Bad (post-resume) cpufreq-info.  As expected given the missing symlinks, cpus 2 and 4 don't show anything.

But what happened to cpu5?  Why is it saying it runs at the same hardware frequency as cpu3, NOT cpu4 as before, and why is it saying that, yet never-the-less saying it's alone, software-coordination-wise?

And if cpu5 is supposed to be hardware-frequency-linked to cpu3 now, why doesn't cpu3 list cpu5 as cpu5 lists cpu3?

Goofy values indeed!
Comment 4 Duncan 2013-03-18 09:56:04 UTC
Created attachment 95721 [details]
grep . cpu?/cpufreq/* (good/pre-suspend)

Good/pre-s2ram grep . cpu?/cpufreq/*

I deliberately redirected STDERR as well to hilite a question I've always had, as this is the best chance I suppose I'll get for a good answer:

Obviously as non-root:

grep: /sys/devices/system/cpu/cpu?/cpufreq/cpuinfo_cur_freq: Permission denied

Root is required to read cpuinfo_cur_freq.

But /sys/devices/system/cpu/cpu?/cpufreq/scaling_cur_freq:1400000

scaling_cur_freq is readable by world.

What's the difference between the two files and why is cpuinfo_cur_freq so special as to be the only file in the dir with 0400 perms, as opposed to 0444?
Comment 5 Duncan 2013-03-18 10:40:03 UTC
Created attachment 95731 [details]
grep . cpu?/cpufreq/* (bad/post-resume)

As with cpufreq-info (bad), the grep (bad) shows cpu3 reporting only itself in related_cpus, while cpu5 reports 3 and 5, yet pre-suspend, 2 and 3 were paired as were 4 and 5.  But affected_cpus reports only the single one for both. And of course there's no cpu2 or 4 shown as those dirs/symlinks are gone!

Additional comment on the cpuinfo_cur_freq question.  Feel free to point me to "TFM" for me to "R".  I just haven't seen it.  (Documentation/cpu-freq/user-guide.txt says cpuinfo_cur_freq is what the hardware reports, scaling_cur_freq is what the kernel thinks it should be, but that doesn't explain the conditions under which they might differ, or why cpuinfo_cur_freq is readable only by root, while scaling_cur_freq and all the other cpufreq/* files are readable by world.)

This is the last file I can think to attach ATM.  Let me know if there's anything else.

FWIW, I've standardized on reiserfs here, as at least since the introduction of data=ordered in 2.6.16 or so, it has been quite good to me, I think in part because the kernel folks don't screw with it as much as they do ext2/3/4.  And all my filesystems are well under a half TB anyway.  So if necessary, I believe I could bisect without running into the mid-window ext4 issue, but I'm hoping the specific triggers here, that it happens only after a s2ram/resume cycle, narrow it down enough so that isn't necessary, once someone familiar with the subsystem and its 3.8 -> 3.9-rc* changes takes a look.
Comment 6 Viresh Kumar 2013-03-18 11:53:47 UTC
(In reply to comment #4)
> Created an attachment (id=95721) [details]
> grep . cpu?/cpufreq/* (good/pre-suspend)
> 
> Good/pre-s2ram grep . cpu?/cpufreq/*
> 
> I deliberately redirected STDERR as well to hilite a question I've always
> had,
> as this is the best chance I suppose I'll get for a good answer:
> 
> Obviously as non-root:
> 
> grep: /sys/devices/system/cpu/cpu?/cpufreq/cpuinfo_cur_freq: Permission
> denied
> 
> Root is required to read cpuinfo_cur_freq.
> 
> But /sys/devices/system/cpu/cpu?/cpufreq/scaling_cur_freq:1400000
> 
> scaling_cur_freq is readable by world.
> 
> What's the difference between the two files and why is cpuinfo_cur_freq so
> special as to be the only file in the dir with 0400 perms, as opposed to
> 0444?

cpuinfo_cur_freq is the freq read from the hardware and so has such permissions.
It will mostly be same as scaling_cur_freq (software stored value of same hardware freq).

BTW, I must be the guy who screwed up your system in 3.9-rc .. I have come to your rescue now and will try to fix it for you.

BTW, it will be good if you can provide me suspend/resume log with all debug prints enabled.
Comment 7 Viresh Kumar 2013-03-18 12:14:49 UTC
(In reply to comment #5)
> (Documentation/cpu-freq/user-guide.txt says cpuinfo_cur_freq is what the
> hardware reports, scaling_cur_freq is what the kernel thinks it should be,
> but
> that doesn't explain the conditions under which they might differ, or why

If cpu frequency changes without kernel getting to know about it.

> cpuinfo_cur_freq is readable only by root, while scaling_cur_freq and all the
> other cpufreq/* files are readable by world.)

Because it is accessing the h/w directly and they don't want anybody else to do this.
Comment 8 Viresh Kumar 2013-03-19 06:16:53 UTC
(In reply to comment #0)
> Abridged (perms/dates omitted) fresh-boot (good)
> ls -dl /sys/devices/system/cpu/cpu?/cpufreq :
> 
> /sys/devices/system/cpu/cpu0/cpufreq/
> /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq/
> /sys/devices/system/cpu/cpu2/cpufreq/
> /sys/devices/system/cpu/cpu3/cpufreq -> ../cpu2/cpufreq/
> /sys/devices/system/cpu/cpu4/cpufreq/
> /sys/devices/system/cpu/cpu5/cpufreq -> ../cpu4/cpufreq/
> 
> After a s2ram/resume cycle (now bad):
> 
> /sys/devices/system/cpu/cpu0/cpufreq/
> /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq/
> /sys/devices/system/cpu/cpu3/cpufreq/
> /sys/devices/system/cpu/cpu5/cpufreq/

Can you try this rather than s2r:

for i in 1 2 3 4 5; do echo 0 > /sys/devices/system/cpu/cpu$i/online ; done
for i in 1 2 3 4 5; do echo 1 > /sys/devices/system/cpu/cpu$i/online ; done

and check the status if things are still corrupted for you?

I am working on a 5 cpu system grouped as 0-1 and 2-3-4.
Above doesn't corrupt anything for me Atleast.

And my system doesn't have S2R support for now.
Comment 9 Duncan 2013-03-19 07:49:53 UTC
(In reply to comment #8)
> (In reply to comment #0)
>> After a s2ram/resume cycle (now bad):
>> 
>> /sys/devices/system/cpu/cpu0/cpufreq/
>> /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq/
>> /sys/devices/system/cpu/cpu3/cpufreq/
>> /sys/devices/system/cpu/cpu5/cpufreq/
> 
> Can you try this rather than s2r:
> 
> for i in 1 2 3 4 5; do echo 0 > /sys/devices/system/cpu/cpu$i/online ; done
> for i in 1 2 3 4 5; do echo 1 > /sys/devices/system/cpu/cpu$i/online ; done
> 
> and check the status if things are still corrupted for you?

> Above doesn't corrupt anything for me Atleast.

That's a nice easy test; no rebuild and reboot needed. =:^)

Tho I had to change the > to >| as I have bash noclobber set and the files obviously already exist...

Uncorrupted before the test, corrupted after.  So just cycling the cpus off and then back online *DOES* corrupt cpufreq, thus a much simpler reproducer! =:^)  Exact same ls results as the above.

> And my system doesn't have S2R support for now.

My old system didn't support s2ram reliably; it would work occasionally but mostly it didn't.  But s2disk was workable for awhile, until the fact that I was running mdraid and the disks didn't always return in the same sdX slots due to hardware wakeup issues complicated things, so eventually I didn't use that much either.  The new system's great with s2ram, sans this bug of course; s2disk didn't work at all at first, but last time I tried it /almost/ worked so there has been improvement.  But I don't like to take unnecessary chances with filesystem log replay and thankfully wall power's good enough around here that I can s2ram for a day and come back and wiggle the mouse and all's fine (with a couple pre-suspend syncs thrown into my script just in case), so I tend to use it a LOT, even more than I'd use s2disk due to the speed. =:^)

But I'd love to have s2both working reliably; for all I know it's actually working now; it was pretty close.  But I prefer not to test the reiserfs log replay (even with pre-suspend syncs I worry, tho as I said reiserfs has actually been very good to me even thru faulty ram, a power supply blowing up on me, a mobo dying, etc, since 2.6.16 or whenever it was that it got ordered journaling by default) when it doesn't work, so knowing s2disk didn't work well when I tested it and with s2ram working SO well, I don't tend to test s2disk/s2both too often.


Meanwhile, thanks for the cpuinfo_cur_freq explanation.  If that actually real-time touches the hardware to get the data as you say, that does explain the root privs.  Maybe that bit of extra info could be added to the documentation?  I could propose some new wording and open a new bug on cpu-freq/user-guide.txt for it if appropriate.
Comment 10 Viresh Kumar 2013-03-19 07:57:45 UTC
(In reply to comment #9)
> That's a nice easy test; no rebuild and reboot needed. =:^)
> 
> Tho I had to change the > to >| as I have bash noclobber set and the files
> obviously already exist...
> 
> Uncorrupted before the test, corrupted after.  So just cycling the cpus off
> and
> then back online *DOES* corrupt cpufreq, thus a much simpler reproducer! =:^) 
> Exact same ls results as the above.

Fuck! I would take out following out of possible culprits list:
- cpufreq core
- s2r/s2d

My system is very much similar to yours as it has multiple cpu groups sharing policy. The only difference is the driver now which is different for us. My system is a ARN SoC and your an x86.

I believe you are using acpi-cpufreq driver, right? I haven't gone through it but will try. I believe it has to do with the updates done on affected_cpus and related_cpus by me.. That driver might have some strong dependency on that.

CAN SOMEBODY ELSE SEE ACPI-CPUFREQ (WITH PRIOR KNOWLEDGE) DRIVER FROM THIS PERSPECTIVE.

> Meanwhile, thanks for the cpuinfo_cur_freq explanation.  If that actually
> real-time touches the hardware to get the data as you say, that does explain
> the root privs.  Maybe that bit of extra info could be added to the
> documentation?  I could propose some new wording and open a new bug on
> cpu-freq/user-guide.txt for it if appropriate.

Probably the best thing could be a patch, which we can review and apply :)
Comment 11 Duncan 2013-03-19 08:23:43 UTC
(In reply to comment #6)

> BTW, it will be good if you can provide me suspend/resume log with all debug
> prints enabled.

I take it that shouldn't be necessary with the new reproducer?  But a debug log
from the offline/online cycle would be useful?

You meant dmesg, correct?  I take it the originally attached dmesg didn't have
the kernel config debug options enabled that you need?  Which specific options
enabled, anything I need to echo into an appropriate sysfs file, etc?  There's
a LOT of kernel debug options to choose from but I guess most of them wouldn't
be particularly helpful here, so I think you didn't really mean *ALL* of them.
=;^)

(In reply to comment #10)

> I believe you are using acpi-cpufreq driver, right?

Yes, the acpi p-states driver, X86_ACPI_CPUFREQ.  AFAIK it was previously the k8 powernow driver, X86_POWERNOW_K8, but a kernel or two ago I noticed complaints in the kernel log saying to switch to the acpi driver, so I disabled the powernow driver (acpi p-states was already enabled, monolithic kernel as I mentioned above) and rebooted, and cpufreq continued to work, so I stayed with the acpi driver.
Comment 12 Viresh Kumar 2013-03-19 08:27:02 UTC
(In reply to comment #11)
> (In reply to comment #6)
> 
> > BTW, it will be good if you can provide me suspend/resume log with all
> debug
> > prints enabled.
> 
> I take it that shouldn't be necessary with the new reproducer?  But a debug
> log
> from the offline/online cycle would be useful?
> 
> You meant dmesg, correct?  I take it the originally attached dmesg didn't
> have
> the kernel config debug options enabled that you need?  Which specific
> options
> enabled, anything I need to echo into an appropriate sysfs file, etc? 
> There's
> a LOT of kernel debug options to choose from but I guess most of them
> wouldn't
> be particularly helpful here, so I think you didn't really mean *ALL* of
> them.
> =;^)

I don't need any logs now. I wanted the logs with following added to the top of drivers/cpufreq/Makefile:

ccflags-y := -DDEBUG -DVERBOSE_DEBUG

But not required now.

> (In reply to comment #10)
> 
> > I believe you are using acpi-cpufreq driver, right?
> 
> Yes, the acpi p-states driver, X86_ACPI_CPUFREQ.  AFAIK it was previously the
> k8 powernow driver, X86_POWERNOW_K8, but a kernel or two ago I noticed
> complaints in the kernel log saying to switch to the acpi driver, so I
> disabled
> the powernow driver (acpi p-states was already enabled, monolithic kernel as
> I
> mentioned above) and rebooted, and cpufreq continued to work, so I stayed
> with
> the acpi driver.

I think i know the issue now, but don't have sufficient knowledge to give a fix.
I will circulate a mail to people who can do this.
Comment 13 Viresh Kumar 2013-03-19 08:50:07 UTC
Hi Guys,

We are talking here about a bug reported by Duncan here. His cpu/cpu*/cpufreq
directory are getting corrupted with 3.9-rc3 and was working well with 3.8

https://bugzilla.kernel.org/show_bug.cgi?id=55411

On his AMD bulldozer tri-cluster/6-core system he doesn't see affected
and related
cpus set correctly after off-lining 1-5 and bringing them back with:

for i in 1 2 3 4 5; do echo 0 > /sys/devices/system/cpu/cpu$i/online ; done
for i in 1 2 3 4 5; do echo 1 > /sys/devices/system/cpu/cpu$i/online ; done

Before running above two, cpufreq-info gave:
https://bugzilla.kernel.org/attachment.cgi?id=95701

And after running above it gave:
https://bugzilla.kernel.org/attachment.cgi?id=95711

Clearly it got corrupted. Somehow cpu 3 showed up in related cpus field of
cpu 5.

I suspect following patches behind this:

commit fcf8058296edbc3de43adf095824fc32b067b9f8
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Tue Jan 29 14:39:08 2013 +0000

    cpufreq: Simplify cpufreq_add_dev()

    Currently cpufreq_add_dev() firsts allocates policy, calls
    driver->init() and then checks if this CPU is already managed or not.
    And if it is already managed, its policy is freed.

    We can save all this if we somehow know that CPU is managed or not in
    advance.  policy->related_cpus contains the list of all valid sibling
    CPUs of policy->cpu. We can check this to see if the current CPU is
    already managed.

    From now on, platforms don't really need to set related_cpus from
    their init() routines, as the same work is done by core too.

    If a platform driver needs to set the related_cpus mask with some
    additional CPUs, other than CPUs present in policy->cpus, they are
    free to do it, though, as we don't override anything.

    [rjw: Changelog]
    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
    Tested-by: Shawn Guo <shawn.guo@linaro.org>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


AND

commit 643ae6e81dd65b333a13259852405fc9f764ac76
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Sat Jan 12 05:14:38 2013 +0000

    cpufreq: Manage only online cpus

    cpufreq core doesn't manage offline cpus and if driver->init() has returned
    mask including offline cpus, it may result in unwanted behavior by
cpufreq core
    or governors.

    We need to get only online cpus in this mask. There are two places
to fix this
    mask, cpufreq core and cpufreq driver. It makes sense to do this
at common place
    and hence is done in core.

    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


And this is the latest piece of documentation available:

SMP systems normally have same clock source for a group of cpus. For these the
.init() would be called only once for the first online cpu. Here the .init()
routine must initialize policy->cpus with mask of all possible cpus (Online +
Offline) that share the clock. Then the core would copy this mask onto
policy->related_cpus and will reset policy->cpus to carry only online cpus.


I saw acpi-cpufreq drivers driver->init() code and found it is not yet
aligned to this
theory and probably that is causing these failures.

I don't have enough knowledge about this driver and how is it used for all x86
systems and so want somebody else (who has some prior experience with it)
to check how policy->cpus and policy->related_cpus must be set from
driver->init().

--
viresh

---------- Forwarded message ----------
From:  <bugzilla-daemon@bugzilla.kernel.org>
Date: 19 March 2013 13:19
Subject: [Bug 55411] sysfs per-cpu cpufreq subdirs/symlinks screwed up
after s2ram
To: viresh.kumar@linaro.org


https://bugzilla.kernel.org/show_bug.cgi?id=55411

--- Comment #9 from Duncan <1i5t5.duncan@cox.net>  2013-03-19 07:49:53 ---
(In reply to comment #8)
> (In reply to comment #0)
>> After a s2ram/resume cycle (now bad):
>>
>> /sys/devices/system/cpu/cpu0/cpufreq/
>> /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq/
>> /sys/devices/system/cpu/cpu3/cpufreq/
>> /sys/devices/system/cpu/cpu5/cpufreq/
>
> Can you try this rather than s2r:
>
> for i in 1 2 3 4 5; do echo 0 > /sys/devices/system/cpu/cpu$i/online ; done
> for i in 1 2 3 4 5; do echo 1 > /sys/devices/system/cpu/cpu$i/online ; done
>
> and check the status if things are still corrupted for you?

> Above doesn't corrupt anything for me Atleast.

That's a nice easy test; no rebuild and reboot needed. =:^)

Tho I had to change the > to >| as I have bash noclobber set and the files
obviously already exist...

Uncorrupted before the test, corrupted after.  So just cycling the cpus off and
then back online *DOES* corrupt cpufreq, thus a much simpler reproducer! =:^)
Exact same ls results as the above.

> And my system doesn't have S2R support for now.

My old system didn't support s2ram reliably; it would work occasionally but
mostly it didn't.  But s2disk was workable for awhile, until the fact that I
was running mdraid and the disks didn't always return in the same sdX slots due
to hardware wakeup issues complicated things, so eventually I didn't use that
much either.  The new system's great with s2ram, sans this bug of course;
s2disk didn't work at all at first, but last time I tried it /almost/ worked so
there has been improvement.  But I don't like to take unnecessary chances with
filesystem log replay and thankfully wall power's good enough around here that
I can s2ram for a day and come back and wiggle the mouse and all's fine (with a
couple pre-suspend syncs thrown into my script just in case), so I tend to use
it a LOT, even more than I'd use s2disk due to the speed. =:^)

But I'd love to have s2both working reliably; for all I know it's actually
working now; it was pretty close.  But I prefer not to test the reiserfs log
replay (even with pre-suspend syncs I worry, tho as I said reiserfs has
actually been very good to me even thru faulty ram, a power supply blowing up
on me, a mobo dying, etc, since 2.6.16 or whenever it was that it got ordered
journaling by default) when it doesn't work, so knowing s2disk didn't work well
when I tested it and with s2ram working SO well, I don't tend to test
s2disk/s2both too often.


Meanwhile, thanks for the cpuinfo_cur_freq explanation.  If that actually
real-time touches the hardware to get the data as you say, that does explain
the root privs.  Maybe that bit of extra info could be added to the
documentation?  I could propose some new wording and open a new bug on
cpu-freq/user-guide.txt for it if appropriate.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Comment 14 Rafael J. Wysocki 2013-03-22 12:10:00 UTC
[Adding Boris and Thomas to the CC.]

On Tuesday, March 19, 2013 02:20:06 PM Viresh Kumar wrote:
> Hi Guys,
> 
> We are talking here about a bug reported by Duncan here. His cpu/cpu*/cpufreq
> directory are getting corrupted with 3.9-rc3 and was working well with 3.8
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=55411
> 
> On his AMD bulldozer tri-cluster/6-core system he doesn't see affected
> and related
> cpus set correctly after off-lining 1-5 and bringing them back with:
> 
> for i in 1 2 3 4 5; do echo 0 > /sys/devices/system/cpu/cpu$i/online ; done
> for i in 1 2 3 4 5; do echo 1 > /sys/devices/system/cpu/cpu$i/online ; done
> 
> Before running above two, cpufreq-info gave:
> https://bugzilla.kernel.org/attachment.cgi?id=95701
> 
> And after running above it gave:
> https://bugzilla.kernel.org/attachment.cgi?id=95711
> 
> Clearly it got corrupted. Somehow cpu 3 showed up in related cpus field of
> cpu 5.
> 
> I suspect following patches behind this:
> 
> commit fcf8058296edbc3de43adf095824fc32b067b9f8
> Author: Viresh Kumar <viresh.kumar@linaro.org>
> Date:   Tue Jan 29 14:39:08 2013 +0000
> 
>     cpufreq: Simplify cpufreq_add_dev()
> 
>     Currently cpufreq_add_dev() firsts allocates policy, calls
>     driver->init() and then checks if this CPU is already managed or not.
>     And if it is already managed, its policy is freed.
> 
>     We can save all this if we somehow know that CPU is managed or not in
>     advance.  policy->related_cpus contains the list of all valid sibling
>     CPUs of policy->cpu. We can check this to see if the current CPU is
>     already managed.
> 
>     From now on, platforms don't really need to set related_cpus from
>     their init() routines, as the same work is done by core too.
> 
>     If a platform driver needs to set the related_cpus mask with some
>     additional CPUs, other than CPUs present in policy->cpus, they are
>     free to do it, though, as we don't override anything.
> 
>     [rjw: Changelog]
>     Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>     Tested-by: Shawn Guo <shawn.guo@linaro.org>
>     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> 
> AND
> 
> commit 643ae6e81dd65b333a13259852405fc9f764ac76
> Author: Viresh Kumar <viresh.kumar@linaro.org>
> Date:   Sat Jan 12 05:14:38 2013 +0000
> 
>     cpufreq: Manage only online cpus
> 
>     cpufreq core doesn't manage offline cpus and if driver->init() has
>     returned
>     mask including offline cpus, it may result in unwanted behavior by
> cpufreq core
>     or governors.
> 
>     We need to get only online cpus in this mask. There are two places
> to fix this
>     mask, cpufreq core and cpufreq driver. It makes sense to do this
> at common place
>     and hence is done in core.
> 
>     Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>     Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> 
> And this is the latest piece of documentation available:
> 
> SMP systems normally have same clock source for a group of cpus. For these
> the
> .init() would be called only once for the first online cpu. Here the .init()
> routine must initialize policy->cpus with mask of all possible cpus (Online +
> Offline) that share the clock. Then the core would copy this mask onto
> policy->related_cpus and will reset policy->cpus to carry only online cpus.
> 
> 
> I saw acpi-cpufreq drivers driver->init() code and found it is not yet
> aligned to this
> theory and probably that is causing these failures.
> 
> I don't have enough knowledge about this driver and how is it used for all
> x86
> systems and so want somebody else (who has some prior experience with it)
> to check how policy->cpus and policy->related_cpus must be set from
> driver->init().

OK, so what exactly do you need to now?

This has to be addressed before final 3.9 this way or another - and the sooner
the better.

Thanks,
Rafael
Comment 15 Viresh Kumar 2013-03-22 12:15:30 UTC
On 22 March 2013 17:47, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> OK, so what exactly do you need to now?

s/now/know ??

We just need to set policy->cpus with all cpus (online or offline) that
share clock line with cpu for which init() is called.

cpufreq core with then set all those cpus onto related cpus and will
keep only online cpus in policy->cpus..

Sorry if i haven't answered it well :(
Comment 16 Thomas Renninger 2013-03-22 12:53:38 UTC
On Friday, March 22, 2013 01:17:18 PM Rafael J. Wysocki wrote:
> [Adding Boris and Thomas to the CC.]
> 
...
> > I don't have enough knowledge about this driver and how is it used for all
> > x86 systems and so want somebody else (who has some prior experience with
> > it) to check how policy->cpus and policy->related_cpus must be set from
> > driver->init().
> 
> OK, so what exactly do you need to now?
> 
> This has to be addressed before final 3.9 this way or another - and the
> sooner the better.

Is this all to try to fix "cpufreq driver gets loaded while some cores
were set offline before"?

I wonder how you run into "cpufreq is initialized with offlined cpus" case.
I remember that there were problems, but it's nearly impossible to run
into this if the cpufreq driver is loaded early at boot.

cpufreq_add_dev() and friends are complicated.
Those init functions got split some time ago and there also slipped
in a bug even it was simple splitting of functions.

I do not have time to look at fcf8058296edbc3de43adf095824fc32b067b9f8
right now. Don't know how much other stuff depends on it and how
sever it is on ARM that cpufreq does not correctly initialize with offlined
cpus..., I would revert this patch.

There are other cpu related drivers (at least on x86) which cannot
initialize correctly if the CPUs got offlined before driver init.
Obviously this is not a clever thing to do.

   Thomas
Comment 17 Rafael J. Wysocki 2013-03-22 13:05:30 UTC
On Friday, March 22, 2013 05:45:29 PM Viresh Kumar wrote:
> On 22 March 2013 17:47, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > OK, so what exactly do you need to now?
> 
> s/now/know ??

Yes, sorry.
Comment 18 Borislav Petkov 2013-03-22 13:21:28 UTC
On Fri, Mar 22, 2013 at 02:12:50PM +0100, Rafael J. Wysocki wrote:
> On Friday, March 22, 2013 05:45:29 PM Viresh Kumar wrote:
> > On 22 March 2013 17:47, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > > OK, so what exactly do you need to now?
> > 
> > s/now/know ??
> 
> Yes, sorry.

Right, I heard about this breakage on Bulldozer and I can repro it here
too :(.

Which is not good and it needs to be fixed ASAP. From looking at the
patches in question, this should be reproducible on everything, though,
since it is generic cpufreq code. And if so, this makes the urgency of a
fix much more dire.

Thanks.
Comment 19 Viresh Kumar 2013-03-22 13:43:34 UTC
Hi guys,

I will answer to questions of both of you in this mail.

On 22 March 2013 18:23, Thomas Renninger <trenn@suse.de> wrote:
> Is this all to try to fix "cpufreq driver gets loaded while some cores
> were set offline before"?

Not really. There are problems with acpi-cpufreq without that case too.

> I wonder how you run into "cpufreq is initialized with offlined cpus" case.
> I remember that there were problems, but it's nearly impossible to run
> into this if the cpufreq driver is loaded early at boot.

I always thought there is a way not to boot all cpus by passing stuff in
command line and so this is a easy case to reproduce.

> cpufreq_add_dev() and friends are complicated.

Not anymore, they are hugely simplified now.

> Those init functions got split some time ago and there also slipped
> in a bug even it was simple splitting of functions.
>
> I do not have time to look at fcf8058296edbc3de43adf095824fc32b067b9f8
> right now. Don't know how much other stuff depends on it and how
> sever it is on ARM that cpufreq does not correctly initialize with offlined
> cpus..., I would revert this patch.

Let me clear it to everybody. There isn't/shouldn't be a bug in cpufreq core,
its just that acpi-cpufreq driver isn't adapted well with the changes related to
affected_cpus and related_cpus.

I have never gone into coding for any non-ARM platform and am really not
aware of acpi-cpufreq driver and its users. That's why i told everybody where
the issue is, and they just need to fix acpi-cpufreq driver with right values of
policy->cpus (affected_cpus) and everything else would work after that.

--
viresh
Comment 20 Borislav Petkov 2013-03-22 13:54:27 UTC
On Fri, Mar 22, 2013 at 07:13:33PM +0530, Viresh Kumar wrote:
> I have never gone into coding for any non-ARM platform and am
> really not aware of acpi-cpufreq driver and its users. That's why
> i told everybody where the issue is, and they just need to fix
> acpi-cpufreq driver with right values of policy->cpus (affected_cpus)
> and everything else would work after that.

No, you're breaking existing drivers with your changes - nobody will fix
stuff *you're* breaking.

You need to get the hardware and test your changes on *that* hardware
too. Otherwise, the offending patches should be reverted until you can
provide a patchset which works on everything.

Geez, the fact that I even need to explain this...
Comment 21 Rafael J. Wysocki 2013-03-22 13:56:25 UTC
On Friday, March 22, 2013 01:43:34 PM bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=55411
> 
> --- Comment #19 from Viresh Kumar <viresh.kumar@linaro.org>  2013-03-22
> 13:43:34 ---
> Hi guys,
> 
> I will answer to questions of both of you in this mail.
> 
> On 22 March 2013 18:23, Thomas Renninger <trenn@suse.de> wrote:
> > Is this all to try to fix "cpufreq driver gets loaded while some cores
> > were set offline before"?
> 
> Not really. There are problems with acpi-cpufreq without that case too.
> 
> > I wonder how you run into "cpufreq is initialized with offlined cpus" case.
> > I remember that there were problems, but it's nearly impossible to run
> > into this if the cpufreq driver is loaded early at boot.
> 
> I always thought there is a way not to boot all cpus by passing stuff in
> command line and so this is a easy case to reproduce.
> 
> > cpufreq_add_dev() and friends are complicated.
> 
> Not anymore, they are hugely simplified now.
> 
> > Those init functions got split some time ago and there also slipped
> > in a bug even it was simple splitting of functions.
> >
> > I do not have time to look at fcf8058296edbc3de43adf095824fc32b067b9f8
> > right now. Don't know how much other stuff depends on it and how
> > sever it is on ARM that cpufreq does not correctly initialize with offlined
> > cpus..., I would revert this patch.
> 
> Let me clear it to everybody. There isn't/shouldn't be a bug in cpufreq core,
> its just that acpi-cpufreq driver isn't adapted well with the changes related
> to affected_cpus and related_cpus.

Which doesn't matter a whit.  It worked before your changes and it has to work
after them.

> I have never gone into coding for any non-ARM platform and am really not
> aware of acpi-cpufreq driver and its users. That's why i told everybody where
> the issue is, and they just need to fix acpi-cpufreq driver with right values
> of policy->cpus (affected_cpus) and everything else would work after that.

Which wasn't the right thing to do.

If your changes break something, *you* (and nobody else) are responsible for
fixing that.

Asking for help is obviously fine, but even if no one can (or has the time to)
help at the moment, you are still responsible for fixing the breakage.  If you
aren't familiar with the code in question, it's the time for a crash course.

Thanks,
Rafael
Comment 22 Thomas Renninger 2013-03-22 14:04:09 UTC
On Friday, March 22, 2013 07:13:33 PM Viresh Kumar wrote:
> Hi guys,
> 
> I will answer to questions of both of you in this mail.
> 
> On 22 March 2013 18:23, Thomas Renninger <trenn@suse.de> wrote:
> > Is this all to try to fix "cpufreq driver gets loaded while some cores
> > were set offline before"?
> 
> Not really. There are problems with acpi-cpufreq without that case too.
> 
> > I wonder how you run into "cpufreq is initialized with offlined cpus"
> > case.
> > I remember that there were problems, but it's nearly impossible to run
> > into this if the cpufreq driver is loaded early at boot.
> 
> I always thought there is a way not to boot all cpus by passing stuff in
> command line and so this is a easy case to reproduce.
I am pretty sure cpuidle states won't initialize and in best case you never
get them working on the offlined cpus.
Local APICs won't be set up, ...

Such a parameter will never exist for x86.

> > cpufreq_add_dev() and friends are complicated.
> 
> Not anymore, they are hugely simplified now.
They were hugely simplified and things are not working anymore and you
do not know why...

> > Those init functions got split some time ago and there also slipped
> > in a bug even it was simple splitting of functions.
> > 
> > I do not have time to look at fcf8058296edbc3de43adf095824fc32b067b9f8
> > right now. Don't know how much other stuff depends on it and how
> > sever it is on ARM that cpufreq does not correctly initialize with
> > offlined
> > cpus..., I would revert this patch.
> 
> Let me clear it to everybody. There isn't/shouldn't be a bug in cpufreq
> core, its just that acpi-cpufreq driver isn't adapted well with the changes
> related to affected_cpus and related_cpus.
And powernow-k8 driver is broken.
The others are not tested that often, I expect they broke as well, right?
 
> I have never gone into coding for any non-ARM platform and am really not
> aware of acpi-cpufreq driver and its users. That's why i told everybody
> where the issue is, and they just need to fix acpi-cpufreq driver with
> right values of policy->cpus (affected_cpus) and everything else would work
> after that.

Sorry, I cannot look into this due to lack of time, but I remember that
there were reasons why cpufreq_add_dev() was complicated.
Or that it's really easy to mess it up and it's not easy to fix it again.

     Thomas
Comment 23 Viresh Kumar 2013-03-22 14:05:21 UTC
On 22 March 2013 19:24, Borislav Petkov <bp@alien8.de> wrote:
> No, you're breaking existing drivers with your changes - nobody will fix
> stuff *you're* breaking.
>
> You need to get the hardware and test your changes on *that* hardware
> too. Otherwise, the offending patches should be reverted until you can
> provide a patchset which works on everything.
>
> Geez, the fact that I even need to explain this...

You are right in every sense and i can't defend myself on that, no excuse.

When i wrote those patches, i fixed many drivers and wasn't aware that
acpi-cpufreq got broken. And only with the mail from Duncan i realized it.

Then i tried to go thorough it & fix it and found there is some stuff in init()
which is is sort of per user case dependent. And wasn't sure how exactly
to fix it and so shouted for help.

But yes i will give it another try to go through the driver in details and see
if i can fix it.

--
viresh
Comment 24 Viresh Kumar 2013-03-22 14:06:28 UTC
On 22 March 2013 19:33, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, March 22, 2013 01:43:34 PM bugzilla-daemon@bugzilla.kernel.org
> wrote:
>> I have never gone into coding for any non-ARM platform and am really not
>> aware of acpi-cpufreq driver and its users. That's why i told everybody
>> where
>> the issue is, and they just need to fix acpi-cpufreq driver with right
>> values
>> of policy->cpus (affected_cpus) and everything else would work after that.
>
> Which wasn't the right thing to do.
>
> If your changes break something, *you* (and nobody else) are responsible for
> fixing that.

Yes, i already replied in another mail on what actually happened.

> Asking for help is obviously fine, but even if no one can (or has the time
> to)
> help at the moment, you are still responsible for fixing the breakage.  If
> you
> aren't familiar with the code in question, it's the time for a crash course.

yes, i need to do that now.
Comment 25 Viresh Kumar 2013-03-22 14:10:46 UTC
On 22 March 2013 19:34, Thomas Renninger <trenn@suse.de> wrote:
> I am pretty sure cpuidle states won't initialize and in best case you never
> get them working on the offlined cpus.
> Local APICs won't be set up, ...
>
> Such a parameter will never exist for x86.

I will see if i can find what i was referring to here.

> They were hugely simplified and things are not working anymore and you
> do not know why...

I know why, but don't know (for now) how to fix it for acpi-cpufreq.

> And powernow-k8 driver is broken.
> The others are not tested that often, I expect they broke as well, right?

acpi-cpufreq is broken and so all others who are using it. Sorry for asking
the stupid question now but what's the hierarchy of cpufreq drivers for intel
(I will try to go through it now), some drivers use acpi-cpufreq driver?

> Sorry, I cannot look into this due to lack of time, but I remember that
> there were reasons why cpufreq_add_dev() was complicated.
> Or that it's really easy to mess it up and it's not easy to fix it again.

Its not cpufreq_add_dev() that is broken but some changes that were part
of the same patch, i.e. part that tried to sort out affected and related cpus.
Comment 26 Borislav Petkov 2013-03-22 14:11:54 UTC
On Fri, Mar 22, 2013 at 07:35:19PM +0530, Viresh Kumar wrote:
> But yes i will give it another try to go through the driver in details
> and see if i can fix it.

Right, and make sure it is a minimal fix which can go in now. If it
appears to get more involved, we'd probably need to revert now and you
can retry next merge window.

Thanks.
Comment 27 Borislav Petkov 2013-03-22 14:13:44 UTC
On Fri, Mar 22, 2013 at 07:40:45PM +0530, Viresh Kumar wrote:
> acpi-cpufreq is broken and so all others who are using it. Sorry for
> asking the stupid question now but what's the hierarchy of cpufreq
> drivers for intel (I will try to go through it now), some drivers use
> acpi-cpufreq driver?

All relevant x86 out there uses acpi-cpufreq. Which means that breaking
it, breaks cpufreq on x86. Which is absolutely a no-no.
Comment 28 Viresh Kumar 2013-03-23 18:35:10 UTC
On 19 March 2013 14:20, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> We are talking here about a bug reported by Duncan here. His cpu/cpu*/cpufreq
> directory are getting corrupted with 3.9-rc3 and was working well with 3.8

Hi Duncan,

My weekend is already spoiled (due to my bugs :) ) and i don't want to spoil
yours. But this is a important patch to test, can you please give it a
try as soon
as you can? And confirm if this resolves your issues or not? (patch is attached
too)

------------x------------------x------------------

From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Sat, 23 Mar 2013 23:42:44 +0530
Subject: [PATCH] cpufreq: acpi-cpufreq: Set policy->cpus correctly from .init()

With the addition of following patch:

fcf8058 cpufreq: Simplify cpufreq_add_dev()

cpufreq driver's .init() routine must initialize policy->cpus with mask of all
possible cpus (Online + Offline) that share the clock. Then the core would copy
this mask onto policy->related_cpus and will reset policy->cpus to carry only
online cpus.

acpi-cpufreq driver wasn't updated with this assumption and so sometimes when
we try to hot[un]plug cpus at run time, sysfs directories gets corrupted.

This patch fixes acpi-cpufreq driver against this corruption.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/acpi-cpufreq.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 937bc28..ceb5925 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -722,15 +722,7 @@ static int acpi_cpufreq_cpu_init(struct
cpufreq_policy *policy)
 	perf = data->acpi_data;
 	policy->shared_type = perf->shared_type;

-	/*
-	 * Will let policy->cpus know about dependency only when software
-	 * coordination is required.
-	 */
-	if (policy->shared_type == CPUFREQ_SHARED_TYPE_ALL ||
-	    policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) {
-		cpumask_copy(policy->cpus, perf->shared_cpu_map);
-	}
-	cpumask_copy(policy->related_cpus, perf->shared_cpu_map);
+	cpumask_copy(policy->cpus, perf->shared_cpu_map);

 #ifdef CONFIG_SMP
 	dmi_check_system(sw_any_bug_dmi_table);
@@ -740,9 +732,7 @@ static int acpi_cpufreq_cpu_init(struct
cpufreq_policy *policy)
 	}

 	if (check_amd_hwpstate_cpu(cpu) && !acpi_pstate_strict) {
-		cpumask_clear(policy->cpus);
-		cpumask_set_cpu(cpu, policy->cpus);
-		cpumask_copy(policy->related_cpus, cpu_sibling_mask(cpu));
+		cpumask_copy(policy->cpus, cpu_sibling_mask(cpu));
 		policy->shared_type = CPUFREQ_SHARED_TYPE_HW;
 		pr_info_once(PFX "overriding BIOS provided _PSD data\n");
 	}


--
viresh
Comment 29 Thomas Renninger 2013-03-24 09:05:47 UTC
On Friday, March 22, 2013 03:04:03 PM Thomas Renninger wrote:
> On Friday, March 22, 2013 07:13:33 PM Viresh Kumar wrote:
> > Hi guys,
> > 
> > I will answer to questions of both of you in this mail.
> > 
> > On 22 March 2013 18:23, Thomas Renninger <trenn@suse.de> wrote:
> > > Is this all to try to fix "cpufreq driver gets loaded while some cores
> > > were set offline before"?
> > 
> > Not really. There are problems with acpi-cpufreq without that case too.
> > 
> > > I wonder how you run into "cpufreq is initialized with offlined cpus"
> > > case.
> > > I remember that there were problems, but it's nearly impossible to run
> > > into this if the cpufreq driver is loaded early at boot.
> > 
> > I always thought there is a way not to boot all cpus by passing stuff in
> > command line and so this is a easy case to reproduce.
> 
> I am pretty sure cpuidle states won't initialize and in best case you never
> get them working on the offlined cpus.
> Local APICs won't be set up, ...
> 
> Such a parameter will never exist for x86.
I take that back.
CPU hot-add (even if CPU is not present at boot time) works.
I looked at C-states for that some time ago and it should only work if
the hot-add event came in via ACPI events for CPUs which were not
initialized at boot time. Better would be to initialize the first time it
gets switched online.
Anyway, making such stuff (cpufreq/cpuidle/...) more robust,
is certainly a good idea.

...
> And powernow-k8 driver is broken.
> The others are not tested that often, I expect they broke as well, right?
And powernow-k8 does not exist anymore..., fortunately I didn't have to
look at this stuff for some time.

... 
> Sorry, I cannot look into this due to lack of time,...


       Thomas
Comment 30 Viresh Kumar 2013-03-24 09:10:54 UTC
On 24 March 2013 14:35, Thomas Renninger <trenn@suse.de> wrote:
> I take that back.
> CPU hot-add (even if CPU is not present at boot time) works.
> I looked at C-states for that some time ago and it should only work if
> the hot-add event came in via ACPI events for CPUs which were not
> initialized at boot time. Better would be to initialize the first time it
> gets switched online.

I removed it from my TODO list, thanks :)

> Anyway, making such stuff (cpufreq/cpuidle/...) more robust,
> is certainly a good idea.

Yes, that was the basic intent of my earlier patches that went into 3.9.
It was all about simplifying it and making it robust.

> And powernow-k8 does not exist anymore..., fortunately I didn't have to
> look at this stuff for some time.

Sure? Driver is still present in mainline.

BTW, i have given an initial fix for acpi-cpufreq (which should work)
and waiting
for Duncan to reply back. All other drivers don't set affected/related
cpus directly,
so they should be alright.

--
viresh
Comment 31 Duncan 2013-03-24 09:48:41 UTC
(In reply to comment #28)
> My weekend is already spoiled (due to my bugs :) ) and i don't want to spoil
> yours.

Don't worry about it.  They're all days in the week, to me.  I don't really have a defined "weekend".  In fact, I had been (somewhat impatiently) grumbling to myself Friday that I was likely to have to wait until Monday for something concrete to test, so I'm happy to be demonstrated wrong! =:^)

> [PATCH] cpufreq: acpi-cpufreq: Set policy->cpus correctly from .init()

You appear to be on the right path as all the dirs and symlinks are there now, but it looks like you'll need a v2 as the order/pairing is now very strange, both as booted and after a s2ram/resume (which changes the order but it's still strange):

Testing against 3.9-rc4 now.

Patch applied pre-suspend ls -dl as above (original pairing is 0/1, 2/3, 4/5, with the first always a dir and the second always a symlink to the first, as seen in comment #0):

/sys/devices/system/cpu/cpu0/cpufreq/
/sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq/
/sys/devices/system/cpu/cpu2/cpufreq/
/sys/devices/system/cpu/cpu3/cpufreq/
/sys/devices/system/cpu/cpu4/cpufreq -> ../cpu3/cpufreq/
/sys/devices/system/cpu/cpu5/cpufreq -> ../cpu2/cpufreq/

Post s2ram/resume cycle:

/sys/devices/system/cpu/cpu0/cpufreq/
/sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq/
/sys/devices/system/cpu/cpu2/cpufreq -> ../cpu4/cpufreq/
/sys/devices/system/cpu/cpu3/cpufreq -> ../cpu5/cpufreq/
/sys/devices/system/cpu/cpu4/cpufreq/
/sys/devices/system/cpu/cpu5/cpufreq/


But I don't know whether it's CPU ordering that's weird, or the cpufreq ordering.  IOW, I don't know whether it /should/ be 0/1, 2/3, 4/5 or not, because I don't know whether those numbers actually reflect the hardware so it's the ordering above that's bad, or if the cpu numbers are just arbitrarily assigned, and the ordering above reflects the actual hardware, and just looks strange due to the arbitrary cpu numbering.

Either way, there's the correct number of dirs and links now, but the ordering is extremely confusing and looks wrong, regardless of whether it actually corresponds to the hardware or not, something I don't have the ability to judge.

Thanks! =:^)
Comment 32 Viresh Kumar 2013-03-24 10:02:42 UTC
Hi Duncan,

Please reply to this mail rather than using bugzilla as all others might not
be following bugzilla rerport.

> --- Comment #31 from Duncan <1i5t5.duncan@cox.net>  2013-03-24 09:48:41 ---
> (In reply to comment #28)
>> My weekend is already spoiled (due to my bugs :) ) and i don't want to spoil
>> yours.
>
> Don't worry about it.  They're all days in the week, to me.  I don't really
> have a defined "weekend".  In fact, I had been (somewhat impatiently)
> grumbling
> to myself Friday that I was likely to have to wait until Monday for something
> concrete to test, so I'm happy to be demonstrated wrong! =:^)

:)

>> [PATCH] cpufreq: acpi-cpufreq: Set policy->cpus correctly from .init()
>
> You appear to be on the right path as all the dirs and symlinks are there
> now,
> but it looks like you'll need a v2 as the order/pairing is now very strange,
> both as booted and after a s2ram/resume (which changes the order but it's
> still
> strange):

:(

> Testing against 3.9-rc4 now.
>
> Patch applied pre-suspend ls -dl as above (original pairing is 0/1, 2/3, 4/5,
> with the first always a dir and the second always a symlink to the first, as
> seen in comment #0):
>
> /sys/devices/system/cpu/cpu0/cpufreq/
> /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq/
> /sys/devices/system/cpu/cpu2/cpufreq/
> /sys/devices/system/cpu/cpu3/cpufreq/
> /sys/devices/system/cpu/cpu4/cpufreq -> ../cpu3/cpufreq/
> /sys/devices/system/cpu/cpu5/cpufreq -> ../cpu2/cpufreq/

The way it works (in cpufreq core), the first cpu of group registered
for cpufreq
would get a directory and second one would get a symlink.

And the above ones also doesn't look good. 2/5 and 3/4 shouldn't be the groups.

acpi-cpufreq driver is getting this information from perf->shared_cpu_map and
that seems to be wrong to me now.

> Post s2ram/resume cycle:
>
> /sys/devices/system/cpu/cpu0/cpufreq/
> /sys/devices/system/cpu/cpu1/cpufreq -> ../cpu0/cpufreq/
> /sys/devices/system/cpu/cpu2/cpufreq -> ../cpu4/cpufreq/
> /sys/devices/system/cpu/cpu3/cpufreq -> ../cpu5/cpufreq/
> /sys/devices/system/cpu/cpu4/cpufreq/
> /sys/devices/system/cpu/cpu5/cpufreq/

2/4 and 3/5 are also wrong groups.

> But I don't know whether it's CPU ordering that's weird, or the cpufreq
> ordering.  IOW, I don't know whether it /should/ be 0/1, 2/3, 4/5 or not,
> because I don't know whether those numbers actually reflect the hardware so
> it's the ordering above that's bad, or if the cpu numbers are just
> arbitrarily
> assigned, and the ordering above reflects the actual hardware, and just looks
> strange due to the arbitrary cpu numbering.

One thing i am sure about is cpu order is fixed at boot and after boot whatever
you do can't change that order... So order should always be 0/1, 2/3, 4/5

> Either way, there's the correct number of dirs and links now, but the
> ordering
> is extremely confusing and looks wrong, regardless of whether it actually
> corresponds to the hardware or not, something I don't have the ability to
> judge.

Hmm.. Can you try one thing? Run 3.8 over your machine and give output of
cpufreq-info and ls -ld after boot and resume..

I would like to see what's the original behavior.

--
viresh
Comment 33 Borislav Petkov 2013-03-24 10:31:05 UTC
On Sun, Mar 24, 2013 at 02:40:53PM +0530, Viresh Kumar wrote:
> > And powernow-k8 does not exist anymore..., fortunately I didn't have to
> > look at this stuff for some time.
> 
> Sure? Driver is still present in mainline.

powernow-k8 is still there for, well, K8 only. Newer machines are
handled by acpi-cpufreq. Thus is acpi-cpufreq's health of major
importance especially now. :)
Comment 34 Duncan 2013-03-24 11:49:30 UTC
On Sun, 24 Mar 2013 15:32:39 +0530
Viresh Kumar <viresh.kumar@linaro.org> wrote:

> Hi Duncan,
> 
> Please reply to this mail rather than using bugzilla as all others
> might not be following bugzilla rerport.

Thanks.  I'm used to it being the other way 'round.

> >> [PATCH] cpufreq: acpi-cpufreq: Set policy->cpus correctly
> >> from .init()
> >
> > You appear to be on the right path as all the dirs and symlinks are
> > there now, but it looks like you'll need a v2 as the order/pairing
> > is now very strange, both as booted and after a s2ram/resume (which
> > changes the order but it's still strange):

> Hmm.. Can you try one thing? Run 3.8 over your machine and give
> output of cpufreq-info and ls -ld after boot and resume..
> 
> I would like to see what's the original behavior.

Good idea! =:^)  It now appears that your bug simply cascaded on a
previously unreported bug in earlier kernels.

The 3.8 ls -dl isn't too interesting as they were all dirs not symlinks
then, and if they hadn't stuck around after a suspend/resume I'd have
definitely noticed and reported the bug back then, but it's still
useful to confirm.

The 3.8 pre-suspend and post resume ls -dl are identical -- no missing
dirs (and no symlinks):

/sys/devices/system/cpu/cpu0/cpufreq/
/sys/devices/system/cpu/cpu1/cpufreq/
/sys/devices/system/cpu/cpu2/cpufreq/
/sys/devices/system/cpu/cpu3/cpufreq/
/sys/devices/system/cpu/cpu4/cpufreq/
/sys/devices/system/cpu/cpu5/cpufreq/

The interesting results are the 3.8 cpufreq-info.  I'll attach the full
output, but here's the interesting bit:

3.8 pre-suspend cpufreq-info excerpts (nicely paired, as are the
pre-patch pre-suspend results for 3.9-rc):

analyzing CPU 0:
  CPUs which run at the same hardware frequency: 0 1
  CPUs which need to have their frequency coordinated by software: 0
analyzing CPU 1:
  CPUs which run at the same hardware frequency: 0 1
  CPUs which need to have their frequency coordinated by software: 1
analyzing CPU 2:
  CPUs which run at the same hardware frequency: 2 3
  CPUs which need to have their frequency coordinated by software: 2
analyzing CPU 3:
  CPUs which run at the same hardware frequency: 2 3
  CPUs which need to have their frequency coordinated by software: 3
analyzing CPU 4:
  CPUs which run at the same hardware frequency: 4 5
  CPUs which need to have their frequency coordinated by software: 4
analyzing CPU 5:
  CPUs which run at the same hardware frequency: 4 5
  CPUs which need to have their frequency coordinated by software: 5

3.8 post-resume (screwed up pairing, so that bit's not a 3.9 thing, I
just noticed it in 3.9 due to the now missing symlinks/dirs):

analyzing CPU 0:
  CPUs which run at the same hardware frequency: 0 1
  CPUs which need to have their frequency coordinated by software: 0
analyzing CPU 1:
  CPUs which run at the same hardware frequency: 0 1
  CPUs which need to have their frequency coordinated by software: 1
analyzing CPU 2:
  CPUs which run at the same hardware frequency: 2
  CPUs which need to have their frequency coordinated by software: 2
analyzing CPU 3:
  CPUs which run at the same hardware frequency: 3
  CPUs which need to have their frequency coordinated by software: 3
analyzing CPU 4:
  CPUs which run at the same hardware frequency: 2 4
  CPUs which need to have their frequency coordinated by software: 4
analyzing CPU 5:
  CPUs which run at the same hardware frequency: 3 5
  CPUs which need to have their frequency coordinated by software: 5
Comment 35 Viresh Kumar 2013-03-24 12:16:53 UTC
On 24 March 2013 17:19, Duncan <1i5t5.duncan@cox.net> wrote:
> On Sun, 24 Mar 2013 15:32:39 +0530 Viresh Kumar <viresh.kumar@linaro.org>
> wrote:

>> Hmm.. Can you try one thing? Run 3.8 over your machine and give
>> output of cpufreq-info and ls -ld after boot and resume..
>>
>> I would like to see what's the original behavior.
>
> Good idea! =:^)  It now appears that your bug simply cascaded on a
> previously unreported bug in earlier kernels.

That made me happy, i am not the only culprit :)

> The 3.8 pre-suspend and post resume ls -dl are identical -- no missing
> dirs (and no symlinks):
>
> /sys/devices/system/cpu/cpu0/cpufreq/
> /sys/devices/system/cpu/cpu1/cpufreq/
> /sys/devices/system/cpu/cpu2/cpufreq/
> /sys/devices/system/cpu/cpu3/cpufreq/
> /sys/devices/system/cpu/cpu4/cpufreq/
> /sys/devices/system/cpu/cpu5/cpufreq/

They were all separate directories (instead of symlinks) earlier because
this only depended on policy->cpus earlier. And none of the cpus are
shared in policy->cpus, i.e. policy->cpus was always policy->cpu.

> 3.8 pre-suspend cpufreq-info excerpts (nicely paired, as are the
> pre-patch pre-suspend results for 3.9-rc):

No they are still not paired well. This is how we should read your analysis:
related-cpus: "same hardware freq"
affected-cpus or policy->cpus: "frequency coordinated by software"

> analyzing CPU 0:
>   CPUs which run at the same hardware frequency: 0 1

related cpus have correct pairs

>   CPUs which need to have their frequency coordinated by software: 0

but affected cpus doesn't

> 3.8 post-resume (screwed up pairing, so that bit's not a 3.9 thing

I told you earlier, this made me happy :)

> analyzing CPU 0:
>   CPUs which run at the same hardware frequency: 0 1
>   CPUs which need to have their frequency coordinated by software: 0
> analyzing CPU 1:
>   CPUs which run at the same hardware frequency: 0 1
>   CPUs which need to have their frequency coordinated by software: 1

These stayed as is as cpu 0 is non removable cpu and so doesn't get
unregistered from cpufreq at all.

> analyzing CPU 2:
>   CPUs which run at the same hardware frequency: 2
>   CPUs which need to have their frequency coordinated by software: 2
> analyzing CPU 3:
>   CPUs which run at the same hardware frequency: 3
>   CPUs which need to have their frequency coordinated by software: 3
> analyzing CPU 4:
>   CPUs which run at the same hardware frequency: 2 4

related cpus got corrupted here.

>   CPUs which need to have their frequency coordinated by software: 4
> analyzing CPU 5:
>   CPUs which run at the same hardware frequency: 3 5
>   CPUs which need to have their frequency coordinated by software: 5


Now back to the real issues:

@Rafael/Borislav/Thomas/Andre/Darrick:

"What do we mean by software AND hardware coordination for x86 ?"

Following are the sha-id's which had something to do with above statement.

3b2d99429e3386b6e2ac949fc72486509c8bbe36
46f18e3a28295a9e11a6ffa4478241c19bc93735
acd316248205d553594296f1895ba5196b89ffcc
e8628dd06d66f2e3965ec9742029b401d63434f1
8adcc0c674004c0f9467031a93dc639c2b01411f

On the platform i work (ARM) there are only two cases, cpus share clock line or
they don't. So, they share policy struct or they don't.


Fixing Duncan's issues shouldn't be a very big deal now as i was thinking too
much about what was broken without my patches too. And now that part is
pretty clear.

--
viresh
Comment 36 Viresh Kumar 2013-03-24 12:23:42 UTC
On 24 March 2013 17:46, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Fixing Duncan's issues shouldn't be a very big deal now as i was thinking too
> much about what was broken without my patches too. And now that part is
> pretty clear.

Hi Duncan,

Try attached patch and this should take back your system to where it was.

NOTE: With this patch your related_cpus wouldn't show any groups and
related cpus must be same as affected cpus. All cpu*/cpufreq must be
directories now and no symlinks.

--
viresh
Comment 37 Duncan 2013-03-25 11:15:44 UTC
On Sun, 24 Mar 2013 17:53:41 +0530
Viresh Kumar <viresh.kumar@linaro.org> wrote:

> On 24 March 2013 17:46, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > Fixing Duncan's issues shouldn't be a very big deal now as i was
> > thinking too much about what was broken without my patches too. And
> > now that part is pretty clear.
> 
> Hi Duncan,
> 
> Try attached patch and this should take back your system to where it
> was.
> 
> NOTE: With this patch your related_cpus wouldn't show any groups and
> related cpus must be same as affected cpus. All cpu*/cpufreq must be
> directories now and no symlinks.

FWIW, with this patch, pre-s2ram and post-resume are indeed consistent,
but it's not back to where it was.

With this patch, each core is a cpufreq law unto itself.  Maybe that's
what you meant with the note, maybe not (I know the mapping of sysfs
files to cpufreq-info output was stated up-thread, but I lost track,
and how affected vs related maps to hardware vs software coordinated
and what it all actually means other than what I'm seeing isn't ideal,
is apparently a bit more than I'm able to keep in my head ATM), but it's
what I get. The relevant bits of cpufreq-info output:

analyzing CPU 0:
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
analyzing CPU 1:
  CPUs which run at the same hardware frequency: 1
  CPUs which need to have their frequency coordinated by software: 1
analyzing CPU 2:
  CPUs which run at the same hardware frequency: 2
  CPUs which need to have their frequency coordinated by software: 2
analyzing CPU 3:
  CPUs which run at the same hardware frequency: 3
  CPUs which need to have their frequency coordinated by software: 3
analyzing CPU 4:
  CPUs which run at the same hardware frequency: 4
  CPUs which need to have their frequency coordinated by software: 4
analyzing CPU 5:
  CPUs which run at the same hardware frequency: 5
  CPUs which need to have their frequency coordinated by software: 5

But at least it's consistent:  The same results both before and after a
suspend/resume cycle.

And given that 3.8 wasn't ideal either, maybe that's good enough for
this cycle, and a real fix will have to wait until the next commit
window and stable-tree.  That'd give us more leeway to fix it right, as
well as a full cycle for testing anything else the "correct" fix might
dredge up.
Comment 38 Viresh Kumar 2013-03-25 11:23:40 UTC
On 25 March 2013 16:45, Duncan <1i5t5.duncan@cox.net> wrote:
> FWIW, with this patch, pre-s2ram and post-resume are indeed consistent,
> but it's not back to where it was.
>
> With this patch, each core is a cpufreq law unto itself.  Maybe that's
> what you meant with the note, maybe not (I know the mapping of sysfs
> files to cpufreq-info output was stated up-thread, but I lost track,
> and how affected vs related maps to hardware vs software coordinated
> and what it all actually means other than what I'm seeing isn't ideal,
> is apparently a bit more than I'm able to keep in my head ATM), but it's
> what I get. The relevant bits of cpufreq-info output:
>
> analyzing CPU 0:
>   CPUs which run at the same hardware frequency: 0
>   CPUs which need to have their frequency coordinated by software: 0
> analyzing CPU 1:
>   CPUs which run at the same hardware frequency: 1
>   CPUs which need to have their frequency coordinated by software: 1
> analyzing CPU 2:
>   CPUs which run at the same hardware frequency: 2
>   CPUs which need to have their frequency coordinated by software: 2
> analyzing CPU 3:
>   CPUs which run at the same hardware frequency: 3
>   CPUs which need to have their frequency coordinated by software: 3
> analyzing CPU 4:
>   CPUs which run at the same hardware frequency: 4
>   CPUs which need to have their frequency coordinated by software: 4
> analyzing CPU 5:
>   CPUs which run at the same hardware frequency: 5
>   CPUs which need to have their frequency coordinated by software: 5
>
> But at least it's consistent:  The same results both before and after a
> suspend/resume cycle.
>
> And given that 3.8 wasn't ideal either, maybe that's good enough for
> this cycle, and a real fix will have to wait until the next commit
> window and stable-tree.  That'd give us more leeway to fix it right, as
> well as a full cycle for testing anything else the "correct" fix might
> dredge up.

This is exactly what i expected and i wrote in Note. Because cpufreq core
does a lot of work based on related_cpus now, its better we don't set it
blindly for x86.

Following code was the only user of relatead_cpus in 3.8 code:

        /* Set governor before ->init, so that driver could check it */
#ifdef CONFIG_HOTPLUG_CPU
        for_each_online_cpu(sibling) {
                struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling);
                if (cp && cp->governor &&
                    (cpumask_test_cpu(cpu, cp->related_cpus))) {
                        policy->governor = cp->governor;
                        found = 1;
                        break;
                }
        }
#endif


There is no other user of relatead_cpus earlier in cpufreq core and that's why
i wonder why it was added earlier.

But a grep of relatead_cpus for 3.8 showed some interesting users in:
tools/power/cpupower/

I will try to check what they are doing with it, but for the kernel it
was almost unused.
And not it is very much used :)

But for 3.9 i believe this patch should be good enough.

Thanks for testing it.

--
viresh
Comment 39 Borislav Petkov 2013-03-25 13:55:20 UTC
On Mon, Mar 25, 2013 at 04:53:39PM +0530, Viresh Kumar wrote:
> But for 3.9 i believe this patch should be good enough.

Seems so here too.

Tested-by: Borislav Petkov <bp@suse.de>
Comment 40 Viresh Kumar 2013-03-29 14:14:36 UTC
On 25 March 2013 16:53, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> There is no other user of relatead_cpus earlier in cpufreq core and that's
> why
> i wonder why it was added earlier.
>
> But a grep of relatead_cpus for 3.8 showed some interesting users in:
> tools/power/cpupower/
>
> I will try to check what they are doing with it, but for the kernel it
> was almost unused.

I checked tools and they aren't doing anything tricky with it. Just reading
groups of cpus from sysfs.

Now i believe even tools should be patched a bit to give correct meanings
of affected or related cpus. Let me try it.
Comment 41 Zhang Rui 2013-04-13 17:45:30 UTC
alex,
please verify if this is the same problem you encountered.
Comment 42 Rafael J. Wysocki 2013-06-04 00:56:15 UTC
Fixed by commit aa77a52 "cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init()".

Note You need to log in before you can comment on or make changes to this bug.