Created attachment 102441 [details] cpufreq-info trace Hello, I have just installed the 3.9.3 kernel on my Debian 7 installation. Linux 3.9.3-exa #7 SMP Fri May 24 11:12:58 CEST 2013 x86_64 GNU/Linux CPU Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (Ivy Bridge) Since then, the related_cpus file in /sys/devices/system/cpu/cpu*/cpufreq/related_cpus just returns the id of the current core: /sys/devices/system/cpu/cpu0/cpufreq/related_cpus contains 0 /sys/devices/system/cpu/cpu6/cpufreq/related_cpus contains 6 etc I read on an archive thread (http://lkml.indiana.edu/hypermail/linux/kernel/1303.3/00144.html) that this could happen when the new intel_pstate driver is activated, but I have deactivated it in the kernel options. I have attached my cpufreq-info trace which proves I am using the default acpi-cpufreq driver. Therefore, there must be some problem with related_cpus when using the acpi-cpufreq driver. Here is a listing of my files in /sys/devices/system/cpu/cpu0/cpufreq/: affected_cpus:0 bios_limit:3401000 cpuinfo_cur_freq:3401000 cpuinfo_max_freq:3401000 cpuinfo_min_freq:1600000 cpuinfo_transition_latency:10000 related_cpus:0 scaling_available_frequencies:3401000 3400000 3300000 3100000 3000000 2900000 2800000 2600000 2500000 2400000 2200000 2100000 2000000 1900000 1700000 1600000 scaling_available_governors:conservative userspace powersave ondemand performance scaling_cur_freq:3401000 scaling_driver:acpi-cpufreq scaling_governor:ondemand scaling_max_freq:3401000 scaling_min_freq:1600000 scaling_setspeed:<unsupported>
(In reply to comment #0) > Since then, the related_cpus file in > /sys/devices/system/cpu/cpu*/cpufreq/related_cpus just returns the id of the > current core: > > /sys/devices/system/cpu/cpu0/cpufreq/related_cpus contains 0 > /sys/devices/system/cpu/cpu6/cpufreq/related_cpus contains 6 etc Yes this is changed recently in kernel... Now, affected cpus contain: All online cpus that share clock line related_cpus contain: All online+offline cpus that share clock line
@Viresh Kumar: Alright, I get this. But my Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (Ivy Bridge) cpu does have threads that share clock frequency domain. Here are the results any older version (from 3.2.0 at least) of the kernel have : /sys/devices/system/cpu/cpu0/cpufreq/related_cpus: 0 1 2 3 4 5 6 7 Which means : Any thread within this set of threads physically apply the same frequency. In the 3.9.3 kernel version, I don't have this result anymore (see my previous post), and I doubt the Linux kernel has changed any physical property of my processor. According to what you said in your post, is "shared clock line" different from "shared frequency domain" ? Thank you for your time
The frequency/p-state is separate per thread, all threads non-idle threads will run at the frequency that will satisfy all the requests. The CPU itself chooses the actual P-state based on all the requests. So all cores/threads are logically separate but do affect other cores.
Absolutely. That's what I tried to mention, probably in a clumsy way.
So reflecting this relation up through sysfs is not particularly useful IMHO. Showing only the logical connection is more correct IMHO. I am not entirely understand why the change was made (not required :-) but is more accurate from my point of view.
(In reply to comment #5) > I am not entirely understand why the change was made (not required :-) but is > more accurate from my point of view. I didn't understood what was the use of related_cpus in kernel at all earlier.. Sent many mails across lists and didn't got a satisfactory answer.. Searched kernel source and nobody seemed to be using it (including cpufreq core).. And when I cleaned up the core finally it made more sense to use it in a separate way: i.e. have both online and offline cpus in it.
@Dirk Brandewie Having the physical connection allows to understand how the underlying hardware works. Having a file in cpuN that contains N only is useless. @Viresh Kumar Yes but on my Intel Core i7 processor, all the offline+online cores are sharing the same frequency domain, which means they should all appear in the related_cpu files.
(In reply to comment #7) > @Viresh Kumar Yes but on my Intel Core i7 processor, all the offline+online > cores are sharing the same frequency domain, which means they should all > appear > in the related_cpu files. cpufreq doesn't care how actual hardware clock domains are managed, it just trusts whatever underlying cpufreq driver has communicated. Because x86 drivers want cpufreq core to believe that every core has a separate clock, so it is. It doesn't make any sense what so ever to keep only one cpu in affected_cpus and all cpus 0-7 in related_cpus as that information isn't used by core. related cpus comes same as affected cpus in your case because you only have one core per domain (virtual domain :) ).. But in case you have more cores in a cluster and few of them are offlined, these two will have different values.
> It doesn't make any sense what so ever to keep only one cpu in affected_cpus > and all cpus 0-7 in related_cpus as that information isn't used by core. > related cpus comes same as affected cpus in your case because you only have > one > core per domain (virtual domain :) ).. But in case you have more cores in a > cluster and few of them are offlined, these two will have different values. I am not arguing for or against the difference between affected_cpus and related_cpus. For me so far, the difference between the two was: - affected_cpus was a "list of CPUs that require software coordination of frequency" - related_cpus was a "List of CPUs that need some sort of frequency coordination, whether software or hardware." This is at least what the official cpufreq Kernel documentation states (see https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt) There definitely is hardware coordination between the 8 cores of my machine. Plus we can easily think about future architectures where a single processor can have distinct frequency domains. So the "frequency domain" notion is not a trivial question (all the cores of a single processor, or some cores of the processor), and has been an information Linux has been giving in previous releases of the kernel (before 3.9). If I get it right, your point is that losing this information is "more logical", and I am saying that losing this information is a loss of information. :)
(In reply to comment #9) > I am not arguing for or against the difference between affected_cpus and > related_cpus. I understand. > For me so far, the difference between the two was: > - affected_cpus was a "list of CPUs that require software coordination of > frequency" > - related_cpus was a "List of CPUs that need some sort of frequency > coordination, whether software or hardware." > > This is at least what the official cpufreq Kernel documentation states (see > https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt) You need to check the latest ones: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/cpu-freq/user-guide.txt?id=refs/tags/v3.9 > There definitely is hardware coordination between the 8 cores of my machine. > Plus we can easily think about future architectures where a single processor > can have distinct frequency domains. So the "frequency domain" notion is not > a > trivial question (all the cores of a single processor, or some cores of the > processor), and has been an information Linux has been giving in previous > releases of the kernel (before 3.9). If I get it right, your point is that > losing this information is "more logical", and I am saying that losing this > information is a loss of information. :) Why I removed it was: "Nobody is making use of this information and was turning out to be more misleading".. As "hardware" for kernel is whatever is present below kernel layer.. So, if the layers below linux don't want to hide that information, let it be. But yes, if we have broken something with that change it must be fixed. But that is not the case. It is just that you saw different values in it with latest kernels and got worried about if something went wrong.
> You need to check the latest ones: > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/cpu-freq/user-guide.txt?id=refs/tags/v3.9 Thanks for the link, I didn't know the other one was outdated > Why I removed it was: "Nobody is making use of this information and was > turning > out to be more misleading".. As "hardware" for kernel is whatever is present > below kernel layer.. So, if the layers below linux don't want to hide that > information, let it be. I have actually been using these files for while for a project related with energy in order to decide which cores have a physical correlation with respect to frequency transition. So if I get it right, it is not possible to have such information anymore.
(In reply to comment #11) > > Why I removed it was: "Nobody is making use of this information and was > turning > > out to be more misleading".. As "hardware" for kernel is whatever is > present > > below kernel layer.. So, if the layers below linux don't want to hide that > > information, let it be. > > I have actually been using these files for while for a project related with > energy in order to decide which cores have a physical correlation with > respect > to frequency transition. So if I get it right, it is not possible to have > such > information anymore. Yeah.. I am not sure if it is worth to add another field (only for acpi-cpufreq) to get this information back.. @Rafael: ??
Finding the correlation between cores is a LOT more complex than looking at the information that was presented by related_cpus. The correlation is workload dependant idle cores lose their vote for what the package p-state should be. IMHO the information presented by related_cpus has not been useful since Nehalem came out.
I totally agree with Jean-Philippe: some information is lost with the 3.9 version and this information *is* useful to correctly set CPU frequencies. So either the related_cpus field meaning should be restored or a new field should be added. If you doubt that such information is valuable, please consider all the runtime DVFS controllers such as beta adaptive [1], or CPU MISER [2]. Those systems, to be ported on current multicore processors, have to take a great care of the actually applied CPU frequency (not only that requested per core but the one actually in use). If it is impossible to determine what cores share the same frequency, it is impossible to perform such kind of DVFS on multicore CPUs in a portable way. If you think that related_cpus is misleading (and I'd totally agree with you on that), then maybe a new field is required. [1] C.-h. Hsu and W.-c. Feng, “A power-aware run-time system for high-performance computing,” in Proceedings of the 2005 ACM/IEEE conference on Supercomputing. [2] R. Ge, X. Feng, W. chun Feng, and K. Cameron, “CPU MISER: A performance-directed, run-time system for power-aware clusters,” in Parallel Processing, 2007. ICPP 2007.
(In reply to comment #14) > I totally agree with Jean-Philippe: some information is lost with the 3.9 > version and this information *is* useful to correctly set CPU frequencies. So > either the related_cpus field meaning should be restored or a new field > should > be added. > > If you doubt that such information is valuable, please consider all the > runtime > DVFS controllers such as beta adaptive [1], or CPU MISER [2]. Those systems, > to > be ported on current multicore processors, have to take a great care of the > actually applied CPU frequency (not only that requested per core but the one > actually in use). If it is impossible to determine what cores share the same > frequency, it is impossible to perform such kind of DVFS on multicore CPUs in > a > portable way. > Exactly it is not possible to tell what frequency the core will run at unless the same pstate is requested for all cores. Even then it is not guaranteed in the presence of thermal throttling. You do NOT have positive control over the frequency the core runs at. You requested your desired performance level and the processor ensures that you get it. The papers below reference processors that did not have this functionality. the assumptions about frequency control no longer hold (at least for current Intel Processors). > If you think that related_cpus is misleading (and I'd totally agree with you > on > that), then maybe a new field is required. > > > [1] C.-h. Hsu and W.-c. Feng, “A power-aware run-time system for > high-performance computing,” in Proceedings of the 2005 ACM/IEEE conference > on > Supercomputing. > [2] R. Ge, X. Feng, W. chun Feng, and K. Cameron, “CPU MISER: A > performance-directed, run-time system for power-aware clusters,” in Parallel > Processing, 2007. ICPP 2007.
(In reply to comment #15) > (In reply to comment #14) > > I totally agree with Jean-Philippe: some information is lost with the 3.9 > > version and this information *is* useful to correctly set CPU frequencies. > So > > either the related_cpus field meaning should be restored or a new field > should > > be added. > > > > If you doubt that such information is valuable, please consider all the > runtime > > DVFS controllers such as beta adaptive [1], or CPU MISER [2]. Those > systems, to > > be ported on current multicore processors, have to take a great care of the > > actually applied CPU frequency (not only that requested per core but the > one > > actually in use). If it is impossible to determine what cores share the > same > > frequency, it is impossible to perform such kind of DVFS on multicore CPUs > in a > > portable way. > > > > Exactly it is not possible to tell what frequency the core will run at unless > the same pstate is requested for all cores. Even then it is not guaranteed > in > the presence of thermal throttling. > > You do NOT have positive control over the frequency the core runs at. You > requested your desired performance level and the processor ensures that you > get > it. > > The papers below reference processors that did not have this functionality. > the > assumptions about frequency control no longer hold (at least for current > Intel > Processors). > OK I understand your point. However, even if in some cases (thermal threshold reached, TurboBoost, ...) the processors ignores/goes beyond the users requests, most of the time, when the user requests a frequency, it is effectively set by the processor. So, yes, the users still have some sort of control over CPU frequency in the general case (even if it went from precise frequency control to performance level requests) and it matters a lot when building and experimenting new DVFS controlers. Going back to the topic, people are designing DVFS controlers and have to be aware of the cores using the same frequency in order to achieve efficient frequency control on multicore processors. A file was previously stating that relation so it should not be such a big deal to restore the same information in another file. Isn't it? > > > If you think that related_cpus is misleading (and I'd totally agree with > you on > > that), then maybe a new field is required. > > > > > > [1] C.-h. Hsu and W.-c. Feng, “A power-aware run-time system for > > high-performance computing,” in Proceedings of the 2005 ACM/IEEE conference > on > > Supercomputing. > > [2] R. Ge, X. Feng, W. chun Feng, and K. Cameron, “CPU MISER: A > > performance-directed, run-time system for power-aware clusters,” in > Parallel > > Processing, 2007. ICPP 2007.
Hi Rafael: Could you have a look this bug? The recent change of related_cpus seems affect some users. Just look related commit and the following commit maybe the cause. This would make cpus in the one dependency domain with "HW_ALL" coordtype not showed in the "related_cpus". From Dirk and Viresh opinions, these info is not useful and current state maybe better. But from Benoit and Jean-Philippe opinions, these info are useful for DVFS controlers. If some thing wrong, please help to correct me. Thanks in advance. commit aa77a52764a92216b61a6c8079b5c01937c046cd Author: Viresh Kumar <viresh.kumar@linaro.org> Date: Sun Mar 24 15:58:12 2013 +0000 cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init() With the addition of following patch: fcf8058 cpufreq: Simplify cpufreq_add_dev() cpufreq driver's .init() routine must initialize policy->cpus with mask of all possible CPUs (Online + Offline) that share the clock. Then the core would copy this mask onto policy->related_cpus and will reset policy->cpus to carry only online cpus. Hi Jean-Philippe: Could you attach the output of acpidump on your machines?
(In reply to comment #17) > Hi Rafael: > Could you have a look this bug? The recent change of related_cpus seems > affect some users. Just look related commit and the following commit maybe > the > cause. This would make cpus in the one dependency domain with "HW_ALL" > coordtype not showed in the "related_cpus". > From Dirk and Viresh opinions, these info is not useful and current > state > maybe better. But from Benoit and Jean-Philippe opinions, these info are > useful > for DVFS controlers. If some thing wrong, please help to correct me. Thanks > in > advance. Hi, I was expecting a patch from you guys to get this fixed by adding an additional field apart from affected/related cpus.
(In reply to comment #16) > Going back to the topic, people are designing DVFS controlers and have to be > aware of the cores using the same frequency in order to achieve efficient > frequency control on multicore processors. A file was previously stating that > relation so it should not be such a big deal to restore the same information > in > another file. Isn't it? I want to understand this a bit more (I woke up after Lan Tianyu's request :) ) What kind of DVFS controllers are they? hardware/software(kernel/userspace) ?? I believe userspace, right? So, what exactly they do and how do this information help them, when everybody is requesting separate frequencies for all the cores? I can write this piece of code but want to understand its need first.
HI: I just start to write a patch to add a new sysfs interface in the acpi-cpufreq driver to expose the previous info ... It's great that you can take it :) If I understand correctly, new interfaces for DVFS should be in the cpufreq core after proving that it's necessary, right?
(In reply to comment #20) > HI: > I just start to write a patch to add a new sysfs interface in the > acpi-cpufreq driver to expose the previous info ... It's great that you can > take it :) Please go ahead and finish it. I will review it then. > If I understand correctly, new interfaces for DVFS should be in the > cpufreq > core after proving that it's necessary, right? Not necessarily. We need to understand the real usecase first and then decide on it.
Created attachment 105421 [details] debug.patch Hi Benoit and Jean-Philippe: Could you help me to test this patch on your machines? This patch is to add new sysfs attribute "domain_cpus" for acpi-cpufreq driver under /sys/.../cpuX/cpufreq/ directory. The new attribute will show all the cpus in the same domain which the cpu stay in. Check whether it satisfy your requirement. Thanks in advance.
(In reply to comment #19) > (In reply to comment #16) > > Going back to the topic, people are designing DVFS controlers and have to > be > > aware of the cores using the same frequency in order to achieve efficient > > frequency control on multicore processors. A file was previously stating > that > > relation so it should not be such a big deal to restore the same > information in > > another file. Isn't it? > > I want to understand this a bit more (I woke up after Lan Tianyu's request :) > ) > What kind of DVFS controllers are they? hardware/software(kernel/userspace) > ?? > I believe userspace, right? > > So, what exactly they do and how do this information help them, when > everybody > is requesting separate frequencies for all the cores? > > I can write this piece of code but want to understand its need first. Hi, So basically, the DVFS controllers I am referring to are userspace software DVFS controllers. They are specific because they try to perform some measurements on the cores to determine the impact of a frequency transition on energy consumption. The final goal is of course to perform energy-efficient DVFS control. Such methodology cannot be applied at a per-core basis because of the shared frequency domain: the cores have to synchronize themselves during the measurements and decide together which frequency to request in order to be (nearly) certain that the frequency they request will actually be the one applied. Of course, such synchronization requires the cores to be aware of who is sharing their frequency domain. Until now, the knowledge existed and allowed such DVFS controllers to be implemented and, despite TurboBoost and other hardware uncertainties, it works very well. If the sysfs field would definitively disappear, the developers would be forced to re-implement it in some approximate and non future-proof ways (probably something like special cases based on the CPUID). Is it clear enough?
Hi everyone. Sorry for the delay. I think Benoit explained the problem very well. As far as acpidump is concerned, I couldn't manage to get it work : jphalimi@abelard:~$ acpidump Linux version not implemented yet Could not get ACPI tables, AE_SUPPORT I am testing the patch right now, and I'll let you know asap.
The patch is working well. Expected values are stored in the domain_cpus file. I have just added the cpufreq_show_cpu function signature on top of the acpi-cpufreq.c file because it was not declared in the scope of this file. That being said, I suggest to change the domain_cpus name. sharedfreqdomain_cpus ou freqdomain_cpus could be more relevant options. Similarly, as affected_cpus and related_cpus' definition have changed, we could change their names as well, such as offline_cpus and online_cpus or something similar.
(In reply to comment #25) > I have just added the cpufreq_show_cpu function signature on top of the > acpi-cpufreq.c file because it was not declared in the scope of this file. > That being said, I suggest to change the domain_cpus name. Sorry. Lack the change in the include/linux/cpufreq.h when produce the debug.patch. diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index d939056..70021f1 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -438,4 +438,7 @@ void cpufreq_frequency_table_get_attr(struct cpufreq_frequency_table *table, void cpufreq_frequency_table_update_policy_cpu(struct cpufreq_policy *policy); void cpufreq_frequency_table_put_attr(unsigned int cpu); + +ssize_t cpufreq_show_cpus(const struct cpumask *mask, char *buf); + #endif /* _LINUX_CPUFREQ_H */ ------- > sharedfreqdomain_cpus ou freqdomain_cpus could be more relevant options. Ok. I am not good at giving a name :) Viresh, how about this name? > Similarly, as affected_cpus and related_cpus' definition have changed, we > could > change their names as well, such as offline_cpus and online_cpus or something > similar. These are ABIs and some applications may use them so it's not easy to change their name. There is a doc to describe their functions.
(In reply to comment #26) > > sharedfreqdomain_cpus ou freqdomain_cpus could be more relevant options. > Ok. I am not good at giving a name :) > > Viresh, how about this name? Few things here: - I haven't seen/reviewed this patch as it wasn't posted anywhere. - I would like this sysfs file to be created only for acpi-cpufreq driver (As nobody else hides its actual core relations) - Regarding name: freqdomain_cpus look fine to me.
Created attachment 105861 [details] Fix patch Hi Jean-Philippe: I update the patch. Rename the new sysfs attribute as freqdomain_cpus and add descriptor in the user-guide.txt. Please help me test it. Thanks in advance. Hi Viresh: I will send this patch to cpufreq maillist after testing and the new attribute is only for acpi-cpufreq driver.
The patch has been sent to cpufreq, ACPI and linux-pm maillist http://www.spinics.net/lists/cpufreq/msg06066.html
I am sorry for the delay, everything looks fine to me.
The fix patch has been merged into linux-pm tree. Close this bug. commit f4fd3797848aa04e72e942c855fd279840a47fe4 Author: Lan Tianyu <tianyu.lan@intel.com> Date: Thu Jun 27 15:08:54 2013 +0800 acpi-cpufreq: Add new sysfs attribute freqdomain_cpus Commits fcf8058 (cpufreq: Simplify cpufreq_add_dev()) and aa77a52 (cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init()) changed the contents of the "related_cpus" sysfs attribute on systems where acpi-cpufreq is used and user space can't get the list of CPUs which are in the same hardware coordination CPU domain (provided by the ACPI AML method _PSD) via "related_cpus" any more. To make up for that loss add a new sysfs attribute "freqdomian_cpus" for the acpi-cpufreq driver which exposes the list of CPUs in the same domain regardless of whether it is coordinated by hardware or software. [rjw: Changelog, documentation] References: https://bugzilla.kernel.org/show_bug.cgi?id=58761 Reported-by: Jean-Philippe Halimi <jean-philippe.halimi@exascale-computing.eu> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>