Created attachment 210711 [details] config of bisected kernel Switching to 4.5 kernel, cpufreq ondemand governor started to act oddly. Without any load, with freshly booted system, it pumped cpu frequency up to maximum at some point of time and stayed there. uptime can be anything, but usually less than 15 minutes when this happens. Driver is acpi-cpufreq. I have git-bisected this to 0df35026c6a527e65b53bc895ad672d648a248f3. There is a mention about certain kernel config values that are ok. My values differ and I get this kind of behaviour. CPU is AMD Phenom(tm) II X4 905e Processor.
Timo, could you please help test with following steps: 1. patch debug_governor.diff 2. after boot up, try to reproduce the problem 3. when this problem is reproduced, please: 3.1 echo 1 > /proc/sys/kernel/debug_cpufreq_idle 3.2 wait for a few seconds, dmesg -c > debug_cpufreq_before_fix.log 4. patch get_idle_fix.diff on top of debug_governor.diff 5. repeat step 2 ~ 3 6. provide debug_cpufreq_before_fix.log and debug_cpufreq_after_fix.log thanks
Created attachment 210951 [details] debug_governor.diff
Created attachment 210961 [details] get_idle_fix.diff This patch is a fix for get_cpu_idle_time_jiffy, actually the calculation of wall time in this function might be not so accurate, it uses get_jiffies_64() directly, however the initial value for jiffies is INITIAL_JIFFIES, which is -300HZ, thus 5 minutes. Maybe my previous patch has exposed this issue more frequently.
please also provide grep . /sys/devices/system/cpu/cpu*/cpufreq/* when reproduced.
Created attachment 210991 [details] debug_governor.diff
Created attachment 211031 [details] Logfile of degug_governor.diff Hi 1. Patched 4.5.0 kernel source with debug_governor.diff 2. Could reproduce the problem: please find attachment debug_cpufreq_before_fix.log
Created attachment 211041 [details] grep-file 3. Applied patch get_idle_fix.diff. So far, could not reproduce the problem.
It seems that this has a huge impact on my systems stability: every time the cpufreq gets upset, I can't for example compile a kernel without fatal errors. Have to reboot to a stable kernel first to do that.
OK, I know the problem. Here are two issues: 1. linux should not use jiffies directly to get the wall time, since jiffies is not counted from zero but from -300HZ. 2. the CONFIG_TICK_CPU_ACCOUNTING is not set in your config, so the kernel use an inaccurate jiffies_to_usecs to calculate the actual idle time, however, jiffies_to_usecs is constrained to unsigned int, which may overflow when the wall time exceed 32bit. 3. my previous patch exposed this issue more frequently, so.. I'll try to provide a fix for this. thanks.
Created attachment 211081 [details] debug_cpufreq_after_fix.log Finally it jumped up to max again. Hope this helps. Just let me know if you need any more info or testing.
Created attachment 211251 [details] v2_fix_get_cpu_idle.diff Please help check if this works for you. (you may need to wait for a while to confirm, at least: wait time = (2016-03-30 13:17:00 - 2016-03-29 20:54:05) )
Created attachment 211261 [details] v2_debug_governor.diff If previous patch v2_fix_get_cpu_idle.diff does not work, please apply both v2_fix_get_cpu_idle.diff and v2_debug_governor.diff, and when reproduced, type: echo 1 > /proc/sys/kernel/governor_debug_oneshot for MANY times, and then show dmesg.
ping Timo
Hi Yu Sorry for such a late answer. I've been running 4.5.0 patched with your v2_fix_get_cpu_idle.diff, and everything seems to be in order. System uptime is "2 days, 12:30", and there has been no problems with cpufreq at all. Although mentioned uptime includes some suspended (systemctl suspend) periods, I put my computer to sleep when I'm not using it. Does it affect to testing? Anyway, from my point of view, I haven't had any problems after applying the mentioned patch.
thanks Timo, the test looks ok to me, I've sent this patch to https://patchwork.kernel.org/patch/8742721/
Great, thank you for fixing this! I'm closing this as resolved.
patch feedback: INITIAL_JIFFIES should not be used init using current value of jiffies long term: tick-sched.c should export function giving both cpufreq and cpuidle the idle duration.
So, there is going to be another solution for this?
Created attachment 212841 [details] calculate idle time directly rather than calculating busy time first Yes, Timo, here's 3nd version with minimal changes to cpufreq, and without touching the INITIAL_JIFFIES, please help test if this patch work for you. Also please apply v2_debug_governor.diff and provide info as well. thanks.
Hi I'm testing with 4.5.1 but I guess it doesn't make any difference, or does it? Anyway , fix fails leading to all 4 cores to run at max freq. Please find attached dmesg2.txt with debug info.
Created attachment 213091 [details] dmesg2.txt
Created attachment 213191 [details] v4_fix_cpu_idle.patch please check this one. Hope time system maintainer would like this version :0
Created attachment 213221 [details] v5_fix_cpu_idle please check v5 instead.
Hi I was running v4_fix for some time, like 20 hours without any problems if that info is useful. Now running 4.5.1 with your latest v5_fix_cpu_idle.diff without v2_debug_governor.diff applied. So far so good, no problems, uptime 1 hour 17 minutes now. That is a short time, but it seems to be the case that the frequency jumps up quite fast after boot if the problem is present.
Version 5 has been sent to mail list at: https://patchwork.kernel.org/patch/8876381/
What I understand about code is zero, but v5_fix_cpu_idle seems to me different than the patch shown in here: https://patchwork.kernel.org/patch/8876381/
Yes, above patch is based on latest 4.6-rc3, and v5 is based on 4.5-rc1, there is some modifications during this period, but the logic is the same.
Ok, thanks for clearing this to me :).
Created attachment 213361 [details] v6_fix_cpu_idle Feedback from community suggest a version 6, could you help try this one
Created attachment 213371 [details] v6_fix_cpu_idle Feedback from community suggest a version 6, could you help try this one
@Timo: Can you please check if the patch from comment #30 helps?
Hi @Rafael, Comment 30 might not work, looks like it should be changed to 's32' rather than 's64', because we are facing the 32 bit overflow problem brought by idle time calculation. But it brought another issue after we changed it to 's32', for example, if cur_idle = 0xf000 0001, and prev_idle = 0xf000 0000, since (s32) cur_idle < (s32) prev_idle, we will still always get the idle 0% util next overflow, actually the system might be very idle. Lv said it might be looked like: if (cur_idle - prev_cpu > (INT_32_MAX/2 + 1)) cur_idle = 0; I'm trying to do some modifications.
(In reply to Chen Yu from comment #32) > Hi @Rafael, > > Comment 30 might not work, looks like it should be changed to 's32' rather > than 's64', because we are facing the 32 bit overflow problem brought by > idle time calculation. OK, scratch that, then. We need to be somewhat smarter here, because we're comparing wall_time to idle_time in the same function going forward, so they both need to be treated similarly.
@Chen Yu: cputime_to_usecs() called by get_cpu_idle_time_jiffy() should resolve to cputime_div(__ct, NSEC_PER_USEC) and that is defined as div_u64((__force u64)__ct, divisor) so the result should be a proper 64-bit number. Where does the 32-bit zapping happen then?
(In reply to Rafael J. Wysocki from comment #34) > @Chen Yu: cputime_to_usecs() called by get_cpu_idle_time_jiffy() should > resolve to > > cputime_div(__ct, NSEC_PER_USEC) > > and that is defined as > > div_u64((__force u64)__ct, divisor) > > so the result should be a proper 64-bit number. > > Where does the 32-bit zapping happen then? When CONFIG_TICK_CPU_ACCOUNTING is not set: include/asm-generic/cputime_jiffies.h #define cputime_to_usecs(__ct) \ jiffies_to_usecs(cputime_to_jiffies(__ct)) unsigned int jiffies_to_usecs(const unsigned long j) { /* * Hz usually doesn't go much further MSEC_PER_SEC. * jiffies_to_usecs() and usecs_to_jiffies() depend on that. */ BUILD_BUG_ON(HZ > USEC_PER_SEC); #if !(USEC_PER_SEC % HZ) return (USEC_PER_SEC / HZ) * j; #else # if BITS_PER_LONG == 32 return (HZ_TO_USEC_MUL32 * j) >> HZ_TO_USEC_SHR32; # else return (j * HZ_TO_USEC_NUM) / HZ_TO_USEC_DEN; # endif #endif } Since jiffies_to_usecs returns a unsigned int, which is 32bit, we cut the result from 64bit to 32bit.
(In reply to Rafael J. Wysocki from comment #33) > (In reply to Chen Yu from comment #32) > > Hi @Rafael, > > > > Comment 30 might not work, looks like it should be changed to 's32' rather > > than 's64', because we are facing the 32 bit overflow problem brought by > > idle time calculation. > > OK, scratch that, then. > > We need to be somewhat smarter here, because we're comparing wall_time to > idle_time in the same function going forward, so they both need to be > treated similarly. wall_time does not have 32-bit zapping issue, so the delta of wall_time should be ok I think?(we compare delta of wall_time of idle_time)
(In reply to Chen Yu from comment #36) > (In reply to Rafael J. Wysocki from comment #33) > > (In reply to Chen Yu from comment #32) > > > Hi @Rafael, > > > > > > Comment 30 might not work, looks like it should be changed to 's32' > rather > > > than 's64', because we are facing the 32 bit overflow problem brought by > > > idle time calculation. > > > > OK, scratch that, then. > > > > We need to be somewhat smarter here, because we're comparing wall_time to > > idle_time in the same function going forward, so they both need to be > > treated similarly. > > wall_time does not have 32-bit zapping issue, so the delta of wall_time > should be ok I think?(we compare delta of wall_time of idle_time) Are you sure about that? It undergoes the same manipulations in get_cpu_idle_time_jiffy().
Created attachment 213461 [details] Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC" In any case, I think that the problematic commit should just be reverted at this point and the original problem it attempted to fix (incorrectly) should be addressed in a different way. Attached is a revert patch (on top of subsequent code changes). Please double check in case I made some silly mistake (again).
The patch in commit #38 is on top of 4.6-rc4. @Timo: Please test.
(In reply to Rafael J. Wysocki from comment #39) > The patch in commit #38 is on top of 4.6-rc4. I mean comment #38, of course.
Ok, I have it running now, 4.6-rc4 patched with attachment in comment #38. Uptime only 22 mins now, no problems so far. Like I've mentioned earlier, frequency pops up very fast if it is present. So this one seems to be good.
OK, thanks! Let's give it time till tomorrow and then I'll send it to the mailing lists.
(In reply to Rafael J. Wysocki from comment #37) > (In reply to Chen Yu from comment #36) > > (In reply to Rafael J. Wysocki from comment #33) > > > (In reply to Chen Yu from comment #32) > > > > Hi @Rafael, > > > > > > > > Comment 30 might not work, looks like it should be changed to 's32' > rather > > > > than 's64', because we are facing the 32 bit overflow problem brought > by > > > > idle time calculation. > > > > > > OK, scratch that, then. > > > > > > We need to be somewhat smarter here, because we're comparing wall_time to > > > idle_time in the same function going forward, so they both need to be > > > treated similarly. > > > > wall_time does not have 32-bit zapping issue, so the delta of wall_time > > should be ok I think?(we compare delta of wall_time of idle_time) > > Are you sure about that? It undergoes the same manipulations in > get_cpu_idle_time_jiffy(). Ah yes, it is the same situation for wall_time.
(In reply to Rafael J. Wysocki from comment #38) > Created attachment 213461 [details] > Revert "cpufreq: governor: Fix negative idle_time when configured with > CONFIG_HZ_PERIODIC" > > In any case, I think that the problematic commit should just be reverted at > this point and the original problem it attempted to fix (incorrectly) should > be addressed in a different way. > > Attached is a revert patch (on top of subsequent code changes). Please > double check in case I made some silly mistake (again). Yes, sorry for my regression. I've checked the code and it is OK. BTW, I'm thinking of this method, after reverting my previous patch, we can change the idle time calculation to the complement set for current 'busy_time', thus we don't have to substract busy_time from wall _time, and the 'delta' of idle_time in governor's dbs_update should always make sense. Something like this: @@ -140,14 +140,10 @@ static inline u64 get_cpu_idle_time_jiff cur_wall_time = jiffies64_to_cputime64(get_jiffies_64()); - busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER]; - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM]; - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_IRQ]; - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SOFTIRQ]; - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_STEAL]; - busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE]; - - idle_time = cur_wall_time - busy_time; + idle_time = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE]; + idle_time += kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT]; + idle_time += kcpustat_cpu(cpu).cpustat[CPUTIME_GUEST]; + idle_time += kcpustat_cpu(cpu).cpustat[CPUTIME_GUEST_NICE]; if (wall) *wall = cputime_to_usecs(cur_wall_time); I'll check some data.
Created attachment 213491 [details] cpufreq: Make idle_time monotonic increasing to avoid incorrect idle_time Timo, could you please help check if this patch still work for you(also with patch from #Comment 38 applied: Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC" )
related link for previous negative idle issue: https://bugzilla.kernel.org/show_bug.cgi?id=69821
(In reply to Chen Yu from comment #45) > Created attachment 213491 [details] > cpufreq: Make idle_time monotonic increasing to avoid incorrect idle_time The reason why this is not a good approach is that cpufreq_governor_start() (as of 4.6-rc4) does this: j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy); prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle; so if j_cdbs->prev_cpu_idle and j_cdbs->prev_cpu_wall are shifted with respect to each other (and they will be after your change), the value of prev_load will not be useful.
Created attachment 213581 [details] config file of kernel Got a compile warning, not too serious I guess: drivers/cpufreq/cpufreq.c: In function ‘get_cpu_idle_time_jiffy’: drivers/cpufreq/cpufreq.c:139:6: warning: unused variable ‘busy_time’ [-Wunused-variable] u64 busy_time; Anyway, attachment from comment #45 has been applied on top of #38 and everything seems to be in order. Please check my attached config that it is still good for this testing.
(In reply to Timo Valtoaho from comment #48) > Created attachment 213581 [details] > config file of kernel > > Got a compile warning, not too serious I guess: > > drivers/cpufreq/cpufreq.c: In function ‘get_cpu_idle_time_jiffy’: > drivers/cpufreq/cpufreq.c:139:6: warning: unused variable ‘busy_time’ > [-Wunused-variable] > u64 busy_time; That's OK. The variable is declared but not used any more. > Anyway, attachment from comment #45 has been applied on top of #38 and > everything seems to be in order. OK, thanks! @Chen Yu: We'll need to reopen bug #69821 after applying the patch from comment #38, so let's move the discussion there.
The patch from comment #38 has been merged as http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=94862a62dfe3ba1c7601115a2dc80721c5b256f0