Bug 12310

Summary: NOHZ appears to cause ondemand to effectively ignore 'ignore_nice_load'
Product: Power Management Reporter: Jim Bray (jimsantelmo)
Component: cpufreqAssignee: Venkatesh Pallipadi (venki)
Status: CLOSED CODE_FIX    
Severity: normal CC: davej, toralf.foerster
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Test patch

Description Jim Bray 2008-12-27 19:38:43 UTC
Latest working kernel version: Don't think it ever worked well with NOHZ, but got worse after 2.6.26
Earliest failing kernel version:
Distribution:
Hardware Environment: Everex notebook, AMD 64 dual-core
Software Environment: 
Problem Description:

  [this bug is a tickless+ondemand interaction: the problem may be in the cpufreq_ondemand code]

  I'm running boinc, which will cheerfully keep both cores loaded with niced processes if I let it. In the past, meaning for the last six months or more, I managed to stop the niced processes from upping the frequency by setting up_threshhold to 50 or more and letting boinc use only 80% CPU, but with 2.6.28 nothing I did mattered.

 I got to looking around the ondemand code and noticed that it would try to call
get_cpu_idle_time_us() and then calculate idle time differently if that failed, which would happen if NOHZ is turned off. I rebooted with 'nohz=off' and the ondemand code behaves completely to spec according to how I set ignore_nice_load and up_threshhold.

Steps to reproduce:  Build tickless kernel, run boinc (or maybe a niced while-true loop?).
Comment 1 Jim Bray 2008-12-28 15:05:07 UTC
  As I expected, just making cpufreq_ondemand:get_cpu_idle_time() unconditionally return get_cpu_idle_time_jiffy works fine with NOHZ enabled.I should have commented out the call to get_cpu_idle_time() because 'wall' is a pointer and the thing it points to is getting updated twice.

static inline cputime64_t get_cpu_idle_time(unsigned int cpu, cputime64_t *wall)
{
	u64 idle_time = get_cpu_idle_time_us(cpu, wall);

	//JB	if (idle_time == -1ULL)
		return get_cpu_idle_time_jiffy(cpu, wall);

	if (dbs_tuners_ins.ignore_nice) { //Must be something broken in here...
		cputime64_t cur_nice;
		unsigned long cur_nice_jiffies;
		struct cpu_dbs_info_s *dbs_info;

		dbs_info = &per_cpu(cpu_dbs_info, cpu);
		cur_nice = cputime64_sub(kstat_cpu(cpu).cpustat.nice,
					 dbs_info->prev_cpu_nice);
		/*
		 * Assumption: nice time between sampling periods will be
		 * less than 2^32 jiffies for 32 bit sys
		 */
		cur_nice_jiffies = (unsigned long)
					cputime64_to_jiffies64(cur_nice);
		dbs_info->prev_cpu_nice = kstat_cpu(cpu).cpustat.nice;
		return idle_time + jiffies_to_usecs(cur_nice_jiffies);
	}
	return idle_time;
}
Comment 2 Venkatesh Pallipadi 2008-12-29 11:50:07 UTC
Thanks for reporting the problem. Will reproduce it locally and get back with updates.
Comment 3 Venkatesh Pallipadi 2008-12-30 13:55:46 UTC
I think I found the bug in the code resulting in this problem. Should have a test patch by tomorrow.
Comment 4 Jim Bray 2008-12-30 14:36:10 UTC
Cool. Thanks. Post it here, or send it to me directly, and I'll let you know what happens with it.
Comment 5 Venkatesh Pallipadi 2008-12-31 14:47:19 UTC
Created attachment 19578 [details]
Test patch

Can you please test the attached patch.

Thanks,
Venki
Comment 6 Jim Bray 2008-12-31 15:31:38 UTC
Hey Venki,

  Works fine for me. Boinc at 100% (meaning both CPUs loaded with niced PRIO_IDLE processes) are ignored by ondemand if and only if ignore_nice_load is set. A non-niced while-true loop sends cpufreq full speed.

  I'm going to attempt to mark this one resolved and closed. If it doesn't work, I'll let you take care of it.

  Thanks, and Happy New Year.

J
Comment 7 Venkatesh Pallipadi 2009-01-23 00:16:06 UTC
Dave,

Can you pickup this patch and push it along.

Thanks,
Venki
Comment 8 Dave Jones 2009-01-23 06:36:34 UTC
queued for 2.6.29, thanks.
Comment 9 Toralf Förster 2009-02-08 02:47:49 UTC
Either this patch isn't included in the current git (v2.6.29-rc3-697-gae1a25d) yet or there's a similar issue remaining : http://lkml.org/lkml/2009/2/7/35
Comment 10 Jim Bray 2009-02-13 15:01:32 UTC
 Just looked at the codde, and it appears to be in 29-rc4.
Comment 11 Jim Bray 2009-02-13 15:15:20 UTC
  I mean 29-rc5. That's where it showed up.