Bug 63081

Summary: ignore_nice_load changes from one to zero after suspend2ram
Product: Power Management Reporter: Rainer Kaluscha (rainer.kaluscha)
Component: cpufreqAssignee: Lan Tianyu (tianyu.lan)
Status: CLOSED INVALID    
Severity: low CC: aaron.lu, rainer.kaluscha, tianyu.lan, viresh.kumar
Priority: P1    
Hardware: i386   
OS: Linux   
Kernel Version: 3.10.16 Subsystem:
Regression: No Bisected commit-id:
Attachments: Kernel config
grep . /sys/devices/system/cpu/cpu*/cpufreq/*
debug.patch
dmesg after s2ram -f
dmesg after pm-suspend

Description Rainer Kaluscha 2013-10-15 17:53:32 UTC
Setting ignore_nice_load (echo 1 >/sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load) works fine. After doing a suspend to RAM and waking the machine again, this setting is lost: cat /sys/devices/system/cpu/cpufreq/ondemand gives 0. 

I noticed this behaviour first with kernel 3.10.16 (it may have been introduced earlier). However, booting the system with kernel 3.7.10 doesn't show this bug.

Workaround: ignore_nice_load can be set again manually after suspend2ram.
Comment 1 Aaron Lu 2013-10-16 07:10:36 UTC
Possible to test an upstream kernel, say Linus' git tree?
Comment 2 Rainer Kaluscha 2013-10-16 20:00:18 UTC
Yes - same behaviour with kernels 3.11.5 and 3.12.0-rc5  :-(

BTW: after suspend2ram, not only the content of /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load has changed - the CPU is indeed running at full speed although only low priority tasks are running ...
Comment 3 Rainer Kaluscha 2013-10-17 02:31:21 UTC
Kernels 3.9.11 and 3.8.13 don't show the bug so it probably has been introduced with the 3.10 series ...
Comment 4 Lan Tianyu 2013-10-17 07:18:38 UTC
I just check the issue on the linux-pm linux-next branch and doesn't find the issue. 

Could you please try it on your machine?

http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/
Comment 5 Rainer Kaluscha 2013-10-18 03:49:51 UTC
Created attachment 111501 [details]
Kernel config
Comment 6 Rainer Kaluscha 2013-10-18 03:50:58 UTC
I did "git clone -b bleeding-edge git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm" and can still reproduce the bug. 

I attached my current kernel config.
Comment 7 Rainer Kaluscha 2013-10-18 03:55:05 UTC
My Hardware configuration (output of lspci):

00:00.0 RAM memory: NVIDIA Corporation MCP78S [GeForce 8200] Memory Controller (rev a2)
00:01.0 ISA bridge: NVIDIA Corporation MCP78S [GeForce 8200] LPC Bridge (rev a2)
00:01.1 SMBus: NVIDIA Corporation MCP78S [GeForce 8200] SMBus (rev a1)
00:01.2 RAM memory: NVIDIA Corporation MCP78S [GeForce 8200] Memory Controller (rev a1)
00:01.3 Co-processor: NVIDIA Corporation MCP78S [GeForce 8200] Co-Processor (rev a2)
00:01.4 RAM memory: NVIDIA Corporation MCP78S [GeForce 8200] Memory Controller (rev a1)
00:02.0 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] OHCI USB 1.1 Controller (rev a1)
00:02.1 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] EHCI USB 2.0 Controller (rev a1)
00:04.0 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] OHCI USB 1.1 Controller (rev a1)
00:04.1 USB controller: NVIDIA Corporation MCP78S [GeForce 8200] EHCI USB 2.0 Controller (rev a1)
00:06.0 IDE interface: NVIDIA Corporation MCP78S [GeForce 8200] IDE (rev a1)
00:07.0 Audio device: NVIDIA Corporation MCP72XE/MCP72P/MCP78U/MCP78S High Definition Audio (rev a1)
00:08.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1)
00:09.0 SATA controller: NVIDIA Corporation MCP78S [GeForce 8200] AHCI Controller (rev a2)
00:0a.0 Ethernet controller: NVIDIA Corporation MCP77 Ethernet (rev a2)
00:10.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1)
00:12.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Express Bridge (rev a1)
00:13.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1)
00:14.0 PCI bridge: NVIDIA Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1)
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
01:08.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 24)
01:09.0 Multimedia controller: Philips Semiconductors SAA7146 (rev 01)
02:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)
02:00.1 Audio device: NVIDIA Corporation High Definition Audio Controller (rev a1)
04:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1)
05:00.0 USB controller: VIA Technologies, Inc. VL80x xHCI USB 3.0 Controller (rev 03)
Comment 8 Lan Tianyu 2013-11-14 07:28:10 UTC
Hi, sorry for later response and I can't boot up kernel with your config on my machine. Please provide the output of "grep . /sys/devices/system/cpu/cpu*/cpufreq/*"
Comment 9 Rainer Kaluscha 2013-11-14 19:02:34 UTC
No problem. I removed unnecessary components to speed up kernel builds so my kernel probably won't boot on different hardware. I attach the output ...
Comment 10 Rainer Kaluscha 2013-11-14 19:03:32 UTC
Created attachment 114741 [details]
grep . /sys/devices/system/cpu/cpu*/cpufreq/*
Comment 11 Rainer Kaluscha 2013-11-14 19:05:32 UTC
BTW: CPU is AMD Phenom II X4 945.
Comment 12 Lan Tianyu 2013-11-15 06:14:17 UTC
Hi, please try this patch.

http://www.spinics.net/lists/kernel/msg1636080.html
Comment 13 Viresh Kumar 2013-11-15 10:55:31 UTC
(In reply to Lan Tianyu from comment #12)
> Hi, please try this patch.
> 
> http://www.spinics.net/lists/kernel/msg1636080.html

Pasting from my mail for records:

Though the patch I have sent fixes a problem similar to this but I don't think
patch of any of us will solve the issue Rainer is facing..

I checked his system configuration and its like this:
- Four CPUs, all having separate clock domains (atleast from kernel
perspective) and so separate policy structure.
- All are using ondemand governor
- not using CPUFREQ_HAVE_GOVERNOR_PER_POLICY feature
- So there is a single set of tunables for ondemand governor that is applicable
across all CPUs..

The way INIT/EXIT are designed in cpufreq_governor.c should take care
of this scenario.

memory for tunables must not be freed unless all the CPUs are removed.
Which can't happen, as we only offline non-boot CPUs and so I believe
that memory isn't getting freed and so your solution wouldn't address his
problem..

Sorry if I said something stupid enough :)
Comment 14 Lan Tianyu 2013-11-21 12:06:52 UTC
Created attachment 115501 [details]
debug.patch

Yes, the previous patch seems not related with this bug and so ignore it.
Current, have no idea about this bug. Please apply this patch which add debug info in the code path of storing ignore_nice_load. Let's see whether user space configures the sysfs interface during system resume.
Comment 15 Rainer Kaluscha 2013-11-21 21:51:35 UTC
Userspace seems to be innocent:


After resume, "dmesg | fgrep cpufreq_ondemand" gives only one entry (probably my call from /etc/init.d/boot.local):
[   28.554411] cpufreq_ondemand: store_ignore_nice_load: ignore nice 1

And "cat /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load" still gives 
"0" ...

So something is happening inside the kernel !?!
Comment 16 Viresh Kumar 2013-11-22 05:04:09 UTC
On Friday 22 November 2013 03:21 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=63081
> 
> --- Comment #15 from Rainer Kaluscha <rainer.kaluscha@web.de> ---
> Userspace seems to be innocent:
> 
> 
> After resume, "dmesg | fgrep cpufreq_ondemand" gives only one entry (probably
> my call from /etc/init.d/boot.local):
> [   28.554411] cpufreq_ondemand: store_ignore_nice_load: ignore nice 1
> 
> And "cat /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load" still
> gives 
> "0" ...
> 
> So something is happening inside the kernel !?!

Lan,

Were you able to reproduce this stuff yet? I will also give it a try soon once I
get some time.
Comment 17 Lan Tianyu 2013-11-22 05:17:42 UTC
(In reply to Viresh Kumar from comment #16)
> Lan,
> 
> Were you able to reproduce this stuff yet? I will also give it a try soon
> once I
> get some time.

No, I tried to reproduce it but no progress.
Comment 18 Lan Tianyu 2013-11-22 05:43:31 UTC
Hi Rainer: 
          Could you run the following the cmd, do s2ram and attach the output of dmesg? This is to open the cpufreq core dynamic debug. There maybe some clues. 

echo "file cpufreq.c +p" > /sys/kernel/debug/dynamic_debug/control

BTW, there will be a lot of logs. so please add "log_buf_len=10M" kernel param to increase log buf.
Comment 19 Rainer Kaluscha 2013-11-22 05:49:59 UTC
Supplement to comment 16: It is consistent with the observation that kernel 3.7.10 still works. If something was happening in userspace it should also affect the older kernels, shouldn't it ?
Comment 20 Rainer Kaluscha 2013-11-22 06:44:31 UTC
Hi Lan, I did as requested and noticed a strange thing: 

After boot, machine is running with 4 low priority background jobs, all 4 cores at 800MhZ - as expected.

If I call s2ram directly, suspend/resume works as expected (attachment s2ram), i.e. all 4 cores at 800Mhz after resume.

However, if I use pm-suspend (attachment pm_suspend), I enter the bug - all 4 cores at 3000 MHz although there is only low priority work ...

Nevertheless, pm-suspend doesn't manipulate ignore_nice_load directly as we saw. Maybe it is triggering something by e.g. unloading some kernel modules before suspend. I'll try to go into that deeper ...

So the bug is probably not in the cpufreq code - thnx for your help anyway.
Comment 21 Rainer Kaluscha 2013-11-22 06:46:59 UTC
Created attachment 115621 [details]
dmesg after s2ram -f
Comment 22 Rainer Kaluscha 2013-11-22 06:47:46 UTC
Created attachment 115631 [details]
dmesg after pm-suspend
Comment 23 Viresh Kumar 2013-11-22 06:51:37 UTC
On 22 November 2013 12:14,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> So the bug is probably not in the cpufreq code - thnx for your help anyway.

I am sort of relieved now.. I understand CPUFreq code very well and
I wasn't able to guess which part screwed it up :)
Comment 24 Rainer Kaluscha 2013-11-22 07:00:38 UTC
On a quick glance at the dmesg output I already got a suspect:

Calling s2ram directly gives "Restoring governor ONDEMAND for cpu 1" while using pm-suspend leads to "Restoring governor PERFORMANCE for cpu 1" ...

Thnx again,
Rainer
Comment 25 Lan Tianyu 2013-11-22 08:34:58 UTC
yes, this looks like a user space issue. So mark this bug as invalid.
Comment 26 Rainer Kaluscha 2013-11-22 10:06:37 UTC
Thnx to your help I tracked the issue down:

pm-suspend sets - for reasons unknown to me - temporarily a new governor (performance) before suspending and re-enables the old one after resume. 

However, it seems to fail in storing also the corresponding settings of the governor so all non-default settings like ignore_nice_load=1 are lost :-(

Thnx again,
Rainer

P.S. pm_suspend uses the interface under /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor so the issue didn't arise with older kernels where that option wasn't present or enabled.