Bug 77201

Summary: CPU online hangs, works when powernow-k8 is UN-loaded
Product: Platform Specific/Hardware Reporter: Mauro Santos (registo.mailling)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: NEEDINFO ---    
Severity: normal CC: a.p.zijlstra, lenb, registo.mailling, rjw, srivatsa, viresh.kumar
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.14+ Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Bisect log
CPU info
dmidecode
lspci
lsmod
Revert-of-commit-c3274763bfc.patch
Boot log
Governor change (ondemand -> performance)
CPU offline/online
config file
CPU offline, powenow-k8 loaded, ondemand cpu gov.
CPU online hang log
light-weight debug prints for cpufreq core: for debugging this problem
Cleaner hang log
CPU offline/online - OK
Frequency transition hang
From Viresh's tree, commit 0f92fd8
From Viresh's tree, commit 3610fd8
Log from commit 5bdfff96c69a
Log from commit fada94ee64e6
Hang test script
stop_transition before cpufreq_notify_transition()
stop_transition after cpufreq_notify_transition()
powernow-k8-post1 bisect log

Description Mauro Santos 2014-06-02 15:46:32 UTC
Created attachment 137861 [details]
Bisect log

Like the description says, I am unable to get my laptop to hibernate or resume from suspend. I have noticed this started after kernel 3.14 hit the repos of my distribution.

The symptom is that the machine suspends successfully but completely hangs when resuming (ctrl-alt-del or sysrq don't work at all). The same happens when trying to hibernate, that is, the machine hangs before completing the hibernation process.

I have tried to find where things go wrong by doing echoing freezer, devices, platform, processors to /sys/power/pm_test and then echoing mem to /sys/power/state.

The tests work successfully until I get to processors. I have found that if powernow-k8 is unloaded before suspending/hibernating things work as expected.

I have tried the latest 3.15-rc5 from git and I can still reproduce the problem. I have also tried to bisect to find the commit that causes the regression but I have ended at commits that I had to skip (they would not compile).

The bisect log and some other info that might be useful can be found in the attachments.
Comment 1 Mauro Santos 2014-06-02 15:47:10 UTC
Created attachment 137871 [details]
CPU info
Comment 2 Mauro Santos 2014-06-02 15:47:30 UTC
Created attachment 137881 [details]
dmidecode
Comment 3 Mauro Santos 2014-06-02 15:47:48 UTC
Created attachment 137891 [details]
lspci
Comment 4 Mauro Santos 2014-06-02 15:48:04 UTC
Created attachment 137901 [details]
lsmod
Comment 5 Mauro Santos 2014-06-02 15:53:28 UTC
Make that 3.15-rc8 (3.15-rc5 was a typo).
Comment 6 Rafael J. Wysocki 2014-06-02 19:01:12 UTC
Does CPU offline/online (via sysfs) work on this machine with 3.15-rc8?
Comment 7 Mauro Santos 2014-06-02 19:49:42 UTC
I assume you wanted me to try it with powernow-k8 loaded.

When doing 'echo 0 > /sys/devices/system/cpu/cpu1/online' the machine keeps working and I can see that cpu1 is gone from /proc/interrupts (as stated in cpu-hotplug.txt).

After trying to get the cpu online again with 'echo 1 > /sys/devices/system/cpu/cpu1/online' the machine hangs.

CPU0 can't be put offline as there is no /sys/devices/system/cpu/cpu0/online even when booting with cpu0_hotplug, but I suppose that's expected.
Comment 8 Len Brown 2014-06-09 17:52:03 UTC
> After trying to get the cpu online again ... the machine hangs.

Thanks for testing.
So the answer is "no", cpu online/offline does not work.
Comment 9 Len Brown 2014-06-09 17:53:15 UTC
so the cpu offline/online works when powernow-k8 is UN loaded?
Comment 10 Mauro Santos 2014-06-09 19:05:59 UTC
If powernow-k8 is UN loaded then cpu offline/online works fine.
Comment 11 Rafael J. Wysocki 2014-06-09 20:37:21 UTC
Viresh, Srivatsa, this looks like fallout of some recent core cpufreq changes related to CPU offline/online.  Can you please have a look at this bug entry?
Comment 12 Srivatsa S. Bhat 2014-06-09 21:58:55 UTC
Hi,

I took a quick look at the CPU hotplug path in both cpufreq.c and powernow-k8.c, but nothing really appeared to stand out.

Does reverting commit c3274763bfc (cpufreq: powernow-k8: Initialize per-cpu data-structures properly) help? I just want to know if that is a suspect or not, since that commit went in around 3.14 and is the only relevant commit in powernow-k8 driver during that time. I'll attach the reverted patch with this bugzilla for your reference.

Also, please give the output of:
cat /sys/devices/system/cpu/cpuNum/cpufreq/related_cpus
cat /sys/devices/system/cpu/cpuNum/cpufreq/affected_cpus

for both CPUs 0 and 1 (just replace cpuNum with 0 and 1).

One other thing: what is the exact version of the earliest failing kernel? Is it vanilla 3.14 or a stable version of 3.14 (eg: 3.14.x)? (For pass/fail testing, please use the sysfs method with powernow-k8 loaded, like you did in comment 7. Once we get this fixed, you can try the full suspend/resume tests).

Thank you!

Regards,
Srivatsa S. Bhat
Comment 13 Srivatsa S. Bhat 2014-06-09 22:01:21 UTC
Created attachment 138661 [details]
Revert-of-commit-c3274763bfc.patch

Patch to revert commit c3274763bfc (cpufreq: powernow-k8: Initialize per-cpu data-structures properly).
Comment 14 Mauro Santos 2014-06-09 23:15:24 UTC
I have tested v3.15-7257-g963649d and applied the patch you attached and it still hangs when trying to bring the cpu online.

The earliest 3.14 I could quickly test right now was 3.14.1 (provided by the distro I'm using) and it also fails when trying to bring the cpu online. If it helps, 3.13.8 worked fine (suspend and hibernate, didn't actually test cpu offline/online).

Here is the other information you requested:
cat /sys/devices/system/cpu/cpu0/cpufreq/related_cpus 
0 1
cat /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus 
0 1

cat /sys/devices/system/cpu/cpu1/cpufreq/related_cpus 
0 1
cat /sys/devices/system/cpu/cpu1/cpufreq/affected_cpus 
0 1
Comment 15 Viresh Kumar 2014-06-10 04:41:19 UTC
Hi,

Can you please add these to top of your drivers/cpufreq/Makefile

ccflags-y := -DDEBUG
ccflags-y += -DVERBOSE_DEBUG16:34

and then give us bootlogs & offline+online logs?
Comment 16 Srivatsa S. Bhat 2014-06-10 05:26:38 UTC
Hi,

Here are a few more suggestions you can try:

1. Use either performance or powersave governor:

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

(You can use either performance or powersave; you don't have to try both).

The ondemand governor (which is usually the default) is complex and uses timers, workqueues etc and interacts with the cpufreq CPU hotplug code, whereas the above 2 governors are extremely simple and don't have any complex interactions with hotplug. So trying them would help isolate the problem and understand whether it is a cpufreq core issue or a cpufreq governor issue.

2. Enable lockdep debugging, specifically the config options mentioned below, so that we'll know if we have a locking issue:

CONFIG_LOCKDEP_SUPPORT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y

You can try the above 2 things along with the suggestions Viresh gave, in a single run, and let us know how it goes.

Regards,
Srivatsa S. Bhat
Comment 17 Mauro Santos 2014-06-10 12:44:19 UTC
I have done as asked and added the ccflags-y to the top of drivers/cpufreq/Makefile and enabled the specified config options.

I have separated the logs in 3 files: boot log, governor change (ondemand->performance) log and cpu offline/online log.

When using the performance governor, cpu offline/online works fine, but if afterterwards I try to change the governor back to ondemand the machine hangs, it doesn't matter for which cpu I try to change the governor first.

I'm also attaching the config I used, just in case anyone wants to check which config options were used.
Comment 18 Mauro Santos 2014-06-10 12:44:46 UTC
Created attachment 138871 [details]
Boot log
Comment 19 Mauro Santos 2014-06-10 12:45:27 UTC
Created attachment 138881 [details]
Governor change (ondemand -> performance)
Comment 20 Mauro Santos 2014-06-10 12:45:47 UTC
Created attachment 138891 [details]
CPU offline/online
Comment 21 Mauro Santos 2014-06-10 12:46:15 UTC
Created attachment 138901 [details]
config file
Comment 22 Viresh Kumar 2014-06-10 14:06:01 UTC
Hi,

Have you missed the most important log? Offline/online with
Hang..

In your case both CPUs share clock line and so it doesn't matter
which cpu you play with.

Sent from my iPad

> On 10-Jun-2014, at 6:14 pm, bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=77201
> 
> --- Comment #17 from registosites@hotmail.com ---
> I have done as asked and added the ccflags-y to the top of
> drivers/cpufreq/Makefile and enabled the specified config options.
> 
> I have separated the logs in 3 files: boot log, governor change
> (ondemand->performance) log and cpu offline/online log.
> 
> When using the performance governor, cpu offline/online works fine, but if
> afterterwards I try to change the governor back to ondemand the machine
> hangs,
> it doesn't matter for which cpu I try to change the governor first.
> 
> I'm also attaching the config I used, just in case anyone wants to check
> which
> config options were used.
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 23 Mauro Santos 2014-06-10 14:42:22 UTC
> Hi,
> 
> Have you missed the most important log? Offline/online with
> Hang..
> 
> In your case both CPUs share clock line and so it doesn't matter
> which cpu you play with.

I don't know how to get the log from when the machine hangs, I've tried:

echo 1 > /sys/devices/system/cpu/cpu1/online ; dmesg > hang.log

When trying to execute that the machine hangs and I have to do a hard reset. After booting and fsck I don't have any traces of hang.log in the filesystem.

If there is any other way to get the logs I'll need guidance or pointers on how to do it. In any case  I'm attaching the log of cpu offline with the ondemand cpu governor and powernow-k8 loaded.
Comment 24 Mauro Santos 2014-06-10 14:43:04 UTC
Created attachment 138911 [details]
CPU offline, powenow-k8 loaded, ondemand cpu gov.
Comment 25 Viresh Kumar 2014-06-10 14:47:06 UTC
Probably just do

cat /proc/kmsg &

offline then online cpu

and simply copy logs from minicom ?
Comment 26 Srivatsa S. Bhat 2014-06-10 15:35:45 UTC
Another (although crude) method:
On most setups, you can go to the console by pressing Ctrl+Alt+F3 or any of the function keys. Go to the console this way, login and run the cpu offline/online command. The kernel logs will get printed directly in front of you on the console. Then take a picture and attach it with the bugzilla :-)
Of course, choose whatever method is convenient for you.

Regards,
Srivatsa S. Bhat
Comment 27 Mauro Santos 2014-06-10 16:47:55 UTC
I've tried using the crude log method, with
cat /proc/kmsg &
and then blindly typing the commands to offline/online the cpu.

The result is attached as a picture.
Comment 28 Mauro Santos 2014-06-10 16:48:26 UTC
Created attachment 138931 [details]
CPU online hang log
Comment 29 Rafael J. Wysocki 2014-06-10 19:04:01 UTC
Did 3.13 work for you correctly?
Comment 30 Mauro Santos 2014-06-10 19:53:34 UTC
I have just tested 3.13.8* and I can confirm that cpu offline/online works without problems.

* This is the latest 3.13 I have quick and easy access to (it is the last 3.13 provided by my distro).
Comment 31 Viresh Kumar 2014-06-11 10:28:54 UTC
Hi,

Thanks for your efforts !!

We couldn't get the complete picture as there are just too many prints wanting to fit on screen :)

Can you try this to get the right set of prints for us, that way we might be able to focus in the right direction:

- Remove all pr_debug() lines from powernow-k8 driver
- And apply attached patch for cpufreq core, only reducing number of print messages. (Rebased over 3.15)

And then give us another screenshot..

Also some queries about your current logs:
- I hope this was the last message you saw on screen and it just became unresponsive?

freq_table: target index is 0, freq is:2200000 kHz

and the expected ones after this are:

powernow_k8: targ: cpu 0, 2200000 kHz, min 800000, max 2200000
powernow_k8: targ: curr fid 0x8, vid 0x15
powernow_k8: cpu 0 transition to index 0
powernow_k8: table matched fid 0xe, giving vid 0x12
powernow_k8: cpu 0, changing to fid 0xe, vid 0x12

So, it looks like the CPU did came back and something happened while changing
freq to max.

Somehow this problem is related to something special being done in your driver. We don't see this problem otherwise for other platforms.

One thing i could figure out is scheduling a *work* for changing frequencies but I am not sure if the problem is related to that..

I tried to have a look at what changed between 3.13.8 and 3.14, and couldn't figure out anything special that might end up in this issue :(


If you couldn't get anything conclusive with above tests then there might be some chances that it *isn't* related to cpufreq and some other changes in kernel are responsible. The best we can try is: get only cpufreq back to the old state, i.e. 3.13.8, by reverting commits and try again..

few reverts were required for this and to simplify your work I have created a branch with all reverts required.

git://git.linaro.org/people/viresh.kumar/mylinux.git powernow-k8-debugging

NOTE: We can still see some diff this way:
git diff v3.13.8..powernow-k8-debugging -- drivers/cpufreq/ -- include/linux/cpufreq.h

But there shouldn't be any *functional* change for cpufreq.c/governors/freq-table.c/powernow-k8 drivers and all cpufreq files do compile.. (haven't tried compiling powernow-k8)..

Let me know if this solves your problem or make it worst :)
Comment 32 Viresh Kumar 2014-06-11 10:29:57 UTC
Created attachment 139091 [details]
light-weight debug prints for cpufreq core: for debugging this problem
Comment 33 Srivatsa S. Bhat 2014-06-11 11:01:37 UTC
(In reply to Viresh Kumar from comment #31)
 
> Also some queries about your current logs:
> - I hope this was the last message you saw on screen and it just became
> unresponsive?
> 
> freq_table: target index is 0, freq is:2200000 kHz
> 
> and the expected ones after this are:
> 
> powernow_k8: targ: cpu 0, 2200000 kHz, min 800000, max 2200000
> powernow_k8: targ: curr fid 0x8, vid 0x15
> powernow_k8: cpu 0 transition to index 0
> powernow_k8: table matched fid 0xe, giving vid 0x12
> powernow_k8: cpu 0, changing to fid 0xe, vid 0x12
> 
> So, it looks like the CPU did came back and something happened while changing
> freq to max.
> 
> Somehow this problem is related to something special being done in your
> driver. We don't see this problem otherwise for other platforms.
>

Right.. I spent all night yesterday trying to figure out what could be the bug, but I didn't find any leads :-(
 
> One thing i could figure out is scheduling a *work* for changing frequencies
> but I am not sure if the problem is related to that..
>

Yeah, that looked odd to me too!
 
> I tried to have a look at what changed between 3.13.8 and 3.14, and couldn't
> figure out anything special that might end up in this issue :(
> 

Not sure if any workqueue changes between those versions have any effect on the CPU hotplug path for powernow-k8 cpufreq driver.

> 
> If you couldn't get anything conclusive with above tests then there might be
> some chances that it *isn't* related to cpufreq and some other changes in
> kernel are responsible. The best we can try is: get only cpufreq back to the
> old state, i.e. 3.13.8, by reverting commits and try again..
> 

I agree. I was about to suggest something similar.

> few reverts were required for this and to simplify your work I have created
> a branch with all reverts required.
> 
> git://git.linaro.org/people/viresh.kumar/mylinux.git powernow-k8-debugging
>

Since Viresh has already set this up, you can try this. If that doesn't help, I'll give you a list of possible suspect commits in cpufreq between the 2 versions to try out. The ones at the top of my list would be: 


12478cf0c55e (cpufreq: Make sure frequency transitions are serialized)

e0b3165ba521 (cpufreq: add 'freq_table' in struct cpufreq_policy)

4e97b631f24 (cpufreq: Initialize governor for a new policy under policy->rwsem)

1c0ca90207 (cpufreq: don't call cpufreq_update_policy() on CPU addition)

6f1e4efd882 (cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled)

Regards,
Srivatsa S. Bhat
Comment 34 Viresh Kumar 2014-06-11 11:04:52 UTC
(In reply to Srivatsa S. Bhat from comment #33)

> Since Viresh has already set this up, you can try this. If that doesn't
> help, I'll give you a list of possible suspect commits in cpufreq between

What else can be there once I have almost got it back to 3.13.8 ?

> the 2 versions to try out. The ones at the top of my list would be: 

> 12478cf0c55e (cpufreq: Make sure frequency transitions are serialized)
> 
> e0b3165ba521 (cpufreq: add 'freq_table' in struct cpufreq_policy)

Remember, he is getting the problem on 3.14 itself and so no fun playing with patches that went into 3.15, right? And if I am not wrong these went recently only.
Comment 35 Srivatsa S. Bhat 2014-06-11 11:12:28 UTC
(In reply to Viresh Kumar from comment #34)
> (In reply to Srivatsa S. Bhat from comment #33)
> 
> > Since Viresh has already set this up, you can try this. If that doesn't
> > help, I'll give you a list of possible suspect commits in cpufreq between
> 
> What else can be there once I have almost got it back to 3.13.8 ?
> 

No, you misunderstood my intention. Of course your branch is a much
stricter one with all new cpufreq commits reverted.

I was just trying to point out commits which came after 3.13.8, in case even your branch (3.13.8 along with cpufreq reverts) ends up passing his tests. In that case we'll have to narrow down *within* the commits between v3.13.8 and v3.14.1, no? My list was to help with that.

> > the 2 versions to try out. The ones at the top of my list would be: 
> 
> > 12478cf0c55e (cpufreq: Make sure frequency transitions are serialized)
> > 
> > e0b3165ba521 (cpufreq: add 'freq_table' in struct cpufreq_policy)
> 
> Remember, he is getting the problem on 3.14 itself and so no fun playing
> with patches that went into 3.15, right? And if I am not wrong these went
> recently only.

Right, sorry, I was doing a diff between 3.13.8 and 3.15-rc8 (based on his first report) by mistake. Ignore these 2 commits, they are recent. But the other ones I listed came in between v3.13.8 and v3.14, so they might be worth trying.

Regards,
Srivatsa S. Bhat
Comment 36 Viresh Kumar 2014-06-11 11:18:20 UTC
(In reply to Srivatsa S. Bhat from comment #35)
> No, you misunderstood my intention. Of course your branch is a much
> stricter one with all new cpufreq commits reverted.
> 
> I was just trying to point out commits which came after 3.13.8, in case even
> your branch (3.13.8 along with cpufreq reverts) ends up passing his tests.
> In that case we'll have to narrow down *within* the commits between v3.13.8
> and v3.14.1, no? My list was to help with that.

Ahh, sorry..

But in that case, wouldn't a git bisect between v3.13.8..my-branch would be easier?
Comment 37 Srivatsa S. Bhat 2014-06-11 11:37:53 UTC
(In reply to Viresh Kumar from comment #36)
> (In reply to Srivatsa S. Bhat from comment #35)
> > No, you misunderstood my intention. Of course your branch is a much
> > stricter one with all new cpufreq commits reverted.
> > 
> > I was just trying to point out commits which came after 3.13.8, in case
> even
> > your branch (3.13.8 along with cpufreq reverts) ends up passing his tests.
> > In that case we'll have to narrow down *within* the commits between v3.13.8
> > and v3.14.1, no? My list was to help with that.
> 
> Ahh, sorry..
> 
> But in that case, wouldn't a git bisect between v3.13.8..my-branch would be
> easier?

Yes, a git bisect would have been ideal, but apparently it is failing due to build errors in intermediate commits (see his first report and bisect log) :-(
So we might have to manually pick suspect commits to test...

Regards,
Srivatsa S. Bhat
Comment 38 Viresh Kumar 2014-06-11 12:02:29 UTC
(In reply to Srivatsa S. Bhat from comment #37)
> (In reply to Viresh Kumar from comment #36)

> > But in that case, wouldn't a git bisect between v3.13.8..my-branch would be

s/3.13.8/3.14

> > easier?
> 
> Yes, a git bisect would have been ideal, but apparently it is failing due to
> build errors in intermediate commits (see his first report and bisect log)
> :-(
> So we might have to manually pick suspect commits to test...

Does the above change make it any better ?

We only have few revert patches (GOOD) on top of 3.14 (BAD) and can easily make out which one is making 3.14 good :)
Comment 39 Srivatsa S. Bhat 2014-06-11 12:06:00 UTC
(In reply to Viresh Kumar from comment #38)
> (In reply to Srivatsa S. Bhat from comment #37)
> > (In reply to Viresh Kumar from comment #36)
> 
> > > But in that case, wouldn't a git bisect between v3.13.8..my-branch would
> be
> 
> s/3.13.8/3.14
> 
> > > easier?
> > 
> > Yes, a git bisect would have been ideal, but apparently it is failing due
> to
> > build errors in intermediate commits (see his first report and bisect log)
> > :-(
> > So we might have to manually pick suspect commits to test...
> 
> Does the above change make it any better ?
> 
> We only have few revert patches (GOOD) on top of 3.14 (BAD) and can easily
> make out which one is making 3.14 good :)

I hope so :-)

Regards,
Srivatsa S. Bhat
Comment 40 Mauro Santos 2014-06-11 13:50:33 UTC
> Hi,
> 
> Thanks for your efforts !!
> 
> We couldn't get the complete picture as there are just too many prints
> wanting to fit on screen :)
> 
> Can you try this to get the right set of prints for us, that way we might be
> able to focus in the right direction:
> 
> - Remove all pr_debug() lines from powernow-k8 driver
> - And apply attached patch for cpufreq core, only reducing number of print
> messages. (Rebased over 3.15)

I have deleted all pr_debug() lines and recompiled with all the patches, modifications and config options from previous posts.

> 
> And then give us another screenshot..

The new screenshot is attached. After bringing the cpu online I got back to a prompt but the machine hangs.

> 
> Also some queries about your current logs:
> - I hope this was the last message you saw on screen and it just became
> unresponsive?
> 
> freq_table: target index is 0, freq is:2200000 kHz
> 

That was the last output before the machine became unresponsive.

> and the expected ones after this are:
> 
> powernow_k8: targ: cpu 0, 2200000 kHz, min 800000, max 2200000
> powernow_k8: targ: curr fid 0x8, vid 0x15
> powernow_k8: cpu 0 transition to index 0
> powernow_k8: table matched fid 0xe, giving vid 0x12
> powernow_k8: cpu 0, changing to fid 0xe, vid 0x12
> 
> So, it looks like the CPU did came back and something happened while changing
> freq to max.
> 
> Somehow this problem is related to something special being done in your
> driver. We don't see this problem otherwise for other platforms.
> 
> One thing i could figure out is scheduling a *work* for changing frequencies
> but I am not sure if the problem is related to that..
> 
> I tried to have a look at what changed between 3.13.8 and 3.14, and couldn't
> figure out anything special that might end up in this issue :(
> 
> 
> If you couldn't get anything conclusive with above tests then there might be
> some chances that it *isn't* related to cpufreq and some other changes in
> kernel are responsible. The best we can try is: get only cpufreq back to the
> old state, i.e. 3.13.8, by reverting commits and try again..
> 
> few reverts were required for this and to simplify your work I have created
> a branch with all reverts required.
> 
> git://git.linaro.org/people/viresh.kumar/mylinux.git powernow-k8-debugging
> 
> NOTE: We can still see some diff this way:
> git diff v3.13.8..powernow-k8-debugging -- drivers/cpufreq/ --
> include/linux/cpufreq.h
> 
> But there shouldn't be any *functional* change for
> cpufreq.c/governors/freq-table.c/powernow-k8 drivers and all cpufreq files
> do compile.. (haven't tried compiling powernow-k8)..
> 
> Let me know if this solves your problem or make it worst :)

I didn't manage to compile this one, it fails with:

  CC      drivers/cpuidle/sysfs.o
  CC      lib/dynamic_debug.o
  CC [M]  drivers/cpufreq/acpi-cpufreq.o
  CC      drivers/cpuidle/governors/ladder.o
drivers/cpufreq/acpi-cpufreq.c: In function ‘store_boost’:
drivers/cpufreq/acpi-cpufreq.c:160:26: error: ‘struct cpufreq_driver’ has no member named ‘boost_supported’
  if (!acpi_cpufreq_driver.boost_supported)
                          ^
drivers/cpufreq/acpi-cpufreq.c: In function ‘show_cpb’:
drivers/cpufreq/acpi-cpufreq.c:180:49: error: ‘struct cpufreq_driver’ has no member named ‘boost_enabled’
  return sprintf(buf, "%u\n", acpi_cpufreq_driver.boost_enabled);
                                                 ^
drivers/cpufreq/acpi-cpufreq.c: In function ‘boost_notify’:
drivers/cpufreq/acpi-cpufreq.c:540:37: error: ‘struct cpufreq_driver’ has no member named ‘boost_enabled’
   boost_set_msrs(acpi_cpufreq_driver.boost_enabled, cpumask);
                                     ^
drivers/cpufreq/acpi-cpufreq.c: At top level:
drivers/cpufreq/acpi-cpufreq.c:897:2: error: unknown field ‘set_boost’ specified in initializer
  .set_boost      = _store_boost,
  ^
drivers/cpufreq/acpi-cpufreq.c:897:2: warning: excess elements in struct initializer
drivers/cpufreq/acpi-cpufreq.c:897:2: warning: (near initialization for ‘acpi_cpufreq_driver’)
drivers/cpufreq/acpi-cpufreq.c: In function ‘acpi_cpufreq_boost_init’:
drivers/cpufreq/acpi-cpufreq.c:908:22: error: ‘struct cpufreq_driver’ has no member named ‘boost_supported’
   acpi_cpufreq_driver.boost_supported = true;
                      ^
drivers/cpufreq/acpi-cpufreq.c:909:22: error: ‘struct cpufreq_driver’ has no member named ‘boost_enabled’
   acpi_cpufreq_driver.boost_enabled = boost_state(0);
                      ^
drivers/cpufreq/acpi-cpufreq.c:913:37: error: ‘struct cpufreq_driver’ has no member named ‘boost_enabled’
   boost_set_msrs(acpi_cpufreq_driver.boost_enabled,
                                     ^
drivers/cpufreq/acpi-cpufreq.c: In function ‘show_cpb’:
drivers/cpufreq/acpi-cpufreq.c:181:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^
scripts/Makefile.build:314: recipe for target 'drivers/cpufreq/acpi-cpufreq.o' failed
make[2]: *** [drivers/cpufreq/acpi-cpufreq.o] Error 1
scripts/Makefile.build:455: recipe for target 'drivers/cpufreq' failed
make[1]: *** [drivers/cpufreq] Error 2
make[1]: *** Waiting for unfinished jobs....
  CC      drivers/cpuidle/governors/menu.o
  CC      lib/nlattr.o
  CC      net/core/rtnetlink.o
  LD      drivers/cpuidle/governors/built-in.o
  LD      drivers/cpuidle/built-in.o
Makefile:841: recipe for target 'drivers' failed
make: *** [drivers] Error 2
make: *** Waiting for unfinished jobs....
  CC      net/core/utils.o
  CC      lib/average.o
  CC      lib/cpu_rmap.o
  CC      lib/dynamic_queue_limits.o
Comment 41 Mauro Santos 2014-06-11 13:51:37 UTC
Created attachment 139111 [details]
Cleaner hang log
Comment 42 Viresh Kumar 2014-06-11 13:55:45 UTC
(In reply to registosites from comment #40)
> That was the last output before the machine became unresponsive.

Hmm, so it looks like something happened while changing
frequency and core is doing things correctly otherwise.

> I didn't manage to compile this one, it fails with:
> 
>   CC      drivers/cpuidle/sysfs.o
>   CC      lib/dynamic_debug.o
>   CC [M]  drivers/cpufreq/acpi-cpufreq.o
>   CC      drivers/cpuidle/governors/ladder.o
> drivers/cpufreq/acpi-cpufreq.c: In function ‘store_boost’:
> drivers/cpufreq/acpi-cpufreq.c:160:26: error: ‘struct cpufreq_driver’ has no

I intentionally didn't revert unrelated drivers completely and so this. But can't you just disable acpi-cpufreq driver and test? I thought yes, and so did this. Otherwise I would update my branch for you.
Comment 43 Mauro Santos 2014-06-11 14:10:41 UTC
Ahh ok, I'll disable acpi-cpufreq and give it another try.
Comment 44 Mauro Santos 2014-06-11 14:21:48 UTC
I think acpi-cpufreq is needed, unless I'm not seeing this correctly (which is quite likely). Using nconfig and asking for SymInfo on AMD Opteron/Athlon64 PowerNow! (which is required to test the problem at hand) I get:

CONFIG_X86_POWERNOW_K8:
(...)
Depends on: CPU_FREQ [=y] && X86 [=y] && ACPI [=y] && ACPI_PROCESSOR [=m] && X86_ACPI_CPUFREQ [=m]

Doesn't this mean acpi-cpufreq is needed?
Comment 45 Viresh Kumar 2014-06-11 14:42:02 UTC
(In reply to registosites from comment #44)
> Depends on: CPU_FREQ [=y] && X86 [=y] && ACPI [=y] && ACPI_PROCESSOR [=m] &&
> X86_ACPI_CPUFREQ [=m]

I don't have a clue why does it depend on that :(

Anyway, I have updated my branch with two more commits (at top) and it does compile now.

In case these revert fixes it for you and you try to identify which revert fixed it, you might also need to apply top revert to get things compiled.
Comment 46 Mauro Santos 2014-06-11 15:35:43 UTC
No luck with these also, machine still hangs after bringing the cpu online :(
Comment 47 Viresh Kumar 2014-06-11 15:59:03 UTC
(In reply to registosites from comment #46)
> No luck with these also, machine still hangs after bringing the cpu online :(

I expected this as as I couldn't find anything significant in cpufreq side.

@Srivatsa: Please have a look at diff between v3.13.8..my-branch -- drivers/cpufreq to see if I have missed something that could break stuff here.

Now what? I don't think something can be wrong with workqueues and it might be some ACPI problem. Not very sure though.

Can you try bisecting between 3.13 and 3.14 again? And try fixing/skipping commits with compile failures ?? Its very difficult to narrow it down otherwise.

@Rafael: any guesses?
Comment 48 Srivatsa S. Bhat 2014-06-11 17:50:57 UTC
Hmm, this just became even more hard :-(

Viresh, sure, I'll check the diff tomorrow to see if there is any obvious breakage.

Meanwhile, can the bug reporter please test these commits that I mentioned earlier? (6f1e4efd882 is my first suspect commit after 3.13.8)

6f1e4efd882 (cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled)

1c0ca90207 (cpufreq: don't call cpufreq_update_policy() on CPU addition)

4e97b631f24 (cpufreq: Initialize governor for a new policy under policy->rwsem)

I know that the issue might not be in cpufreq (especially after your test results from Viresh's branch). But I still I have some tiny hope that we might have missed something in cpufreq... Besides, bisecting within the cpufreq commits is a lot faster than bisecting everything between 3.13.8 and 3.14, in case the bug is really in cpufreq.

By the way, testing the above commits means that you need to test 1 commit before the commits mentioned above and see if it works, and then go to the mentioned commit and see if it fails.

Regards,
Srivatsa S. Bhat
Comment 49 Srivatsa S. Bhat 2014-06-11 18:40:26 UTC
By the way, another approach I have in mind is to remove the usage of work_on_cpu() from the powernow-k8 driver (since that's the only noticeably different thing about this driver), and check if it solves the problem. That way, powernow-k8 would become mostly similar to all other cpufreq drivers and this might make it easier to isolate the issue.  (Of course work_on_cpu() must be remove with care, such that the correctness of the code is still maintained).

Regards,
Srivatsa S. Bhat
Comment 50 Mauro Santos 2014-06-11 19:35:10 UTC
(In reply to Srivatsa S. Bhat from comment #48)
> Meanwhile, can the bug reporter please test these commits that I mentioned
> earlier? (6f1e4efd882 is my first suspect commit after 3.13.8)
> 
> 6f1e4efd882 (cpufreq: Fix timer/workqueue corruption by protecting reading
> governor_enabled)
> 
> 1c0ca90207 (cpufreq: don't call cpufreq_update_policy() on CPU addition)
> 
> 4e97b631f24 (cpufreq: Initialize governor for a new policy under
> policy->rwsem)

I'll test the commits before these and these commits and post back when I have the results.

Should I also test

e0b3165ba521 (cpufreq: add 'freq_table' in struct cpufreq_policy)
12478cf0c55e (cpufreq: Make sure frequency transitions are serialized)

in case the previous ones work/don't work?
Comment 51 Rafael J. Wysocki 2014-06-11 19:42:28 UTC
(In reply to Srivatsa S. Bhat from comment #49)
> By the way, another approach I have in mind is to remove the usage of
> work_on_cpu() from the powernow-k8 driver (since that's the only noticeably
> different thing about this driver), and check if it solves the problem.

That is a good idea.  At least we could narrow down the issue this way.
Comment 52 Rafael J. Wysocki 2014-06-11 19:43:00 UTC
(In reply to registosites from comment #50)
> (In reply to Srivatsa S. Bhat from comment #48)
> 
> Should I also test
> 
> e0b3165ba521 (cpufreq: add 'freq_table' in struct cpufreq_policy)
> 12478cf0c55e (cpufreq: Make sure frequency transitions are serialized)
> 
> in case the previous ones work/don't work?

Yes, please.
Comment 53 Viresh Kumar 2014-06-12 02:36:51 UTC
(In reply to Srivatsa S. Bhat from comment #48)
> Meanwhile, can the bug reporter please test these commits that I mentioned
> earlier? (6f1e4efd882 is my first suspect commit after 3.13.8)
> 
> 6f1e4efd882 (cpufreq: Fix timer/workqueue corruption by protecting reading
> governor_enabled)

It was already reverted in my branch

> 1c0ca90207 (cpufreq: don't call cpufreq_update_policy() on CPU addition)

Went in 3.15

> 4e97b631f24 (cpufreq: Initialize governor for a new policy under
> policy->rwsem)

Also reverted in my branch

> I know that the issue might not be in cpufreq (especially after your test
> results from Viresh's branch). But I still I have some tiny hope that we
> might have missed something in cpufreq... Besides, bisecting within the
> cpufreq commits is a lot faster than bisecting everything between 3.13.8 and
> 3.14, in case the bug is really in cpufreq.
> 
> By the way, testing the above commits means that you need to test 1 commit
> before the commits mentioned above and see if it works, and then go to the
> mentioned commit and see if it fails.

Actually I couldn't understand how this may help? Sorry for missing what you had in mind :(
Comment 54 Viresh Kumar 2014-06-12 02:48:11 UTC
(In reply to Srivatsa S. Bhat from comment #49)
> By the way, another approach I have in mind is to remove the usage of
> work_on_cpu() from the powernow-k8 driver (since that's the only noticeably
> different thing about this driver), and check if it solves the problem.

I wouldn't bet my 10 cents on that :). Also I am not sure if people added a work there just for fun, there must be some constraints and we might see crashes or hangs..

But yeah, maybe we can try that as well..

I was thinking about this earlier but never shared as I thought it might not be worth.

Logs shared by $reporter of this problem clears that cpufreq and governors did all good, more or less. And it was the target_index() which never came back.. I will try having a closer look again though. $reporter also reported that offline/online works fine with performance gov but switching back to ondemand again crashes it. Which makes me believe that cpufreq core is doing everything correctly and there is some problem while switching freq after offline/online.

@Reporter (sorry we don't know your name :( ), Can you please try this?

Keep the prints as you shared in last hang pic. Set governor to userspace, 
set any cpu to min freq, offline/online cpu1, change freq to any freq (and later max freq if earlier one works) and see how things behave.. I am quite sure it wouldn't last.

If you still see a hang you might want to do this:
- Add many debug prints in your target_index() path (as gov is userspace you wouldn't see them again and again).
- And see where exactly you hang .. 
- I am quite sure work will get scheduled and so try adding as many debug prints as possible on the first go only.

HTH :)
Comment 55 Viresh Kumar 2014-06-12 02:49:41 UTC
(In reply to Rafael J. Wysocki from comment #52)
> (In reply to registosites from comment #50)
> > (In reply to Srivatsa S. Bhat from comment #48)
> > 
> > Should I also test
> > 
> > e0b3165ba521 (cpufreq: add 'freq_table' in struct cpufreq_policy)
> > 12478cf0c55e (cpufreq: Make sure frequency transitions are serialized)
> > 
> > in case the previous ones work/don't work?
> 
> Yes, please.

Actually no. These must have went into 3.15 and we are trying to debug 3.14 here. Did I miss anything ?
Comment 56 Mauro Santos 2014-06-12 09:48:10 UTC
(In reply to Viresh Kumar from comment #55)
> (In reply to Rafael J. Wysocki from comment #52)
> > (In reply to registosites from comment #50)
> > > (In reply to Srivatsa S. Bhat from comment #48)
> > > 
> > > Should I also test
> > > 
> > > e0b3165ba521 (cpufreq: add 'freq_table' in struct cpufreq_policy)
> > > 12478cf0c55e (cpufreq: Make sure frequency transitions are serialized)
> > > 
> > > in case the previous ones work/don't work?
> > 
> > Yes, please.
> 
> Actually no. These must have went into 3.15 and we are trying to debug 3.14
> here. Did I miss anything ?

I have finished testing the following:

6f1e4efd882 (cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled)
87ae97f10c0 (cpufreq: s3c24xx: Staticize local variable)
result: both ok


1c0ca90207 (cpufreq: don't call cpufreq_update_policy() on CPU addition)
d9a789c7a0 (cpufreq: Refactor cpufreq_set_policy())
result: both bad


4e97b631f24 (cpufreq: Initialize governor for a new policy under policy->rwsem)
5a7e56a5d29 (cpufreq: Initialize policy before making it available for others to use)
result: both bad

I have also tested setting the governor to userspace, setting the cpu to the minimum speed, cpu offline/online, set an intermediate cpu speed and then maximum speed.

The machine becomes unresponsive only when trying to set the cpu to the maximum speed.

If I understand correctly, the suggestion to try next is to keep the patch from 
comment 32, keep the changes I've done to powernow-k8 (remove all pr_debug()), keep the extra compile flags in drivers/cpufreq/Makefile and also add plenty of pr_debug() to drivers/cpufreq/cpufreq.c.

I assume it is here I should add more debug output since target_index() can be found only here. If I'm correct, in which places would it make more sense to add debug output and which info should I try to output?

I'm ok with doing this as long as it doesn't get too complicated, the reason is my programming skills are not that good and I have zero knowledge of the kernel's internals.
Comment 57 Viresh Kumar 2014-06-12 10:18:47 UTC
(In reply to Mauro from comment #56)

Hi Mauro, we finally know your name :)

> I have finished testing the following:
> 
> 6f1e4efd882 (cpufreq: Fix timer/workqueue corruption by protecting reading
> governor_enabled)
> 87ae97f10c0 (cpufreq: s3c24xx: Staticize local variable)
> result: both ok

So you tested your stuff on these two commits (i.e. no reverts hacks)?

> 1c0ca90207 (cpufreq: don't call cpufreq_update_policy() on CPU addition)
> d9a789c7a0 (cpufreq: Refactor cpufreq_set_policy())
> result: both bad
> 
> 
> 4e97b631f24 (cpufreq: Initialize governor for a new policy under
> policy->rwsem)
> 5a7e56a5d29 (cpufreq: Initialize policy before making it available for
> others to use)
> result: both bad

I don't know what we can make out of these tests :(..

@Srivatsa: ?

> I have also tested setting the governor to userspace, setting the cpu to the
> minimum speed, cpu offline/online, set an intermediate cpu speed and then
> maximum speed.
> 
> The machine becomes unresponsive only when trying to set the cpu to the
> maximum speed.

Great. I am more convinced now that it is not a cpufreq problem. It can still be something about your driver though.

> If I understand correctly, the suggestion to try next is to keep the patch
> from 
> comment 32, keep the changes I've done to powernow-k8 (remove all
> pr_debug()), keep the extra compile flags in drivers/cpufreq/Makefile and
> also add plenty of pr_debug() to drivers/cpufreq/cpufreq.c.

Correct!! But wait, continue reading ...

> I assume it is here I should add more debug output since target_index() can
> be found only here.

I couldn't understand that, sorry :(

> If I'm correct, in which places would it make more sense
> to add debug output and which info should I try to output?

I wouldn't be interested in any data here in prints. Just want to know what's the last thing executed before hang..

> I'm ok with doing this as long as it doesn't get too complicated, the reason
> is my programming skills are not that good and I have zero knowledge of the
> kernel's internals.

It doesn't look like, you have managed very well :)

Okay, lets simplify things a bit.. I have updated my branch again with all changes you need.. Don't add any other change and just test it.

I have added this at many locations:

+       pr_info("%s: %d\n", __func__, __LINE__);

Will print function name and line number in file.

You *will* get lots of prints now for every frequency transition.. To control these, enable only *userspace* governor from menuconfig and leave all other governors disabled. That way there wouldn't be any transition happening..

Now, repeat your earlier test with userspace governor and see which line/routine is the last one... 

You will come to know broadly about where you have reached.. Now to go to the exact statement, remove all prints I have added in powernow-k8 and add them only in your hotpath, and add more of them to the last set of functions executed.

Lets see if we can get to the last instruction which crashed :)

Probably it will be the one changing freq/voltage..
Comment 58 Mauro Santos 2014-06-12 13:22:06 UTC
> > 6f1e4efd882 (cpufreq: Fix timer/workqueue corruption by protecting reading
> > governor_enabled)
> > 87ae97f10c0 (cpufreq: s3c24xx: Staticize local variable)
> > result: both ok
> 
> So you tested your stuff on these two commits (i.e. no reverts hacks)?


Yes, I have tested with no revert hacks, and as suggested I have tested the commit preceding the suspect commit and the suspect commit.


> Okay, lets simplify things a bit.. I have updated my branch again with all
> changes you need.. Don't add any other change and just test it.
> 
> I have added this at many locations:
> 
> +       pr_info("%s: %d\n", __func__, __LINE__);
> 
> Will print function name and line number in file.
> 
> You *will* get lots of prints now for every frequency transition.. To
> control these, enable only *userspace* governor from menuconfig and leave
> all other governors disabled. That way there wouldn't be any transition
> happening..
> 
> Now, repeat your earlier test with userspace governor and see which
> line/routine is the last one... 
> 
> You will come to know broadly about where you have reached.. Now to go to
> the exact statement, remove all prints I have added in powernow-k8 and add
> them only in your hotpath, and add more of them to the last set of functions
> executed.
> 
> Lets see if we can get to the last instruction which crashed :)
> 
> Probably it will be the one changing freq/voltage..

Not much luck with this approach I'm afraid. I have compiled your updated branch with no reverts and changed the config to include only the userspace governor.

Then proceeded to test as before:
cat /proc/kmsg &
echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
echo 1800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
echo 2200000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed

When trying to change the frequency to 2.2GHz the machine becomes unresponsive and there is no output :(
I have noticed something that may or may not be important, it doesn't matter if I do
800M - cpu offline/online - 1.8G - 2.2G
or
800M - cpu offline/online - 2.2G - 1.8G

The machine always becomes unresponsive when trying to do the second frequency change after cpu offline/online. I have pictures of the process, I can attach them if you think they are important, but like I said when the machine hangs there is zero output :(
Comment 59 Viresh Kumar 2014-06-12 14:20:52 UTC
(In reply to Mauro from comment #58)
> Not much luck with this approach I'm afraid. I have compiled your updated
> branch with no reverts and changed the config to include only the userspace
> governor.
> 
> Then proceeded to test as before:
> cat /proc/kmsg &
> echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 1 > /sys/devices/system/cpu/cpu1/online
> echo 1800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> echo 2200000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
> 
> When trying to change the frequency to 2.2GHz the machine becomes
> unresponsive and there is no output :(

Were you getting a lot of prints during earlier frequency transitions?
Its awkward as we were getting enough prints earlier, what happened now ?

> I have noticed something that may or may not be important, it doesn't matter
> if I do
> 800M - cpu offline/online - 1.8G - 2.2G
> or
> 800M - cpu offline/online - 2.2G - 1.8G
> 
> The machine always becomes unresponsive when trying to do the second
> frequency change after cpu offline/online.

This surely is a important observation.

> I have pictures of the process, I
> can attach them if you think they are important, but like I said when the
> machine hangs there is zero output :(

Still I want to see how bad it is now, just attach them :)
Comment 60 Mauro Santos 2014-06-12 14:30:05 UTC
(In reply to Viresh Kumar from comment #59)
> > When trying to change the frequency to 2.2GHz the machine becomes
> > unresponsive and there is no output :(
> 
> Were you getting a lot of prints during earlier frequency transitions?
> Its awkward as we were getting enough prints earlier, what happened now ?

Yes, lots of prints, too many to fit the screen, I was referring to the exact point/step when the machine hangs.


> Still I want to see how bad it is now, just attach them :)

Will do.
Comment 61 Mauro Santos 2014-06-12 14:31:28 UTC
Created attachment 139461 [details]
CPU offline/online - OK

Frequency transition to 800MHz and cpu offline/online worked fine.
Comment 62 Mauro Santos 2014-06-12 14:34:33 UTC
Created attachment 139471 [details]
Frequency transition hang

This one is what happens when I try to do the second frequency transition, that is, after:

boot -> 800M -> cpu offline/online -> 2.2G ->(1) 1.8G

(1) machine hangs here.
Comment 63 Viresh Kumar 2014-06-12 15:14:55 UTC
(In reply to Mauro from comment #62)
> Created attachment 139471 [details]
> Frequency transition hang
> 
> This one is what happens when I try to do the second frequency transition,
> that is, after:
> 
> boot -> 800M -> cpu offline/online -> 2.2G ->(1) 1.8G
> 
> (1) machine hangs here.

Strange that we didn't got any prints at all :(

Let me try some more hacks :)
Comment 64 Viresh Kumar 2014-06-12 15:29:42 UTC
(In reply to Viresh Kumar from comment #63)
> Let me try some more hacks :)

Okay, I can understand that its taking more time than it should have but its probably more tricky that we actually thought..

Don't loose hope, we can still crack this :)

I have added one more commit now. Idea is to stop ongoing frequency transition at any point of time we want.

By default everything will work as it was working. So, continue upto the last step, where you crashed, 1.8G ..

Before changing last freq do this:

echo 1 > /sys/kernel/debug/stop_transition


Follow this in code:

+       /* HACK THIS */
+       if (stop_transition)
+               return 0;

So, we will return from whichever part of powernowk8_target_fn() we  want.
I have placed it at some point now, you can drag it in whichever direction you want.

If it works, then drag it later into the code.
Comment 65 Mauro Santos 2014-06-12 16:26:22 UTC
(In reply to Viresh Kumar from comment #64)
> (In reply to Viresh Kumar from comment #63)
> > Let me try some more hacks :)
> 
> Okay, I can understand that its taking more time than it should have but its
> probably more tricky that we actually thought..
> 
> Don't loose hope, we can still crack this :)

No worries, the bug will be nuked from orbit sooner or later :D

> 
> I have added one more commit now. Idea is to stop ongoing frequency
> transition at any point of time we want.
> 
> By default everything will work as it was working. So, continue upto the
> last step, where you crashed, 1.8G ..
> 
> Before changing last freq do this:
> 
> echo 1 > /sys/kernel/debug/stop_transition
> 
> 
> Follow this in code:
> 
> +       /* HACK THIS */
> +       if (stop_transition)
> +               return 0;
> 
> So, we will return from whichever part of powernowk8_target_fn() we  want.
> I have placed it at some point now, you can drag it in whichever direction
> you want.
> 
> If it works, then drag it later into the code.

I'll move that code around and see if I can find where it crashes. Should I move it around only inside the function where you put it now (transition_frequency_fidvid()) or also try other functions?

On that note, since I'm going to be changing only one file and it is built as a module, is it ok if I do only:
make drivers/cpufreq/powernow-k8.ko?

Also, what is the best and fastest way, if possible, to compile only what changed?
Comment 66 Srivatsa S. Bhat 2014-06-12 16:48:35 UTC
Hi Mauro,

You can continue debugging with Viresh suggestions, since that's one of the ways to debug this issue. When time permits, can you please try this commit:

commit 5bdfff96c69a4d5ab9c49e60abf9e070ecd2acbb
Author: Lai Jiangshan <laijs@cn.fujitsu.com>
Date:   Sat Feb 15 22:02:28 2014 +0800

    workqueue: ensure @task is valid across kthread_stop()

Of course, try one commit before it and see if it works, and then try the above mentioned commit and see if it fails.

The reason I chose this commit is because there are only 2 commits in the workqueue code between 3.13.8 and 3.14.1, and the other commit is not relevant to your case because it is active only if CONFIG_DEBUG_OBJECTS_WORK is set (and this option is not set in your .config).

Thank you!

Regards,
Srivatsa S. Bhat
Comment 67 Rafael J. Wysocki 2014-06-12 19:27:47 UTC
For completeness, the commit immediately preceding the one chosen by Srivatsa is

fada94ee64e6 workqueue: add args to workqueue lockdep name

so Mauro, please check if fada94ee64e6 works and then if 5bdfff96c69a fails.
Comment 68 Mauro Santos 2014-06-12 21:28:09 UTC
(In reply to Rafael J. Wysocki from comment #67)
> For completeness, the commit immediately preceding the one chosen by
> Srivatsa is
> 
> fada94ee64e6 workqueue: add args to workqueue lockdep name
> 
> so Mauro, please check if fada94ee64e6 works and then if 5bdfff96c69a fails.

I have tested both, and both hang. I have used the latest test method of using the userspace governor and both hang on the second frequency change attempt.

What I'm trying to do now is get the full text output (I've seen something scroll by that might be interesting).

I've made a serial cable and testing if it is working. Most text is ok but there are a few glitches here and there, which is expected I guess. If there are any "optimal" settings do tell :)

I have started what Viresh requested and the first try worked, but I haven't started moving the debug code to try to find where things go wrong.

I'd say my initial bisect might be very close to the source of the problem, the reason I say this is because the machine goes into suspend just fine, but when trying to resume it hangs (that was my initial testing procedure), knowing what I know now I'd say a good guess is it hangs when trying to bring the cpu online.

As a last note, I won't have much time to look into this until the weekend, so radio silence until then doesn't mean I quit ;)
Comment 69 Rafael J. Wysocki 2014-06-12 23:41:39 UTC
(In reply to Mauro from comment #68)
> (In reply to Rafael J. Wysocki from comment #67)
> > For completeness, the commit immediately preceding the one chosen by
> > Srivatsa is
> > 
> > fada94ee64e6 workqueue: add args to workqueue lockdep name
> > 
> > so Mauro, please check if fada94ee64e6 works and then if 5bdfff96c69a
> fails.
> 
> I have tested both, and both hang. I have used the latest test method of
> using the userspace governor and both hang on the second frequency change
> attempt.

Well, the scheduler changes are left then.

> I'd say my initial bisect might be very close to the source of the problem,
> the reason I say this is because the machine goes into suspend just fine,
> but when trying to resume it hangs (that was my initial testing procedure),
> knowing what I know now I'd say a good guess is it hangs when trying to
> bring the cpu online.

Oh, we know that for sure. :-)

That's why I asked whether or not CPU online worked via sysfs.
Comment 70 Viresh Kumar 2014-06-13 04:14:17 UTC
(In reply to Rafael J. Wysocki from comment #69)
> Well, the scheduler changes are left then.

Maybe some ACPI stuff as well ?
Comment 71 Viresh Kumar 2014-06-13 04:20:31 UTC
(In reply to Mauro from comment #68)
> I have started what Viresh requested and the first try worked, but I haven't
> started moving the debug code to try to find where things go wrong.

Great, and we reach here only after work is scheduled.

So. Srivatsa lost his bet :) and its not scheduler as well..

I have updated my tree again to get to the real target, transition_fid_vid(). This is calling few routines to change freq/volt and we can check them one by one. I am quite hopeful this will work as well and next few may break .

Give us a good news this time :)

> I'd say my initial bisect might be very close to the source of the problem,

You meant the new hack ?

> the reason I say this is because the machine goes into suspend just fine,
> but when trying to resume it hangs (that was my initial testing procedure),
> knowing what I know now I'd say a good guess is it hangs when trying to
> bring the cpu online.

All thanks to Rafael for his guess. It was perfect!!
Comment 72 Viresh Kumar 2014-06-13 05:18:51 UTC
(In reply to Viresh Kumar from comment #45)
> I don't have a clue why does it depend on that :(

Now, I know why. This can very much be some ACPI issue considering so much dependency it has on it and I am not good at ACPI :(

631e8ac powernow-k8: Add a kconfig dependency on acpi-cpufreq
Comment 73 Srivatsa S. Bhat 2014-06-13 10:49:52 UTC
(In reply to Viresh Kumar from comment #71)
> (In reply to Mauro from comment #68)
> > I have started what Viresh requested and the first try worked, but I
> haven't
> > started moving the debug code to try to find where things go wrong.
> 
> Great, and we reach here only after work is scheduled.
> 
> So. Srivatsa lost his bet :) and its not scheduler as well..
> 

Haha :-) Good to know we have made some more progress :-)

Regards,
Srivatsa S. Bhat
Comment 74 Rafael J. Wysocki 2014-06-13 21:23:01 UTC
(In reply to Viresh Kumar from comment #72)
> (In reply to Viresh Kumar from comment #45)
> > I don't have a clue why does it depend on that :(
> 
> Now, I know why. This can very much be some ACPI issue considering so much
> dependency it has on it and I am not good at ACPI :(
> 
> 631e8ac powernow-k8: Add a kconfig dependency on acpi-cpufreq

It looks like the relevant routine doesn't do anything ACPI-related
at all.  It accesses MSRs directly  it seems.
Comment 75 Mauro Santos 2014-06-14 20:05:26 UTC
(In reply to Viresh Kumar from comment #71)

> > I'd say my initial bisect might be very close to the source of the problem,
> 
> You meant the new hack ?

What I meant was, when I was doing the bisect (the one I attached with the initial bug report) I was testing if the machine suspended and resumed well. At the time I had no idea of what the problem was, but now I see that it was hanging when trying to bring the cpu online (during resume).

I've tested your latest changes and now the machine hangs, I'll be attaching the logs from this and last test.
Comment 76 Mauro Santos 2014-06-14 20:08:06 UTC
Created attachment 139721 [details]
From Viresh's tree, commit 0f92fd8

This one does not hang when using stop_transition.
Comment 77 Mauro Santos 2014-06-14 20:09:41 UTC
Created attachment 139731 [details]
From Viresh's tree, commit 3610fd8

This one hangs even when using stop_transition.
Comment 78 Mauro Santos 2014-06-14 20:11:57 UTC
Created attachment 139741 [details]
Log from commit 5bdfff96c69a

This one hangs, might not have anything interesting but there seems to be some "fireworks" going on.
Comment 79 Mauro Santos 2014-06-14 20:13:07 UTC
Created attachment 139751 [details]
Log from commit fada94ee64e6

Same as the one from commit 5bdfff96c69a.
Comment 80 Mauro Santos 2014-06-14 20:20:25 UTC
Created attachment 139761 [details]
Hang test script

I've made a small script to automate the test and reduce the chance of me making some error.

I'm attaching this as a reference in case later I see some debug output on screen that does not make it across the serial cable to the text log. Obviously my echos with information do not get across the cable but help in checking where things hang.

I forgot to mention before, the test with commit 3610fd8 from Viresh's tree hangs at "Changing frequency to 1.8GHz" I don't see more output, the log matches what I see on screen.
Comment 81 Viresh Kumar 2014-06-15 05:00:49 UTC
(In reply to Mauro from comment #77)
> Created attachment 139731 [details]
> From Viresh's tree, commit 3610fd8
> 
> This one hangs even when using stop_transition.

Oh, this went against my expectation. We crashed before changing freq :(

So this is what we have verified:
- transition_frequency_fidvid() gets called and after this only we do something that crashes system.
- And that isn't a call to transition_fid_vid() as we are crashing before that only.

So, you need to move around:

	/* HACK THIS */
	if (stop_transition)
		return 0;


in transition_frequency_fidvid() and find which is the last line before which it is working. If that's a function call, go into that function and check again.

Give us the last instruction that causes it.. 

As an sample of what you need to do, check my latest branch. But you need to do some more changes manually too.

Lets crack it soon..
Comment 82 Mauro Santos 2014-06-15 10:30:53 UTC
I have managed to narrow this down to a function, specifically:
cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);

Now this is from cpufreq and not powenow-k8, I haven't tried doing anything inside cpufreq.c because I don't know if it would work just (moving the if-return) or if that would be a mistake.

I'm attaching logs of stop_transition before and after cpufreq_notify_transition() as they seem to be slightly different.

When the machine hangs I only see my message about changing freq to 1.8GHz, there is no more output to screen before the machine hangs.
Comment 83 Mauro Santos 2014-06-15 10:31:36 UTC
Created attachment 139791 [details]
stop_transition before cpufreq_notify_transition()

All is good, machine doesn't hang.
Comment 84 Mauro Santos 2014-06-15 10:32:56 UTC
Created attachment 139801 [details]
stop_transition after cpufreq_notify_transition()

Machine hangs, no debug output when trying to change freq.
Comment 85 Rafael J. Wysocki 2014-06-15 23:13:05 UTC
It looks like one of the notifiers crashes in srcu_notifier_call_chain() called from __cpufreq_notify_transition().

We need to figure out which one.
Comment 86 Viresh Kumar 2014-06-16 05:43:01 UTC
(In reply to Mauro from comment #82)
> I have managed to narrow this down to a function, specifically:
> cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);

I guessed this yesterday, but thought this might not be the problem as there aren't too many drivers notified for these transitions. And those rarely get modified.

As Rafael also mentioned, the problem is obviously with some of the recipients of these notifiers failing for the POSTCHANGE notification.

In case you don't know how this works, this is how:

Some kernel drivers can register for notifiers, so that before and after every frequency tranistions notifiers would be called and those drivers can act accordingly. One of them is broken here.

List of all files which are registering for these notifiers.

arch/arm/kernel/smp.c
arch/arm/kernel/smp_twd.c
arch/arm/mach-pxa/viper.c
arch/arm/mach-s3c24xx/mach-osiris-dvs.c
arch/blackfin/mach-common/dpmc.c
arch/cris/arch-v32/kernel/time.c
arch/powerpc/oprofile/op_model_cell.c
arch/sparc/kernel/time_64.c
arch/x86/kernel/tsc.c
arch/x86/kvm/x86.c
drivers/acpi/processor_perflib.c
drivers/acpi/processor_thermal.c
drivers/cpufreq/cpufreq.c
drivers/cpufreq/cpufreq_governor.c
drivers/cpufreq/cpufreq_stats.c
drivers/cpufreq/cris-artpec3-cpufreq.c
drivers/cpufreq/cris-etraxfs-cpufreq.c
drivers/cpufreq/loongson2_cpufreq.c
drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
drivers/gpu/drm/tilcdc/tilcdc_drv.c
drivers/i2c/busses/i2c-davinci.c
drivers/i2c/busses/i2c-s3c2410.c
drivers/macintosh/windfarm_cpufreq_clamp.c
drivers/mmc/host/davinci_mmc.c
drivers/mmc/host/s3cmci.c
drivers/mtd/nand/s3c2410.c
drivers/pcmcia/soc_common.c
drivers/staging/tidspbridge/rmgr/drv_interface.c
drivers/thermal/cpu_cooling.c
drivers/tty/serial/samsung.c
drivers/tty/serial/sh-sci.c
drivers/video/da8xx-fb.c
drivers/video/nuc900fb.c
drivers/video/pxafb.c
drivers/video/s3c2410fb.c
drivers/video/sa1100fb.c
drivers/watchdog/s3c2410_wdt.c
include/linux/cpufreq.h


Those that look relevant to me for your configuration:

arch/x86/kernel/tsc.c
arch/x86/kvm/x86.c
drivers/acpi/processor_perflib.c
drivers/acpi/processor_thermal.c

Only first two had significant changes and there are some big changes in 3.14 around these. I am not good at that code and we need to get few more people into this discussions. Though Rafael/Srivatsa might have seen these files..

But before that we must narrow down on this.. Believe me, we are very very close :)

I have created two more branches: powernow-k8-post1 and powernow-k8-post2 (Compile tested only for x86_64_defconfig) ..

All debug messages/stop_transition/etc are dropped as we don't need them anymore and so drop them from your script as well..

powernow-k8-post1 has reverted all changes between 3.13.8 and 3.14 for arch/x86/kernel/tsc.c file and powernow-k8-post2 have done it for arch/x86/kvm/x86.c..

Only one of these two will work for you. Check which one, I would put my 10 cents on *-post1 branch :)

Once, any one of the branches work for you, try doing a git bisect for that branch and 3.14 (Only few revert patches actually, should be easy).. 

And you can tell me which revert patch has *fixed* the issue for you and that's the culprit.

Then we can follow on LKML, will put you in CC, on the original commit threads..

Thanks for your dedication towards fixing this up, it was really a nightmare to get down to this :)

--
viresh
Comment 87 Mauro Santos 2014-06-16 09:36:11 UTC
(In reply to Viresh Kumar from comment #86)
> Only one of these two will work for you. Check which one, I would put my 10
> cents on *-post1 branch :)
 
You should have bet more, *-post2 hangs, *-post1 works.

> Once, any one of the branches work for you, try doing a git bisect for that
> branch and 3.14 (Only few revert patches actually, should be easy).. 
> 
> And you can tell me which revert patch has *fixed* the issue for you and
> that's the culprit.

I had to skip the last step of the bisect as it fails to build, I'm attaching the bisect log.

It fails with:
arch/x86/kernel/cpu/perf_event.c: In function ‘arch_perf_update_userpage’:
arch/x86/kernel/cpu/perf_event.c:1903:23: error: ‘CYC2NS_SCALE_FACTOR’ undeclared (first use in this function)
  userpg->time_shift = CYC2NS_SCALE_FACTOR;
                       ^
arch/x86/kernel/cpu/perf_event.c:1903:23: note: each undeclared identifier is reported only once for each function it appears in
scripts/Makefile.build:308: recipe for target 'arch/x86/kernel/cpu/perf_event.o' failed
make[3]: *** [arch/x86/kernel/cpu/perf_event.o] Error 1
make[3]: *** Waiting for unfinished jobs....
  CC      kernel/printk/printk.o
scripts/Makefile.build:455: recipe for target 'arch/x86/kernel/cpu' failed
make[2]: *** [arch/x86/kernel/cpu] Error 2
scripts/Makefile.build:455: recipe for target 'arch/x86/kernel' failed
make[1]: *** [arch/x86/kernel] Error 2
Makefile:841: recipe for target 'arch/x86' failed
make: *** [arch/x86] Error 2

> 
> Then we can follow on LKML, will put you in CC, on the original commit
> threads..
> 
> Thanks for your dedication towards fixing this up, it was really a nightmare
> to get down to this :)

I'm just scratching my itch ;) I also have to thank you, and everyone else that looked into this, it looks like this is not affecting many people and yet you took the time and effort to track this down.
Comment 88 Mauro Santos 2014-06-16 09:36:39 UTC
Created attachment 139901 [details]
powernow-k8-post1 bisect log
Comment 89 Viresh Kumar 2014-06-16 09:51:07 UTC
(In reply to Mauro from comment #87)
> (In reply to Viresh Kumar from comment #86)
> > Only one of these two will work for you. Check which one, I would put my 10
> > cents on *-post1 branch :)
>  
> You should have bet more, *-post2 hangs, *-post1 works.

Thanks for your great results :)

Last round of debugging I want you to do :)

One of these commits is the culprit:

57c67da sched/clock, x86: Move some cyc2ns() code around
20d1c86 sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs

Can you test these commits and the ones just before them? And we will know which commit broke this for you..

As both are written by Peter Zijlstra, we can ping him back and lets see what he says.
Comment 90 Mauro Santos 2014-06-16 12:11:20 UTC
I've tried (from your tree, but I suppose that doesn't matter):
20d1c86 sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
57c67da sched/clock, x86: Move some cyc2ns() code around
5dd12c21 sched/clock, x86: Use mul_u64_u32_shr() for native_sched_clock()
62b94a0 sched/preempt: Take away preempt_enable_no_resched() from modules

and all of them fail to compile. Looking at the bisect log I attached when submitting the bug report, I see that all of them are the same I had to skip because they would not compile.

They all fail with:

CC      arch/x86/kernel/asm-offsets.s
In file included from include/linux/spinlock.h:271:0,
                 from include/linux/mmzone.h:7,
                 from include/linux/gfp.h:4,
                 from include/linux/slab.h:14,
                 from include/linux/crypto.h:24,
                 from arch/x86/kernel/asm-offsets.c:8:
include/linux/spinlock_api_smp.h: In function ‘__raw_spin_lock_bh’:
include/linux/spinlock_api_smp.h:134:2: error: implicit declaration of function ‘__local_bh_disable_ip’ [-Werror=implicit-function-declaration]
  __local_bh_disable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
  ^
include/linux/spinlock_api_smp.h:134:34: error: ‘SOFTIRQ_LOCK_OFFSET’ undeclared (first use in this function)
  __local_bh_disable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
                                  ^
include/linux/spinlock_api_smp.h:134:34: note: each undeclared identifier is reported only once for each function it appears in
include/linux/spinlock_api_smp.h: In function ‘__raw_spin_unlock_bh’:
include/linux/spinlock_api_smp.h:176:2: error: implicit declaration of function ‘__local_bh_enable_ip’ [-Werror=implicit-function-declaration]
  __local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
  ^
include/linux/spinlock_api_smp.h:176:33: error: ‘SOFTIRQ_LOCK_OFFSET’ undeclared (first use in this function)
  __local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
                                 ^
include/linux/spinlock_api_smp.h: In function ‘__raw_spin_trylock_bh’:
include/linux/spinlock_api_smp.h:181:34: error: ‘SOFTIRQ_LOCK_OFFSET’ undeclared (first use in this function)
  __local_bh_disable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
                                  ^
In file included from include/linux/spinlock_api_smp.h:190:0,
                 from include/linux/spinlock.h:271,
                 from include/linux/mmzone.h:7,
                 from include/linux/gfp.h:4,
                 from include/linux/slab.h:14,
                 from include/linux/crypto.h:24,
                 from arch/x86/kernel/asm-offsets.c:8:
include/linux/rwlock_api_smp.h: In function ‘__raw_read_lock_bh’:
include/linux/rwlock_api_smp.h:175:34: error: ‘SOFTIRQ_LOCK_OFFSET’ undeclared (first use in this function)
  __local_bh_disable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
                                  ^
include/linux/rwlock_api_smp.h: In function ‘__raw_write_lock_bh’:
include/linux/rwlock_api_smp.h:202:34: error: ‘SOFTIRQ_LOCK_OFFSET’ undeclared (first use in this function)
  __local_bh_disable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
                                  ^
include/linux/rwlock_api_smp.h: In function ‘__raw_read_unlock_bh’:
include/linux/rwlock_api_smp.h:251:33: error: ‘SOFTIRQ_LOCK_OFFSET’ undeclared (first use in this function)
  __local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
                                 ^
include/linux/rwlock_api_smp.h: In function ‘__raw_write_unlock_bh’:
include/linux/rwlock_api_smp.h:275:33: error: ‘SOFTIRQ_LOCK_OFFSET’ undeclared (first use in this function)
  __local_bh_enable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
                                 ^
cc1: some warnings being treated as errors
/home/msantos/build/temp/cpufreq_revert/src/linux/./Kbuild:81: recipe for target 'arch/x86/kernel/asm-offsets.s' failed
make[1]: *** [arch/x86/kernel/asm-offsets.s] Error 1
Makefile:859: recipe for target 'prepare0' failed
make: *** [prepare0] Error 2
Comment 91 Srivatsa S. Bhat 2014-06-16 12:47:40 UTC
(In reply to Mauro from comment #87)
> (In reply to Viresh Kumar from comment #86)
> > Only one of these two will work for you. Check which one, I would put my 10
> > cents on *-post1 branch :)
>  
> You should have bet more, *-post2 hangs, *-post1 works.
> 

Very impressive debugging, both of you! :-)

Regards,
Srivatsa S. Bhat
Comment 92 Srivatsa S. Bhat 2014-06-16 12:51:21 UTC
(In reply to Mauro from comment #90)
> I've tried (from your tree, but I suppose that doesn't matter):
> 20d1c86 sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
> 57c67da sched/clock, x86: Move some cyc2ns() code around
> 5dd12c21 sched/clock, x86: Use mul_u64_u32_shr() for native_sched_clock()
> 62b94a0 sched/preempt: Take away preempt_enable_no_resched() from modules
> 
> and all of them fail to compile. Looking at the bisect log I attached when
> submitting the bug report, I see that all of them are the same I had to skip
> because they would not compile.
> 

Can we use the same "1 commit before, 1 commit after" method here?
That is, for each commit that you want to test, try one commit before it and see if it passes, and one commit after it and see if it fails. That should give us a good indication of whether the commit in question is problematic or not. Of course you'll need to use the mainline tree for this, which doesn't include any of the reverts.

Regards,
Srivatsa S. Bhat
Comment 93 Mauro Santos 2014-06-16 13:15:24 UTC
(In reply to Srivatsa S. Bhat from comment #92)

> Can we use the same "1 commit before, 1 commit after" method here?
> That is, for each commit that you want to test, try one commit before it and
> see if it passes, and one commit after it and see if it fails. That should
> give us a good indication of whether the commit in question is problematic
> or not. Of course you'll need to use the mainline tree for this, which
> doesn't include any of the reverts.

Well, the bisect log I attached when I opened the bug report was done with mainline, the conclusion was:

# only skipped commits left to test
# possible first bad commit: [0bd3a173d711857fc9f583eb5825386cc08f3948] sched/preempt, locking: Rework local_bh_{dis,en}able()
# possible first bad commit: [10b033d434c25a6c9e0f4f4dc2418af1b8236c63] sched/clock, x86: Avoid a runtime condition in native_sched_clock()
# possible first bad commit: [6577e42a3e1633afe762f47da9e00061ee4b9a5e] sched/clock: Fix up clear_sched_clock_stable()
# possible first bad commit: [35af99e646c7f7ea46dc2977601e9e71a51dadd5] sched/clock, x86: Use a static_key for sched_clock_stable
# possible first bad commit: [ef08f0fff87630d4f67ceb09514d8b444df833f8] sched/clock: Remove local_irq_disable() from the clocks
# possible first bad commit: [20d1c86a57762f0a33a78988e3fc8818316badd4] sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
# possible first bad commit: [57c67da274f3fab38e08d2c9edf08b89e1d9c71d] sched/clock, x86: Move some cyc2ns() code around
# possible first bad commit: [5dd12c2152743747ca9f50ef80281e54cc416dc0] sched/clock, x86: Use mul_u64_u32_shr() for native_sched_clock()
# possible first bad commit: [62b94a08da1bae9d187d49dfcd6665af393750f8] sched/preempt: Take away preempt_enable_no_resched() from modules
# possible first bad commit: [9ea4c380066fbe23fe0da7f4abfabc444f2467f4] locking: Optimize lock_bh functions

Unless I'm seeing this wrong, the commits Viresh wanted me to try are included there, as well as several commits before and after, the result I'm seeing now is the same as the one I saw before, those commits fail to compile.
Comment 94 Srivatsa S. Bhat 2014-06-16 13:27:31 UTC
(In reply to Mauro from comment #93)
> (In reply to Srivatsa S. Bhat from comment #92)
> 
> > Can we use the same "1 commit before, 1 commit after" method here?
> > That is, for each commit that you want to test, try one commit before it
> and
> > see if it passes, and one commit after it and see if it fails. That should
> > give us a good indication of whether the commit in question is problematic
> > or not. Of course you'll need to use the mainline tree for this, which
> > doesn't include any of the reverts.
> 
> Well, the bisect log I attached when I opened the bug report was done with
> mainline, the conclusion was:
> 
> # only skipped commits left to test
> # possible first bad commit: [0bd3a173d711857fc9f583eb5825386cc08f3948]
> sched/preempt, locking: Rework local_bh_{dis,en}able()
> # possible first bad commit: [10b033d434c25a6c9e0f4f4dc2418af1b8236c63]
> sched/clock, x86: Avoid a runtime condition in native_sched_clock()
> # possible first bad commit: [6577e42a3e1633afe762f47da9e00061ee4b9a5e]
> sched/clock: Fix up clear_sched_clock_stable()
> # possible first bad commit: [35af99e646c7f7ea46dc2977601e9e71a51dadd5]
> sched/clock, x86: Use a static_key for sched_clock_stable
> # possible first bad commit: [ef08f0fff87630d4f67ceb09514d8b444df833f8]
> sched/clock: Remove local_irq_disable() from the clocks
> # possible first bad commit: [20d1c86a57762f0a33a78988e3fc8818316badd4]
> sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
> # possible first bad commit: [57c67da274f3fab38e08d2c9edf08b89e1d9c71d]
> sched/clock, x86: Move some cyc2ns() code around
> # possible first bad commit: [5dd12c2152743747ca9f50ef80281e54cc416dc0]
> sched/clock, x86: Use mul_u64_u32_shr() for native_sched_clock()
> # possible first bad commit: [62b94a08da1bae9d187d49dfcd6665af393750f8]
> sched/preempt: Take away preempt_enable_no_resched() from modules
> # possible first bad commit: [9ea4c380066fbe23fe0da7f4abfabc444f2467f4]
> locking: Optimize lock_bh functions
> 
> Unless I'm seeing this wrong, the commits Viresh wanted me to try are
> included there, as well as several commits before and after, the result I'm
> seeing now is the same as the one I saw before, those commits fail to
> compile.

Hmm, right. Ok, then I guess we'll just have to try and fix the build errors in these commits and try your tests on them. Viresh might already be working on doing that, but if not, I'll try to fix them for you.

Regards,
Srivatsa S. Bhat
Comment 95 Viresh Kumar 2014-06-16 15:31:17 UTC
(In reply to Srivatsa S. Bhat from comment #94)
> Hmm, right. Ok, then I guess we'll just have to try and fix the build errors
> in these commits and try your tests on them. Viresh might already be working
> on doing that, but if not, I'll try to fix them for you.

I haven't done anything yet and don't know how to get these fixed.. I have gone through the series which played with this stuff and looks there were way too many issues reported by multiple people.

https://lkml.org/lkml/2014/1/21/512

Anyway Mauro has already verified that reverting these two commits fixes our issues over 3.14..

> 20d1c86 sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
> 57c67da sched/clock, x86: Move some cyc2ns() code around

So, can we get peter in the discussion here now ? We can tell him that things aren't bisectable and so don't know which of these has broken things..

--
viresh
Comment 96 Srivatsa S. Bhat 2014-06-16 15:41:06 UTC
(In reply to Viresh Kumar from comment #95)
> (In reply to Srivatsa S. Bhat from comment #94)
> > Hmm, right. Ok, then I guess we'll just have to try and fix the build
> errors
> > in these commits and try your tests on them. Viresh might already be
> working
> > on doing that, but if not, I'll try to fix them for you.
> 
> I haven't done anything yet and don't know how to get these fixed.. I have
> gone through the series which played with this stuff and looks there were
> way too many issues reported by multiple people.
> 
> https://lkml.org/lkml/2014/1/21/512
> 
> Anyway Mauro has already verified that reverting these two commits fixes our
> issues over 3.14..
> 
> > 20d1c86 sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable
> IRQs
> > 57c67da sched/clock, x86: Move some cyc2ns() code around
> 
> So, can we get peter in the discussion here now ? We can tell him that
> things aren't bisectable and so don't know which of these has broken things..
> 

Sure, that sounds better. No point trying to debug this further when he can probably tell us precisely what is going wrong, and fix it immediately.

Regards,
Srivatsa S. Bhat
Comment 97 Viresh Kumar 2014-06-17 06:23:11 UTC
Hi Mauro,

I think we can do one more/final test to make sure we have hit the right file, before Peter shouts at us:)

Though I am pretty sure that our analysis points at what I have mentioned in my mail:

https://lkml.org/lkml/2014/6/16/477

Can you please modify: arch/x86/kernel/tsc.c to simply return 0 from beginning of this function: time_cpufreq_notifier() ? This should fix your issues, if our analysis was right.

This function is the one getting called from our POSTCHANGE notifications ..

These changes must be done over 3.14 without any other modifications..
Comment 98 Mauro Santos 2014-06-17 07:31:05 UTC
(In reply to Viresh Kumar from comment #97)
> Can you please modify: arch/x86/kernel/tsc.c to simply return 0 from
> beginning of this function: time_cpufreq_notifier() ? This should fix your
> issues, if our analysis was right.
> 
> This function is the one getting called from our POSTCHANGE notifications ..
> 
> These changes must be done over 3.14 without any other modifications..

We are in the clear, we're not getting shouted at :)

With no patches or other changes (besides adding return 0;) and using the same .config and test script I've been using so far it seems to work fine, the machine does not hang.
Comment 99 Viresh Kumar 2014-06-17 08:11:47 UTC
(In reply to Mauro from comment #98)
> We are in the clear, we're not getting shouted at :)
> 
> With no patches or other changes (besides adding return 0;) and using the
> same .config and test script I've been using so far it seems to work fine,
> the machine does not hang.

Ahh, nice. Thanks for your tests.
Comment 100 Rafael J. Wysocki 2014-06-17 11:40:12 UTC
I suspect that what fails is the

set_cyc2ns_scale(tsc_khz, freq->cpu);

at the end of time_cpufreq_notifier(), so I wonder if commenting this single line out (instead of making the whole function return 0 unconditionally) make the
problem go away too?
Comment 101 Mauro Santos 2014-06-17 12:24:43 UTC
(In reply to Rafael J. Wysocki from comment #100)
> I suspect that what fails is the
> 
> set_cyc2ns_scale(tsc_khz, freq->cpu);
> 
> at the end of time_cpufreq_notifier(), so I wonder if commenting this single
> line out (instead of making the whole function return 0 unconditionally)
> make the
> problem go away too?

That also seems to work, the machine stays responsive.
Comment 102 Rafael J. Wysocki 2014-06-17 19:41:55 UTC
So my bet is that the while () loop in cyc2ns_write_begin() becomes infinite in the CPU online case on this machine.
Comment 103 Viresh Kumar 2014-06-18 07:21:56 UTC
(In reply to Rafael J. Wysocki from comment #102)
> So my bet is that the while () loop in cyc2ns_write_begin() becomes infinite
> in the CPU online case on this machine.

I don't understand this code well and so wouldn't bet this time :)
But in that case, why isn't the other CPU responding? Only one should be busy in the loop, right ?
Comment 104 Rafael J. Wysocki 2014-06-18 11:04:18 UTC
Because that is a CPU online and the other CPU is waiting for this one to complete.

That's because the powernow-k8 driver does the work_on_cpu() in powernowk8_target() and the CPU that called it blocks on the flush_work() in there while the other one is running the loop in cyc2ns_write_begin() in the hope to be kicked out of it.  At least that's my theory. :-)

So this looks like a quite interesting deadlock to me.
Comment 105 Viresh Kumar 2014-06-18 11:24:02 UTC
(In reply to Rafael J. Wysocki from comment #104)
> Because that is a CPU online and the other CPU is waiting for this one to
> complete.
> 
> That's because the powernow-k8 driver does the work_on_cpu() in
> powernowk8_target() and the CPU that called it blocks on the flush_work() in
> there while the other one is running the loop in cyc2ns_write_begin() in the
> hope to be kicked out of it.  At least that's my theory. :-)

Hmm, but why are you so sure that work_on_cpu() isn't executed on CPU0 ? And we are guaranteed that the work will be called on CPU0. It should atleast sometimes get called from CPU0. Maybe Mauro can confirm this after another test but I don't want to waste his time for proving our theories :)

> So this looks like a quite interesting deadlock to me.

Indeed. And it was a nightmare to solve really. It would have been much easier if one of us could have reproduced it locally. As debugging it remotely is really tough :) .. You don't want to play around code too much as somebody else is working on your behalf :)
Comment 106 Mauro Santos 2014-06-18 12:25:30 UTC
If you want me to do a few more tests to help kill this bug I'm ok with that.

The laptop didn't die of heatstroke so far, so a few more tests should be ok :)
Comment 107 Viresh Kumar 2014-06-18 12:27:26 UTC
(In reply to Mauro from comment #106)
> If you want me to do a few more tests to help kill this bug I'm ok with that.
> 
> The laptop didn't die of heatstroke so far, so a few more tests should be ok
> :)

Thanks, but I dont' have any tests in my mind which might help debugging here. Lets wait for PeterZ.
Comment 108 Peter Zijlstra 2014-06-23 11:47:02 UTC
On Wed, Jun 18, 2014 at 11:04:18AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=77201
> 
> --- Comment #104 from Rafael J. Wysocki <rjw@rjwysocki.net> ---
> Because that is a CPU online and the other CPU is waiting for this one to
> complete.
> 
> That's because the powernow-k8 driver does the work_on_cpu() in
> powernowk8_target() and the CPU that called it blocks on the flush_work() in
> there while the other one is running the loop in cyc2ns_write_begin() in the
> hope to be kicked out of it.  At least that's my theory. :-)
> 
> So this looks like a quite interesting deadlock to me.

So two things; it looks like we only need to call set_cyc2ns_scale()
(in time_cpufreq_notifier) inside the condition that actually changes
tsc_khz, not unconditionally.

Since if the tsc_khz value doesn't change, there's nothing to update.

This avoids it getting called twice for every change (PRECHANGE and
POSTCHANGE) and exhaust the struct cyc2ns data slots (2).


--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -918,9 +918,9 @@ static int time_cpufreq_notifier(struct
 		tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
 		if (!(freq->flags & CPUFREQ_CONST_LOOPS))
 			mark_tsc_unstable("cpufreq changes");
-	}
 
-	set_cyc2ns_scale(tsc_khz, freq->cpu);
+		set_cyc2ns_scale(tsc_khz, freq->cpu);
+	}
 
 	return 0;
 }


Secondly, it looks like (at least for the k8 case) we always call
time_cpufreq_notifier(.cpu == smp_processor_id()), so if the above
doesn't cure things, could you try this:


--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -127,8 +127,6 @@ static struct cyc2ns_data *cyc2ns_write_
 	if (data == c2n->head)
 		data++;
 
-	/* XXX send an IPI to @cpu in order to guarantee a read? */
-
 	/*
 	 * When we observe the tail write from cyc2ns_read_end(),
 	 * the cpu must be done with that entry and its safe
@@ -151,6 +149,15 @@ static void cyc2ns_write_end(int cpu, st
 	smp_wmb();
 
 	ACCESS_ONCE(c2n->head) = data;
+
+	if (cpu == smp_processor_id()) {
+		/*
+		 * If we're on the current CPU, force consume the data
+		 * so that we free up the data tail.
+		 */
+		struct cyc2ns_data *data = cyc2ns_read_begin();
+		cyc2ns_read_end(data);
+	}
 }
 
 /*
Comment 109 Mauro Santos 2014-06-23 17:16:47 UTC
(In reply to Peter Zijlstra from comment #108)
> On Wed, Jun 18, 2014 at 11:04:18AM +0000,
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=77201
> > 
> > --- Comment #104 from Rafael J. Wysocki <rjw@rjwysocki.net> ---
> > Because that is a CPU online and the other CPU is waiting for this one to
> > complete.
> > 
> > That's because the powernow-k8 driver does the work_on_cpu() in
> > powernowk8_target() and the CPU that called it blocks on the flush_work()
> in
> > there while the other one is running the loop in cyc2ns_write_begin() in
> the
> > hope to be kicked out of it.  At least that's my theory. :-)
> > 
> > So this looks like a quite interesting deadlock to me.
> 
> So two things; it looks like we only need to call set_cyc2ns_scale()
> (in time_cpufreq_notifier) inside the condition that actually changes
> tsc_khz, not unconditionally.
> 
> Since if the tsc_khz value doesn't change, there's nothing to update.
> 
> This avoids it getting called twice for every change (PRECHANGE and
> POSTCHANGE) and exhaust the struct cyc2ns data slots (2).
> 
> 
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -918,9 +918,9 @@ static int time_cpufreq_notifier(struct
>               tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
>               if (!(freq->flags & CPUFREQ_CONST_LOOPS))
>                       mark_tsc_unstable("cpufreq changes");
> -     }
>  
> -     set_cyc2ns_scale(tsc_khz, freq->cpu);
> +             set_cyc2ns_scale(tsc_khz, freq->cpu);
> +     }
>  
>       return 0;
>  }
> 

I have tested this patch with the userspace governor (same procedure of previous tests) and with the ondemand governor with kernels 3.14 and 3.16-rc2 and it seems to solve the problem. I couldn't get the machine to hang.

I have tried cpu offline/online, frequency changes, unloading/loading powernow-k8, suspend/hibernate.

> 
> Secondly, it looks like (at least for the k8 case) we always call
> time_cpufreq_notifier(.cpu == smp_processor_id()), so if the above
> doesn't cure things, could you try this:
> 
> 
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -127,8 +127,6 @@ static struct cyc2ns_data *cyc2ns_write_
>       if (data == c2n->head)
>               data++;
>  
> -     /* XXX send an IPI to @cpu in order to guarantee a read? */
> -
>       /*
>        * When we observe the tail write from cyc2ns_read_end(),
>        * the cpu must be done with that entry and its safe
> @@ -151,6 +149,15 @@ static void cyc2ns_write_end(int cpu, st
>       smp_wmb();
>  
>       ACCESS_ONCE(c2n->head) = data;
> +
> +     if (cpu == smp_processor_id()) {
> +             /*
> +              * If we're on the current CPU, force consume the data
> +              * so that we free up the data tail.
> +              */
> +             struct cyc2ns_data *data = cyc2ns_read_begin();
> +             cyc2ns_read_end(data);
> +     }
>  }
>  
>  /*

I have also tried this patch in addition to the previous one (both patches applied) and it doesn't seem to break anything.

Let me know if you want me to perform any other tests or if you need me to attach any logs.
Comment 110 Peter Zijlstra 2014-06-23 18:54:11 UTC
On Mon, Jun 23, 2014 at 05:16:47PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=77201
> 
> --- Comment #109 from Mauro <registosites@hotmail.com> ---
> (In reply to Peter Zijlstra from comment #108)
> > On Wed, Jun 18, 2014 at 11:04:18AM +0000,
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=77201
> > > 
> > > --- Comment #104 from Rafael J. Wysocki <rjw@rjwysocki.net> ---
> > > Because that is a CPU online and the other CPU is waiting for this one to
> > > complete.
> > > 
> > > That's because the powernow-k8 driver does the work_on_cpu() in
> > > powernowk8_target() and the CPU that called it blocks on the flush_work()
> in
> > > there while the other one is running the loop in cyc2ns_write_begin() in
> the
> > > hope to be kicked out of it.  At least that's my theory. :-)
> > > 
> > > So this looks like a quite interesting deadlock to me.
> > 
> > So two things; it looks like we only need to call set_cyc2ns_scale()
> > (in time_cpufreq_notifier) inside the condition that actually changes
> > tsc_khz, not unconditionally.
> > 
> > Since if the tsc_khz value doesn't change, there's nothing to update.
> > 
> > This avoids it getting called twice for every change (PRECHANGE and
> > POSTCHANGE) and exhaust the struct cyc2ns data slots (2).
> > 
> > 
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -918,9 +918,9 @@ static int time_cpufreq_notifier(struct
> >             tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
> >             if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> >                     mark_tsc_unstable("cpufreq changes");
> > -   }
> >  
> > -   set_cyc2ns_scale(tsc_khz, freq->cpu);
> > +           set_cyc2ns_scale(tsc_khz, freq->cpu);
> > +   }
> >  
> >     return 0;
> >  }
> > 
> 
> I have tested this patch with the userspace governor (same procedure of
> previous tests) and with the ondemand governor with kernels 3.14 and 3.16-rc2
> and it seems to solve the problem. I couldn't get the machine to hang.
> 
> I have tried cpu offline/online, frequency changes, unloading/loading
> powernow-k8, suspend/hibernate.

OK, if this one is sufficient, I'll leave it at this. I'll go write a
pretty changelog and post proper (probably tomorrow somewhere).

Thanks for testing!
Comment 111 Mauro Santos 2014-06-23 19:02:48 UTC
No problem. This was affecting me so to get this fixed I had to do my part of the work :)

Thanks everyone for your help in debugging this and for fixing it :)