Bug 14391 - use after free of struct powernow_k8_data
Summary: use after free of struct powernow_k8_data
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Mark Langsdorf
URL:
Keywords:
Depends on:
Blocks: 13615
  Show dependency tree
 
Reported: 2009-10-11 21:50 UTC by Rafael J. Wysocki
Modified: 2011-07-30 06:42 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.31
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Possibly fixes the issue -> testing required (1.55 KB, patch)
2009-11-21 22:28 UTC, Thomas Renninger
Details | Diff
Thomas's patch + fixed unlock (1.04 KB, patch)
2009-11-22 17:05 UTC, Michal Schmidt
Details | Diff
full dmesg (70.42 KB, text/plain)
2009-11-22 17:07 UTC, Michal Schmidt
Details
Fix use after free of struct powernow_k8_data (1.08 KB, patch)
2009-12-10 17:46 UTC, Thomas Renninger
Details | Diff
Fix use after free of struct powernow_k8_data [v2] (1.11 KB, patch)
2009-12-10 20:07 UTC, Michal Schmidt
Details | Diff

Description Rafael J. Wysocki 2009-10-11 21:50:41 UTC
Subject    : Re: use after free of struct powernow_k8_data
Submitter  : Michal Schmidt <mschmidt@redhat.com>
Date       : 2009-09-24 14:51
References : http://marc.info/?l=linux-kernel&m=125380383515615&w=4

This entry is being used for tracking a regression from 2.6.30.  Please don't
close it until the problem is fixed in the mainline.

It helps if the following commits are reverted:

commit 1ff6e97f1d993dff2f9b6f4a9173687370660232
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Fri Jun 12 20:55:37 2009 +0930

    [CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c

commit e15bc4559b397a611441a135b1f5992f07d0f436
Author: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Date:   Thu Jun 11 15:26:54 2009 +0000

    [CPUFREQ] powernow-k8: get drv data for correct CPU
Comment 1 Rafael J. Wysocki 2009-10-11 21:51:18 UTC
Notify-Also : Andrew Morton <akpm@linux-foundation.org>
Comment 2 Rafael J. Wysocki 2009-10-27 20:18:22 UTC
On Tuesday 27 October 2009, Michal Schmidt wrote:
> On 10/26/2009 08:31 PM, Rafael J. Wysocki wrote:
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.30 and 2.6.31.  Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=14391
> > Subject             : use after free of struct powernow_k8_data
> > Submitter   : Michal Schmidt<mschmidt@redhat.com>
> > Date                : 2009-09-24 14:51 (33 days old)
> > References  : http://marc.info/?l=linux-kernel&m=125380383515615&w=4
> 
> It should remain listed. I have verified the bug is still reproducible 
> with current mainline from git.
Comment 3 Rusty Russell 2009-11-17 06:02:58 UTC
I can't see how my patch would break this, but cpufreq keeps surprising me :(

There's no debug prints in the ->exit function, so I have no idea what cpu that's being called for, or when.

But this is suspicious, on the resume side:

   cpufreq-core: CPU already managed, adding link

This path calls ->exit in powernow_k8.c for cpu 1.  (See cpufreq_add_dev_policy)

This doesn't happy on initial boot, and I'm not sure it should be happening now.

Dave?

Thanks,
Rusty.
Comment 4 Thomas Renninger 2009-11-21 22:16:01 UTC
What I expect is going on:
Naga's and Randy's correct patches together reveal a race condition.
With Naga's patch possibly offlined CPUs are checked for current speed also and
Rusty's patch might change the timing when the code is executed.
Attached patch is supposed to fix this (hopefully).
Comment 5 Thomas Renninger 2009-11-21 22:28:28 UTC
Created attachment 23860 [details]
Possibly fixes the issue -> testing required
Comment 6 Michal Schmidt 2009-11-22 17:05:55 UTC
Created attachment 23870 [details]
Thomas's patch + fixed unlock

Thomas,
I noticed your patch had unbalanced locking in the error path, so I replaced it with the attached one. Unfortunately, it does not fix the problem. SLUB debugging still detects poison overwritten after resume.
Comment 7 Michal Schmidt 2009-11-22 17:07:37 UTC
Created attachment 23871 [details]
full dmesg

Full dmesg with "slub_debug=FZP cpufreq.debug=1"
Comment 8 Thomas Renninger 2009-12-10 17:32:28 UTC
I think I got it. It's not that hard...

Cleaned up patch from comment #6 still fixes a race condition and should go in, but through linux-next and does not need backporting as it doesn't fix any known problem.
Feel free to add it with From: Michal Schmidt <...>
and add my Signed-off-by: it wouldn't be worth much without Michal's enhancements.

What is strange is that Rusty's patch should be involved.
Can you give next patch a try, please.
Comment 9 Thomas Renninger 2009-12-10 17:46:44 UTC
Created attachment 24139 [details]
Fix use after free of struct powernow_k8_data

Possibly this still needs extra care in drivers/cpufreq/cpufreq.c:
int cpufreq_update_policy(unsigned int cpu)
and the managing cpu should only get called with ->get()
Comment 10 Michal Schmidt 2009-12-10 20:07:55 UTC
Created attachment 24142 [details]
Fix use after free of struct powernow_k8_data [v2]

Thomas,

yes, that was the bug!

'data' is a local variable, resetting it won't help. So here's a modified patch. It fixes the bug on my laptop.

You should be credited as the author of both patches, not me. My modifications of them were trivial.

Thanks.
Michal
Comment 11 Thomas Renninger 2009-12-11 10:08:20 UTC
I should have added:
CC: stable@kernel.org
This patch is really trivial and safe and should go there.

Be aware that it's not related to suspend, but to all multi core machines that make use of the "managed core" cpufreq implementation (this is when /sys/devices/system/cpu/cpu1/cpufreq is a link to ../cpu0/cpufreq) and when a a _PPC cpufreq BIOS limitation event is thrown to a managed cpu core (in your case the latter always seem to happen on suspend).

Dave, can you pick these up, please.
This would be:

The patch from comment #6:
http://bugzilla.kernel.org/attachment.cgi?id=23870
for your next branch and

The patch from comment #10:
http://bugzilla.kernel.org/show_bug.cgi?id=14391#c10
as a riskless regression fix which should go to mainline/stable kernels asap.

Ehh, I cannot change the status, should be code available.
Comment 12 Anonymous Emailer 2009-12-11 18:06:41 UTC
Reply-To: davej@redhat.com

On Fri, Dec 11, 2009 at 10:08:22AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
 
 > Dave, can you pick these up, please.

Can you mail them to me with the relevant Signed-off-by:'s please ?

thanks.

	Dave
Comment 13 Mark Langsdorf 2010-02-03 19:37:01 UTC
Is this issue resolved?
Comment 14 Thomas Renninger 2010-02-05 10:05:36 UTC
Yes.
Eh, but the patch does not show up in linus tree and linus cpufreq branch?

In cpufreq next branch:
commit 557a701c16553b0b691dbb64ef30361115a80f64
Author: Thomas Renninger <trenn@suse.de>
Date:   Mon Dec 14 11:44:15 2009 +0100

    [CPUFREQ] Fix use after free of struct powernow_k8_data

    Easy fix for a regression introduced in 2.6.31.


In cpufreq linus branch it's not there.
Comment 15 Mark Langsdorf 2010-02-08 14:54:20 UTC
Have you resubmitted?  I'd like to be able to close this issue.
Comment 16 Thomas Renninger 2010-02-08 20:04:57 UTC
Dave took it already.
IMO it should have ended in 2.6.33 asap (and stable), instead it seem to have ended in -next branch (accidentally?), queued for 2.6.34.
Comment 17 Len Brown 2011-07-30 06:42:36 UTC
closed by:


commit 557a701c16553b0b691dbb64ef30361115a80f64
Author: Thomas Renninger <trenn@suse.de>
Date:   Mon Dec 14 11:44:15 2009 +0100

    [CPUFREQ] Fix use after free of struct powernow_k8_data
    
    Easy fix for a regression introduced in 2.6.31.
    
    On managed CPUs the cpufreq.c core will call driver->exit(cpu) on the
    managed cpus and powernow_k8 will free the core's data.
    
    Later driver->get(cpu) function might get called trying to read out the
    current freq of a managed cpu and the NULL pointer check does not work on
    the freed object -> better set it to NULL.
    
    ->get() is unsigned and must return 0 as invalid frequency.
    
    Reference:
    http://bugzilla.kernel.org/show_bug.cgi?id=14391
    
    Signed-off-by: Thomas Renninger <trenn@suse.de>
    Tested-by: Michal Schmidt <mschmidt@redhat.com>
    CC: stable@kernel.org
    Signed-off-by: Dave Jones <davej@redhat.com>

Note You need to log in before you can comment on or make changes to this bug.