Subject : Re: use after free of struct powernow_k8_data Submitter : Michal Schmidt <mschmidt@redhat.com> Date : 2009-09-24 14:51 References : http://marc.info/?l=linux-kernel&m=125380383515615&w=4 This entry is being used for tracking a regression from 2.6.30. Please don't close it until the problem is fixed in the mainline. It helps if the following commits are reverted: commit 1ff6e97f1d993dff2f9b6f4a9173687370660232 Author: Rusty Russell <rusty@rustcorp.com.au> Date: Fri Jun 12 20:55:37 2009 +0930 [CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c commit e15bc4559b397a611441a135b1f5992f07d0f436 Author: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Date: Thu Jun 11 15:26:54 2009 +0000 [CPUFREQ] powernow-k8: get drv data for correct CPU
Notify-Also : Andrew Morton <akpm@linux-foundation.org>
On Tuesday 27 October 2009, Michal Schmidt wrote: > On 10/26/2009 08:31 PM, Rafael J. Wysocki wrote: > > The following bug entry is on the current list of known regressions > > introduced between 2.6.30 and 2.6.31. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14391 > > Subject : use after free of struct powernow_k8_data > > Submitter : Michal Schmidt<mschmidt@redhat.com> > > Date : 2009-09-24 14:51 (33 days old) > > References : http://marc.info/?l=linux-kernel&m=125380383515615&w=4 > > It should remain listed. I have verified the bug is still reproducible > with current mainline from git.
I can't see how my patch would break this, but cpufreq keeps surprising me :( There's no debug prints in the ->exit function, so I have no idea what cpu that's being called for, or when. But this is suspicious, on the resume side: cpufreq-core: CPU already managed, adding link This path calls ->exit in powernow_k8.c for cpu 1. (See cpufreq_add_dev_policy) This doesn't happy on initial boot, and I'm not sure it should be happening now. Dave? Thanks, Rusty.
What I expect is going on: Naga's and Randy's correct patches together reveal a race condition. With Naga's patch possibly offlined CPUs are checked for current speed also and Rusty's patch might change the timing when the code is executed. Attached patch is supposed to fix this (hopefully).
Created attachment 23860 [details] Possibly fixes the issue -> testing required
Created attachment 23870 [details] Thomas's patch + fixed unlock Thomas, I noticed your patch had unbalanced locking in the error path, so I replaced it with the attached one. Unfortunately, it does not fix the problem. SLUB debugging still detects poison overwritten after resume.
Created attachment 23871 [details] full dmesg Full dmesg with "slub_debug=FZP cpufreq.debug=1"
I think I got it. It's not that hard... Cleaned up patch from comment #6 still fixes a race condition and should go in, but through linux-next and does not need backporting as it doesn't fix any known problem. Feel free to add it with From: Michal Schmidt <...> and add my Signed-off-by: it wouldn't be worth much without Michal's enhancements. What is strange is that Rusty's patch should be involved. Can you give next patch a try, please.
Created attachment 24139 [details] Fix use after free of struct powernow_k8_data Possibly this still needs extra care in drivers/cpufreq/cpufreq.c: int cpufreq_update_policy(unsigned int cpu) and the managing cpu should only get called with ->get()
Created attachment 24142 [details] Fix use after free of struct powernow_k8_data [v2] Thomas, yes, that was the bug! 'data' is a local variable, resetting it won't help. So here's a modified patch. It fixes the bug on my laptop. You should be credited as the author of both patches, not me. My modifications of them were trivial. Thanks. Michal
I should have added: CC: stable@kernel.org This patch is really trivial and safe and should go there. Be aware that it's not related to suspend, but to all multi core machines that make use of the "managed core" cpufreq implementation (this is when /sys/devices/system/cpu/cpu1/cpufreq is a link to ../cpu0/cpufreq) and when a a _PPC cpufreq BIOS limitation event is thrown to a managed cpu core (in your case the latter always seem to happen on suspend). Dave, can you pick these up, please. This would be: The patch from comment #6: http://bugzilla.kernel.org/attachment.cgi?id=23870 for your next branch and The patch from comment #10: http://bugzilla.kernel.org/show_bug.cgi?id=14391#c10 as a riskless regression fix which should go to mainline/stable kernels asap. Ehh, I cannot change the status, should be code available.
Reply-To: davej@redhat.com On Fri, Dec 11, 2009 at 10:08:22AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > Dave, can you pick these up, please. Can you mail them to me with the relevant Signed-off-by:'s please ? thanks. Dave
Is this issue resolved?
Yes. Eh, but the patch does not show up in linus tree and linus cpufreq branch? In cpufreq next branch: commit 557a701c16553b0b691dbb64ef30361115a80f64 Author: Thomas Renninger <trenn@suse.de> Date: Mon Dec 14 11:44:15 2009 +0100 [CPUFREQ] Fix use after free of struct powernow_k8_data Easy fix for a regression introduced in 2.6.31. In cpufreq linus branch it's not there.
Have you resubmitted? I'd like to be able to close this issue.
Dave took it already. IMO it should have ended in 2.6.33 asap (and stable), instead it seem to have ended in -next branch (accidentally?), queued for 2.6.34.
closed by: commit 557a701c16553b0b691dbb64ef30361115a80f64 Author: Thomas Renninger <trenn@suse.de> Date: Mon Dec 14 11:44:15 2009 +0100 [CPUFREQ] Fix use after free of struct powernow_k8_data Easy fix for a regression introduced in 2.6.31. On managed CPUs the cpufreq.c core will call driver->exit(cpu) on the managed cpus and powernow_k8 will free the core's data. Later driver->get(cpu) function might get called trying to read out the current freq of a managed cpu and the NULL pointer check does not work on the freed object -> better set it to NULL. ->get() is unsigned and must return 0 as invalid frequency. Reference: http://bugzilla.kernel.org/show_bug.cgi?id=14391 Signed-off-by: Thomas Renninger <trenn@suse.de> Tested-by: Michal Schmidt <mschmidt@redhat.com> CC: stable@kernel.org Signed-off-by: Dave Jones <davej@redhat.com>