Bug 13780 - NULL pointer dereference loading powernowk8
Summary: NULL pointer dereference loading powernowk8
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Mark Langsdorf
URL:
Keywords:
Depends on:
Blocks: 12398
  Show dependency tree
 
Reported: 2009-07-15 18:00 UTC by Kurt Roeckx
Modified: 2010-01-19 22:03 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.29
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
kernel log (14 bytes, text/plain)
2009-07-15 18:00 UTC, Kurt Roeckx
Details
kernel log (4.98 KB, text/plain)
2009-07-15 18:03 UTC, Kurt Roeckx
Details
powernow-k8 module (25.18 KB, application/octet-stream)
2009-07-21 08:36 UTC, Kurt Roeckx
Details
Kernel log with 2.6.31-rc9 (22.59 KB, text/x-log)
2009-09-07 20:10 UTC, Kurt Roeckx
Details
powernow-k8 module of 2.6.31-rc9 (17.18 KB, application/octet-stream)
2009-09-07 20:12 UTC, Kurt Roeckx
Details
config file (37.17 KB, application/octet-stream)
2009-09-07 20:13 UTC, Kurt Roeckx
Details
data->powernow_table != powernow_table (2.38 KB, patch)
2009-09-09 13:01 UTC, Kurt Roeckx
Details | Diff

Description Kurt Roeckx 2009-07-15 18:00:02 UTC
Created attachment 22358 [details]
kernel log

Hi,

When the "loadcpufreq" init script loads the kernel module (powernow_k8?), I get the following error:
kernel: [   39.822807] BUG: unable to handle kernel NULL pointer dereference at
0000000000000024
kernel: [   39.824035] IP: [<ffffffffa034b094>] powernowk8_cpu_init+0x30d/0xa07
[powernow_k8]

I'm using debian's kernel linux-image-2.6.30-1-amd64 2.6.30-2

If I don't start that script, I do not get the error.
Comment 1 Kurt Roeckx 2009-07-15 18:03:59 UTC
Created attachment 22359 [details]
kernel log
Comment 2 Mark Langsdorf 2009-07-15 21:55:34 UTC
Can you make sure your BIOS is up to date?  I've seen another issue similar to this that was fixed with a BIOS update.

If your BIOS updates and the problem persists, would you please pass cpufreq.debug=7 on the kernel command line and send me the resulting kernel log?
Comment 3 Kurt Roeckx 2009-07-21 00:34:15 UTC
I'm currently not at home and so don't have full access to the machine, and don't want to remotely reboot it.  I will try to give you more details once I'm back.

The motherboard is an asus K8V.  The processor an K8 3000+ socket 754 at 2.0 GHz.  It's one of the oldest K8's.  cpuinfo says:
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 4
model name      : AMD Athlon(tm) 64 Processor 3000+
stepping        : 8


The bios is currently a version from 2004.  There seems to be 1 newer version from 2005 available, not sure if that is going to help.
Comment 4 Dave Jones 2009-07-21 00:58:07 UTC
regardless of any BIOS issue, we shouldn't be oopsing.  I don't see anything obviously wrong in powernowk8_cpu_init, but I could be overlooking something subtle.

Do you know if Debian apply any patches to powernow-k8 in their kernel ?

Could you attach your powernow-k8.ko file ? Disassembling that will show exactly where we're oopsing, and matching that up to a source line could be enlightening.
Comment 5 Kurt Roeckx 2009-07-21 08:36:00 UTC
Created attachment 22425 [details]
powernow-k8 module
Comment 6 Kurt Roeckx 2009-07-21 08:37:46 UTC
As far as I can see they did not apply a patch related to powernow or cpufreq.
Comment 7 Mark Langsdorf 2009-07-21 22:58:47 UTC
At a guess, the failure is occurring here (lines 1291).  The oops reports the error at powernow_cpu_init+0x30d, and the math to set the transition latency that precedes this is pretty clearly at powernow_cpu_init+0x2df:

        /* only run on specific CPU from here on */
        oldmask = current->cpus_allowed;
        set_cpus_allowed_ptr(current, &cpumask_of_cpu(pol->cpu));


There's a patch series already in 2.6.31 that completely redoes this logic because of potential NULL pointer issues.  Could you see if you still experience the problem on 2.6.31-rc3?  If you don't, that would confirm the problem and the solution in 2.6.31.  I'll also try to think of a way to fix it for 2.6.30.
Comment 8 Kurt Roeckx 2009-08-05 13:15:57 UTC
I'm still seeing the same problem with 2.6.31-rc5
Comment 9 David Rientjes 2009-08-17 20:18:57 UTC
Kurt, could you post the stack trace and your .config from the 2.6.31-rc5 (or rc6) try?
Comment 10 Kurt Roeckx 2009-09-07 20:10:49 UTC
Created attachment 23032 [details]
Kernel log with 2.6.31-rc9
Comment 11 Kurt Roeckx 2009-09-07 20:12:00 UTC
Created attachment 23033 [details]
powernow-k8 module of 2.6.31-rc9
Comment 12 Kurt Roeckx 2009-09-07 20:13:14 UTC
Created attachment 23034 [details]
config file
Comment 13 Rafael J. Wysocki 2009-09-07 23:12:20 UTC
Dropping from the list of recent regressions, as it is unknown whether or not the recent kernels are affected.
Comment 14 Kurt Roeckx 2009-09-08 19:50:03 UTC
I did a bisect, and this is the result:

$ git bisect good
0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f is first bad commit
commit 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f
Author: Dave Jones <davej@redhat.com>
Date:   Wed Feb 4 14:37:50 2009 -0500

    [CPUFREQ] checkpatch cleanups for powernow-k8

    This driver has so many long function names, and deep nested if's
    The remaining warnings will need some code restructuring to clean up.

    Signed-off-by: Dave Jones <davej@redhat.com>

:040000 040000 45d8e325663e62e0833cf4e1cb8b27fa4a9b3a56 affb6fab44cb22aca8a33ca0d767797831fc2bae M      arch

$ git bisect log
# bad: [3ff323f89075624b6891e7c428edb8e8a35be13c] Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
# good: [7dcca30a32aadb0520417521b0c44f42d09fe05c] Linux 2.6.22
git bisect start 'HEAD' 'v2.6.22' '--' 'arch/x86/kernel/cpu/cpufreq/powernow-k8.c'
# good: [a266d9f1253a38ec2d5655ebcd6846298b0554f4] [CPUFREQ] powernow-k8: ignore out-of-range PstateStatus value
git bisect good a266d9f1253a38ec2d5655ebcd6846298b0554f4
# bad: [df1829770db415dc5a5ed5ada3bd70176c6f0a01] [CPUFREQ] powernow-k8 cleanup msg if BIOS does not export ACPI _PSS cpufreq data
git bisect bad df1829770db415dc5a5ed5ada3bd70176c6f0a01
# bad: [0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f] [CPUFREQ] checkpatch cleanups for powernow-k8
git bisect bad 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f
# good: [835481d9bcd65720b473db6b38746a74a3964218] cpumask: convert struct cpufreq_policy to cpumask_var_t
git bisect good 835481d9bcd65720b473db6b38746a74a3964218
# good: [732553e567c2700ba5b9bccc6ec885c75779a94b] [CPUFREQ] powernow-k8: Get transition latency from ACPI _PSS table
git bisect good 732553e567c2700ba5b9bccc6ec885c75779a94b
# good: [a0abd520fd69295f4a3735e29a9448a32e101d47] cpumask: fix powernow-k8: partial revert of 2fdf66b491ac706657946442789ec644cc317e1a
git bisect good a0abd520fd69295f4a3735e29a9448a32e101d47
Comment 15 Kurt Roeckx 2009-09-08 20:09:46 UTC
Looking at the patch, I see:
@ -390,14 +413,14 @@ static int core_frequency_transition(struct powernow_k8_data *data, u32 reqfid)
 
                if (reqfid > data->currfid) {
                        if (data->currfid > LO_FID_TABLE_TOP) {
-                               if (write_new_fid(data, data->currfid + fid_interval)) {
+                               if (write_new_fid(data,
+                                               data->currfid + fid_interval))
                                        return 1;
-                               }
                        } else {
                                if (write_new_fid
-                                   (data, 2 + convert_fid_to_vco_fid(data->currfid))) {
+                                   (data,
+                                    2 + convert_fid_to_vco_fid(data->currfid)))
                                        return 1;
-                               }
                        }
                } else {
                        if (write_new_fid(data, data->currfid - fid_interval))

Notice that the else belongs to the a different if now.  I'll try a kernel with the else restored.
Comment 16 Kurt Roeckx 2009-09-08 20:45:21 UTC
The else is actually after the proper }, so that didn't make any sense.
Comment 17 Kurt Roeckx 2009-09-09 13:01:30 UTC
Created attachment 23048 [details]
data->powernow_table != powernow_table

The functions all get a struct powernow_k8_data *data and a struct cpufreq_frequency_table *powernow_table.  data also has a powernow_table member.

The old code always used the powernow_table variable to set the invalid entry, the new used data, and basicly assumed that data->powernow_table and powernow_table are the same.

The patch passes the powernow_table instead of data to invalidate_entry().
Comment 18 Kurt Roeckx 2009-09-09 13:10:29 UTC
Going over the patch I identified by the bisect, I also noticed that powernow_k8_cpu_init_acpi() now has a space_id pointing to the control register, and it ends up checking that one twice for the same value now. The old check was once for the status register and once for the control register.

This is the diff:
-       if ((data->acpi_data.control_register.space_id != ACPI_ADR_SPACE_FIXED_HARDWARE) ||
-               (data->acpi_data.status_register.space_id != ACPI_ADR_SPACE_FIXED_HARDWARE)) {
+       space_id = data->acpi_data.control_register.space_id;
+       if ((space_id != ACPI_ADR_SPACE_FIXED_HARDWARE) ||
+               (space_id != ACPI_ADR_SPACE_FIXED_HARDWARE)) {

Maybe you want a status_space_id and control_space_id.


Kurt
Comment 19 Rafael J. Wysocki 2009-09-10 20:55:50 UTC
Pasting manually due to the Bugzilla's e-mail interface not working at the moment.

On Tue, Sep 08, 2009 at 01:12:14AM +0200, Rafael J. Wysocki wrote:
> On Monday 07 September 2009, Kurt Roeckx wrote:
> > On Sun, Sep 06, 2009 at 08:11:51PM +0200, Rafael J. Wysocki wrote:
> > > This message has been generated automatically as a part of a report
> > > of regressions introduced between 2.6.29 and 2.6.30.
> > > 
> > > The following bug entry is on the current list of known regressions
> > > introduced between 2.6.29 and 2.6.30.  Please verify if it still should
> > > be listed and let me know (either way).
> > 
> > I'm not really sure what you want to know from me.  I still see
> > the issue, and I think that was clear from the bug report.  Maybe
> > you should either include more info in this mail, or point people
> > to some website.
> > 
> > I don't know if this is a regression between 2.6.29 and 2.6.30, I
> > never tried a 2.6.29 kernel because it has other issues (for which
> > I do have a patch now).  The last kernel I have without issues is
> > a 2.6.22 kernel.  I tried different kernels between 2.6.22 and
> > 2.6.30, but I can't tell you if they have the issue or not.  I
> > didn't notice, but that doesn't mean they don't have it.  If you
> > think it's important to find out which commit introduced this
> > issue for me, I can try and run a bisect.
> 
> The information you've just provided is very helpful.
> 
> I'm now going to drop this from the list of recent regressions until it's
> confirmed that kernels later than 2.6.22 are not affected.

So I did a biset, and this is the result:
$ git bisect good
0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f is first bad commit
commit 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f
Author: Dave Jones <davej@redhat.com>
Date:   Wed Feb 4 14:37:50 2009 -0500

    [CPUFREQ] checkpatch cleanups for powernow-k8

    This driver has so many long function names, and deep nested if's
    The remaining warnings will need some code restructuring to clean up.

    Signed-off-by: Dave Jones <davej@redhat.com>

:040000 040000 45d8e325663e62e0833cf4e1cb8b27fa4a9b3a56 affb6fab44cb22aca8a33ca0d767797831fc2bae M      arch

2.6.29-rc5 is good, -rc7 is bad.

First-Bad-Commit : 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f
Comment 20 Rafael J. Wysocki 2009-09-10 23:19:50 UTC
On Sunday 06 September 2009, David Rientjes wrote:
> On Sun, 6 Sep 2009, Rafael J. Wysocki wrote:
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.29 and 2.6.30.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.29 and 2.6.30.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=13780
> > Subject             : NULL pointer dereference loading powernowk8
> > Submitter   : Kurt Roeckx <kurt@roeckx.be>
> > Date                : 2009-07-15 18:00 (54 days old)
> > 
> 
> This bug has been abandoned as far as I know since there has been no 
> response from the reporter in three weeks to post the latest failure 
> (which was then 2.6.31-rc5).
Comment 21 David Rientjes 2009-09-15 23:46:35 UTC
Should be fixed by 2c701b1 in 2.6.31-rc8 and later kernels.  Please advise.
Comment 22 Kurt Roeckx 2009-09-16 06:48:25 UTC
It is not, see the patch and comment #17
Comment 23 Kurt Roeckx 2009-12-28 01:47:18 UTC
So this if fixed now in 2.6.32.

Note You need to log in before you can comment on or make changes to this bug.