Created attachment 22358 [details] kernel log Hi, When the "loadcpufreq" init script loads the kernel module (powernow_k8?), I get the following error: kernel: [ 39.822807] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024 kernel: [ 39.824035] IP: [<ffffffffa034b094>] powernowk8_cpu_init+0x30d/0xa07 [powernow_k8] I'm using debian's kernel linux-image-2.6.30-1-amd64 2.6.30-2 If I don't start that script, I do not get the error.
Created attachment 22359 [details] kernel log
Can you make sure your BIOS is up to date? I've seen another issue similar to this that was fixed with a BIOS update. If your BIOS updates and the problem persists, would you please pass cpufreq.debug=7 on the kernel command line and send me the resulting kernel log?
I'm currently not at home and so don't have full access to the machine, and don't want to remotely reboot it. I will try to give you more details once I'm back. The motherboard is an asus K8V. The processor an K8 3000+ socket 754 at 2.0 GHz. It's one of the oldest K8's. cpuinfo says: processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 4 model name : AMD Athlon(tm) 64 Processor 3000+ stepping : 8 The bios is currently a version from 2004. There seems to be 1 newer version from 2005 available, not sure if that is going to help.
regardless of any BIOS issue, we shouldn't be oopsing. I don't see anything obviously wrong in powernowk8_cpu_init, but I could be overlooking something subtle. Do you know if Debian apply any patches to powernow-k8 in their kernel ? Could you attach your powernow-k8.ko file ? Disassembling that will show exactly where we're oopsing, and matching that up to a source line could be enlightening.
Created attachment 22425 [details] powernow-k8 module
As far as I can see they did not apply a patch related to powernow or cpufreq.
At a guess, the failure is occurring here (lines 1291). The oops reports the error at powernow_cpu_init+0x30d, and the math to set the transition latency that precedes this is pretty clearly at powernow_cpu_init+0x2df: /* only run on specific CPU from here on */ oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, &cpumask_of_cpu(pol->cpu)); There's a patch series already in 2.6.31 that completely redoes this logic because of potential NULL pointer issues. Could you see if you still experience the problem on 2.6.31-rc3? If you don't, that would confirm the problem and the solution in 2.6.31. I'll also try to think of a way to fix it for 2.6.30.
I'm still seeing the same problem with 2.6.31-rc5
Kurt, could you post the stack trace and your .config from the 2.6.31-rc5 (or rc6) try?
Created attachment 23032 [details] Kernel log with 2.6.31-rc9
Created attachment 23033 [details] powernow-k8 module of 2.6.31-rc9
Created attachment 23034 [details] config file
Dropping from the list of recent regressions, as it is unknown whether or not the recent kernels are affected.
I did a bisect, and this is the result: $ git bisect good 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f is first bad commit commit 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f Author: Dave Jones <davej@redhat.com> Date: Wed Feb 4 14:37:50 2009 -0500 [CPUFREQ] checkpatch cleanups for powernow-k8 This driver has so many long function names, and deep nested if's The remaining warnings will need some code restructuring to clean up. Signed-off-by: Dave Jones <davej@redhat.com> :040000 040000 45d8e325663e62e0833cf4e1cb8b27fa4a9b3a56 affb6fab44cb22aca8a33ca0d767797831fc2bae M arch $ git bisect log # bad: [3ff323f89075624b6891e7c428edb8e8a35be13c] Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 # good: [7dcca30a32aadb0520417521b0c44f42d09fe05c] Linux 2.6.22 git bisect start 'HEAD' 'v2.6.22' '--' 'arch/x86/kernel/cpu/cpufreq/powernow-k8.c' # good: [a266d9f1253a38ec2d5655ebcd6846298b0554f4] [CPUFREQ] powernow-k8: ignore out-of-range PstateStatus value git bisect good a266d9f1253a38ec2d5655ebcd6846298b0554f4 # bad: [df1829770db415dc5a5ed5ada3bd70176c6f0a01] [CPUFREQ] powernow-k8 cleanup msg if BIOS does not export ACPI _PSS cpufreq data git bisect bad df1829770db415dc5a5ed5ada3bd70176c6f0a01 # bad: [0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f] [CPUFREQ] checkpatch cleanups for powernow-k8 git bisect bad 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f # good: [835481d9bcd65720b473db6b38746a74a3964218] cpumask: convert struct cpufreq_policy to cpumask_var_t git bisect good 835481d9bcd65720b473db6b38746a74a3964218 # good: [732553e567c2700ba5b9bccc6ec885c75779a94b] [CPUFREQ] powernow-k8: Get transition latency from ACPI _PSS table git bisect good 732553e567c2700ba5b9bccc6ec885c75779a94b # good: [a0abd520fd69295f4a3735e29a9448a32e101d47] cpumask: fix powernow-k8: partial revert of 2fdf66b491ac706657946442789ec644cc317e1a git bisect good a0abd520fd69295f4a3735e29a9448a32e101d47
Looking at the patch, I see: @ -390,14 +413,14 @@ static int core_frequency_transition(struct powernow_k8_data *data, u32 reqfid) if (reqfid > data->currfid) { if (data->currfid > LO_FID_TABLE_TOP) { - if (write_new_fid(data, data->currfid + fid_interval)) { + if (write_new_fid(data, + data->currfid + fid_interval)) return 1; - } } else { if (write_new_fid - (data, 2 + convert_fid_to_vco_fid(data->currfid))) { + (data, + 2 + convert_fid_to_vco_fid(data->currfid))) return 1; - } } } else { if (write_new_fid(data, data->currfid - fid_interval)) Notice that the else belongs to the a different if now. I'll try a kernel with the else restored.
The else is actually after the proper }, so that didn't make any sense.
Created attachment 23048 [details] data->powernow_table != powernow_table The functions all get a struct powernow_k8_data *data and a struct cpufreq_frequency_table *powernow_table. data also has a powernow_table member. The old code always used the powernow_table variable to set the invalid entry, the new used data, and basicly assumed that data->powernow_table and powernow_table are the same. The patch passes the powernow_table instead of data to invalidate_entry().
Going over the patch I identified by the bisect, I also noticed that powernow_k8_cpu_init_acpi() now has a space_id pointing to the control register, and it ends up checking that one twice for the same value now. The old check was once for the status register and once for the control register. This is the diff: - if ((data->acpi_data.control_register.space_id != ACPI_ADR_SPACE_FIXED_HARDWARE) || - (data->acpi_data.status_register.space_id != ACPI_ADR_SPACE_FIXED_HARDWARE)) { + space_id = data->acpi_data.control_register.space_id; + if ((space_id != ACPI_ADR_SPACE_FIXED_HARDWARE) || + (space_id != ACPI_ADR_SPACE_FIXED_HARDWARE)) { Maybe you want a status_space_id and control_space_id. Kurt
Pasting manually due to the Bugzilla's e-mail interface not working at the moment. On Tue, Sep 08, 2009 at 01:12:14AM +0200, Rafael J. Wysocki wrote: > On Monday 07 September 2009, Kurt Roeckx wrote: > > On Sun, Sep 06, 2009 at 08:11:51PM +0200, Rafael J. Wysocki wrote: > > > This message has been generated automatically as a part of a report > > > of regressions introduced between 2.6.29 and 2.6.30. > > > > > > The following bug entry is on the current list of known regressions > > > introduced between 2.6.29 and 2.6.30. Please verify if it still should > > > be listed and let me know (either way). > > > > I'm not really sure what you want to know from me. I still see > > the issue, and I think that was clear from the bug report. Maybe > > you should either include more info in this mail, or point people > > to some website. > > > > I don't know if this is a regression between 2.6.29 and 2.6.30, I > > never tried a 2.6.29 kernel because it has other issues (for which > > I do have a patch now). The last kernel I have without issues is > > a 2.6.22 kernel. I tried different kernels between 2.6.22 and > > 2.6.30, but I can't tell you if they have the issue or not. I > > didn't notice, but that doesn't mean they don't have it. If you > > think it's important to find out which commit introduced this > > issue for me, I can try and run a bisect. > > The information you've just provided is very helpful. > > I'm now going to drop this from the list of recent regressions until it's > confirmed that kernels later than 2.6.22 are not affected. So I did a biset, and this is the result: $ git bisect good 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f is first bad commit commit 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f Author: Dave Jones <davej@redhat.com> Date: Wed Feb 4 14:37:50 2009 -0500 [CPUFREQ] checkpatch cleanups for powernow-k8 This driver has so many long function names, and deep nested if's The remaining warnings will need some code restructuring to clean up. Signed-off-by: Dave Jones <davej@redhat.com> :040000 040000 45d8e325663e62e0833cf4e1cb8b27fa4a9b3a56 affb6fab44cb22aca8a33ca0d767797831fc2bae M arch 2.6.29-rc5 is good, -rc7 is bad. First-Bad-Commit : 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f
On Sunday 06 September 2009, David Rientjes wrote: > On Sun, 6 Sep 2009, Rafael J. Wysocki wrote: > > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.29 and 2.6.30. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.29 and 2.6.30. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13780 > > Subject : NULL pointer dereference loading powernowk8 > > Submitter : Kurt Roeckx <kurt@roeckx.be> > > Date : 2009-07-15 18:00 (54 days old) > > > > This bug has been abandoned as far as I know since there has been no > response from the reporter in three weeks to post the latest failure > (which was then 2.6.31-rc5).
Should be fixed by 2c701b1 in 2.6.31-rc8 and later kernels. Please advise.
It is not, see the patch and comment #17
So this if fixed now in 2.6.32.