Bug 214329 - Kernel NULL pointer dereference after 5cbba60596b1f32f637190ca9ed5b1acdadb852c
Summary: Kernel NULL pointer dereference after 5cbba60596b1f32f637190ca9ed5b1acdadb852c
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_pstate (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Srinivas Pandruvada
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-06 11:28 UTC by Pablo Mendez Hernandez
Modified: 2022-06-21 09:15 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.15.0-0.rc0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
First part of screenshot (627.21 KB, image/jpeg)
2021-09-06 11:34 UTC, Pablo Mendez Hernandez
Details
Second part of screenshot (568.60 KB, image/jpeg)
2021-09-06 11:35 UTC, Pablo Mendez Hernandez
Details
Third part of screenshot (477.88 KB, image/jpeg)
2021-09-06 11:35 UTC, Pablo Mendez Hernandez
Details
Fourth part of screenshot (609.54 KB, image/jpeg)
2021-09-06 11:36 UTC, Pablo Mendez Hernandez
Details
Patch for 5.16 kernel (8.26 KB, application/mbox)
2021-09-23 20:00 UTC, Srinivas Pandruvada
Details

Description Pablo Mendez Hernandez 2021-09-06 11:28:12 UTC
(Pasted from OCR scan)

? _rdmsr_on_cpu+0x40/0x40
wrmsrl_on_cpu+0x3f/0x50
intel_pstate_init_cpu+0x13c/0x5e0
? _raw_spin_unlock_irqrestore+0x37/0x40
? __debug_object_init+0x12f/0x380
? lockdep_init_map_type+0x51/0x230
intel_pstate_cpu_init+0x11/0x60
cpufreq_online+0x3ef/0xa40
cpufreq_add_dev+0x82/0xa0
subsys_interface_register+0x131/0x150
? mark_held_locks+0x50/0x80
? lockdep_hardirqs_on_prepare+0xff/0x180
? _raw_write_unlock_irqrestore+0x37/0x40 cpufreq_register_driver+0x16e/0x2e0
intel_pstate_register_driver+0x42/0xb0
? intel_pstate_setup+0x10f/0x10f
intel_pstate_init+0x401/0x600
do_one_initcall+0x64/0x320
? rcu_read_lock_sched_held+0x3f/0x80
kernel_init_freeable+0x284/0x2d0
? rest_init+0x280/0x280
kernel_init+0x16/0x120
ret_from_fork+0x1f/0x30
Comment 1 Pablo Mendez Hernandez 2021-09-06 11:29:52 UTC
Hardware is Lenovo P1 Gen 3 running Fedora 34 with Rawhide kernels:

- kernel-5.15.0-0.rc0.20210831gitb91db6a0b52e.1.fc36.x86_64: OK
- kernel-5.15.0-0.rc0.20210901git9e9fb7655ed5.2.fc36.x86_64: FAILS
- kernel-5.15.0-0.rc0.20210902git4ac6d90867a4.4.fc36.x86_64: FAILS
Comment 2 Pablo Mendez Hernandez 2021-09-06 11:34:51 UTC
Created attachment 298679 [details]
First part of screenshot
Comment 3 Pablo Mendez Hernandez 2021-09-06 11:35:18 UTC
Created attachment 298681 [details]
Second part of screenshot
Comment 4 Pablo Mendez Hernandez 2021-09-06 11:35:36 UTC
Created attachment 298683 [details]
Third part of screenshot
Comment 5 Pablo Mendez Hernandez 2021-09-06 11:36:04 UTC
Created attachment 298685 [details]
Fourth part of screenshot
Comment 7 Pablo Mendez Hernandez 2021-09-06 16:35:21 UTC
It fixed the issue for me. Thanks!

If you want to amend the commit message, you can mention that the issue also existed in P1 Gen3.

You have my "Tested-by" if you consider it appropriate :)
Comment 8 Srinivas Pandruvada 2021-09-08 01:53:40 UTC
Thanks for the test. We decided to revert the commit for this release.


Please keep the bug open, I will attach the fix soon for the next release.
Comment 9 Pablo Mendez Hernandez 2021-09-08 14:43:55 UTC
Sure, no problem.
Comment 10 Srinivas Pandruvada 2021-09-23 20:00:58 UTC
Created attachment 298935 [details]
Patch for 5.16 kernel

Hi Pablo,

Please test the attached patch. I would like to add your tested-by after your tests.
Comment 11 Pablo Mendez Hernandez 2021-10-11 11:42:56 UTC
Hi,

As discussed over email, I tested the patch and it's submitted already for inclusion in rafael's tree:

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=57577c996d731ce1e5a4a488e64e6e201b360847
Comment 12 Zhang Rui 2022-06-21 09:15:58 UTC
commit 57577c996d731ce1e5a4a488e64e6e201b360847
Author:     Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
AuthorDate: Tue Sep 28 09:42:17 2021 -0700
Commit:     Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CommitDate: Tue Oct 5 15:30:44 2021 +0200

    cpufreq: intel_pstate: Process HWP Guaranteed change notification
    
    It is possible that HWP guaranteed ratio is changed in response to
    change in power and thermal limits. For example when Intel Speed Select
    performance profile is changed or there is change in TDP, hardware can
    send notifications. It is possible that the guaranteed ratio is
    increased. This creates an issue when turbo is disabled, as the old
    limits set in MSR_HWP_REQUEST are still lower and hardware will clip
    to older limits.
    
    This change enables HWP interrupt and process HWP interrupts. When
    guaranteed is changed, calls cpufreq_update_policy() so that driver
    callbacks are called to update to new HWP limits. This callback
    is called from a delayed workqueue of 10ms to avoid frequent updates.
    
    Although the scope of IA32_HWP_INTERRUPT is per logical cpu, on some
    plaforms interrupt is generated on all CPUs. This is particularly a
    problem during initialization, when the driver didn't allocated
    data for other CPUs. So this change uses a cpumask of enabled CPUs and
    process interrupts on those CPUs only.
    
    When the cpufreq offline() or suspend() callback is called, HWP interrupt
    is disabled on those CPUs and also cancels any pending work item.
    
    Spin lock is used to protect data and processing shared with interrupt
    handler. Here READ_ONCE(), WRITE_ONCE() macros are used to designate
    shared data, even though spin lock act as an optimization barrier here.
    
    Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
    Tested-by: pablomh@gmail.com
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Note You need to log in before you can comment on or make changes to this bug.