Bug 214329 - Kernel NULL pointer dereference after 5cbba60596b1f32f637190ca9ed5b1acdadb852c
Summary: Kernel NULL pointer dereference after 5cbba60596b1f32f637190ca9ed5b1acdadb852c
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_pstate (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Srinivas Pandruvada
Depends on:
Reported: 2021-09-06 11:28 UTC by Pablo Mendez Hernandez
Modified: 2022-06-21 09:15 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.15.0-0.rc0
Regression: No
Bisected commit-id:

First part of screenshot (627.21 KB, image/jpeg)
2021-09-06 11:34 UTC, Pablo Mendez Hernandez
Second part of screenshot (568.60 KB, image/jpeg)
2021-09-06 11:35 UTC, Pablo Mendez Hernandez
Third part of screenshot (477.88 KB, image/jpeg)
2021-09-06 11:35 UTC, Pablo Mendez Hernandez
Fourth part of screenshot (609.54 KB, image/jpeg)
2021-09-06 11:36 UTC, Pablo Mendez Hernandez
Patch for 5.16 kernel (8.26 KB, application/mbox)
2021-09-23 20:00 UTC, Srinivas Pandruvada

Description Pablo Mendez Hernandez 2021-09-06 11:28:12 UTC
(Pasted from OCR scan)

? _rdmsr_on_cpu+0x40/0x40
? _raw_spin_unlock_irqrestore+0x37/0x40
? __debug_object_init+0x12f/0x380
? lockdep_init_map_type+0x51/0x230
? mark_held_locks+0x50/0x80
? lockdep_hardirqs_on_prepare+0xff/0x180
? _raw_write_unlock_irqrestore+0x37/0x40 cpufreq_register_driver+0x16e/0x2e0
? intel_pstate_setup+0x10f/0x10f
? rcu_read_lock_sched_held+0x3f/0x80
? rest_init+0x280/0x280
Comment 1 Pablo Mendez Hernandez 2021-09-06 11:29:52 UTC
Hardware is Lenovo P1 Gen 3 running Fedora 34 with Rawhide kernels:

- kernel-5.15.0-0.rc0.20210831gitb91db6a0b52e.1.fc36.x86_64: OK
- kernel-5.15.0-0.rc0.20210901git9e9fb7655ed5.2.fc36.x86_64: FAILS
- kernel-5.15.0-0.rc0.20210902git4ac6d90867a4.4.fc36.x86_64: FAILS
Comment 2 Pablo Mendez Hernandez 2021-09-06 11:34:51 UTC
Created attachment 298679 [details]
First part of screenshot
Comment 3 Pablo Mendez Hernandez 2021-09-06 11:35:18 UTC
Created attachment 298681 [details]
Second part of screenshot
Comment 4 Pablo Mendez Hernandez 2021-09-06 11:35:36 UTC
Created attachment 298683 [details]
Third part of screenshot
Comment 5 Pablo Mendez Hernandez 2021-09-06 11:36:04 UTC
Created attachment 298685 [details]
Fourth part of screenshot
Comment 7 Pablo Mendez Hernandez 2021-09-06 16:35:21 UTC
It fixed the issue for me. Thanks!

If you want to amend the commit message, you can mention that the issue also existed in P1 Gen3.

You have my "Tested-by" if you consider it appropriate :)
Comment 8 Srinivas Pandruvada 2021-09-08 01:53:40 UTC
Thanks for the test. We decided to revert the commit for this release.

Please keep the bug open, I will attach the fix soon for the next release.
Comment 9 Pablo Mendez Hernandez 2021-09-08 14:43:55 UTC
Sure, no problem.
Comment 10 Srinivas Pandruvada 2021-09-23 20:00:58 UTC
Created attachment 298935 [details]
Patch for 5.16 kernel

Hi Pablo,

Please test the attached patch. I would like to add your tested-by after your tests.
Comment 11 Pablo Mendez Hernandez 2021-10-11 11:42:56 UTC

As discussed over email, I tested the patch and it's submitted already for inclusion in rafael's tree:

Comment 12 Zhang Rui 2022-06-21 09:15:58 UTC
commit 57577c996d731ce1e5a4a488e64e6e201b360847
Author:     Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
AuthorDate: Tue Sep 28 09:42:17 2021 -0700
Commit:     Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CommitDate: Tue Oct 5 15:30:44 2021 +0200

    cpufreq: intel_pstate: Process HWP Guaranteed change notification
    It is possible that HWP guaranteed ratio is changed in response to
    change in power and thermal limits. For example when Intel Speed Select
    performance profile is changed or there is change in TDP, hardware can
    send notifications. It is possible that the guaranteed ratio is
    increased. This creates an issue when turbo is disabled, as the old
    limits set in MSR_HWP_REQUEST are still lower and hardware will clip
    to older limits.
    This change enables HWP interrupt and process HWP interrupts. When
    guaranteed is changed, calls cpufreq_update_policy() so that driver
    callbacks are called to update to new HWP limits. This callback
    is called from a delayed workqueue of 10ms to avoid frequent updates.
    Although the scope of IA32_HWP_INTERRUPT is per logical cpu, on some
    plaforms interrupt is generated on all CPUs. This is particularly a
    problem during initialization, when the driver didn't allocated
    data for other CPUs. So this change uses a cpumask of enabled CPUs and
    process interrupts on those CPUs only.
    When the cpufreq offline() or suspend() callback is called, HWP interrupt
    is disabled on those CPUs and also cancels any pending work item.
    Spin lock is used to protect data and processing shared with interrupt
    handler. Here READ_ONCE(), WRITE_ONCE() macros are used to designate
    shared data, even though spin lock act as an optimization barrier here.
    Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
    Tested-by: pablomh@gmail.com
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Note You need to log in before you can comment on or make changes to this bug.