Bug 218865 - NULL pointer reference if X86_AMD_PSTATE_DEFAULT_MODE=1 on a AMD CPU that supports CPPC
Summary: NULL pointer reference if X86_AMD_PSTATE_DEFAULT_MODE=1 on a AMD CPU that sup...
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: AMD Linux
: P3 normal
Assignee: linux-pm@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-21 00:15 UTC by Matthew Stapleton
Modified: 2024-05-26 20:36 UTC (History)
1 user (show)

See Also:
Kernel Version: 6.6.22
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Matthew Stapleton 2024-05-21 00:15:02 UTC
I recently deployed a custom kernel build with X86_AMD_PSTATE_DEFAULT_MODE set to 1 (Disabled) to a newer AMD system: AMD Ryzen 3 4100 that supports cppc and got a NULL pointer reference crash so I had to add amd_pstate=active on that system to get it to boot.  The same kernel boots fine on older AMD systems that don't support cppc.

The kernel is 6.6.22

This is what I think is happening when no amd_pstate parameter has been specified although I haven't tested a patch yet:
in function: amd_pstate_init the cppc_state starts as AMD_PSTATE_UNDEFINED.

On older systems that don't support CPPC, !boot_cpu_has(X86_FEATURE_CPPC) tests true so "driver load is disabled, boot with specific mode to enable this" is printed and -ENODEV is returned.

On the newer systems that support CPPC , it goes on to "ret = amd_pstate_set_driver(CONFIG_X86_AMD_PSTATE_DEFAULT_MODE);" .  Then amd_pstate_set_driver sees AMD_PSTATE_DISABLE , prints "driver is explicitly disabled" , doesn't assign a driver to current_pstate_driver, but still returns 0.  So then the switch statement in amd_pstate_init breaks and then I suspect it crashes at "current_pstate_driver->adjust_perf = amd_pstate_adjust_perf;" just below the switch statement.

If this is the case, maybe adjusting amd_pstate_set_driver to return -ENODEV if cppc_state == AMD_PSTATE_DISABLE might fix the problem.

Here is some of the crash output (manually typed from a photo):
amd_pstate: driver is explicitly disabled
BUG: kernel NULL pointer deference, address: 0000000000000050
...
? __die+0x1a/0x60
? page_fault_oops+0x17c/0x4a0
? _prb_read_valid+0x263/0x2e0
? exc_page_fault+0x33f/0x610
? prb_read_valid+0x12/0x20
? asm_exc_page_fault+0x22/0x30
? amd_pstate_init+0x90/0x260
? amd_pstate_param+0xb0/0xb0
do_one_initcall+0x82/0x2c0
kernel_init_freeable+0x1af/0x260
? rest_init+0xb0/0xb0
kernel_init+0x11/0x1b0
ret_from_fork+0x2b/0x40
? rest_init+0xb0/0xb0
ret_from_fork_asm+0x11/0x20
Comment 1 Matthew Stapleton 2024-05-21 00:41:44 UTC
Actually that fix might interfer with the use of amd_pstate_set_driver in function: amd_pstate_param so perhaps it would be better to retest cppc_state for disabled after the call to amd_pstate_set_driver(CONFIG_X86_AMD_PSTATE_DEFAULT_MODE) or maybe a test if current_pstate_driver=NULL after the switch statement might be better.
Comment 2 Artem S. Tashkinov 2024-05-21 07:18:20 UTC
Perry please take a look.
Comment 3 Perry Yuan(AMD) 2024-05-21 07:19:48 UTC
(In reply to Artem S. Tashkinov from comment #2)
> Perry please take a look.

Sure, let me check the issue on my local system.
Comment 4 Perry Yuan(AMD) 2024-05-21 07:28:05 UTC
(In reply to Matthew Stapleton from comment #0)
> I recently deployed a custom kernel build with X86_AMD_PSTATE_DEFAULT_MODE
> set to 1 (Disabled) to a newer AMD system: AMD Ryzen 3 4100 that supports
> cppc and got a NULL pointer reference crash so I had to add
> amd_pstate=active on that system to get it to boot.  The same kernel boots
> fine on older AMD systems that don't support cppc.
> 
> The kernel is 6.6.22
> 
> This is what I think is happening when no amd_pstate parameter has been
> specified although I haven't tested a patch yet:
> in function: amd_pstate_init the cppc_state starts as AMD_PSTATE_UNDEFINED.
> 
> On older systems that don't support CPPC, !boot_cpu_has(X86_FEATURE_CPPC)
> tests true so "driver load is disabled, boot with specific mode to enable
> this" is printed and -ENODEV is returned.
> 
> On the newer systems that support CPPC , it goes on to "ret =
> amd_pstate_set_driver(CONFIG_X86_AMD_PSTATE_DEFAULT_MODE);" .  Then
> amd_pstate_set_driver sees AMD_PSTATE_DISABLE , prints "driver is explicitly
> disabled" , doesn't assign a driver to current_pstate_driver, but still
> returns 0.  So then the switch statement in amd_pstate_init breaks and then
> I suspect it crashes at "current_pstate_driver->adjust_perf =
> amd_pstate_adjust_perf;" just below the switch statement.
> 
> If this is the case, maybe adjusting amd_pstate_set_driver to return -ENODEV
> if cppc_state == AMD_PSTATE_DISABLE might fix the problem.
> 
> Here is some of the crash output (manually typed from a photo):
> amd_pstate: driver is explicitly disabled
> BUG: kernel NULL pointer deference, address: 0000000000000050
> ...
> ? __die+0x1a/0x60
> ? page_fault_oops+0x17c/0x4a0
> ? _prb_read_valid+0x263/0x2e0
> ? exc_page_fault+0x33f/0x610
> ? prb_read_valid+0x12/0x20
> ? asm_exc_page_fault+0x22/0x30
> ? amd_pstate_init+0x90/0x260
> ? amd_pstate_param+0xb0/0xb0
> do_one_initcall+0x82/0x2c0
> kernel_init_freeable+0x1af/0x260
> ? rest_init+0xb0/0xb0
> kernel_init+0x11/0x1b0
> ret_from_fork+0x2b/0x40
> ? rest_init+0xb0/0xb0
> ret_from_fork_asm+0x11/0x20

Could you try the patch lately posted while I am trying to find a system to reproduce?
https://lore.kernel.org/lkml/9b31fbcdfd4e4f00c3302f45e655aa43589b224c.1715356532.git.perry.yuan@amd.com/

I have changed the mode selection code a bit, it will be great if you can help to try this on your system,  if the issue still can be reproduced, I will need to chang the patch accordingly.
thanks.

Perry.
Comment 5 Matthew Stapleton 2024-05-23 01:44:25 UTC
Thanks for the patch.

After manually applying just "[PATCH v2 10/10] cpufreq: amd-pstate: automatically load pstate driver by default" to 6.6.31, the AMD Ryzen 3 4100 system is no longer crashing at bootup with X86_AMD_PSTATE_DEFAULT_MODE set to 1 (Disabled) and when no amd_pstate option is specified.
Comment 6 Perry Yuan(AMD) 2024-05-23 16:13:53 UTC
(In reply to Matthew Stapleton from comment #5)
> Thanks for the patch.
> 
> After manually applying just "[PATCH v2 10/10] cpufreq: amd-pstate:
> automatically load pstate driver by default" to 6.6.31, the AMD Ryzen 3 4100
> system is no longer crashing at bootup with X86_AMD_PSTATE_DEFAULT_MODE set
> to 1 (Disabled) and when no amd_pstate option is specified.

Thanks for the testing!
glad to know that patch work for you.
Then we can close the ticket looks like. 


Perry

Note You need to log in before you can comment on or make changes to this bug.