Bug 215841
Summary: | MSR_TSX_FORCE_ABORT not present during resume | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | kyle.d.pelton |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | ashok.raj, bp, pawan.kumar.gupta, pmenzel+bugzilla.kernel.org |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | v5.18-rc2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dell-7380 cml system call trace after s3 resume
whl_u dmesg dell-p3520 dmesg untested test patch |
Reverting the commit fixes the issue. 0x10f is MSR_TSX_FORCE_ABORT. And that's weird because if that MSR is not there, it should not be marked as valid and we should not write it on resume. Unless the microcode is playing tricks on us again. Are you updating microcode before suspending? Are you willing to run a debug patch and send me full dmesg? Thx. Created attachment 300765 [details]
whl_u dmesg
Created attachment 300766 [details]
dell-p3520 dmesg
No we are not updating the microcode. I'll run debug patches on the system if you have one. This CPU does support the MSR_TSX_FORCE_ABORT as enumerated by CPUID.7.0.EDX[13]=1. # cpuid -r -1 -l 7 CPU: 0x00000007 0x00: eax=0x00000000 ebx=0x029c67af ecx=0x00000000 edx=0xbc002e00 # rdmsr -a 0x10f 0 0 0 0 0 0 0 0 As the failing MSR is supported by the system but likely the microcode has not made it available yet at MSR restore location during resume. It is important to restore these MSRs. A possible workaround is to restore them at a later point in resume process. Working on it. Looking at his dmesg, it has, in that order: 1. the warn from the failed MSR write down the x86_acpi_suspend_lowlevel() path where msr_restore_context() happens, in __restore_processor_state(). 2. a bit later: ... [ 28.704447] PM: Calling mce_syscore_resume+0x0/0x30 [ 28.704481] PM: Calling mc_bp_resume+0x0/0x50 and this is where the microcode on the BSP is reloaded. So, if I can trust the timestamps, I guess the microcode reloading should happen a lot earlier, somewhere up in __restore_processor_state()... On Fri, Apr 15, 2022 at 11:31:37PM +0000, bugzilla-daemon@kernel.org wrote: >https://bugzilla.kernel.org/show_bug.cgi?id=215841 > >--- Comment #7 from Borislav Petkov (bp@alien8.de) --- >Looking at his dmesg, it has, in that order: > >1. the warn from the failed MSR write down the x86_acpi_suspend_lowlevel() >path >where msr_restore_context() happens, in __restore_processor_state(). > >2. a bit later: > >... >[ 28.704447] PM: Calling mce_syscore_resume+0x0/0x30 >[ 28.704481] PM: Calling mc_bp_resume+0x0/0x50 > >and this is where the microcode on the BSP is reloaded. > >So, if I can trust the timestamps, I guess the microcode reloading should >happen a lot earlier, somewhere up in __restore_processor_state()... Yes, microcode is getting loaded after __restore_processor_state(). Confirmed with a debug build that a call to mc_bp_resume() just before msr_restore_context() fixes the issue. Not sure if this is a right solution because now microcode loading will be attempted twice in resume path. Not registering mc_bp_resume() with syscore is also not an option as its needed for other use cases: 1 1188 kernel/kexec_core.c <<kernel_kexec>> syscore_resume(); 2 337 kernel/power/hibernate.c <<create_image>> syscore_resume(); 3 506 kernel/power/hibernate.c <<resume_target_kernel>> syscore_resume(); 4 608 kernel/power/hibernate.c <<hibernation_platform_enter>> syscore_resume(); 5 444 kernel/power/suspend.c <<suspend_enter>> syscore_resume(); (In reply to Pawan Gupta from comment #8) > Not sure if this is a right solution because now microcode loading will > be attempted twice in resume path. Twice is not the problem - it checks the patchlevel before applying and the second time won't do anything. > Not registering mc_bp_resume() with > syscore is also not an option as its needed for other use cases: Which means that MSR restore must depend on microcode resume because some of the MSRs are "emulated" by microcode. Which then means, the MSR restoring must happen *every time* mc_bp_resume() runs and *after* it has ran. Unfortunately, there's no way such a dependency can be expressed with the syscore functionality. At least I don't see it. Alternatively, we could do a callback similar to microcode_check() where we run msr_restore_context()... On Sat, Apr 16, 2022 at 01:21:17PM +0000, bugzilla-daemon@kernel.org wrote: >https://bugzilla.kernel.org/show_bug.cgi?id=215841 > >--- Comment #9 from Borislav Petkov (bp@alien8.de) --- >(In reply to Pawan Gupta from comment #8) >> Not sure if this is a right solution because now microcode loading will >> be attempted twice in resume path. > >Twice is not the problem - it checks the patchlevel before applying and the >second time won't do anything. > >> Not registering mc_bp_resume() with >> syscore is also not an option as its needed for other use cases: > >Which means that MSR restore must depend on microcode resume because some of >the MSRs are "emulated" by microcode. > >Which then means, the MSR restoring must happen *every time* mc_bp_resume() >runs and *after* it has ran. > >Unfortunately, there's no way such a dependency can be expressed with the >syscore functionality. At least I don't see it. Right, IIUC syscore would resume in the order entities were registered, which can't be relied on when we have to maintain strict ordering. >Alternatively, we could do a callback similar to microcode_check() where we >run >msr_restore_context()... Hmm, when CONFIG_MICROCODE=n there needs to be a fallback path for msr_restore_context(). For cases when microcode support is disabled, we would still need to restore the MSRs. I believe microcode loading should be done pretty early, and the right place to load it is during restore_processor_context(). If I am not mistaken, syscore_resume() is always by preceded by restore_processor_context(). I am more inclined towards your previous suggestion of loading the microcode in restore_processor_context(). Adding Ashok Raj and Rafael Wysocki if they have any opinion on this. I guess we could do something like the attached and untested patch. Untested because I need to bisect a triplefault on current tip/master first. ;-\ Created attachment 300768 [details]
untested test patch
update microcode after processor state has been restored
On Mon, Apr 18, 2022 at 10:25:04AM +0000, bugzilla-daemon@kernel.org wrote: >https://bugzilla.kernel.org/show_bug.cgi?id=215841 > >--- Comment #11 from Borislav Petkov (bp@alien8.de) --- >I guess we could do something like the attached and untested patch. Similar to your test patch I earlier tested below change to be working: @@ -262,6 +264,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) x86_platform.restore_sched_clock_state(); mtrr_bp_restore(); perf_restore_debug_store(); + mc_bp_resume(); msr_restore_context(ctxt); c = &cpu_data(smp_processor_id()); I will share the test build with your patch for Kyle to also verify. |
Created attachment 300764 [details] dell-7380 cml system call trace after s3 resume The Call trace is seen after resume from S3 after commit e2a1256b17b16f9b9adf1b6fea56819e7b68e463 on v5.18-rc2. It reproduced on 3 of our systems. Ive attached the log for the dell 7380 cmlu log. [ 28.703345] unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) at rIP: 0xffffffff9b2819e4 (native_write_msr+0x4/0x20) [ 28.703357] Call Trace: [ 28.703359] <TASK> [ 28.703361] ? restore_processor_state+0x255/0x2d0 [ 28.703369] x86_acpi_suspend_lowlevel+0x11f/0x170 [ 28.703374] acpi_suspend_enter+0x4f/0x1f0 [ 28.703379] suspend_devices_and_enter+0x6e0/0x7d0 [ 28.703384] pm_suspend.cold+0x35c/0x3a7 [ 28.703388] state_store+0x81/0xe0 [ 28.703392] kobj_attr_store+0x12/0x20 [ 28.703396] sysfs_kf_write+0x3f/0x50 [ 28.703399] kernfs_fop_write_iter+0x13b/0x1d0 [ 28.703403] new_sync_write+0x101/0x180 [ 28.703408] vfs_write+0x217/0x2a0 [ 28.703413] ksys_write+0x67/0xe0 [ 28.703417] __x64_sys_write+0x1a/0x20 [ 28.703421] do_syscall_64+0x3b/0x90 [ 28.703426] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 28.703429] RIP: 0033:0x7fda13c260a7 [ 28.703434] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 28.703437] RSP: 002b:00007fffa4060268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 28.703441] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fda13c260a7 [ 28.703443] RDX: 0000000000000004 RSI: 000055a41f65a570 RDI: 0000000000000004 [ 28.703444] RBP: 000055a41f65a570 R08: 0000000000000000 R09: 0000000000000004 [ 28.703446] R10: 000055a41f0cc2a6 R11: 0000000000000246 R12: 0000000000000004 [ 28.703447] R13: 000055a41f657510 R14: 00007fda13d014a0 R15: 00007fda13d008a0 [ 28.703451] </TASK> [ 28.703454] Call Trace: [ 28.703454] <TASK> [ 28.703455] ? restore_processor_state+0x255/0x2d0 [ 28.703459] x86_acpi_suspend_lowlevel+0x11f/0x170 [ 28.703463] acpi_suspend_enter+0x4f/0x1f0 [ 28.703468] suspend_devices_and_enter+0x6e0/0x7d0 [ 28.703471] pm_suspend.cold+0x35c/0x3a7 [ 28.703475] state_store+0x81/0xe0 [ 28.703477] kobj_attr_store+0x12/0x20 [ 28.703480] sysfs_kf_write+0x3f/0x50 [ 28.703483] kernfs_fop_write_iter+0x13b/0x1d0 [ 28.703486] new_sync_write+0x101/0x180 [ 28.703491] vfs_write+0x217/0x2a0 [ 28.703495] ksys_write+0x67/0xe0 [ 28.703499] __x64_sys_write+0x1a/0x20 [ 28.703503] do_syscall_64+0x3b/0x90 [ 28.703506] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 28.703509] RIP: 0033:0x7fda13c260a7 [ 28.703511] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 28.703514] RSP: 002b:00007fffa4060268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 28.703516] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fda13c260a7 [ 28.703518] RDX: 0000000000000004 RSI: 000055a41f65a570 RDI: 0000000000000004 [ 28.703519] RBP: 000055a41f65a570 R08: 0000000000000000 R09: 0000000000000004 [ 28.703520] R10: 000055a41f0cc2a6 R11: 0000000000000246 R12: 0000000000000004 [ 28.703522] R13: 000055a41f657510 R14: 00007fda13d014a0 R15: 00007fda13d008a0 [ 28.703525] </TASK>