Bug 216211 - LWP feature flag enabled on just half of Bulldozer/Piledriver CPU cores
Summary: LWP feature flag enabled on just half of Bulldozer/Piledriver CPU cores
Status: REOPENED
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: AMD Linux
: P1 low
Assignee: Borislav Petkov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-06 23:24 UTC by Ștefan Talpalaru
Modified: 2022-10-05 16:27 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.17.9
Subsystem:
Regression: No
Bisected commit-id:


Attachments
5.17.9 dmesg -T (130.77 KB, text/plain)
2022-07-11 11:22 UTC, Ștefan Talpalaru
Details
5.17.9 .config (145.47 KB, text/plain)
2022-07-11 11:23 UTC, Ștefan Talpalaru
Details
cpuid -r (33.80 KB, text/plain)
2022-07-11 11:23 UTC, Ștefan Talpalaru
Details
test fix (3.07 KB, patch)
2022-08-13 20:34 UTC, Borislav Petkov
Details | Diff
Test fix v2 (1.58 KB, patch)
2022-10-05 10:56 UTC, Borislav Petkov
Details | Diff

Description Ștefan Talpalaru 2022-07-06 23:24:53 UTC
LWP = Lightweight Profiling: http://developer.amd.com/wordpress/media/2012/10/43724.pdf

On an AMD FX-8320E CPU, the first CPU core in a module has LWP disabled, while the second one has it enabled. This creates problems when using GCC with "-march=native" (which enables or not "-mlwp", depending on which core it runs) and mixing the resulting LWP and non-LWP objects: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86007

Diagnostic commands:

$ grep -E '(core id|lwp)' /proc/cpuinfo

core id		: 0
core id		: 1
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
core id		: 2
core id		: 3
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
core id		: 4
core id		: 5
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
core id		: 6
core id		: 7
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

$ cpuid | grep 'lightweight profiling support'
      lightweight profiling support          = false
      lightweight profiling support          = true
      lightweight profiling support          = false
      lightweight profiling support          = true
      lightweight profiling support          = false
      lightweight profiling support          = true
      lightweight profiling support          = false
      lightweight profiling support          = true
Comment 1 Borislav Petkov 2022-07-11 11:00:05 UTC
Upload .config and full dmesg pls.

Also output of "cpuid -r" pls.

I'm assuming this is baremetal box, not a guest, etc.

Also, do you have the latest BIOS from your mobo vendor installed?

Thx.
Comment 2 Ștefan Talpalaru 2022-07-11 11:22:43 UTC
Created attachment 301388 [details]
5.17.9 dmesg -T
Comment 3 Ștefan Talpalaru 2022-07-11 11:23:09 UTC
Created attachment 301389 [details]
5.17.9 .config
Comment 4 Ștefan Talpalaru 2022-07-11 11:23:41 UTC
Created attachment 301390 [details]
cpuid -r
Comment 5 Ștefan Talpalaru 2022-07-11 11:25:19 UTC
Yes, this is bare metal. The motherboard has the latest BIOS and the CPU has the latest microcode.
Comment 6 Borislav Petkov 2022-07-12 15:05:29 UTC
   0x80000001 0x00: eax=0x00600f20 ebx=0x10000000 ecx=0x01eb3fff edx=0x2fd3fbff
   0x80000001 0x00: eax=0x00600f20 ebx=0x10000000 ecx=0x01ebbfff edx=0x2fd3fbff
   0x80000001 0x00: eax=0x00600f20 ebx=0x10000000 ecx=0x01eb3fff edx=0x2fd3fbff
   0x80000001 0x00: eax=0x00600f20 ebx=0x10000000 ecx=0x01ebbfff edx=0x2fd3fbff
   0x80000001 0x00: eax=0x00600f20 ebx=0x10000000 ecx=0x01eb3fff edx=0x2fd3fbff
   0x80000001 0x00: eax=0x00600f20 ebx=0x10000000 ecx=0x01ebbfff edx=0x2fd3fbff
   0x80000001 0x00: eax=0x00600f20 ebx=0x10000000 ecx=0x01eb3fff edx=0x2fd3fbff
   0x80000001 0x00: eax=0x00600f20 ebx=0x10000000 ecx=0x01ebbfff edx=0x2fd3fbff

well, the kernel is basically reporting what the hardware (CPUID) says...

Looking at your BIOS:

[Fri Jul  8 01:55:57 2022] DMI: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX R2.0, BIOS 2901 05/04/2016

I don't think you'll get a new one.

So unless Mario has a better idea, I don't see what we can do here.

Unless maybe you boot with clearcpuid=lwp and you don't use LWP at all. It'll fix your compilation at least.

HTH.
Comment 7 Mario Limonciello (AMD) 2022-07-12 15:17:40 UTC
I guess there is the possibility that we can flag this CPU and setup a quirk to disable lwp automatically if not all the cores agree (like what that kernel command line option would do).
Comment 8 Borislav Petkov 2022-07-12 15:25:06 UTC
I'd say "this BIOS" because I don't think the CPU is wrong here - BIOS is likely not setting up things properly or so.
Comment 9 Ștefan Talpalaru 2022-07-12 16:52:40 UTC
It's the microcode.

If I disable early microcode loading by disabling CONFIG_MICROCODE_AMD, thus no longer updating 0x6000822 to 0x06000852, LWP is present on all cores.

There's an intriguing piece of info in this 2019 Xen commit message - https://patchwork.kernel.org/project/xen-devel/patch/1558347216-19179-1-git-send-email-andrew.cooper3@citrix.com/ :

"LWP was dropped from Fam15/16 CPUs when IBPB for Spectre v2 was introduced in microcode, owing to LWP not being used in practice."

Could we be looking at a botched feature disabling attempt?
Comment 10 Ștefan Talpalaru 2022-07-12 17:03:31 UTC
> Unless maybe you boot with clearcpuid=lwp and you don't use LWP at all. It'll
> fix your compilation at least.

That would work, since I only need predictable "-march=native" output, not LWP, but adding "clearcpuid=207" to the 5.17.9 command line only disabled the feature flag in /proc/cpuinfo. Both cpuid and gcc still see it on half the cores.

Anyway, my long-term workaround is to pass "-no-lwp" to gcc, but not knowing what was going on under the hood bothered me.
Comment 11 Borislav Petkov 2022-07-13 09:40:42 UTC
(In reply to Ștefan Talpalaru from comment #10)
> That would work, since I only need predictable "-march=native" output, not
> LWP, but adding "clearcpuid=207" to the 5.17.9 command line only disabled
> the feature flag in /proc/cpuinfo.

Yeah, the feature-name thing is in 5.19.

> Both cpuid and gcc still see it on half the cores.

Yeah, this is correct - it won't help you if your compiler does the detection itself as the kernel cannot clear the CPUID flag. Maybe...

> Anyway, my long-term workaround is to pass "-no-lwp" to gcc, but not knowing
> what was going on under the hood bothered me.

Yeah, lemme dig more about the ucode thing.

Thx.
Comment 12 Andrew Cooper 2022-07-13 10:21:59 UTC
Can you read the ext feature mask on all cores?

modprobe msr
rdmsr -a 0xc0011005
Comment 13 Borislav Petkov 2022-07-14 08:09:36 UTC
Yap, we might be able to fix the intention of the BIOS fix through those. :-)
Comment 14 Ștefan Talpalaru 2022-07-14 08:34:43 UTC
> rdmsr -a 0xc0011005

1eb3fff2fd3fbff
1ebbfff2fd3fbff
1eb3fff2fd3fbff
1ebbfff2fd3fbff
1eb3fff2fd3fbff
1ebbfff2fd3fbff
1eb3fff2fd3fbff
1ebbfff2fd3fbff
Comment 15 Andrew Cooper 2022-07-14 08:55:53 UTC
Thanks, and now, does `wrmsr -a 0xc0011005 0x1eb3fff2fd3fbff` resolve your issue?
Comment 16 Ștefan Talpalaru 2022-07-14 09:49:33 UTC
> does `wrmsr -a 0xc0011005 0x1eb3fff2fd3fbff` resolve your issue?

Yes, it disables LWP on all cores.
Comment 17 Borislav Petkov 2022-07-14 11:20:50 UTC
Right, I guess we can do that. It probably should go into init_amd_bd() and should be in a separate function, similar to clear_rdrand_cpuid_bit().

Just in case someone else reading here wants to do it... :-)
Comment 18 Andrew Cooper 2022-07-14 22:48:22 UTC
I found a similarly impacted system down the back of the sofa and had a play.  (HP ProLiant DL385 G7 with firmware from 03/19/2012 if we're keeping score :) )

When we load microcode on CPU0, we lose the LWP feature bit (both CPUID and in the mask MSR) and gain the IBPB bit.  This is as expected.

I had a bit of a play, and I can't figure out what else actually got disabled.  All other CPUID details pertaining to LWP remain identical, you can still set %xcr0.lwp and seemingly use the LWP instructions and MSRs, although I stopped short of actually turning it on properly and checking what got written into the buffer.

Then when CPU1 comes up, we read the microcode version, see that it's up to date, and skip the ucode load.

Interestingly, while the LWP bit is still set, the IBPB bit is set too.  So clearly the CPUID data from 0x80000008 (where IBPB lives) is coming from a shared location in the compute unit, whereas CPUID data for 0x80000001 (where LWP lives) is coming from a cluster-local location.

This isn't surprising.  The ability to control some of the CPUID data is a supported feature for hypervisors, and also the reason why I never encountered this bug with Xen.  We context switch the CPUID MSRs per VM, and the heterogeneous levelling logic on boot has caused LWP to fall out of the default profiles.

Anyway, if I force load microcode on CPU1 (i.e. ignore the version check), then the LWP CPUID bit does hide itself.

I think we probably want to fix this by disregarding the version check for non-primary cluster members, because who knows what other per-CPU state the microcode (didn't) fix up.
Comment 19 Borislav Petkov 2022-08-13 20:33:50 UTC
Does the attached, only build-tested patch fix your issue?

Thx.
Comment 20 Borislav Petkov 2022-08-13 20:34:35 UTC
Created attachment 301560 [details]
test fix

Remove ucode revision checking.
Comment 21 Ștefan Talpalaru 2022-08-13 23:42:13 UTC
Yes, the patch disables LWP on all cores. Tested with a Git HEAD kernel.
Comment 22 Borislav Petkov 2022-08-14 10:24:22 UTC
Thanks for testing, lemme do a proper patch.
Comment 23 Svyatko 2022-08-16 12:03:15 UTC
ILL this problem exists only with AMD Bulldozer Family 15h microprocessor micro-architecture https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
with "Clustered Multithreading" (CMT), which is distinct from hyperthreading https://en.wikipedia.org/wiki/Hyper-threading

AMD calls this design a "Module".
1 module = 2 integer + 1 FPU cores.
Comment 24 Ștefan Talpalaru 2022-08-16 12:09:15 UTC
> only with AMD Bulldozer Family

Bulldozer, Piledriver, Steamroller and Excavator are all affected.
Comment 25 Svyatko 2022-08-16 12:25:49 UTC
(In reply to Ștefan Talpalaru from comment #24)
> > only with AMD Bulldozer Family
> 
> Bulldozer, Piledriver, Steamroller and Excavator are all affected.

Yes, whole AMD "Heavy equipment" family (Bulldozer-based).

Maybe there'll be some benefit for multi-kernel hyper-threaded AMD Zen-based CPUs in using previous code path?
Comment 26 Borislav Petkov 2022-08-16 12:33:36 UTC
Fix on its way upstream:

df76acb227a0 ("x86/microcode/AMD: Attempt applying on every logical thread")

We're done here.
Comment 27 Borislav Petkov 2022-08-18 14:00:16 UTC
Turns out this is not as easy - I've retracted this fix as it is a bit lacking in some situations. A better solution is forthcoming.

Thx.
Comment 28 Borislav Petkov 2022-10-05 10:55:29 UTC
Ok, I have had some time finally to do a second version of the fix. Pls give it a try.

Thx.
Comment 29 Borislav Petkov 2022-10-05 10:56:16 UTC
Created attachment 302939 [details]
Test fix v2

Second version of the fix, just early patching.
Comment 30 Ștefan Talpalaru 2022-10-05 16:06:44 UTC
Second patch tested and working as advertised. LWP is now disabled on all cores.
Comment 31 Borislav Petkov 2022-10-05 16:27:28 UTC
Thanks, I'll add your Tested-by and queue it along with a couple more fixes.

Note You need to log in before you can comment on or make changes to this bug.