Bug 112051

Summary: AMD Family 10h/12h Erratum #721
Product: Platform Specific/Hardware Reporter: Kyle Repinski (repinski23)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: RESOLVED INSUFFICIENT_DATA    
Severity: normal CC: bp, mirh
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: All Subsystem:
Regression: No Bisected commit-id:

Description Kyle Repinski 2016-02-07 01:56:25 UTC
I noticed an erratum (#721) for AMD Family 10h and 12h CPUs is fixable via simply modifying a MSR, and it does not appear to be patched in the Linux kernel. Other OS's have included this fix but I have not found any talk about it for Linux, so I thought it would be best to at least make sure it's made aware of here.

Reference for this erratum can be found on the AMD Family 10h and 12h Revision Guides, located (respectively) here:
http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf
http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf

It appears to have been discovered by Matthew Dillon in this mailing list post: https://www.dragonflybsd.org/mailarchive/kernel/2011-12/msg00025.html

Here is the summary on the issue, and its fix, from the above Revision Guides:


Description:
Under a highly specific and detailed set of internal timing conditions, the processor may incorrectly update the stack pointer after a long series of push and/or near-call instructions, or a long series of pop and/or near-return instructions. The processor must be in 64-bit mode for this erratum to occur.

Potential Effect on System:
The stack pointer value jumps by a value of approximately 1024, either in th
e positive or negative direction. This incorrect stack pointer causes unpredictable program or system behavior, usually observed as a program exception or crash (for example, a #GP or #UD).

Suggested Workaround:
System software may set MSRC001_1029[0] = 1b.

Fix Planned:
No
Comment 1 mirh 2017-11-21 20:10:29 UTC
Ccing maintainer. 
(which I don't know if pissing off)

Hopefully having already a functional "perfectioned" test case will make all of this quick.
Comment 2 Borislav Petkov 2017-11-21 20:21:42 UTC
There are a lot of errata fixable by a MSR chicken bit. The question is, is your machine affected by it and if yes, how do you know it is caused by this exact erratum?
Comment 3 Borislav Petkov 2019-04-21 10:51:15 UTC
Looks forgotten.