112051 – AMD Family 10h/12h Erratum #721

Bug 112051 - AMD Family 10h/12h Erratum #721

Summary: AMD Family 10h/12h Erratum #721

Status:	RESOLVED INSUFFICIENT_DATA

Alias:	None

Product:	Platform Specific/Hardware
Classification:	Unclassified
Component:	x86-64 (show other bugs)
Hardware:	x86-64 Linux

Importance:	P1 normal
Assignee:	platform_x86_64@kernel-bugs.osdl.org

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-02-07 01:56 UTC by Kyle Repinski
Modified:	2019-04-21 10:51 UTC (History)
CC List:	2 users (show)

See Also:
Kernel Version:	All
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Kyle Repinski 2016-02-07 01:56:25 UTC

I noticed an erratum (#721) for AMD Family 10h and 12h CPUs is fixable via simply modifying a MSR, and it does not appear to be patched in the Linux kernel. Other OS's have included this fix but I have not found any talk about it for Linux, so I thought it would be best to at least make sure it's made aware of here.

Reference for this erratum can be found on the AMD Family 10h and 12h Revision Guides, located (respectively) here:
http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf
http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf

It appears to have been discovered by Matthew Dillon in this mailing list post: https://www.dragonflybsd.org/mailarchive/kernel/2011-12/msg00025.html

Here is the summary on the issue, and its fix, from the above Revision Guides:


Description:
Under a highly specific and detailed set of internal timing conditions, the processor may incorrectly update the stack pointer after a long series of push and/or near-call instructions, or a long series of pop and/or near-return instructions. The processor must be in 64-bit mode for this erratum to occur.

Potential Effect on System:
The stack pointer value jumps by a value of approximately 1024, either in th
e positive or negative direction. This incorrect stack pointer causes unpredictable program or system behavior, usually observed as a program exception or crash (for example, a #GP or #UD).

Suggested Workaround:
System software may set MSRC001_1029[0] = 1b.

Fix Planned:
No

Comment 1 mirh 2017-11-21 20:10:29 UTC

Ccing maintainer. 
(which I don't know if pissing off)

Hopefully having already a functional "perfectioned" test case will make all of this quick.

Comment 2 Borislav Petkov 2017-11-21 20:21:42 UTC

There are a lot of errata fixable by a MSR chicken bit. The question is, is your machine affected by it and if yes, how do you know it is caused by this exact erratum?

Comment 3 Borislav Petkov 2019-04-21 10:51:15 UTC

Looks forgotten.

Note You need to log in before you can comment on or make changes to this bug.