I noticed an erratum (#721) for AMD Family 10h and 12h CPUs is fixable via simply modifying a MSR, and it does not appear to be patched in the Linux kernel. Other OS's have included this fix but I have not found any talk about it for Linux, so I thought it would be best to at least make sure it's made aware of here. Reference for this erratum can be found on the AMD Family 10h and 12h Revision Guides, located (respectively) here: http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf It appears to have been discovered by Matthew Dillon in this mailing list post: https://www.dragonflybsd.org/mailarchive/kernel/2011-12/msg00025.html Here is the summary on the issue, and its fix, from the above Revision Guides: Description: Under a highly specific and detailed set of internal timing conditions, the processor may incorrectly update the stack pointer after a long series of push and/or near-call instructions, or a long series of pop and/or near-return instructions. The processor must be in 64-bit mode for this erratum to occur. Potential Effect on System: The stack pointer value jumps by a value of approximately 1024, either in th e positive or negative direction. This incorrect stack pointer causes unpredictable program or system behavior, usually observed as a program exception or crash (for example, a #GP or #UD). Suggested Workaround: System software may set MSRC001_1029[0] = 1b. Fix Planned: No
Ccing maintainer. (which I don't know if pissing off) Hopefully having already a functional "perfectioned" test case will make all of this quick.
There are a lot of errata fixable by a MSR chicken bit. The question is, is your machine affected by it and if yes, how do you know it is caused by this exact erratum?
Looks forgotten.