Bug 207979 - kernel_fpu_begin() does not set mxcsr value
Summary: kernel_fpu_begin() does not set mxcsr value
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Borislav Petkov
: 206987 (view as bug list)
Depends on:
Reported: 2020-05-29 17:39 UTC by Petteri Aimonen
Modified: 2022-01-06 18:39 UTC (History)
8 users (show)

See Also:
Kernel Version: 5.7.0-rc7 (75caf310d16)
Regression: No
Bisected commit-id:

Userland program for setting mxcsr (513 bytes, text/x-csrc)
2020-05-29 17:39 UTC, Petteri Aimonen
Dmesg from crash, with MXCSR debug info (94.74 KB, text/plain)
2020-05-29 17:40 UTC, Petteri Aimonen
Patch for printing extra debug info in simd exception (972 bytes, patch)
2020-05-29 17:40 UTC, Petteri Aimonen
Details | Diff
Possible fix, by setting mxcsr to default value (586 bytes, patch)
2020-05-29 17:48 UTC, Petteri Aimonen
Details | Diff
save/restore mxcsr (1.72 KB, patch)
2020-06-01 11:49 UTC, Borislav Petkov
Details | Diff
Proposed patch with selftest (9.38 KB, patch)
2020-06-02 08:55 UTC, Petteri Aimonen
Details | Diff
Proposed patch with selftest, v2 (9.49 KB, patch)
2020-06-12 06:31 UTC, Petteri Aimonen
Details | Diff

Description Petteri Aimonen 2020-05-29 17:39:38 UTC
Created attachment 289417 [details]
Userland program for setting mxcsr

While debugging crash with amdgpu driver ( https://gitlab.freedesktop.org/drm/amd/-/issues/1154 ), I appear to have found a more general bug in kernel_fpu_begin().

What seems to happen is the kernel_fpu_begin() does not reset the mxcsr value, instead leaving it at whatever value the user mode thread has been using. This has the potential to disturb any floating point operations in the kernel and causing FPU/SIMD exceptions if the mask bits have been changed.

I have been able to reproduce the problem as follows:

1. Have a kernel module that uses floating point operations. In my case, amdgpu.

2. Compile attached mxcsr_test.c: gcc -Wall mxcsr_test.c

3. Run one copy per CPU thread, using a normal unprivileged user account: ./a.out & ./a.out & ... & ./a.out

4. Do something that causes floating point operations in kernel, in my case graphic activity caused by browsing with Firefox works best.

5. Within an hour, the kernel will crash with "simd exception".

In 0001-kernel-traps-print-mxcsr.txt I have added extra debug to print the MXCSR value.
In dmesg_5.7.0_mxcsr_crash.txt it can be seen that the MXCSR value in kernel is the same as is set by mxcsr_test.c.
Comment 1 Petteri Aimonen 2020-05-29 17:40:15 UTC
Created attachment 289419 [details]
Dmesg from crash, with MXCSR debug info
Comment 2 Petteri Aimonen 2020-05-29 17:40:54 UTC
Created attachment 289421 [details]
Patch for printing extra debug info in simd exception
Comment 3 Petteri Aimonen 2020-05-29 17:48:56 UTC
Created attachment 289423 [details]
Possible fix, by setting mxcsr to default value
Comment 4 Borislav Petkov 2020-06-01 11:48:47 UTC
(In reply to Petteri Aimonen from comment #3)
> Possible fix, by setting mxcsr to default value

Pretty much. But with some saving and restoring of the MXCSR. Pls try the following one.

Comment 5 Borislav Petkov 2020-06-01 11:49:55 UTC
Created attachment 289447 [details]
save/restore mxcsr
Comment 6 Petteri Aimonen 2020-06-01 11:55:10 UTC
I can try that patch. But is the separate save & restore of MXCSR necessary, considering copy_fpregs_to_fpstate(&current->thread.fpu); uses fxsaveq that already saves mxcsr?

It seems to me the difference would be only in the MXCSR value that is active between kernel_fpu_end() and the context switch back to userland. There shouldn't be any float operations between those, and if there accidentally are, using the kernel MXCSR value would be safer.
Comment 7 Borislav Petkov 2020-06-01 15:08:27 UTC
Well, depends. It can use XSAVE too:

        if (likely(use_xsave())) {

not that it matters tho - all XSAVE* save MXCSR too. Lemme add two more gents to check.

Dave, Sebastian, do you guys think that it would be enough to unconditionally overwrite MXCSR in kernel_fpu_begin() if userspace has unmasked the SIMD exception bits and thus kernel code using SIMD insns can end up in an SIMD exception?

Comment 8 Dave Hansen 2020-06-01 16:09:40 UTC
I agree that we can just "clobber" the bad user MXCSR value with the good kernel value and leave it.  All the rest of the XSAVE/FPU state is clobbered there anyway, and gets restored before going back to userspace, so I think MXCSR can just do the same thing.

It would also be nice to go double-check the SDM for anything else that can cause SIMD/FP faults that we might have missed.  MXCSR is certainly the worst offender, but I also wonder if there are any others we might have missed.

This also seems like something we need a selftest for.
Comment 9 H. Peter Anvin 2020-06-01 17:53:33 UTC
This would seem to be the only sensible option to me. If it was *only* the trap bits that would be one thing (the #XF handler could detect that case and adjust MXCSR at trap time, saving a few cycles), but user space could have mucked with the various silent bits as well, changing the semantics of kernel code.
Comment 10 Borislav Petkov 2020-06-01 18:43:03 UTC

Petteri you wanna give producing a proper patch a try, along with a selftest and explaining the issue properly in the commit message?

Comment 11 Petteri Aimonen 2020-06-01 18:46:45 UTC
@Borislav I can certainly try :)
Comment 12 Borislav Petkov 2020-06-01 18:50:17 UTC
Ok, if you have any questions, feel free to switch to mail so that we don't pollute bugzilla with modalities and I'll answer them if I can.

Comment 13 Petteri Aimonen 2020-06-02 08:55:57 UTC
Created attachment 289467 [details]
Proposed patch with selftest

Here is a properly formatted patch, including a selftest.

Compiling floating point code for kernel modules requires extra CFLAGS. I have chosen to add FPU_CFLAGS define to arch/x86/Makefile, using the same logic as amdgpu currently uses in multiple Makefiles. Eventually amdgpu could be changed to use the global FPU_CFLAGS also.

I did a cursory check if the Intel SDM for other potentially problematic registers:

- MPX bound registers: MPX support has been purposely removed from kernel.
- SSE2 EFLAGS register: is not retained between function calls, so code will not expect any particular value
- x87 legacy control word register: for pre-SSE CPUs, this is initialized to sane default by FNSAVE instruction. Potentially problematic if kernel code would use legacy x87 instructions on a cpu with X86_FEATURE_XSAVE.
Comment 14 Petteri Aimonen 2020-06-12 06:31:56 UTC
Created attachment 289635 [details]
Proposed patch with selftest, v2

Here is an updated version of the patch, incorporating the feedback from mail chain:

- Actual fix is unchanged.
- Moved the test module trigger file under debugfs.
- Moved self-test userland part out of x86 folder as it is portable.
- Minor formatting fixes.
- Updated patch to apply cleanly against newest upstream.
Comment 15 Borislav Petkov 2020-06-12 08:18:50 UTC
Please send it to lkml and Cc all the people on the thread so that it can get discussed properly. bugzilla is not for dealing with patches.

Comment 16 Petteri Aimonen 2020-06-12 13:17:18 UTC
Sorry, I'm unfamiliar with the kernel patch process and this is getting more complex than I have time available currently. As such, I will just leave the patch here and run a custom kernel for now. I hope the fix will eventually make in, but this is as far as I can take this now.
Comment 17 Borislav Petkov 2020-06-12 13:21:05 UTC
Ok, no problem. I'll take care of it and Cc you.

Thanks for taking the time and responding to feedback.
Comment 18 Alex Deucher 2020-07-15 16:16:45 UTC
Boris, are you still planning to send this out?
Comment 19 Borislav Petkov 2020-07-16 09:56:05 UTC
7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()") is already in v5.8-rc4.
Comment 20 Cyrax 2020-07-23 01:47:16 UTC
*** Bug 206987 has been marked as a duplicate of this bug. ***
Comment 21 krakopo 2020-08-24 14:25:12 UTC
I am still hitting this problem even with the patch (7ad816762f9b) applied. For details please see my comments in bug 206987 here: https://bugzilla.kernel.org/show_bug.cgi?id=206987#c33
Comment 22 Borislav Petkov 2020-09-06 23:12:37 UTC
(In reply to krakopo from comment #21)
> I am still hitting this problem even with the patch (7ad816762f9b) applied.
> For details please see my comments in bug 206987 here:
> https://bugzilla.kernel.org/show_bug.cgi?id=206987#c33

Does this always happen only when you run a KVM guest and not otherwise?
Comment 23 krakopo 2020-09-08 16:59:36 UTC
@Borislav Yes that was the case. I have recently moved up to kernel 5.8.5 and so far I have not been able to reproduce this even when using KVM.
Comment 24 Borislav Petkov 2020-09-08 18:14:37 UTC

Ok, if you can reproduce it, do let me know how so that I can try it here.

Comment 25 Borislav Petkov 2022-01-06 18:39:09 UTC
Looks forgotten. Feel free to reopen if still of interest.

Note You need to log in before you can comment on or make changes to this bug.