Bug 213301

Summary: kernel 5.13-rc4: arch/x86/kernel/alternative: Not a NOP at 0xdb920966
Product: Other Reporter: Richard Narron (richard)
Component: OtherAssignee: other_other
Status: RESOLVED CODE_FIX    
Severity: normal CC: a.p.zijlstra, bp, richard
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg for Not a NOP bug running x86 kernel on x86_64 hardware
dmesg for x86_64 kernel shows no errors on same hardware
config-huge-5.13.0-rc4-smp
proposed patch

Description Richard Narron 2021-05-31 19:16:04 UTC
Created attachment 297079 [details]
dmesg for Not a NOP bug running x86 kernel on x86_64 hardware

The following warning appears when running an x86 kernel 5.13-rc4 smp on an x86_64 machine:

[    0.226073] ------------[ cut here ]------------
[    0.226074] Not a NOP at 0xdb920966
[    0.226077] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:211 optimize_nops+0xe6/0x140
[    0.226085] Modules linked in:
[    0.226088] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc4-smp #1
[    0.226091] Hardware name: Hewlett-Packard HP Z400 Workstation/0B4Ch, BIOS 786G3 v03.61 03/05/2018
[    0.226094] EIP: optimize_nops+0xe6/0x140
[    0.226097] Code: 74 26 00 90 8b 45 f0 64 2b 05 d8 c6 16 dd 75 68 8d 65 f4 5b 5e 5f 5d c3 50 68 92 4b c3 dc c6 05 8c 6
7 01 dd 01 e8 2e 92 ec 00 <0f> 0b 58 5a eb d4 8b b5 7c ff ff ff 8b 8d 78 ff ff ff 29 da 89 4d
[    0.226102] EAX: 00000017 EBX: 0000002b ECX: 00000000 EDX: 00000000
[    0.226104] ESI: 246df6c7 EDI: db92093a EBP: dcdefe08 ESP: dcdefd78
[    0.226107] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210292
[    0.226110] CR0: 80050033 CR2: ff9ff000 CR3: 1d18c000 CR4: 000006f0
[    0.226112] Call Trace:
[    0.226115]  ? hv_query_ext_cap+0xa6/0x100
[    0.226118]  ? hv_query_ext_cap+0xa5/0x100
[    0.226121]  ? hv_query_ext_cap+0xab/0x100
[    0.226123]  ? hv_query_ext_cap+0xa6/0x100
[    0.226126]  ? hv_query_ext_cap+0xa5/0x100
[    0.226128]  ? hv_query_ext_cap+0xb4/0x100
[    0.226130]  ? hv_query_ext_cap+0xa6/0x100
[    0.226133]  ? apply_alternatives+0x166/0x310
[    0.226135]  ? hv_query_ext_cap+0x7a/0x100
[    0.226138]  ? xfs_reflink_find_shared+0x10/0xb0
[    0.226143]  ? prb_read_valid+0x1e/0x30
[    0.226147]  ? console_unlock+0x2d6/0x4a0
[    0.226150]  ? notify_die+0x7a/0xa0
[    0.226154]  ? exc_general_protection+0x280/0x280
[    0.226158]  ? exc_int3+0x3e/0xe0
[    0.226161]  ? irqentry_nmi_exit+0x5/0x30
[    0.226164]  ? handle_exception+0x128/0x12b
[    0.226168]  ? __cond_resched+0x18/0x50
[    0.226171]  ? synchronize_rcu+0x1b/0x70
[    0.226175]  ? arch_kdebugfs_init+0x1d/0x1d
[    0.226180]  ? alternative_instructions+0x6a/0xff
[    0.226183]  ? check_bugs+0x995/0x9b2
[    0.226185]  ? flush_tlb_kernel_range+0x30/0x90
[    0.226190]  ? kunmap_local_indexed+0x80/0xf0
[    0.226196]  ? start_kernel+0x754/0x782
[    0.226199]  ? early_idt_handler_common+0x44/0x44
[    0.226202]  ? startup_32_smp+0x161/0x164
[    0.226206] ---[ end trace 454510fab3f064c1 ]---
Comment 1 Richard Narron 2021-05-31 19:19:18 UTC
Created attachment 297081 [details]
dmesg for x86_64 kernel shows no errors on same hardware
Comment 2 Richard Narron 2021-05-31 20:06:20 UTC
Created attachment 297087 [details]
config-huge-5.13.0-rc4-smp

config-huge-5.13.0-rc4-smp
Comment 3 peterz 2021-05-31 22:29:49 UTC
On Mon, May 31, 2021 at 07:21:14PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:

Going by the $subject, please use the below and boot with
"debug-alternative" on the kernel command line.

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 75c752b0628c..9b2cd6b7078b 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -208,8 +208,10 @@ static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *ins
 	}
 
 	for (nop = i; i < a->instrlen; i++) {
-		if (WARN_ONCE(instr[i] != 0x90, "Not a NOP at 0x%px\n", &instr[i]))
+		if (WARN_ONCE(instr[i] != 0x90, "Not a NOP at 0x%px\n", &instr[i])) {
+			DUMP_BYTES(instr, a->instrlen, "%px:", instr);
 			return;
+		}
 	}
 
 	local_irq_save(flags);
Comment 4 Borislav Petkov 2021-05-31 23:48:22 UTC
Some preliminary observations, more fun tomorrow. So I can repro in a
vm, it looks like this below (might wanna stretch your terminal).

In any case, this looks like hv_do_hypercall() in hyperv_init() which
does CALL_NOSPEC and if you look at it closely, you'll see that at
final_insn, at offset 17 decimal there's a NOP (0x90).

But that trips our NOP detection because this is a 32-bit kernel and it
uses the 32-bit version of CALL_NOSPEC which is doing alignment stuff
and *that* the compiler pads with, yap, NOPs.

Which is exactly the thing I was saying it'll never happen because we
can control the make sure there are no NOPs in between instructions.
Except that damn compiler-generated ratpoline.

Oh well, let's talk tomorrow.

 SMP alternatives: feat: 7*32+12, old: (hyperv_init+0x304/0x3d1 (c27fb587) len: 54), repl: (c28cdd8f, len: 54)
 SMP alternatives: c27fb587: old_insn: ff 54 24 0c 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
 SMP alternatives: c28cdd8f: rpl_insn: eb 2f 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 8d 64 24 04 ff 74 24 0c c3 8d b4 26 00 00 00 00 e8 db ff ff ff
 SMP alternatives: c27fb587: final_insn: eb 2f 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 8d 64 24 04 ff 74 24 0c c3 8d b4 26 00 00 00 00 e8 db ff ff ff
 ------------[ cut here ]------------
 Not a NOP at 0xc27fb598
 WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:211 optimize_nops.isra.13+0x1a1/0x1c0
 Modules linked in:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc4-smp+ #3
Comment 5 Borislav Petkov 2021-06-01 09:11:35 UTC
Btw, Richard, why are you using a 32-bit kernel on 64-bit hardware?
Comment 6 Borislav Petkov 2021-06-01 15:58:20 UTC
Created attachment 297101 [details]
proposed patch

Richard, pls try this - it should fix it.

Thx.
Comment 7 Richard Narron 2021-06-01 20:35:40 UTC
Thanks Borislav,
 
    The patch works well on the x86_64 machine running in 32-bit mode.

    The x86_64 machine has 4 processors and ECC ram, much faster and safer than the average 32-bit machine.

    I found an x86 Intel Pentium 4 machine with the bug and the patch fixes it too.

    I have two more x86 AMD Athlon machines and I will test those to see if they have the bug...
Comment 8 Borislav Petkov 2021-06-01 20:39:31 UTC
(In reply to Richard Narron from comment #7)
>     The patch works well on the x86_64 machine running in 32-bit mode.

Thanks.

>     The x86_64 machine has 4 processors and ECC ram, much faster and safer
> than the average 32-bit machine.

And let me reiterate: you should run 64-bit kernels on 64-bit hardware. 64-bit kernels a orders of magnitude more tested than 32-bit kernels.

>     I found an x86 Intel Pentium 4 machine with the bug and the patch fixes
> it too.
>
>     I have two more x86 AMD Athlon machines and I will test those to see if
> they have the bug...

Ok, thanks for testing.
Comment 9 Richard Narron 2021-06-01 23:09:40 UTC
The two x86 AMD Athlon machines do not seem to have the bug and the patched version of the kernel runs fine.
Comment 10 Richard Narron 2021-06-07 04:29:03 UTC
Linux kernel 5.13-rc5 contains a fix from Borislav and it works.

The x86_64 machine running in x86 mode runs fine now and the Pentium 4 with the bug also runs fine now with 5.13-rc5.
Comment 11 Borislav Petkov 2021-06-07 08:25:08 UTC
Yap, upstream commit ID for future reference is:

2b31e8ed96b2 ("x86/alternative: Optimize single-byte NOPs at an arbitrary position")