Created attachment 297079 [details] dmesg for Not a NOP bug running x86 kernel on x86_64 hardware The following warning appears when running an x86 kernel 5.13-rc4 smp on an x86_64 machine: [ 0.226073] ------------[ cut here ]------------ [ 0.226074] Not a NOP at 0xdb920966 [ 0.226077] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:211 optimize_nops+0xe6/0x140 [ 0.226085] Modules linked in: [ 0.226088] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc4-smp #1 [ 0.226091] Hardware name: Hewlett-Packard HP Z400 Workstation/0B4Ch, BIOS 786G3 v03.61 03/05/2018 [ 0.226094] EIP: optimize_nops+0xe6/0x140 [ 0.226097] Code: 74 26 00 90 8b 45 f0 64 2b 05 d8 c6 16 dd 75 68 8d 65 f4 5b 5e 5f 5d c3 50 68 92 4b c3 dc c6 05 8c 6 7 01 dd 01 e8 2e 92 ec 00 <0f> 0b 58 5a eb d4 8b b5 7c ff ff ff 8b 8d 78 ff ff ff 29 da 89 4d [ 0.226102] EAX: 00000017 EBX: 0000002b ECX: 00000000 EDX: 00000000 [ 0.226104] ESI: 246df6c7 EDI: db92093a EBP: dcdefe08 ESP: dcdefd78 [ 0.226107] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210292 [ 0.226110] CR0: 80050033 CR2: ff9ff000 CR3: 1d18c000 CR4: 000006f0 [ 0.226112] Call Trace: [ 0.226115] ? hv_query_ext_cap+0xa6/0x100 [ 0.226118] ? hv_query_ext_cap+0xa5/0x100 [ 0.226121] ? hv_query_ext_cap+0xab/0x100 [ 0.226123] ? hv_query_ext_cap+0xa6/0x100 [ 0.226126] ? hv_query_ext_cap+0xa5/0x100 [ 0.226128] ? hv_query_ext_cap+0xb4/0x100 [ 0.226130] ? hv_query_ext_cap+0xa6/0x100 [ 0.226133] ? apply_alternatives+0x166/0x310 [ 0.226135] ? hv_query_ext_cap+0x7a/0x100 [ 0.226138] ? xfs_reflink_find_shared+0x10/0xb0 [ 0.226143] ? prb_read_valid+0x1e/0x30 [ 0.226147] ? console_unlock+0x2d6/0x4a0 [ 0.226150] ? notify_die+0x7a/0xa0 [ 0.226154] ? exc_general_protection+0x280/0x280 [ 0.226158] ? exc_int3+0x3e/0xe0 [ 0.226161] ? irqentry_nmi_exit+0x5/0x30 [ 0.226164] ? handle_exception+0x128/0x12b [ 0.226168] ? __cond_resched+0x18/0x50 [ 0.226171] ? synchronize_rcu+0x1b/0x70 [ 0.226175] ? arch_kdebugfs_init+0x1d/0x1d [ 0.226180] ? alternative_instructions+0x6a/0xff [ 0.226183] ? check_bugs+0x995/0x9b2 [ 0.226185] ? flush_tlb_kernel_range+0x30/0x90 [ 0.226190] ? kunmap_local_indexed+0x80/0xf0 [ 0.226196] ? start_kernel+0x754/0x782 [ 0.226199] ? early_idt_handler_common+0x44/0x44 [ 0.226202] ? startup_32_smp+0x161/0x164 [ 0.226206] ---[ end trace 454510fab3f064c1 ]---
Created attachment 297081 [details] dmesg for x86_64 kernel shows no errors on same hardware
Created attachment 297087 [details] config-huge-5.13.0-rc4-smp config-huge-5.13.0-rc4-smp
On Mon, May 31, 2021 at 07:21:14PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: Going by the $subject, please use the below and boot with "debug-alternative" on the kernel command line. diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 75c752b0628c..9b2cd6b7078b 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -208,8 +208,10 @@ static void __init_or_module noinline optimize_nops(struct alt_instr *a, u8 *ins } for (nop = i; i < a->instrlen; i++) { - if (WARN_ONCE(instr[i] != 0x90, "Not a NOP at 0x%px\n", &instr[i])) + if (WARN_ONCE(instr[i] != 0x90, "Not a NOP at 0x%px\n", &instr[i])) { + DUMP_BYTES(instr, a->instrlen, "%px:", instr); return; + } } local_irq_save(flags);
Some preliminary observations, more fun tomorrow. So I can repro in a vm, it looks like this below (might wanna stretch your terminal). In any case, this looks like hv_do_hypercall() in hyperv_init() which does CALL_NOSPEC and if you look at it closely, you'll see that at final_insn, at offset 17 decimal there's a NOP (0x90). But that trips our NOP detection because this is a 32-bit kernel and it uses the 32-bit version of CALL_NOSPEC which is doing alignment stuff and *that* the compiler pads with, yap, NOPs. Which is exactly the thing I was saying it'll never happen because we can control the make sure there are no NOPs in between instructions. Except that damn compiler-generated ratpoline. Oh well, let's talk tomorrow. SMP alternatives: feat: 7*32+12, old: (hyperv_init+0x304/0x3d1 (c27fb587) len: 54), repl: (c28cdd8f, len: 54) SMP alternatives: c27fb587: old_insn: ff 54 24 0c 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 SMP alternatives: c28cdd8f: rpl_insn: eb 2f 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 8d 64 24 04 ff 74 24 0c c3 8d b4 26 00 00 00 00 e8 db ff ff ff SMP alternatives: c27fb587: final_insn: eb 2f 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 8d 64 24 04 ff 74 24 0c c3 8d b4 26 00 00 00 00 e8 db ff ff ff ------------[ cut here ]------------ Not a NOP at 0xc27fb598 WARNING: CPU: 0 PID: 0 at arch/x86/kernel/alternative.c:211 optimize_nops.isra.13+0x1a1/0x1c0 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-rc4-smp+ #3
Btw, Richard, why are you using a 32-bit kernel on 64-bit hardware?
Created attachment 297101 [details] proposed patch Richard, pls try this - it should fix it. Thx.
Thanks Borislav, The patch works well on the x86_64 machine running in 32-bit mode. The x86_64 machine has 4 processors and ECC ram, much faster and safer than the average 32-bit machine. I found an x86 Intel Pentium 4 machine with the bug and the patch fixes it too. I have two more x86 AMD Athlon machines and I will test those to see if they have the bug...
(In reply to Richard Narron from comment #7) > The patch works well on the x86_64 machine running in 32-bit mode. Thanks. > The x86_64 machine has 4 processors and ECC ram, much faster and safer > than the average 32-bit machine. And let me reiterate: you should run 64-bit kernels on 64-bit hardware. 64-bit kernels a orders of magnitude more tested than 32-bit kernels. > I found an x86 Intel Pentium 4 machine with the bug and the patch fixes > it too. > > I have two more x86 AMD Athlon machines and I will test those to see if > they have the bug... Ok, thanks for testing.
The two x86 AMD Athlon machines do not seem to have the bug and the patched version of the kernel runs fine.
Linux kernel 5.13-rc5 contains a fix from Borislav and it works. The x86_64 machine running in x86 mode runs fine now and the Pentium 4 with the bug also runs fine now with 5.13-rc5.
Yap, upstream commit ID for future reference is: 2b31e8ed96b2 ("x86/alternative: Optimize single-byte NOPs at an arbitrary position")