Bug 213143 - System fails to boot when CONFIG_SMP=y
Summary: System fails to boot when CONFIG_SMP=y
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: Alpha (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Richard Henderson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-19 10:19 UTC by Rich E
Modified: 2025-02-15 14:03 UTC (History)
4 users (show)

See Also:
Kernel Version: 5.13.0-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
error output from boot (2.02 KB, text/plain)
2021-05-19 10:19 UTC, Rich E
Details
.config of non-SMP 5.13-rc2 kernel (40.16 KB, application/x-gzip)
2021-05-19 10:20 UTC, Rich E
Details

Description Rich E 2021-05-19 10:19:23 UTC
Created attachment 296859 [details]
error output from boot

I first ran into problems with 5.10.28 and 5.12.4, earlier in the boot process, but 5.13-rc2 fails differently, and later, so I'm reporting that, instead.

With CONFIG_SMP=y, booting in qemu (5.2 on Debian buster) with "qemu-system-alpha -m 4096 -net nic,model=virtio-net-pci -net user,hostfwd=tcp::20000-:22 -drive file=alphadisk,format=raw -smp 1 -kernel vmlinux-5.13.0-rc2-smp -initrd initrd.img-5.13.0-rc2-smp -append 'console=ttyS0 root=UUID=f5487547-65eb-4330-8644-39e494b5d972' -nographic", the system dies during boot with the attached log output. When booted on 5.13-rc2 with the identical config, but with CONFIG_SMP=n, it runs fine. (I'm reporting it here instead of to, say, qemu because it happens when a kernel config parameter is varied, on the same userland, even if you only give qemu one core.)

I'll attach the config of the non-SMP version below; the SMP version is just this with CONFIG_SMP=y and make olddefconfig run.
Comment 1 Rich E 2021-05-19 10:20:17 UTC
Created attachment 296861 [details]
.config of non-SMP 5.13-rc2 kernel
Comment 2 Aurelien Jarno 2023-08-09 11:37:22 UTC
A bisect pointed me to this commit, which is a bit surprising, but fully reproducible:

commit f2f84b05e02b7710a201f0017b3272ad7ef703d1
Author: Kees Cook <keescook@chromium.org>
Date:   Wed Sep 25 16:47:58 2019 -0700

    bug: consolidate warn_slowpath_fmt() usage

    Instead of having a separate helper for no printk output, just consolidate
    the logic into warn_slowpath_fmt().

    Link: http://lkml.kernel.org/r/20190819234111.9019-4-keescook@chromium.org
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Christophe Leroy <christophe.leroy@c-s.fr>
    Cc: Drew Davenport <ddavenport@chromium.org>
    Cc: Feng Tang <feng.tang@intel.com>
    Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
    Cc: YueHaibing <yuehaibing@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Comment 3 Matt Turner 2024-05-21 18:06:21 UTC
I independently discovered this as well and reported it in https://lore.kernel.org/linux-kernel//202006112201.3B20AB28DC@keescook/T/#m63f054b306ee63a86496feec5a39779806511202
Comment 4 John Paul Adrian Glaubitz 2024-05-21 18:39:28 UTC
Replacing the calls to raw_smp_processor_id() in __warn() with just "0" fixes the problem for me:

diff --git a/kernel/panic.c b/kernel/panic.c
index 8bff183d6180..12f6cea6b8b0 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -671,11 +671,11 @@ void __warn(const char *file, int line, void *caller, unsigned taint,
 
        if (file)
                pr_warn("WARNING: CPU: %d PID: %d at %s:%d %pS\n",
-                       raw_smp_processor_id(), current->pid, file, line,
+                       0, current->pid, file, line,
                        caller);
        else
                pr_warn("WARNING: CPU: %d PID: %d at %pS\n",
-                       raw_smp_processor_id(), current->pid, caller);
+                       0, current->pid, caller);
 
 #pragma GCC diagnostic push
 #ifndef __clang__

So, I assume the problem is that SMP support is not fully initialized at this point yet such that raw_smp_processor_id() causes the zero pointer dereference.
Comment 5 John Paul Adrian Glaubitz 2025-01-24 06:55:40 UTC
Here is a potential patch by Ivan Kokshaysky which fixes the problem for me:

diff --git a/arch/alpha/include/uapi/asm/ptrace.h b/arch/alpha/include/uapi/asm/ptrace.h
index 5ca45934fcbb..d2e8e69a18f1 100644
--- a/arch/alpha/include/uapi/asm/ptrace.h
+++ b/arch/alpha/include/uapi/asm/ptrace.h
@@ -49,7 +49,7 @@ struct pt_regs {
 	unsigned long r16;
 	unsigned long r17;
 	unsigned long r18;
-};
+} __attribute__((aligned(16)));	/* GCC expects 16-byte stack alignment */
 
 /*
  * This is the extended stack used by signal handlers and the context

Note You need to log in before you can comment on or make changes to this bug.