Created attachment 296859 [details] error output from boot I first ran into problems with 5.10.28 and 5.12.4, earlier in the boot process, but 5.13-rc2 fails differently, and later, so I'm reporting that, instead. With CONFIG_SMP=y, booting in qemu (5.2 on Debian buster) with "qemu-system-alpha -m 4096 -net nic,model=virtio-net-pci -net user,hostfwd=tcp::20000-:22 -drive file=alphadisk,format=raw -smp 1 -kernel vmlinux-5.13.0-rc2-smp -initrd initrd.img-5.13.0-rc2-smp -append 'console=ttyS0 root=UUID=f5487547-65eb-4330-8644-39e494b5d972' -nographic", the system dies during boot with the attached log output. When booted on 5.13-rc2 with the identical config, but with CONFIG_SMP=n, it runs fine. (I'm reporting it here instead of to, say, qemu because it happens when a kernel config parameter is varied, on the same userland, even if you only give qemu one core.) I'll attach the config of the non-SMP version below; the SMP version is just this with CONFIG_SMP=y and make olddefconfig run.
Created attachment 296861 [details] .config of non-SMP 5.13-rc2 kernel
A bisect pointed me to this commit, which is a bit surprising, but fully reproducible: commit f2f84b05e02b7710a201f0017b3272ad7ef703d1 Author: Kees Cook <keescook@chromium.org> Date: Wed Sep 25 16:47:58 2019 -0700 bug: consolidate warn_slowpath_fmt() usage Instead of having a separate helper for no printk output, just consolidate the logic into warn_slowpath_fmt(). Link: http://lkml.kernel.org/r/20190819234111.9019-4-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Drew Davenport <ddavenport@chromium.org> Cc: Feng Tang <feng.tang@intel.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I independently discovered this as well and reported it in https://lore.kernel.org/linux-kernel//202006112201.3B20AB28DC@keescook/T/#m63f054b306ee63a86496feec5a39779806511202
Replacing the calls to raw_smp_processor_id() in __warn() with just "0" fixes the problem for me: diff --git a/kernel/panic.c b/kernel/panic.c index 8bff183d6180..12f6cea6b8b0 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -671,11 +671,11 @@ void __warn(const char *file, int line, void *caller, unsigned taint, if (file) pr_warn("WARNING: CPU: %d PID: %d at %s:%d %pS\n", - raw_smp_processor_id(), current->pid, file, line, + 0, current->pid, file, line, caller); else pr_warn("WARNING: CPU: %d PID: %d at %pS\n", - raw_smp_processor_id(), current->pid, caller); + 0, current->pid, caller); #pragma GCC diagnostic push #ifndef __clang__ So, I assume the problem is that SMP support is not fully initialized at this point yet such that raw_smp_processor_id() causes the zero pointer dereference.
Here is a potential patch by Ivan Kokshaysky which fixes the problem for me: diff --git a/arch/alpha/include/uapi/asm/ptrace.h b/arch/alpha/include/uapi/asm/ptrace.h index 5ca45934fcbb..d2e8e69a18f1 100644 --- a/arch/alpha/include/uapi/asm/ptrace.h +++ b/arch/alpha/include/uapi/asm/ptrace.h @@ -49,7 +49,7 @@ struct pt_regs { unsigned long r16; unsigned long r17; unsigned long r18; -}; +} __attribute__((aligned(16))); /* GCC expects 16-byte stack alignment */ /* * This is the extended stack used by signal handlers and the context
This has now been fixed in the following three commits: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/alpha?id=3b35a171060f846b08b48646b38c30b5d57d17ff https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/alpha?id=0a0f7362b0367634a2d5cb7c96226afc116f19c9 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/alpha?id=77b823fa619f97d16409ca37ad4f7936e28c5f83 The fixes will be backported to stable kernels soon, I assume it's safe to say this can be considered fixed.