Bug 219554
Summary: | Kernel 6.13 crashes when doing grub2-mkconfig during Fedora install in VM with Nehalem CPU config | ||
---|---|---|---|
Product: | Linux | Reporter: | Adam Williamson (adamw) |
Component: | Kernel | Assignee: | Virtual assignee for kernel bugs (linux-kernel) |
Status: | NEW --- | ||
Severity: | high | CC: | dan-kernel, kashyap.cv |
Priority: | P3 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION | |
Regression: | Yes | Bisected commit-id: | 5185e7f9f3bd754ab60680814afd714e2673ef88 |
Attachments: |
log extract from a UEFI failure on kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42
log extract from a BIOS failure on kernel-6.13.0-0.rc0.20241126git7eef7e306d3c.10.fc42 log extract from a BIOS failure on kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42 |
Description
Adam Williamson
2024-12-03 18:50:41 UTC
Created attachment 307313 [details]
log extract from a UEFI failure on kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42
Created attachment 307314 [details]
log extract from a BIOS failure on kernel-6.13.0-0.rc0.20241126git7eef7e306d3c.10.fc42
I don't have logs from a BIOS failure on the latest kernel yet, I'll try and get some soon.
Created attachment 307315 [details]
log extract from a BIOS failure on kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42
This bisects to: [adamw@xps13a linux ((5185e7f9f3bd...)|BISECTING)]$ git bisect bad 5185e7f9f3bd754ab60680814afd714e2673ef88 is the first bad commit commit 5185e7f9f3bd754ab60680814afd714e2673ef88 (HEAD) Author: Mike Rapoport (Microsoft) <rppt@kernel.org> Date: Wed Oct 23 19:27:11 2024 +0300 x86/module: enable ROX caches for module text on 64 bit Enable execmem's cache of PMD_SIZE'ed pages mapped as ROX for module text allocations on 64 bit. Link: https://lkml.kernel.org/r/20241023162711.2579610-9-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> arch/x86/Kconfig | 1 + arch/x86/mm/init.c | 37 ++++++++++++++++++++++++++++++++++++- 2 files changed, 37 insertions(+), 1 deletion(-) Update: I also did some testing with various CPU model configurations. I think this actually isn't to do with Nehalem per se, but "virtual machines where the CPU configuration does not exactly match the host", or something like that. I tried a bunch of qemu CPU model settings - nehalem, sandybridge, haswell, Skylake-Client and Cascadelake-Server - and got failures with all of them, but when I set the model to "host", all tests passed. The tests get farmed out to a cluster of systems which have different CPUs - one is Broadwell, one is Skylake, one is Cascade Lake - so I think when I set the model to anything specific, it will match the host CPU on some or none of those systems, but never *all* of them, so the bug will always show up. Luis Chamberlain has suggested that https://lore.kernel.org/lkml/20250103065631.26459-1-jgross@suse.com/T/#u might fix this. It looks promising, and I will test it when I get a chance. Hi Adam, FWIW, I have worked on CPU models on x86 in the user-land. I can expand on your "... or something like that" here. Summary ------- The virtual CPU model "Nehalem" indeed looks not like a problem here. I think a reason "Nehalem" is used as the *default* because if your underlying host OS is CentOS9 or RHEL9. As you might know, RHEL and CentOS 9 use "x86-64-v2" micro-architecture[1]. Thus, the virtual CPU model, "Nehalem", is the only common CPU model that works across both Intel and AMD hosts on "x86-64-v2". AMD had the necessary CPU features to run "Nehalem". So, it is very handy in a mixed hardware (CI) setup that is running CentOS 9 or RHEL 9. Upstream OpenStack Infrastructure team also uses "Nehalem" as default across its CI cluster for this reason[2]. In your openQA cluster with mixed CPUs, you may have to find a "lowest common denominator" CPU model that you can give to your guests. More on it below; it's a bit involved. [1] https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level [2] "Use Nehalem CPU model by default" — https://review.opendev.org/c/openstack/devstack/+/815020 Long version ------------ There are three main ways to configure CPU models: (1) Named CPU models — QEMU comes with a number of predefined CPU models that typically refer to specific generations of hardware released by their respective vendors; and libvirt will periodically check what new QEMU models are added, and it in turn adds supporst for it: Run `virsh cpu models` to see them, or `qemu-kvm -cpu help`.) * Pros: - You can select a tailored CPU model and a set of features that works in your environment. - Back and forth live migration works. * Cons: This is not a “con” per se, but it requires a good understanding of your hardware pool, and some planning. By “good understanding”, I mean the below: - You should know the different types of hardware you have, and make sure you're not mixing Intel and AMD. - You have to calculate your "baseline" CPU model and do a back-n-forth live migration test to make sure it works in your environment. An example: https://kashyapc.fedorapeople.org/Calculate-CPU-hypervisor-baseline.html (2) "host-passthrough" — as the name implies, it passes the host CPU model features, model, stepping, exactly to the guest. It translates to `-cpu host` on the QEMU command-line. * Pros: - Maximum set of features - Provides best possible performance for your guest - Ideal option when live migration is not a strict requirement. * Cons: - QEMU / libvirt cannot guarantee that a stable CPU is exposed to the guest across hosts—this is "implicit" when choosing `host-passthrough` - Live-migration will not go with mixed host CPUs (i.e. from SandyBridge to IceLake, to pick a random example) - Requirements for live migration to work: on both source and target hosts, you *must* have identical physical CPUs, kernel, and CPU microcode versions (look for the package "microcode_ctl") (3) "host-model" — a libvirt abstraction; It provides maximum possible CPU features from the host; it auto-adds critical guest CPU flags — assuming, you have updated CPU microcode, kernel, QEMU, and libvirt. It provides live migration compatibility, with a caveat: bi-directional live migration is only "partially possible" — meaning, you can't migrate back to any random host, it'll need to have the same-or-more features that the original. > Luis Chamberlain has suggested that > https://lore.kernel.org/lkml/20250103065631.26459-1-jgross@suse.com/T/#u > might fix this. It looks promising, and I will test it when I get a chance. Worth a try, but I'm sceptical that will make a difference. That patch is adding a test for the "PSE" CPUID feature flag, but you say you already reproduced it on QEMU with the "Nehalem" CPU, and that model includes the PSE flag already (https://gitlab.com/qemu-project/qemu/-/blob/master/target/i386/cpu.c#L2869) Kashyap: as I replied on RHBZ, yes, I'm aware of those issues. All the systems in the "cluster" use Intel CPUs, and they don't have any problems on kernels without the commit in question. The commit in question clearly introduces a problem of some kind. Another note I didn't add here yet: since I changed the openQA config to use `-cpu host` to try and work around this, the bug is still happening on those hosts. In my local testing (on an i7-1250U) I could not reproduce the bug with `-cpu host` - only `-cpu Nehalem` - but on the openQA runners, it reproduces even with `-cpu host`. Those systems use a mix of CPUs: xeon gold 5218, xeon gold 6130, and xeon e5-2683v4, according to my notes. Those CPUs are older than the one I was testing with. The i7-1250U that I tested with is an Alder Lake part from 2022. The Xeon Gold 5218 is Cascade Lake, from 2019. Xeon Gold 6130 is Skylake, from 2017. Xeon e5-2683v4 is Broadwell, from 2016 (can you tell I get the build cluster's hand-me-downs? :>) So the CPU association here might be as simple as "this commit is broken on older CPUs", the break point being somewhere between Cascade Lake and Alder Lake. Daniel, I did indeed test Luis' patch and it didn't help :| Mike Rapaport has asked me to test this patch: diff --git a/mm/execmem.c b/mm/execmem.c index be6b234c032e..0090a6f422aa 100644 --- a/mm/execmem.c +++ b/mm/execmem.c @@ -266,6 +266,7 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size) unsigned long vm_flags = VM_ALLOW_HUGE_VMAP; struct execmem_area *area; unsigned long start, end; + unsigned int page_shift; struct vm_struct *vm; size_t alloc_size; int err = -ENOMEM; @@ -296,8 +297,9 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size) if (err) goto err_free_mem; + page_shift = get_vm_area_page_order(vm) + PAGE_SHIFT; err = vmap_pages_range_noflush(start, end, range->pgprot, vm->pages, - PMD_SHIFT); + page_shift); if (err) goto err_free_mem; -- 2.45.2 I didn't manage to get to it over the weekend as it didn't apply cleanly to the Fedora Rawhide kernel source at the time and I didn't have time to rebase it, but I'll do it today. "So the CPU association here might be as simple as "this commit is broken on older CPUs", the break point being somewhere between Cascade Lake and Alder Lake." Hmm, well, thinking about it, I can only say for sure "somewhere between Broadwell and Alder Lake", since I didn't check yet that the bug still happens on *all the openQA worker hosts* with `-cpu host`, only that it's definitely still happening on some of them. Update: in initial testing, the patch proposed by Mike (the one from comment #9) seems to be working. I've run 50 installs with it and none have hung. As a check I also ran 10 installs with the unpatched kernel, and four of them hung. It looks like the whole execmem ROX thing is being disabled for final - assuming https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=a9bbe341333109465605e8733bab0b573cddcc8c is pulled - so this should stop being an issue whenever that lands, until it's re-enabled, I guess? |