Bug 219554 - Kernel 6.13 crashes when doing grub2-mkconfig during Fedora install in VM with Nehalem CPU config
Summary: Kernel 6.13 crashes when doing grub2-mkconfig during Fedora install in VM wit...
Status: NEW
Alias: None
Product: Linux
Classification: Unclassified
Component: Kernel (show other bugs)
Hardware: Intel Linux
: P3 high
Assignee: Virtual assignee for kernel bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-12-03 18:50 UTC by Adam Williamson
Modified: 2025-01-14 18:32 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem: MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION
Regression: Yes
Bisected commit-id: 5185e7f9f3bd754ab60680814afd714e2673ef88


Attachments
log extract from a UEFI failure on kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42 (61.00 KB, text/plain)
2024-12-03 18:52 UTC, Adam Williamson
Details
log extract from a BIOS failure on kernel-6.13.0-0.rc0.20241126git7eef7e306d3c.10.fc42 (35.02 KB, text/plain)
2024-12-03 18:54 UTC, Adam Williamson
Details
log extract from a BIOS failure on kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42 (66.92 KB, text/plain)
2024-12-03 20:52 UTC, Adam Williamson
Details

Description Adam Williamson 2024-12-03 18:50:41 UTC
This is an upstream report for https://bugzilla.redhat.com/show_bug.cgi?id=2329581 . This is with Fedora distro kernel, but I'm pretty sure it's an upstream issue, I don't think any Fedora customizations are relevant here.

In automated testing of Fedora we've noticed a lot of failures of install tests since kernel-6.13.0-0.rc0.20241125git9f16d5e6f220.8.fc42 landed in Rawhide - that is, a snapshot of upstream git 9f16d5e6f220 . The previous build, kernel-6.13.0-0.rc0.20241119git158f238aa69d.2.fc42 - a snapshot of upstream 158f238aa69d - did not show this problem. The problems persist with the latest kernel build, kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42 (a snapshot of e70140ba0d2b ).

Both BIOS and UEFI x86_64 installs are frequently hitting kernel crashes when the Fedora installer runs grub2-mkconfig as part of the install process. In the BIOS case, this causes the system to hang permanently. In the UEFI case, the system hangs for a while then reboots, and fails to boot as the installation did not complete.

I've reproduced both BIOS and UEFI failures locally with a qemu VM configured like the one we use in the affected tests: 2 vCPUs, 4G RAM, and CPU model Nehalem - that's `-cpu Nehalem` argument to qemu. If I use host CPU config instead, the bug doesn't happen. We intentionally use the Nehalem model in this testing to ensure Fedora doesn't inadvertently stop supporting the CPU baseline it intends to support.

This happens on more than 50% of install attempts, but not all of them (sometimes they work; I've set our test system to retry failures five times for now to mitigate the effects of this bug).

The details of the traces we get in the kernel logs differ between occurrences and also between BIOS and UEFI, which someone suggested indicate this may be some kind of memory corruption issue. But the broad shape is consistent: the installer reaches grub2-mkconfig and we get a kernel crash.

I did also try reproducing this by running `grub2-mkconfig -o /boot/grub/grub2.cfg` multiple times on an *installed* VM with the same kernel and VM config, but could not trigger a crash in this case. There must be something specific about how this happens in the installer environment (for one thing, the installer runs the command chroot'ed into the installed system environment).

I'll attach sample logs from a UEFI failure and a BIOS failure.
Comment 1 Adam Williamson 2024-12-03 18:52:19 UTC
Created attachment 307313 [details]
log extract from a UEFI failure on kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42
Comment 2 Adam Williamson 2024-12-03 18:54:50 UTC
Created attachment 307314 [details]
log extract from a BIOS failure on kernel-6.13.0-0.rc0.20241126git7eef7e306d3c.10.fc42

I don't have logs from a BIOS failure on the latest kernel yet, I'll try and get some soon.
Comment 3 Adam Williamson 2024-12-03 20:52:42 UTC
Created attachment 307315 [details]
log extract from a BIOS failure on kernel-6.13.0-0.rc1.20241202gite70140ba0d2b.14.fc42
Comment 4 Adam Williamson 2024-12-23 11:32:07 UTC
This bisects to:

[adamw@xps13a linux ((5185e7f9f3bd...)|BISECTING)]$ git bisect bad
5185e7f9f3bd754ab60680814afd714e2673ef88 is the first bad commit
commit 5185e7f9f3bd754ab60680814afd714e2673ef88 (HEAD)
Author: Mike Rapoport (Microsoft) <rppt@kernel.org>
Date:   Wed Oct 23 19:27:11 2024 +0300

    x86/module: enable ROX caches for module text on 64 bit
    
    Enable execmem's cache of PMD_SIZE'ed pages mapped as ROX for module text
    allocations on 64 bit.
    
    Link: https://lkml.kernel.org/r/20241023162711.2579610-9-rppt@kernel.org
    Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
    Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
    Tested-by: kdevops <kdevops@lists.linux.dev>
    Cc: Andreas Larsson <andreas@gaisler.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Ard Biesheuvel <ardb@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: Brian Cain <bcain@quicinc.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Dinh Nguyen <dinguyen@kernel.org>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Helge Deller <deller@gmx.de>
    Cc: Huacai Chen <chenhuacai@kernel.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Johannes Berg <johannes@sipsolutions.net>
    Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
    Cc: Kent Overstreet <kent.overstreet@linux.dev>
    Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Song Liu <song@kernel.org>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Cc: Vineet Gupta <vgupta@kernel.org>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

 arch/x86/Kconfig   |  1 +
 arch/x86/mm/init.c | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)
Comment 5 Adam Williamson 2025-01-02 20:17:12 UTC
Update: I also did some testing with various CPU model configurations. I think this actually isn't to do with Nehalem per se, but "virtual machines where the CPU configuration does not exactly match the host", or something like that.

I tried a bunch of qemu CPU model settings - nehalem, sandybridge, haswell, Skylake-Client and Cascadelake-Server - and got failures with all of them, but when I set the model to "host", all tests passed.

The tests get farmed out to a cluster of systems which have different CPUs - one is Broadwell, one is Skylake, one is Cascade Lake - so I think when I set the model to anything specific, it will match the host CPU on some or none of those systems, but never *all* of them, so the bug will always show up.
Comment 6 Adam Williamson 2025-01-05 07:26:13 UTC
Luis Chamberlain has suggested that https://lore.kernel.org/lkml/20250103065631.26459-1-jgross@suse.com/T/#u might fix this. It looks promising, and I will test it when I get a chance.
Comment 7 Kashyap Chamarthy 2025-01-13 11:34:19 UTC
Hi Adam,

FWIW, I have worked on CPU models on x86 in the user-land.  I can expand
on your "... or something like that" here.

Summary
-------

The virtual CPU model "Nehalem" indeed looks not like a problem here.  I
think a reason "Nehalem" is used as the *default* because if your
underlying host OS is CentOS9 or RHEL9.

As you might know, RHEL and CentOS 9 use "x86-64-v2" 
micro-architecture[1].  Thus, the virtual CPU model, "Nehalem", is the
only common CPU model that works across both Intel and AMD hosts on
"x86-64-v2".  AMD had the necessary CPU features to run "Nehalem".  So, 
it is very handy in a mixed hardware (CI) setup that is running CentOS 9
or RHEL 9.  Upstream OpenStack Infrastructure team also uses "Nehalem" 
as default across its CI cluster for this reason[2].


In your openQA cluster with mixed CPUs, you may have to find a "lowest 
common denominator" CPU model that you can give to your guests.  More on 
it below; it's a bit involved.

[1] https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level

[2] "Use Nehalem CPU model by default" — 
    https://review.opendev.org/c/openstack/devstack/+/815020


Long version
------------

There are three main ways to configure CPU models:

(1) Named CPU models — QEMU comes with a number of predefined CPU models
    that typically refer to specific generations of hardware released by
    their respective vendors; and libvirt will periodically check what
    new QEMU models are added, and it in turn adds supporst for it:  Run
    `virsh cpu models` to see them, or `qemu-kvm -cpu help`.)

    * Pros:

      - You can select a tailored CPU model and a set of features that
        works in your environment.
      - Back and forth live migration works.

    * Cons: 

      This is not a “con” per se, but it requires a good understanding
      of your hardware pool, and some planning.  By “good
      understanding”, I mean the below:

      - You should know the different types of hardware you have, and
        make sure you're not mixing Intel and AMD.

      - You have to calculate your "baseline" CPU model and do a
        back-n-forth live migration test to make sure it works in your
        environment.  An example:

        https://kashyapc.fedorapeople.org/Calculate-CPU-hypervisor-baseline.html

(2) "host-passthrough" — as the name implies, it passes the host CPU
    model features, model, stepping, exactly to the guest.  It
    translates to `-cpu host` on the QEMU command-line.

    * Pros:

      - Maximum set of features
      - Provides best possible performance for your guest
      - Ideal option when live migration is not a strict requirement.

   * Cons:

     - QEMU / libvirt cannot guarantee that a stable CPU is exposed to
       the guest across hosts—this is "implicit" when choosing
       `host-passthrough`

     - Live-migration will not go with mixed host CPUs (i.e. from
       SandyBridge to IceLake, to pick a random example)

     - Requirements for live migration to work: on both source and
       target hosts, you *must* have identical physical CPUs, kernel,
       and CPU microcode versions (look for the package
       "microcode_ctl")

(3) "host-model" — a libvirt abstraction;   It provides maximum possible
    CPU features from the host; it auto-adds critical guest CPU flags —
    assuming, you have updated CPU microcode, kernel, QEMU, and libvirt.

    It provides live migration compatibility, with a caveat:
    bi-directional live migration is only "partially possible" —
    meaning, you can't migrate back to any random host, it'll need to
    have the same-or-more features that the original.
Comment 8 Daniel Berrange 2025-01-13 11:55:55 UTC
> Luis Chamberlain has suggested that
> https://lore.kernel.org/lkml/20250103065631.26459-1-jgross@suse.com/T/#u
> might fix this. It looks promising, and I will test it when I get a chance.

Worth a try, but I'm sceptical that will make a difference. That patch is adding a test for the "PSE" CPUID feature flag, but you say you already reproduced it on QEMU with the "Nehalem" CPU, and that model includes the PSE flag already (https://gitlab.com/qemu-project/qemu/-/blob/master/target/i386/cpu.c#L2869)
Comment 9 Adam Williamson 2025-01-13 17:37:20 UTC
Kashyap: as I replied on RHBZ, yes, I'm aware of those issues. All the systems in the "cluster" use Intel CPUs, and they don't have any problems on kernels without the commit in question. The commit in question clearly introduces a problem of some kind.

Another note I didn't add here yet: since I changed the openQA config to use `-cpu host` to try and work around this, the bug is still happening on those hosts. In my local testing (on an i7-1250U) I could not reproduce the bug with `-cpu host` - only `-cpu Nehalem` - but on the openQA runners, it reproduces even with `-cpu host`. Those systems use a mix of CPUs: xeon gold 5218, xeon gold 6130, and xeon e5-2683v4, according to my notes. Those CPUs are older than the one I was testing with. The i7-1250U that I tested with is an Alder Lake part from 2022. The Xeon Gold 5218 is Cascade Lake, from 2019. Xeon Gold 6130 is Skylake, from 2017. Xeon e5-2683v4 is Broadwell, from 2016 (can you tell I get the build cluster's hand-me-downs? :>) So the CPU association here might be as simple as "this commit is broken on older CPUs", the break point being somewhere between Cascade Lake and Alder Lake.

Daniel, I did indeed test Luis' patch and it didn't help :|

Mike Rapaport has asked me to test this patch:

diff --git a/mm/execmem.c b/mm/execmem.c
index be6b234c032e..0090a6f422aa 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -266,6 +266,7 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
 	unsigned long vm_flags = VM_ALLOW_HUGE_VMAP;
 	struct execmem_area *area;
 	unsigned long start, end;
+	unsigned int page_shift;
 	struct vm_struct *vm;
 	size_t alloc_size;
 	int err = -ENOMEM;
@@ -296,8 +297,9 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
 	if (err)
 		goto err_free_mem;
 
+	page_shift = get_vm_area_page_order(vm) + PAGE_SHIFT;
 	err = vmap_pages_range_noflush(start, end, range->pgprot, vm->pages,
-				       PMD_SHIFT);
+				       page_shift);
 	if (err)
 		goto err_free_mem;
 
--
2.45.2

I didn't manage to get to it over the weekend as it didn't apply cleanly to the Fedora Rawhide kernel source at the time and I didn't have time to rebase it, but I'll do it today.
Comment 10 Adam Williamson 2025-01-13 17:44:27 UTC
"So the CPU association here might be as simple as "this commit is broken on older CPUs", the break point being somewhere between Cascade Lake and Alder Lake."

Hmm, well, thinking about it, I can only say for sure "somewhere between Broadwell and Alder Lake", since I didn't check yet that the bug still happens on *all the openQA worker hosts* with `-cpu host`, only that it's definitely still happening on some of them.
Comment 11 Adam Williamson 2025-01-14 17:26:33 UTC
Update: in initial testing, the patch proposed by Mike (the one from comment #9) seems to be working. I've run 50 installs with it and none have hung. As a check I also ran 10 installs with the unpatched kernel, and four of them hung.
Comment 12 Adam Williamson 2025-01-14 18:32:00 UTC
It looks like the whole execmem ROX thing is being disabled for final - assuming https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=a9bbe341333109465605e8733bab0b573cddcc8c is pulled - so this should stop being an issue whenever that lands, until it's re-enabled, I guess?

Note You need to log in before you can comment on or make changes to this bug.