Bug 219592 - [linux-next branch][tag: next-20241210][regression] kdump/kexec inside KVM/QEMU based VM functional broken likely due to commit 5a82223e0743 x86/kexec: Mark relocate_kernel page as ROX instead of RWX
Summary: [linux-next branch][tag: next-20241210][regression] kdump/kexec inside KVM/QE...
Status: RESOLVED CODE_FIX
Alias: None
Product: Linux
Classification: Unclassified
Component: Kernel (show other bugs)
Hardware: Intel Linux
: P3 high
Assignee: Virtual assignee for kernel bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-12-12 02:39 UTC by hongyuni
Modified: 2024-12-13 07:39 UTC (History)
1 user (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel kconfig reference (251.93 KB, text/plain)
2024-12-13 06:36 UTC, hongyuni
Details

Description hongyuni 2024-12-12 02:39:17 UTC
Hi,

based on linux-next repo, release tag next-20241210 (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git), we've hit a kdump/kexec functional regression when using above kernel as guest kernel inside a KVM/QEMU based VM.

steps to reproduce issue:
- boot target kernel (kdump/kexec enabled) guest VM image based on KVM/QEMU on any IA-64 physical machine, login to VM after booted
- inside VM, try to trigger crash for kdump by following steps:
-- # kdumpctl restart
-- # rm -rf /var/crash/*
-- # sync
-- # sleep 1
-- # echo c > /proc/sysrq-trigger
- after successfully crash triggered, VM rebooted by hypervisor for a recovery
- login to the rebooted VM again
-- # check if any new crash log captured under /var/crash/ for above kdump test

in issue kernel case: no crash log captured, which is likely due to kdump/kexec functional broken
in no-issue kernel case: crash log can be captured as expected

Issue was not seen in previous week's release tag next-20241205

so it's suspected as a regression in IA-64 kdump/kexec component.
Comment 1 hongyuni 2024-12-12 02:40:24 UTC
Further git bisect shows following suspected bad commit info:

5a82223e0743fb36bcb99657772513739d1a9936 is the first bad commit
commit 5a82223e0743fb36bcb99657772513739d1a9936
Author: David Woodhouse <dwmw@amazon.co.uk>
Date: Thu Dec 5 15:05:19 2024 +0000

x86/kexec: Mark relocate_kernel page as ROX instead of RWX

All writes to the page now happen before it gets marked as executable
(or after it's already switched to the identmap page tables where it's
OK to be RWX).

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241205153343.3275139-14-dwmw2@infradead.org

arch/x86/kernel/machine_kexec_64.c | 3 ++-
1 file changed, 2 insertions, 1 deletion
[root@spr-tdx-001 linux-next]# git bisect log
git bisect start

status: waiting for both good and bad commits
bad: [1b2ab8149928c1cea2d7eca30cd35bb7fe014053] Add linux-next specific files for 20241210
git bisect bad 1b2ab8149928c1cea2d7eca30cd35bb7fe014053
status: waiting for good commit(s), bad commit known
good: [cd9ce8217345bd13035a0d3edaaecec4244d0ddd] x86/tdx: Disable unnecessary virtualization exceptions
git bisect good cd9ce8217345bd13035a0d3edaaecec4244d0ddd
good: [2839933a40c480f8050d036ea9ed568ef58fee6e] Merge branch 'fs-next' of linux-next
git bisect good 2839933a40c480f8050d036ea9ed568ef58fee6e
good: [cc115b05717b1839ee0763eaf36e6be038589432] Merge branch 'for-linux-next' of https://gitlab.freedesktop.org/drm/i915/kernel
git bisect good cc115b05717b1839ee0763eaf36e6be038589432
bad: [1279aa1f74afe7c4d127124865359b079d9d09c1] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git bisect bad 1279aa1f74afe7c4d127124865359b079d9d09c1
good: [d5554a4060c084959b8a0ffd7e73bf4a1405f509] Merge branch 'for-next' of git://git.kernel.dk/linux-block.git
git bisect good d5554a4060c084959b8a0ffd7e73bf4a1405f509
bad: [d7233bd5a74937b8e69a1c71400e1e98a13dc81d] Merge branch into tip/master: 'x86/boot'
git bisect bad d7233bd5a74937b8e69a1c71400e1e98a13dc81d
good: [f96b1d15e9b80c0ff5ffb4a9296b5c10aae292d0] Merge branch into tip/master: 'perf/core'
git bisect good f96b1d15e9b80c0ff5ffb4a9296b5c10aae292d0
good: [2a77e4be12cb58bbf774e7c717c8bb80e128b7a4] sched/fair: Untangle NEXT_BUDDY and pick_next_task()
git bisect good 2a77e4be12cb58bbf774e7c717c8bb80e128b7a4
good: [9e5683e2d0b5584c51993908c5d0afa78e613492] x86/kexec: Only swap pages for ::preserve_context mode
git bisect good 9e5683e2d0b5584c51993908c5d0afa78e613492
good: [b3adabae8a96fee62184f4236bf60313b35244e9] x86/kexec: Drop page_list argument from relocate_kernel()
git bisect good b3adabae8a96fee62184f4236bf60313b35244e9
bad: [5a82223e0743fb36bcb99657772513739d1a9936] x86/kexec: Mark relocate_kernel page as ROX instead of RWX
git bisect bad 5a82223e0743fb36bcb99657772513739d1a9936
good: [93e489ad7a4694bb2fe8110f5012f85bd3eee65a] x86/kexec: Clean up register usage in relocate_kernel()
git bisect good 93e489ad7a4694bb2fe8110f5012f85bd3eee65a
first bad commit: [5a82223e0743fb36bcb99657772513739d1a9936] x86/kexec: Mark relocate_kernel page as ROX instead of RWX
Comment 2 hongyuni 2024-12-12 03:03:20 UTC
update: revert first bad commit on top of https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git could bypass such functional regression.
Comment 3 David Woodhouse 2024-12-12 20:45:05 UTC
Thanks for the report. Please could you test the patch at
https://lore.kernel.org/kexec/9c68688625f409104b16164da30aa6d3eb494e5d.camel@infradead.org/
Comment 4 hongyuni 2024-12-13 06:36:29 UTC
Created attachment 307355 [details]
kernel kconfig reference
Comment 5 hongyuni 2024-12-13 06:38:17 UTC
(In reply to David Woodhouse from comment #3)
> Thanks for the report. Please could you test the patch at
> https://lore.kernel.org/kexec/9c68688625f409104b16164da30aa6d3eb494e5d.
> camel@infradead.org/

it works, this original issue is fixed with this patch.

furthermore, my local kconfig reference as attachment https://bugzilla.kernel.org/attachment.cgi?id=307355
Comment 7 hongyuni 2024-12-13 07:39:56 UTC
(In reply to David Woodhouse from comment #6)
> Better fix at
> https://lore.kernel.org/kexec/ed7dd45f89e8f286478791137447a21d53735dbd.
> camel@infradead.org/

yes, my bad, something went wrong in my local email app, I missed this later version of fix.

the issue is also fixed by this new patch.

Note You need to log in before you can comment on or make changes to this bug.