Bug 216387

Summary: Boot Loop using 5.19 Kernel and Syslinux on x86-64 UEFI Platform
Product: Platform Specific/Hardware Reporter: watnuss
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: RESOLVED CODE_FIX    
Severity: blocking CC: artlav, bp, brijesh.singh, michael.roth, ruinairas1992, tglx, thomas.lendacky
Priority: P1    
Hardware: AMD   
OS: Linux   
Kernel Version: 5.19.X Subsystem:
Regression: No Bisected commit-id:

Description watnuss 2022-08-20 22:54:13 UTC
Hi,

starting with the introduction of kernel version 5.19 I am getting an infinite boot loop.
My CPU: AMD Ryzen 7 PRO 3700U w/ Radeon Vega Mobile Gfx
Running Archlinux with syslinux bootloader on an UEFI system. The relevant bit of syslinux config:
$ grep -A4 "LABEL arch$" /boot/EFI/syslinux/syslinux.cfg
LABEL arch
    MENU LABEL Arch Linux
    LINUX ../../vmlinuz-linux
    APPEND cryptdevice=/dev/nvme0n1p5:encStorage root=/dev/encStorage/rootvol rw
    INITRD ../../amd-ucode.img,../../initramfs-linux.img

I had help in the archlinux forum in bisecting the commit which introduced the issue:
https://bbs.archlinux.org/viewtopic.php?pid=2052876#p2052876

git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [4b0986a3613c92f4ec1bdc7f60ec66fea135991f] Linux 5.18
git bisect good 4b0986a3613c92f4ec1bdc7f60ec66fea135991f
# status: waiting for bad commit, 1 good commit known
# bad: [3d7cb6b04c3f3115719235cc6866b10326de34cd] Linux 5.19
git bisect bad 3d7cb6b04c3f3115719235cc6866b10326de34cd
# bad: [c011dd537ffe47462051930413fed07dbdc80313] Merge tag 'arm-soc-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad c011dd537ffe47462051930413fed07dbdc80313
# bad: [7e062cda7d90543ac8c7700fc7c5527d0c0f22ad] Merge tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad 7e062cda7d90543ac8c7700fc7c5527d0c0f22ad
# bad: [3842007b1a33589d57f67eac479b132b77767514] Merge tag 'zonefs-5.19-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
git bisect bad 3842007b1a33589d57f67eac479b132b77767514
# bad: [22922deae13fc8d3769790c2eb388e9afce9771d] Merge tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 22922deae13fc8d3769790c2eb388e9afce9771d
# good: [03e1ccd45fa70904e43ddceda140854d22b7e871] Merge tag 'x86-irq-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 03e1ccd45fa70904e43ddceda140854d22b7e871
# bad: [d61306047533eb6f63a7bd51dfa7f868503bf0ba] Merge tag 'for-linus-5.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
git bisect bad d61306047533eb6f63a7bd51dfa7f868503bf0ba
# bad: [1de564b8c1a6f9f8bf3a106daa0be9f2cba7d045] Merge tag 'x86_build_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 1de564b8c1a6f9f8bf3a106daa0be9f2cba7d045
# bad: [eb39e37d5cebdf0f63ee2a315fc23b035d81b4b0] Merge tag 'x86_sev_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad eb39e37d5cebdf0f63ee2a315fc23b035d81b4b0
# bad: [ba37a1438aeb540cc48722d629f4b2e7e4398466] x86/sev: Add a sev= cmdline option
git bisect bad ba37a1438aeb540cc48722d629f4b2e7e4398466
# good: [9704c07bf9f7682a83aec4e66f2d9154dbd8577f] x86/kernel: Validate ROM memory before accessing when SEV-SNP is active
git bisect good 9704c07bf9f7682a83aec4e66f2d9154dbd8577f
# good: [b66370db9a90b3fa4c4a1a732af3e7e38d6d4c7c] KVM: x86: Move lookup of indexed CPUID leafs to helper
git bisect good b66370db9a90b3fa4c4a1a732af3e7e38d6d4c7c
# good: [5f211f4fc49622473667e6983bb57beab755f6f6] x86/compressed: Use firmware-validated CPUID leaves for SEV-SNP guests
git bisect good 5f211f4fc49622473667e6983bb57beab755f6f6
# good: [76f61e1e89b32f3e5d639f1b57413a919066da06] x86/compressed/64: Add identity mapping for Confidential Computing blob
git bisect good 76f61e1e89b32f3e5d639f1b57413a919066da06
# bad: [30612045e69d088f1effd748048ebb0e282984ec] x86/sev: Use firmware-validated CPUID for SEV-SNP guests
git bisect bad 30612045e69d088f1effd748048ebb0e282984ec
# bad: [b190a043c49af4587f5e157053f909192820522a] x86/sev: Add SEV-SNP feature detection/setup
git bisect bad b190a043c49af4587f5e157053f909192820522a
# first bad commit: [b190a043c49af4587f5e157053f909192820522a] x86/sev: Add SEV-SNP feature detection/setup
Comment 1 watnuss 2022-08-20 22:57:25 UTC
Running the custom kernel build from post https://bbs.archlinux.org/viewtopic.php?pid=2052866#p2052866 allows me to boot into kernel version 5.19.2.

If I can give more information or help in testing, please let me know.
Comment 2 Borislav Petkov 2022-08-21 09:39:34 UTC
Already being debugged upstream:

https://lore.kernel.org/all/Yvuo2rtootBSlpfQ@jpiotrowski-Surface-Book-3/
Comment 3 Michael Roth 2022-08-22 16:28:40 UTC
This issue seems to be very system-dependent and I'm unable to reproduce on my hardware, but please give this potential fix a try and let me know if it resolves the issue:

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 52f989f6acc2..dd6cd0d7c740 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -392,6 +392,13 @@ bool snp_init(struct boot_params *bp)
        if (!bp)
                return false;

+       /*
+        * bp->cc_blob_address should only be set by boot/compressed kernel.
+        * Initialize it to 0 to ensure that uninitialized values from
+        * buggy bootloaders aren't propagated.
+        */
+       bp->cc_blob_address = 0;
+
        cc_info = find_cc_blob(bp);
        if (!cc_info)
                return false;
Comment 4 artlav 2022-08-22 16:57:36 UTC
The issue can be consistently reproduced in QEMU with OVMF EFI.
Place vmlinuz file into an image with a bootloader of your choice, run with "debug ignore_loglevel earlyprintk=efi,keep console=ttyS0".
Bad ones will boot loop without printing anything, good ones would print a kernel panic.

The arch forum thread linked above also contains a zip file with a ready-to-run QEMU image and scripts.

qemu-system-x86_64 \
-nodefaults -m 512 -machine q35 -enable-kvm \
-drive if=pflash,format=raw,readonly=on,file=ovmf_code.fd \
-drive if=pflash,format=raw,file=ovmf_vars-1024x768.fd \
-vga virtio \
-device nvme,drive=nvme0,serial=deadbeaf1,max_ioqpairs=8 -drive file=sdb.qcow2,if=none,id=nvme0 \
-serial stdio
Comment 5 artlav 2022-08-22 17:10:58 UTC
(In reply to Michael Roth from comment #3)
> This issue seems to be very system-dependent and I'm unable to reproduce on
> my hardware, but please give this potential fix a try and let me know if it
> resolves the issue:
> 
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 52f989f6acc2..dd6cd0d7c740 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -392,6 +392,13 @@ bool snp_init(struct boot_params *bp)
>         if (!bp)
>                 return false;
> 
> +       /*
> +        * bp->cc_blob_address should only be set by boot/compressed kernel.
> +        * Initialize it to 0 to ensure that uninitialized values from
> +        * buggy bootloaders aren't propagated.
> +        */
> +       bp->cc_blob_address = 0;
> +
>         cc_info = find_cc_blob(bp);
>         if (!cc_info)
>                 return false;

Tried that out, it appears to boot fine.
Comment 6 watnuss 2022-08-24 11:57:08 UTC
I can also confirm that the patch from comment #3 works on my hardware.
Comment 7 Matthew 2022-08-27 18:10:00 UTC
I'm able to confirm this is an issue with the ChimeraOS project (built using Arch) as well. Syslinux is used to boot the OS and starting with 5.19 the UEFI bios constantly bootloop. I suspect legacy bios may boot fine, but the devices I have are UEFI only. The 5.18.16 kernel boots fine.
Comment 8 watnuss 2022-09-02 07:50:04 UTC
The issue is resolved in 5.19.6.