Bug 206343 - Hanging on boot: error parsing RSDP address - Intel(R) Core(TM) i9-9820X
Summary: Hanging on boot: error parsing RSDP address - Intel(R) Core(TM) i9-9820X
Status: RESOLVED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: BIOS (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: acpi_bios
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-01-29 08:03 UTC by Steven Clarkson
Modified: 2020-04-21 03:31 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg output from workaround (111.44 KB, text/plain)
2020-01-29 08:03 UTC, Steven Clarkson
Details

Description Steven Clarkson 2020-01-29 08:03:41 UTC
Created attachment 287015 [details]
dmesg output from workaround

After upgrading my kernel to 5.3, my machine hangs at boot. GRUB outputs the that it is booting the kernel, then the machine hangs indefinitely.

The motherboard is an ASUS WS X299 SAGE, with firmware version 1201. Anecdotally, I believe this affects most recent ASUS motherboards, without the most recent firmware version.

Last known working kernel was 5.2. I was able to bisect the issue to commit

8e44c7840 Revert "x86/boot: Disable RSDP parsing temporarily"

I was able to boot my system by applying the patch below to the most recent kernel, 5.5. The output of dmesg is attached.

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 9652d5c2afda..5df966201abd 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -373,7 +373,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
         * so that early debugging output from the RSDP parsing code can be
         * collected.
         */
-       boot_params->acpi_rsdp_addr = get_rsdp_addr();
+       // boot_params->acpi_rsdp_addr = get_rsdp_addr();
 
        debug_putstr("early console in extract_kernel\n");
Comment 1 Steven Clarkson 2020-01-30 06:00:26 UTC
Turns out this causes the kernel to hang in the while loop parsing the SRAT table in count_immovable_mem_regions. After dumping the SRAT table, it looks like there's 320 bytes of zeros in the middle of it.

Sure enough, dmesg complains

[    0.007413] ACPI: [SRAT:0x00] Invalid zero length
[    0.007415] ACPI: [SRAT:0x01] Invalid zero length

Proposed patch below.


diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index 25019d42ae93..7369de333eda 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -394,6 +394,12 @@ int count_immovable_mem_regions(void)
 
        while (table + sizeof(struct acpi_subtable_header) < table_end) {
                sub_table = (struct acpi_subtable_header *)table;
+
+               if (!sub_table->length) {
+                       debug_putstr("Invalid zero length SRAT subtable.\n");
+                       break;
+               }
+
                if (sub_table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) {
                        struct acpi_srat_mem_affinity *ma;
Comment 2 Borislav Petkov 2020-01-30 08:06:43 UTC
(In reply to Steven Clarkson from comment #1)
> Turns out this causes the kernel to hang in the while loop parsing the SRAT
> table in count_immovable_mem_regions. After dumping the SRAT table, it looks
> like there's 320 bytes of zeros in the middle of it.

Of course. Qwalitee BIOS. ;-\

> Sure enough, dmesg complains
> 
> [    0.007413] ACPI: [SRAT:0x00] Invalid zero length
> [    0.007415] ACPI: [SRAT:0x01] Invalid zero length
> 
> Proposed patch below.
> 
> 
> diff --git a/arch/x86/boot/compressed/acpi.c
> b/arch/x86/boot/compressed/acpi.c
> index 25019d42ae93..7369de333eda 100644
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -394,6 +394,12 @@ int count_immovable_mem_regions(void)
>  
>         while (table + sizeof(struct acpi_subtable_header) < table_end) {
>                 sub_table = (struct acpi_subtable_header *)table;
> +
> +               if (!sub_table->length) {
> +                       debug_putstr("Invalid zero length SRAT subtable.\n");
> +                       break;
> +               }
> +
>                 if (sub_table->type == ACPI_SRAT_TYPE_MEMORY_AFFINITY) {
>                         struct acpi_srat_mem_affinity *ma;

Yah, makes a lot of sense to me. Especially if this has been already encountered with other BIOSes. Sounds like the qwalitee work has been spread around.

Please submit a proper patch to LKML documenting which BIOS version it is and CC me. If you need help with creating the patch, just ask.

Thx.
Comment 3 Steven Clarkson 2020-04-21 03:31:21 UTC
Fixed in 2b73ea379624 ("x86/boot: Handle malformed SRAT tables during early ACPI parsing")

Note You need to log in before you can comment on or make changes to this bug.