Bug 195633

Summary: Linux 4.11 won't boot
Product: EFI Reporter: Paweł (pawel.pc44)
Component: BootAssignee: EFI Virtual User (efi)
Status: RESOLVED CODE_FIX    
Severity: blocking CC: hidave.darkstar, lessonz.legion, matt, pawel.pc44, spinnau, wolkenschieber
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.11 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: hardware report
kmsg_4.11.3_oops
kmsg_4.11.3_prinfo_memcpy
kmsg_4.10.9
affected system 1 (EFI boot)
affected system 2 (EFI boot)
affected system 3 (EFI boot)
affected system 4 (EFI boot)
affected system 5 (EFI boot)
non-affected system 1 (EFI boot)
non-affected system 2 (EFI boot)

Description Paweł 2017-05-02 09:10:59 UTC
Created attachment 256163 [details]
hardware report

Hi,

since Linux 11-rc1 to final my PC doesn't boot anymore. I made git bisect and it seems the bellow commit is causing the problem:

7b0a911478c74ca02581d496f732c10e811e894f

efi/x86: Move the EFI BGRT init code to early init code

    Before invoking the arch specific handler, efi_mem_reserve() reserves
    the given memory region through memblock.

    efi_bgrt_init() will call efi_mem_reserve() after mm_init(), at which
    time memblock is dead and should not be used anymore.

    The EFI BGRT code depends on ACPI initialization to get the BGRT ACPI
    table, so move parsing of the BGRT table to ACPI early boot code to
    ensure that efi_mem_reserve() in EFI BGRT code still use memblock safely.

any help will be appreciated. Thanks!
Comment 1 MNX 2017-05-24 19:14:52 UTC
Same freeze here.

Here's an earlyprintk=efi output:
http://abload.de/img/earlyprintk1txjlw.png
http://abload.de/img/earlyprintk233jbp.png
Comment 2 Dave Young 2017-05-27 05:22:20 UTC
Can you print the bgrt table like below and save the kmesg? I would better to have a full kernel log, for the working kernel and failed kernel. 

        pr_info("%s acpi_table_bgrt.version %hu\n", __func__, bgrt->version);
        pr_info("%s acpi_table_bgrt.status %hhu\n", __func__, bgrt->status);
        pr_info("%s acpi_table_bgrt.image_type %hhu\n", __func__, bgrt->image_type);
        pr_info("%s acpi_table_bgrt.image_address %llx\n", __func__, bgrt->image_address);
        print_hex_dump(KERN_INFO, "efi_bgrt_init acpi_table_bgrt", DUMP_PREFIX_OFFSET, 16, 1, bgrt, sizeof(*bgrt), false);
Comment 3 Keith Baker 2017-05-31 19:00:53 UTC
I'm also facing boot issue since 4.11 with bios booting. The computer freezes at `Loading initial ramdisk`. With earlyprintk a panic excerpt can be seen.

By setting the kernel option acpi=off, the kernel boots fine.

Within https://bbs.archlinux.org/viewtopic.php?pid=1714915, we identified the changes commit `7b0a911478c74ca02581d496f732c10e811e894f` as possible cause. 

Furthermore by applying the patch from https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=7425826f4f7ac60f2538b06a7f0a5d1006405159, booting 4.11.3 is possible again.
Comment 4 Matt Fleming 2017-06-02 08:41:42 UTC
(In reply to Keith Baker from comment #3)
> 
> Furthermore by applying the patch from
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/
> ?id=7425826f4f7ac60f2538b06a7f0a5d1006405159, booting 4.11.3 is possible
> again.

Thanks for this investigative work. A pull request containing this fix was sent to Linus today, so hopefully, it'll be merged soon and then applied to the stable trees.
Comment 5 spinnau 2017-06-02 19:08:21 UTC
Created attachment 256839 [details]
kmsg_4.11.3_oops
Comment 6 spinnau 2017-06-02 19:11:30 UTC
Created attachment 256841 [details]
kmsg_4.11.3_prinfo_memcpy
Comment 7 spinnau 2017-06-02 19:11:56 UTC
Created attachment 256843 [details]
kmsg_4.10.9
Comment 8 spinnau 2017-06-02 19:55:51 UTC
There might be different problems for EFI and BIOS boot.

With Linux 4.11.3 and EFI the boot hangs with the error shown in Screenshot https://bugzilla.kernel.org/attachment.cgi?id=256839. With acpi=off it will boot without any problems.

I have added the pr_info statements to print the bgrt table as advised by Dave Young. The output of this is:

efi_bgrt: efi_bgrt_init acpi_table_bgrt.version 1
efi_bgrt: efi_bgrt_init acpi_table_bgrt.status 0
efi_bgrt: efi_bgrt_init acpi_table_bgrt.image_type 0
efi_bgrt: efi_bgrt_init acpi_table_bgrt.image_address a62b01800000001
efi_bgrt_init acpi_table_bgrt00000000: 42 47 52 54 3c 00 00 00 00 5f 41 4c 41 53 4b 41
efi_bgrt_init acpi_table_bgrt00000010: 41 20 4d 20 49 00 00 00 09 20 07 01 41 4d 49 20
efi_bgrt_init acpi_table_bgrt00000020: 13 00 01 00 01 00 00 00 01 00 00 00 18 b0 62 0a
efi_bgrt_init acpi_table_bgrt00000030: 00 00 00 00 00 00 00 00


The problem might be caused by the memcpy call in line  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/arch/x86/platform/efi/efi-bgrt.c?h=v4.11.2#n71. If I put the pr_info calls after that line, then the bgrt table will not be printed to the kmsg before the oops occurs and the system halts.


If this memcpy line will be commented out, then my system boots fine. Due to the missing/wrong "bmp_header.id" it will then goto out in the next if condition and "Ignoring BGRT: Incorrect BMP magic number..." will be added to the kmsg. Please find the full kmsg for this case here https://bugzilla.kernel.org/attachment.cgi?id=256841


For comparison I have also added the kmsg for the last working kernel 4.10.9 here https://bugzilla.kernel.org/attachment.cgi?id=256843.
Comment 9 spinnau 2017-06-05 10:18:09 UTC
(In reply to spinnau from comment #8)
> For comparison I have also added the kmsg for the last working kernel 4.10.9
> here https://bugzilla.kernel.org/attachment.cgi?id=256843.


With the 4.10 kernel on my system BGRT was ignored, as memremap of the bmp image header failed. But despite this message, the system was booting fine.

[    0.112643] ioremap: invalid physical address a62b01800000001
[    ........]
[    0.112738] efi_bgrt: Ignoring BGRT: failed to map image header memory


If I change the "early_memremap" of the image header in kernel 4.11 back to the "memremap" call from kernel 4.10, then the old behavior can be restored (memremap fails, BGRT will be ignored) and booting works again. But this clearly isn't the right solution, as it leaves some errors.


The patch for non-EFI boot mentioned in comments #3 and #4 also doesn't help in my case.
Comment 10 spinnau 2017-06-06 13:06:34 UTC
On Arch Linux forum https://bbs.archlinux.org/viewtopic.php?id=226520 we have collected ACPI Data Table [BGRT] dumps of affected systems that don't boot with linux-4.11 and also of non-affected systems. For convenience I will attach the ACPI dumps to this bug report.

From comparison it can be seen, that the affected and non-affected systems differ in length of the raw table data.

affected: Raw Table Data: Length 60 (0x3C)
non-affected: Raw Table Data: Length 56 (0x38)
Comment 11 spinnau 2017-06-06 13:08:44 UTC
Created attachment 256875 [details]
affected system 1 (EFI boot)
Comment 12 spinnau 2017-06-06 13:09:46 UTC
Created attachment 256877 [details]
affected system 2 (EFI boot)
Comment 13 spinnau 2017-06-06 13:12:03 UTC
Created attachment 256881 [details]
affected system 3 (EFI boot)
Comment 14 spinnau 2017-06-06 13:13:54 UTC
Created attachment 256883 [details]
affected system 4 (EFI boot)
Comment 15 spinnau 2017-06-06 13:14:31 UTC
Created attachment 256887 [details]
affected system 5 (EFI boot)
Comment 16 spinnau 2017-06-06 13:17:16 UTC
Created attachment 256889 [details]
non-affected system 1 (EFI boot)
Comment 17 spinnau 2017-06-06 13:18:17 UTC
Created attachment 256891 [details]
non-affected system 2 (EFI boot)
Comment 19 Dave Young 2017-06-08 05:42:58 UTC
Here is a patch to fix this issue:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1416209.html
Comment 20 Dave Young 2017-06-08 05:43:38 UTC
Here is a patch to fix this issue:

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1416209.html
Comment 21 spinnau 2017-06-09 19:35:07 UTC
Many thanks to Dave Young for the patch. I can confirm that the patch https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git/commit/?id=0a97e704d93fe4facf2bffe3c78095b9d441df42 modified and applied to linux-4.11.3 solves the problem on my system with EFI boot.

Due to the check the invalid image address is detected and the system boots fine with this kmsg as expected:

# dmesg |grep -i bgrt
[    0.000000] ACPI: BGRT 0x00000000CA0F4C80 00003C (v00 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] efi_bgrt: Ignoring BGRT: invalid image address
Comment 22 Matt Fleming 2017-06-15 12:48:27 UTC
The patch from Dave has been merged by Linus in v4.12-rc5 and it has also been applied to the v4.11 stable tree.

Closing, thanks everyone!