Bug 203463 - Boot crash regression from Validate trampoline placement against E820
Summary: Boot crash regression from Validate trampoline placement against E820
Status: RESOLVED CODE_FIX
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-30 23:47 UTC by mail+kernel-bugzilla
Modified: 2019-10-06 20:12 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.18
Tree: Mainline
Regression: Yes


Attachments
Wokring dmesg on 5.1.0-rc7 with problematic commit reverted (42.89 KB, text/plain)
2019-05-06 14:29 UTC, mail+kernel-bugzilla
Details
Patch poposal (1.19 KB, patch)
2019-08-12 15:14 UTC, Kirill A. Shutemov
Details | Diff

Description mail+kernel-bugzilla 2019-04-30 23:47:00 UTC
I have a Samsung 500C "Alex" Chromebook.

Between v4.17 and v4.18 a regression was introduced in the kernel that immediately crashes it, leading to a reboot loop before any output is shown.

I have bisected it.

The the first bad commit is:

      x86/boot/compressed/64: Validate trampoline placement against E820

      There were two report of boot failure cased by trampoline placed into
      a reserved memory region. It can happen on machines that don't report
      EBDA correctly.

      Fix the problem by re-validating the found address against the E820 table.
      If the address is in a reserved area, find the next usable region below the
      initial address.

      Fixes: 3548e131ec6a ("x86/boot/compressed/64: Find a place for 32-bit trampoline")
      Reported-by: Dmitry Malkin <d.malkin@real-time-systems.com>
      Reported-by: youling 257 <youling257@gmail.com>
      Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20180801133225.38121-1-kirill.shutemov@linux.intel.com

The issue is also still present on v5.1-rc7.

This machine, like other Chromebooks, uses EFI and some kernel signing stuff.
kexec'ing from an older kernel works, but direct boot is totally broken.

I am available to try out patches that fix it.
Comment 1 mail+kernel-bugzilla 2019-04-30 23:49:42 UTC
Possibly related: bug 202351
Comment 2 mail+kernel-bugzilla 2019-05-01 00:00:55 UTC
Note that reverting the commit in question

    1b3a62643660020cdc68e6139a010c06e8fc96c7
    x86/boot/compressed/64: Validate trampoline placement against E820

and fixing the merge conflicts created by deletion makes v5.1-rc7 boot successfully.
Comment 3 Kirill A. Shutemov 2019-05-06 07:32:35 UTC
Sorry for this.

Could you share dmesg for a successful boot of the machine?
Comment 4 mail+kernel-bugzilla 2019-05-06 14:29:47 UTC
Created attachment 282643 [details]
Wokring dmesg on 5.1.0-rc7 with problematic commit reverted

I have attached the working dmesg.

Thanks a lot for looking at this!
Comment 5 mail+kernel-bugzilla 2019-06-17 14:42:06 UTC
Just a short ping on this, as I still have the device around currently and would like to help.
Comment 6 Kirill A. Shutemov 2019-08-06 12:12:23 UTC
Could you check again the uptodate kernel?
Comment 7 mail+kernel-bugzilla 2019-08-10 01:37:03 UTC
Hey Kirill,

I've tried with v5.3-rc3, but the problem persists as before.

Reverting the commit mentioned above continues to fix it.
Comment 8 Kirill A. Shutemov 2019-08-12 15:14:23 UTC
Created attachment 284339 [details]
Patch poposal

Could you give this a try?
Comment 9 mail+kernel-bugzilla 2019-08-13 02:03:33 UTC
This makes it work.

(I tried on v5.3-rc3.)

Awesome!
Comment 10 Kirill A. Shutemov 2019-08-13 15:14:43 UTC
Posted upstream:

https://lkml.kernel.org/r/20190813131654.24378-1-kirill.shutemov@linux.intel.com
Comment 11 mail+kernel-bugzilla 2019-09-15 23:53:27 UTC
This seems to be released in 5.3.

Links (using Github for the convenience that it shows all tags in which the commit is present):

* https://github.com/torvalds/linux/commit/0a46fff2f9108c2c44218380a43a736cf4612541
* https://github.com/torvalds/linux/commit/c96e8483cb2da6695c8b8d0896fe7ae272a07b54
* the merge that pulled it in: https://github.com/torvalds/linux/commit/146c3d3220e039b5d61bf810e0b42218eb020f39

There also seems to be a backport to Linux 5.2.14:

https://lwn.net/Articles/798885/

I'll try out the released kernels once back at that machine (earliest in 2 weeks).
Comment 12 mail+kernel-bugzilla 2019-10-06 20:12:16 UTC
There's also a backport to v4.19.72.

Closing; thanks again Kirill for fixing this!

Note You need to log in before you can comment on or make changes to this bug.