Many machines are failing to boot with UEFI using the 3.12.7 kernel. This was not an issue with kernel 3.12.6. This issue is being discussed in the following bug report: https://bugs.archlinux.org/task/33745?project=1&order=lastedit&sort=desc. We have now come to a consensus that this issue is a bug introduced in the 3.12.7 kernel that did not previously exist in the 3.12.6 kernel.
This is unlikely to be a regression in the UEFI code because no changes occurred between 3.12.6 and 3.12.7.
Please try and bisect the issue.
I bisected the 3.12.6 to 3.12.7 code and this is the log:
git bisect start
# bad: [4301b7a8fe14a787fbf0bb9cad16b623f45956f6] Linux 3.12.7
git bisect bad 4301b7a8fe14a787fbf0bb9cad16b623f45956f6
# good: [d0266db287d492abe63e19859ad99dd232bc0e89] Linux 3.12.6
git bisect good d0266db287d492abe63e19859ad99dd232bc0e89
# good: [f3c1f0d0aaf20f9dee35ae99ec8b8705af4dc60e] drm/radeon: fix render backend setup for SI and CIK
git bisect good f3c1f0d0aaf20f9dee35ae99ec8b8705af4dc60e
# good: [f3b578d9d009a9f670e893cec8579aa069aaaccb] mm: numa: avoid unnecessary work on the failure path
git bisect good f3b578d9d009a9f670e893cec8579aa069aaaccb
# bad: [e93b100931a45490cd07960a1ec51d9d8e5100cb] GFS2: Fix slab memory leak in gfs2_bufdata
git bisect bad e93b100931a45490cd07960a1ec51d9d8e5100cb
# bad: [eede0e9020693adaeed01fb464261a00ce9d05ad] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
git bisect bad eede0e9020693adaeed01fb464261a00ce9d05ad
# good: [ef36ec29945653ced2c30158213841d248299a8a] mm: fix TLB flush race between migration, and change_protection_range
git bisect good ef36ec29945653ced2c30158213841d248299a8a
# bad: [9c612a77032a98b264d12fd6e3df2ca530d968d2] mm: numa: defer TLB flush for THP migration as long as possible
git bisect bad 9c612a77032a98b264d12fd6e3df2ca530d968d2
# good: [186fa6eb6131954d17457f37283e654cb079c25b] mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates
git bisect good 186fa6eb6131954d17457f37283e654cb079c25b
# first bad commit: [9c612a77032a98b264d12fd6e3df2ca530d968d2] mm: numa: defer TLB flush for THP migration as long as possible
Also noteworthy: I get an error message as follows:
Failed to alloc lowmem for boot params
However, it's quite likely that the bisect does not help. As one of the other Arch users pointed out to me:
"Your bisect produced nonsense, since the problem seemingly appears and disappears at random if you rebuild the same code.
Also, it happens in the very early boot code (arch/x86/boot/compressed). The mm subsystem (and the rest of the kernel) isn't even loaded at this time.
It might have something to do with the alignment or the content of the compressed kernel. Using another compression type (such as XZ or LZO) might hide the problem again."
I now tend to agree with him.
It may be worth changing you configuration around a chunk and then re-doing the bisect see if you end up in the same place.
A bad commit in the TLB handling area is not that implausible a place for weird and strange crashes.
The "failed to alloc lowmem for boot params" error is revealing at least, since you shouldn't be hitting that at all.
Bastian, could you post a dmesg of a working kernel?
What hardware are you seeing this on?
What boot loader are you using to boot the machine, or are you booting it directly either via the UEFI shell or the UEFI Boot Manager?
My laptop is a Lenovo Thinkpad W520. I've compiled an x86_64 kernel and I'm booting with gummiboot, which as far as I understand is a simple UEFI Boot Manager.
Other examples of affected hardware are mentioned in the Arch bug report:
How did you compile the kernel? And which kernel did you use? Was it a native upstream kernel or some Arch kernel? Please be specific.
This could be a toolchain/build issue, or simply an EFI memory map config issue. The attached patch may alleviate the "Failed to alloc lowmem" issue you're seeing. Btw, does your machine always fail to boot in the same way? That is, do you always see the "Failed to alloc lowmem" error on a failed boot?
Please provide a dmesg from a working kernel.
In order to debug this we need to be able to pinpoint the exact differences between working and non-working kernels, and the above archlinux thread mostly contains "Version x.xxx.x doesn't work for me, x.xx.x does", which isn't all that helpful.
Created attachment 122431 [details]
Thanks for the patch. I'll provide all the required info on Sunday as I'm away for the weekend and don't have my laptop with me.
I just tried out the patch. It does not solve the boot problem on my machine.
A description of how the kernel is being compiled can be found in this comment: https://bugs.archlinux.org/task/33745#comment118190. I'll let Bastian provide the other information.
I know Matt said it won't be very helpful, but weirdly the 3.12.8 kernel boots successfully on my machine. I compiled it exactly the same as the 3.12.7 kernel as in the comment I mentioned previously.
Just an FYI, although the 3.12.8 kernel has not been released to the Arch testing repository, the SVN PKGBUILD files have been updated to the 3.12.8 kernel. Here's the svn link to the PKGBUILD files: svn://svn.archlinux.org/packages/linux/trunk.
Yet again... does this patch work: http://pastebin.com/24kvw8kt ?
Ulf, don't propose unrelated patches. The setup_efi patch would not account for seeing the lowmem error message Bastian is seeing.
The archlinux thread posted in this report is a complete mess of people reporting different errors, on different hardware, with different boot loaders. Clearly people are hitting *different* issues.
If removing setup_efi_pci() works for you, please open a separate bug report, attach the patch, describe the failure you see without the patch, and describe the hardware and boot loader you're using.
Brian, did you also see the "Failed to alloc lowmem for boot params" error? If not, do you see any error message whatsoever? When your machine fails to boot, how does it fail?
Conflating these error reports into one monolithic report is not the way to get these problems resolved.
Matt, I do not ever see the "Failed to alloc lowmem..." message when my computer boots nor when it fails to boot (with and without the patch you posted above). For the times it fails to boot, I'll power on the machine and see the Lenovo UEFI splash screen as usual. Then the splash screen goes away as it always does during boot, but that's it. The backlight on my screen is in fact turned on, but no messages or anything shows up after the splash screen goes away.
Not sure if this will be of any help, but when this boot failure occurs on my machine, powering off the machine does not require me to hold the power button for 5+ seconds, A.K.A. perform a hard shutdown. Just holding the power button for about 1 or 2 seconds shuts the computer off. In other words, it's shutting down cleanly. Probably because the failed boot process doesn't make it to the point of mounting the filesystem.
Brian, are you using a boot loader and if so, which one? Are you able to run the UEFI shell from your BIOS menu? If you can, try and execute the vmlinuz directly (you may need to rename it with a .efi extension, i.e. vmlinuz.efi), something along the lines of,
fs0:\> vmlinuz.efi ignore_loglevel
and report whether you see any output. The 'ignore_loglevel' parameter is important, debugging this blind is going to be incredibly time consuming and it appears the Archlinux Linux package makes the boot super quiet (see change-default-console-loglevel.patch), which looks smarter when things are working OK, but makes debugging difficult. Ideally we need some output to figure out where to start looking.
Since your machine powers off instantly, I suspect the kernel hasn't setup the interrupt tables by the time it crashes. It could be crashing in the EFI boot stub, or very early in the kernel boot code. The key to debugging this is to narrow down where we should start searching for bugs.
I've used refind and gummiboot. Both fail to boot, but booting UEFI with GRUB2 is successful.
Yes, I can run UEFI shell.
Running "vmlinuz.efi ignore_loglevel" from the UEFI shell with a bootable kernel installed, a bunch of jargon is output to the screen. I see things like "Unable to mount filesystem...", "Kernel panic...", and at the end is a "Call Trace" list. It just hangs after that. I suppose this is normal since it is not the normal way to boot linux. I have to do an actual hard shutdown to power off (holding power button for 5+ seconds).
Running "vmlinuz.efi ignore_loglevel" from the UEFI shell with an unbootable kernel installed, there is no output after I hit the enter key. The system just hangs there in the UEFI shell. I can power off the machine like I described earlier, i.e., not a hard shutdown (holding power button for 1-2 seconds).
Thanks Brian, those are all good data points. What model of Lenovo are you using?
So, first things first. Let's see whether we can make it into the EFI boot stub. Please apply the attached banner.patch. I'm hoping that because it applies directly to the EFI boot stub, and not the kernel proper, it won't alter your failure in any way. The patch will apply to v3.12.7.
Please report any output you see.
Created attachment 122631 [details]
Bastian, I've opened the following bug report to track your issue, which is different from the one Brian is reporting,
Let's move the "Failed to alloc lowmem for boot params" conversation over there.
I am using a first generation Lenovo X1 Carbon.
Here is the output I get after applying your banner.patch:
Building boot params
Allocated boot params
Built command line
Finished building boot params
EFI boot stub v3.12.7
Setting up graphics
Setting up pci
Exiting boot services
Nothing else happens after this. The computer just hangs at this point.
Could you revert the banner patch and apply the following two patches to v3.12.7 and rebuild,
You'll need to enable CONFIG_EARLY_PRINTK_EFI in your config and pass earlyprintk=efi as an argument when booting your kernel, e.g.
fs0:\> vmlinuz.efi ignore_loglevel earlyprintk=efi
Hopefully you'll see some output (don't worry if the text scrolls slowly). If you still don't see any output, then we'll have to try something much more laborious.
Created attachment 122711 [details]
Created attachment 122721 [details]
Hey Matt and Brian,
I was able to boot the Arch core repo build of 3.12.7 from the UEFI shell.
This kernel shows all the symptoms Brian described when I try booting it from gummiboot. In particular there is no "Failed to allow lowmem for boot params" message and the boot just hangs. Pressing the power button does shut the system down normally though, so it's not locked up completely.
Here's how I got it to boot:
- I copied my kernel image from /boot/vmlinuz-linux to /boot/3.12.7.efi
- I downloaded a binary built of the UEFI shell v1 and executed the shellx64.efi file from gummiboot which gave me an UEFI shell (I can't go through my laptop's BIOS to obtain an UEFI shell)
- I executed the following UEFI shell commands:
3.12.7.efi initrd=\initrams-linux.img root=/dev/sda2 rw
---> Kernel boots just fine! The is the first time I was able to boot this kernel build.
I didn't apply any of the patches Matt posted. I'll attach a dmesg of this successful boot.
Created attachment 122751 [details]
dmesg of succesful UEFI shell 3.12.7 boot
There's a typo in my previous post.
It should be
3.12.7.efi initrd=\initramfs-linux.img root=/dev/sda2 rw
with "initramfs-linux.img" instead of "initrams-linux.img"
Matt, I'm sorry to say this, but there was no output using the earlyprintk patches.
Created attachment 122821 [details]
unpatched build of arch 3.12.7-2 kernel
Created attachment 122831 [details]
build of arch 3.12.7-2 kernel with lowmem, banner and earlyprintk patches
Created attachment 122841 [details]
build of arch 3.12.7-2 kernel with lowmem, banner and earlyprintk patches + no_setup_efi_pci patch
Created attachment 122851 [details]
debug messages from unsuccessful boot
Created attachment 122861 [details]
debug messages from successful boot (very bad quality)
I built a set of kernel images based on the stock arch 3.12.7-2 PKGBUILD  with and without the attached lowmem, banner and earlyprintk patches. (I read too late that the latter were not intended to be used together, but I think it will not hurt either.) Both kernels do not boot on my dell xps 13 fhd when launched from efi shell. unpatched hangs with no output. the patched one hangs like Brian posted. (https://bugzilla.kernel.org/show_bug.cgi?id=68761#c18) The third kernel has setup_efi_pci patched out and boots fine, until it can't open root because of the missing initrd. (see attached jpgs). The patched kernels are incremental builds ontop of the stock arch, all built in a chroot enviroment.
: sha256sum of the attached kernels
I probably have same problem as Brian. Without patches there is no output from EFI shell or gummiboot after launching kernel. With banner and earlyprintk patches (arch 3.12.7-2) i get same output as comment #18 and hang after that. UEFI Grub boot in ubuntu liveusb boots succesfully. Arch liveusb does not. Most interesting thing is that 3.2.16-1 used to boot but doesn't anymore (same thing happens).
Could everyone please try the following reloc.patch on top of v3.12.7 and report the output (feel free to apply the earlyprintk patches too).
Created attachment 122911 [details]
Attached is an image of all the jargon that was output after applying both the reloc and earlyprintk patches.
Created attachment 122931 [details]
Reloc patch output
Brian, is that v3.12.7 with only the earlyprintk and reloc.patch? If so, it seems like your issue is resolved?
Could you verify it works with gummiboot too?
Yes, it is v3.12.7.
No, computer still doesn't boot. I do get output when I boot using gummiboot, but it hangs after the "Call trace", as seen in the image I posted.
Brian, sorry, I should have been clearer. Your kernel is panicing because it can't find the root file system, but you'll notice that you have to hold the button down to power off your machine now. You may want to try using the same command line that Bastian used, namely,
fs0:\> 3.12.7.efi initrd=\initramfs-linux.img root=/dev/sda2 rw
Does gummiboot produce the same call trace as in https://bugzilla.kernel.org/show_bug.cgi?id=68761#c36 ?
If so, I think we can conclude that this issue is resolved, since not finding the root file system is a config problem.
Ulf, does reloc.patch work for you without deleting setup_efi_pci()?
I get a kernel panic regardless using gummiboot or manually booting through the UEFI shell.
I've attached a text file showing the output I get booting with gummiboot. Output is similar, if not the same, when booting through the UEFI shell.
Created attachment 122941 [details]
For completeness, yes, I have to do a forced shutdown.
May I ask for a high level explanation of what reloc.patch did?
Great, thanks for testing.
reloc.patch fixes a bug in the EFI boot stub. The boot stub allocates a buffer and copies the kernel image into it, but it gets the alignment wrong. The kernel decompressor code checks the alignment, notices it's wrong, and rounds up the output buffer address past the buffer we originally allocated in the EFI boot stub. What happens when the decompressor runs is anybody's guess because no one has been able to get any output from the kernel, but rest assured it will be Bad Things (TM), i.e. overwriting random bits of firmware code/data.
Hitting this bug is going to depend on exactly where the kernel image is loaded in the address space, which means your choice of boot loader, kernel version and even kernel build affect your chance of a successful boot.
Thanks for your persistence in tracking this issue down, we got there eventually. I'll get this pushed upstream and it'll make its way to the stable kernels.
I am still interested to know whether reloc.patch fixes the setup_efi_pci() issue people are seeing. If anyone could provide any info, that would be much appreciated.
Hey Matt, that could be the culprit indeed :)
However, given that people were able to boot some builds of 3.12.7 in the past already, we can't conclude from Brian's successful boot that the reloc.patch fixes the issue. I guess we need more testers?
I'll test it now :)
I was similarily affected by this bug. 3.12.7 didn't boot (at least not the precompiled version). Compiling 3.12.7 using -j 1 does however work for me, anything other than 1 results in the kernel not booting.
As 3.12.8 exhibits the same bug I applied your reloc.patch and booted it as I'd do with every other kernel. At least I can now see a kernel panic happening (see attached screenshot).
So I'm wondering: Should this patch fix the bug? If so, it does not, at least not for me because I didn't change the kernel commandline or any other configuration for the kernel with your reloc patch or working kernels. In my understanding the kernel _does_ find its root device but can't mount it?
I'm using gummiboot (no EFI Shell).
Created attachment 123051 [details]
3.12.8 with applied reloc.patch kernel panic
Sorry, I should have been more specific. After applying reloc.patch, my computer did NOT boot successfully. I did get a kernel panic like Max described.
I concur with Max and Brian. I cannot boot the kernel after applying reloc.patch.
dell xps 13:
- arch 3.12.7-2 + reloc.patch: boots up to the kernel panic described by bastian, brian and max. (gummiboot and efi shell / kernel boots fine with syslinux in csm mode)
- arch 3.12.7-2 + reloc.patch + earlyprintk-efi.patch: boot hangs without output. (efi shell)
- arch 3.12.7-2 + reloc.patch: hangs at boot but after a couple of seconds upper 1/10 of the screen turns grey. it did never do taht before and does not with 3.12.7-2 stock. (no, i am not kidding! :))
So my guess is that still not enough memory for the kernel is allocated and parts of the firmware is overwitten. (block device layout for my xps 13, as well as bastian's, brians' and max's systems and graphic stack for my w520.
Everyone, thanks for clarifying and apologies for the misunderstanding. I didn't realise that the patch broke your existing configurations. Turns out there was a typo in one line of the patch which meant that it was writing a 64-bit pointer to a 32-bit data item, resulting in the ramdisk pointer becoming corrupt.
Could everyone try reloc2.patch and see if things improve (I'll delete the old one so that there's no confusion).
Created attachment 123281 [details]
I still cannot boot 3.12.7 + reloc2.patch from gummiboot. I can however:
a) boot 3.12.7 from UEFI shell
b) load gummiboot.x64 from UEFI shell and _then_ load 3.12.7 + reloc2.patch from the gummiboot menu.
However, I think this behaviour is the same as without reloc2.patch (as I described a while ago: I can boot from UEFI shell but not from gummiboot directly). I didn't try to load gummiboot from UEFI shell and then load the kernel at that time though.
I saw that quite a few patches for EFISTUB booting went into 3.13.
Should try out 3.13 (+reloc2.patch?)
(In reply to Matt Fleming from comment #51)
> Created attachment 123281 [details]
reloc2.patch works fine for me on dell xps 13 and lenovo w520. I will upload my patched arch kernel package and ask the guys in the arch bug report to test it and report back. Thanks a lot for your effort!
Okay, sorry but reloc2.patch didn't help. Same behaviour as before: Just a blank screen, backlight works. Compiled with -j16.
Will try 3.13 tomorrow and see if that works. [Anyone have PKGBUILD for that one? :-)]
Thanks for providing the packages on the Arch bugtracker. I have been following this bug with interest.
I have a Lenovo X230 and have faced this issue with 3.12.7 and 3.12.8.
Thanks for providing reloc2.patch - Using 3.12.7-2 with reloc2.patch seems to resolve my issues. No black screen on boot.
Kernel 3.13 (AUR: linux-mainline which is basically vanilla) boots just fine without your reloc2.patch
Ulf, just tried your kernel with Matt's reloc2.patch. My computer boots successfully.
Please remember that we have had succesful builds of 3.12.7 without any patches in the past.
Ulf might have just gotten lucky. Instead of using his build I'd recommend to compile your own version, so that we have some statistical evidence for the statement that the patch fixes this problem at least.
I'm coming from the arch bug report discussion to report a successful boot with the pre-built patched kernel Ulf provided on thinkpad x230, using refind 0.7.7-1.
Though from what I gathered reading this thread, this success could be a matter of luck and not necessarily a proof of a working fix.
Let me know how I can help with finding out if it truly fixed the issue.
(In reply to Bastian Beischer from comment #59)
> Ulf might have just gotten lucky. Instead of using his build I'd recommend
> to compile your own version, so that we have some statistical evidence for
> the statement that the patch fixes this problem at least.
That is definitely a good idea. You should also try different values of the "pkgbase", as this made a difference before. (https://bugs.archlinux.org/task/33745#comment111238)
I compiled a kernel from Ulf's provided sources and patches with a modified pkgbase and it failed to boot exactly as before: it displays kernel parameter and hangs there.
A version without the modified pkgbase is compiling.
Results are in for locally compiled kernels:
With the provided pkgbase name: boots!
With a modified pkgbase name: hangs!
I'm totally alien with the inner workings of the kernel hence do not understand what's happening here but if you need me to provide additional debugging info, I'm willing to help.
I would like to add that this bug affects me too. My affected stack includes:
- Dell XPS 13 (Sputnik/Developer Edition)
- Arch Linux 64 Bit
I'll try to get some time together and test some of these patches... I greatly appreciate all of your efforts.
Following on from Bastian Beischer request that others should compile a kernel and test I've done the following:
- Taken a copy of 3.12.9-1 and compiled it using the Arch Build System (ABS). All that was modified was 'pkgbase' so it'd get copied to a unique name. After adding an entry for gummiboot and restarting this package resulted in a black screen as per the bug report.
- Using the same copy from my first attempt, with the same 'pkgbase' I applied the patch inside the Arch Linux PKGBUILD prepare() function. Compiling, Installing and booting using gummiboot still resulted in the same black screen.
Further to my previous comment, I've done some further testing with 3.12.9-1.
- Compiled with the reloc2.patch applied and pkgbase set to "linux-dhope" in PKGBUILD. This results in a black screen on boot. I've cleaned out the tree and rebuilt three times and had the same result each time.
- Compiled with the reloc2.patch applied and pkgbase set to "linux-dhope-efibootfix" in PKGBUILD resulted in a successful boot.
I would be interested if other Lenovo X230 users experiencing this issue could try the same process.
Okay so I have been following the Arch thread and been trying to work with them on this. I have an X230 with the latest bios applied, and when I try to boot I experience the issues with the black screen and then having to power down the system with a long press on the power button. I can state that booting to UFI Shell and trying to load the kernel from the command line does not work for me for any kernels that do not boot via gummiboot.
I tried compiling the latest Arch core kernel with modifications to the PKGBUILD
Inside the prepare() function:
patch -p1 -i [path to reloc patch]
The first time I did this and compiled the kernel I installed it using pacman -U and rebooted to the kernel. I only received a blank screen and it would not go past that.
I also have the aur mainline kernel 13.3 installed on my machine and I can also report that it boots without a problem.
I'm going to build the kernel again and then try it one more time. Can someone tell me what I should do when booting to try and help get some more information? Should I use ignore_loglevel?
Rich, ignore_loglevel is a good idea for debugging in general since it should print all kernel statements to the console.
I'm gonna have to go and have another read of the EFI boot stub code and hunt for more bugs. I certainly don't think that reloc2.patch is going to hurt anything, but there's clearly other things going on.
By the way, has anyone had any problems with 3.13, yet?
I have tried like 5 or 6 different builds and haven't had a single failure... 3.13 contains some EFI related changes (Matt will now best), maybe the issue is already fixed there?
3.13 seems to work for me on more devices (but turning on all the kernel debugging spews lots of warnings and crashes my box so it's not IMHO right), 3.14-rc1 is completely broken.
Problem persists for me under 3.12.9-2 (no reloc2 patch applied). Even adding debug options yields no output:
debug ignore_loglevel log_buf_len=10M print_fatal_signals=1 LOGLEVEL=8 earlyprintk=vga,keep sched_debug
Just wanted to add an addition, I'm using the 3.12.9-3-ck build of the kernel from AUR and it works fine, I haven't had a problem yet. Also with ignore_loglevel set I did not have any output from the CORE 3.12.9-3 kernel so I can't really help much there.
Can confirm 3.13.2 won't boot on a Lenovo Thinkpad T530.
The kernel is a stock Arch kernel: 3.13.2-1-ARCH x86_64 GNU/Linux
After I recompiled the kernel (using the same kernel config with the Arch build system) boot works.
Prior to 3.13.2 the 3.13.X kernel versions all booted fine (all of them were stock Arch kernels)
I am available for further testing, so if you want me to try to boot with the UEFI shell let me know.
I can confirm as well that EFI boot doesn't work on my Kabini (AMD A4-5000, ZBOX nano AQ01) system.
Last working kernel version was 3.10.20.
3.12.X, 3.13.X and 3.14-RC2 don't work and cause a hang before I even see a single output.
I'd like to mention that I have had no issue booting any linux-ck kernels version yet, even when the same revision of the arch kernel fails to boot.
It's been a while since my last manually built kernel which is probably why my locally built 3.13.x kernels are triggering systemd errors and end up in a rescue shell, but they don't get stuck as with this bug.
Today was a new kernel package pushed to the repositories. The kernel version was the same but it contains an additional patch .
I gave it a try and it works fine. The added patch doesn't seem to be in any way relevant to the UEFI issue, and since my own rebuild of the same kernel worked fine the issue seems to be with that specific build.
I noticed an interesting difference in the output of the file command with the kernel image as an argument:
boot/vmlinuz-linux: Linux kernel x86 boot executable bzImage, version 3.13.2-1-ARCH (nobody@var-lib-archbuild-extra-x86_64-thomas) #1, RO-rootFS, swap_dev 0x3, Normal VGA
/boot/vmlinuz-linux: Linux kernel x86 boot executable bzImage, version 3.13.2-2-ARCH (tobias@T-POWA-LX) #1 SMP PREEMPT Wed Feb 12 08:2, RO-rootFS, swap_dev 0x3, Normal VGA
arch/x86/boot/bzImage: Linux kernel x86 boot executable bzImage, version 3.13.2-1-ARCH (poljar@monolith) #3 SMP PREEMPT Sat Feb 8 16:33:, RO-rootFS, swap_dev 0x3, Normal VGA
The first kernel above doesn't boot while the other two boot fine. Note the missing build date.
Just to add some more data points:
on my Thinkpad X220i with UEFI boot via gummiboot, I tried the following:
1) all stock Arch Kernels since 3.12.6 did crash with a blank screen
2) a self-compiled kernel 3.12.9 with stock kernel name (arch) did crash with blank screen
3) a self-compiled kernel 3.12.9 with kernel name "arch-testing" booted
3) a self-compiled kernel 3.13.2 with kernel name "arch-testing" got stock in a reboot cycle without any messages printed to screen
Adding my own data point, I am able to successfully boot the stock Arch package 3.13.6-1 with gummiboot on my ThinkPad x230. Previously, the last stock Arch kernel I could boot was 3.12.6-1.
(In reply to Matt Fleming from comment #11)
> If removing setup_efi_pci() works for you, please open a separate bug
> report, attach the patch, describe the failure you see without the patch,
> and describe the hardware and boot loader you're using.
I just build a recent git kernel and it would not boot, commented out the setup_efi_pci() call and it works. (Like I said before it has never failed me on my dell xps 13 ivy bridge) It has worked for other too: http://permalink.gmane.org/gmane.linux.kernel.efi/1560
I'ld be that the two of us are not the only ones in this thread for whom this hack fixes or shadows the problem. Even if it does not work for everybody. It might be worth trying. I'll attach this very simple patch. maybe someone else could try it out.
Created attachment 130561 [details]
comment call to setup_efi_pci()
Since this bug is going nowhere right now (and I just got reports from two users that Arch's 3.14.0-4 kernel fails again), I'd like to get some new information, but don't know how. What confuses me most is that when I build a kernel twice, with identical code and configuration, with the same toolchain, it happens that one build works and one fails - at least if I believe the reports I get. Is there a way to compare two such builds and maybe find out which differences cause this problem?
Thomas, can you send me the two kernels? Exactly how does the bad kernel fail? Hang, reset?
This really does sound like a toolchain bug, or perhaps an EFI boot stub bug triggered by values written by your toolchain.
For me 3.14 fails to boot in the exact same way as 3.12 did before: Blank screen nothing more, when booting with gummiboot. My current setup involves chainloading syslinux with gummiboot and then booting the kernel, which works.
Just a suggestion: The last time I tried debugging this bug I tried compiling the kernel with -j1 and -jX with X >= 1. -j1 never failed for me, whereas anything larger than 1 *might* fail.
Apart from that: If you can give me any way of displaying _some_ information (just getting a blank screen atm, backlight works...) I'd be happy to help out!
Matt, I am unable to reproduce any of these problems myself, so I am relying on what I get from https://bugs.archlinux.org/task/33745 and from personal conversation with affected users.
At one point, a user compiled two identical versions of 3.12.7 where one kernel worked and the other failed. He uploaded them at the time, however, I just found out that he deleted the files and I can thus no longer download them.
I'll see if I can find someone willing to produce two such kernels.
In any case, the failing kernels always simply produced a blank screen with no output, hanging indefinitely.
Max, I think at this point I just need two copies of the same kernel version, without user modification - one working and one not working, where the only difference is that they were recompiled.
If anyone can provide me with those we might get a bit closer to debugging this.
Thomas, does 3.14.0-4 include the patch I suggested here, http://marc.info/?l=linux-kernel&m=139703196223283&w=2 ?
That link says "No such message", but I think I know which message you refer to - and yes, that patch is included.
Hi Thomas, some more info about the 3.14 Arch packages:
Hi Thomas, some more info about the 3.14 Arch packages:
I'm unable to build a kernel that fails myself, I can however test builds provided by you if needed.
(Sorry for the useless comment above).
Damir, short version, the issue is as random as it was with 3.13. The difference between -2 and -3 is a minor change in the btusb module and a fix in the kernel entirely unrelated to early boot.
I agree with Thomas. I'd like to mention once more that I can not boot 3.14-4 from gummiboot, but I can go from gummiboot to UEFI shell v1 (shellx64.efi) and then:
cp vmlinuz-linux 3.14-4.efi
3.14-4.efi initrd=\initramfs-linux.img root=/dev/sda2 rw
-> Successful boot.
I was able to boot most of the 3.13 kernels, but all the 3.14 kernels failed for me (including 3.14-1 and 3.14-2, which is different to what Damir experienced)
Could it be that this depends on the vendor of the notebook and their EFI firmware?
Bastian, I think at this point it's pretty clear it's a(nother) memory corruption problem, or a garbage value problem, e.g. the boot stub is reading random/garbage data from somewhere.
While it may be more likely to hit this issue with certain vendors' notebooks, I don't think their firmware is to blame in this case. It's just that their memory map/boot environment seem to trigger the bug more easily. Which is why it's not entirely surprising that if you change your boot method slightly (going via the EFI Shell) things work better, since the memory map will look different.
I have so far not been able to produce a working kernel. I tried 3 times but none of the kernels booted.
Matt, I'll apply the patch you mentioned above (no, its not included in the stock Arch kernel) and see if that works.
I have switched back from Gummiboot/EFI Stub to GRUB and have not had a kernel fail on me yet, I was able to always reproduce this problem with Gummiboot or EFI Stub, or even trying to boot from the EFI Shell. Could this not be a problem with the kernel but an actual problem with the way EFI is implemented on certain machines (Lenovo seems to a big issue) and the EFI Stub?
Just my two cents.
Created attachment 131951 [details]
Kernel which fails to boot
Tested on T430 laptop, built from arch package version 3.14-4, no modifications made to package.
Created attachment 131961 [details]
Kernel which boots
This version is from the official arch package, which was built on a different machine. This boots successfully on a T430 laptop.
This isn't quite as good as recompiled on the same machine, but both kernels have been produced from the same source.
Created attachment 133461 [details]
3.14.1 hangs, output with ignore_loglevel and earlyprintk=efi
I have an issue which I think is similar to this bug. I'm unable to boot the kernel since version 3.14 using gummiboot. In short: 3.13.8 boots, 3.14 and 3.14.1 did not boot. With 3.14.1, I continued testing by trying to boot from EFI shell with ignore_loglevel and earlyprintk=efi, which gives the attached output (133461). Let me know if I can do something more. I'm using arch linux, where the 3.14.1 kernel already includes a patch "x86/efi: Correct EFI boot stub use of code32_start". My computer is a Dell Latitude E4310.
Created attachment 134131 [details]
better error logging in eboot.c
@Matt: This issue still remains on my DELL XPS 13 even after the bugfix you found with Thomas. I experimented with the current git kernel. Still could not find a reproducible way to build a bad kernel though. What would be helpful was better error logging. I attached a patch proposal. The error messages are pretty generic, but now I get atleast some output, if the kernel fails. Interestingly i manged to build a kernel where setup_efi_pci fails, but as the ret value is not check at all, the kernel boots.
I have an Intel NUC N2820 and did a fresh install of arch linux today.
first problem was gummiboot - screen stayed black. Downgraded to an older version and was able to start gummiboot.
After that screen stayed black when selecting an entry in gummiboot. I also downgraded to kernel 3.13.8 to get the system booting.
@Matt: what needs to be done, to get this pathc included in mainline? It would not hurt, would it?
No your patch wouldn't hurt, and I agree that it would be a good addition.
Could you please rebase your patch against the 'next' branch at,
and mail to email@example.com and Cc me, and I'll make sure that it gets applied. Thanks!
Created attachment 142371 [details]
add better error logging to efi-main
attached the rebased patch here for reference purposes. Going to send it to the list tomorrow. Thx for your work Matt!
> On July 7, 2014 at 3:25 PM firstname.lastname@example.org wrote:
> --- Comment #103 from Matt Fleming <email@example.com> ---
> No your patch wouldn't hurt, and I agree that it would be a good addition.
> Could you please rebase your patch against the 'next' branch at,
> and mail to firstname.lastname@example.org and Cc me, and I'll make sure that it
> gets applied. Thanks!
> You are receiving this mail because:
> You are on the CC list for the bug.
I just read this  and I think this just might be "it". The random symptoms we
are seeing are pretty much exactly what I would expect of a bug like this.
Michael Brown who discovered this, posted two patches to the list. Going to try
I think I experience the same bug on Debian. Here, 3.14-1-amd64 does not boot (based on 3.14.12). The 3.16-rc5 kernel we have does boot fine, though.
Is the patch at https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit/?h=urgent&id=c7fb93ec51d462ec3540a729ba446663c26a0505 going to be included in the 3.16 kernel? It would be useful if it is so that more people can see if this fixes the efi stub loader boot bug.
(In reply to Mike Cloaked from comment #107)
> Is the patch at
> ?h=urgent&id=c7fb93ec51d462ec3540a729ba446663c26a0505 going to be included
> in the 3.16 kernel? It would be useful if it is so that more people can see
> if this fixes the efi stub loader boot bug.
Yes Mike, that patch will be in v3.16.
There have been no further reports of this bug occurring in arch once the patch referred to was in the arch kernels, and I have added a comment to the arch bug at https://bugs.archlinux.org/task/33745#comment125801 asking if that bug might be closed if no further reports occur through to when kernel 3.16 is released. That may be a suitable time to consider this bug closed here too?
I think this bug was mainly opened as a way for arch users/developers to track upstream progress of debugging this problem. If things are looking fixed then I'm definitely in favour of closing this bug.
If people run into similar bugs, or they think their bug has never been fixed, I'd suggest that they open a *new* report instead of re-opening this one. The reason being that, over time, this bug became a hodge-podge of reports and opening individual reports with as much detail as possible may at least allow us to gauge how many bugs we have in this area.
Anyone have a good reason to keep this report open?
No, I think you are right. I am pretty sure that Michaels patch fixed the problem atleast on some/most systems. Future will tell! :)