Bug 110131
Summary: | Kernel panic after the random pool initialisation on Dell Latitude E6420 | ||
---|---|---|---|
Product: | EFI | Reporter: | Viorel-Cătălin Răpițeanu (rapiteanu.catalin) |
Component: | Boot | Assignee: | EFI Virtual User (efi) |
Status: | RESOLVED CODE_FIX | ||
Severity: | blocking | CC: | matt, rapiteanu.catalin, stf_xl |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.2.4 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg log with CONFIG_EFI_PGT_DUMP
debug patch dmesg log with the debug patch applied backport patch avoid loss of precision in numpages |
Description
Viorel-Cătălin Răpițeanu
2015-12-29 03:06:58 UTC
Could you install kernel from git sources (https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/) and perform bisection between 4.2.3 and 4.2.4. We have about 260 patches there, none of them seems to be related with random generator, there are however sched and x86 patches that possibly cause this bug. I've done a bisect between 4.2.3 and 4.2.4 and found that the regression was introduced by this commit:
> commit 496c2053cd784dd653d295e499503f14907022b3
> x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime,
> instead of top-down
I can also confirm that both people that have seen this problem so far (me included) are using UEFI to boot the kernel.
Thanks. I assign bug to proper component. Can you try booting with the broken commit applied and specify "efi=old_map" on the kernel command line? That should get your kernel booting, but it's a good idea to verify that. Oh, also, build with CONFIG_EFI_PGT_DUMP enabled and make the kernel command line, "efi=old_map,debug". It would be good to stare at the memory mapping for your machine which will appear in dmesg. > Can you try booting with the broken commit applied and specify "efi=old_map" > on the kernel command line? That should get your kernel booting, but it's a > good idea to verify that. That efi modifier gets my kernel booting. > Oh, also, build with CONFIG_EFI_PGT_DUMP enabled and make the kernel command > line, "efi=old_map,debug". It would be good to stare at the memory mapping > for your machine which will appear in dmesg. I will attach the dmesg for that scenario as soon as possible. If there is any more relevant debug information that I can provide, just leave a message. Created attachment 201021 [details]
dmesg log with CONFIG_EFI_PGT_DUMP
dmesg log of the bad kernel with "CONFIG_EFI_PGT_DUMP" and "efi=old_map,debug".
Created attachment 202091 [details]
debug patch
Could you try out the attached debug patch on top of v4.2.4, which includes the problematic commit? You don't need to specify efi=old_map, the patch should hopefully take care of ensuring the kernel boots but EFI runtime services will not be available (but that shouldn't be a problem). After verifying that your kernel boots could you attach the new dmesg? The images you posted make it look like we're spinning forever trying to map the EFI regions, though I have no idea why yet. Created attachment 202111 [details]
dmesg log with the debug patch applied
The kernel boots with the applied patch without having to specify efi=old_map. The new dmesg log contains the applied debug patch. If there is anything else that could help the debugging process, please let me know. OK, that narrows things down a little. Could you try the attached backport ontop of a clean v4.2.4? Created attachment 202121 [details]
backport patch
The kernel doesn't boot with the proposed backport patch without old_map. Thanks for testing that out. Let me go and stare at the code some more. It appears that this is a whole new bug in the mapping code. Created attachment 202191 [details]
avoid loss of precision in numpages
Fingers crossed, can you try out this attachment? I was able to reproduce your issue locally in Qemu by force-feeding the EFI virtual region allocator the problematic address range.
The last patch successfully started the kernel. That was nice! Hope that this will be merged in the master branch soon. Thank you for spending your time fixing this issue. This has now been merged into Linus' tree and should be backported to the stable releases soonish, https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/mm/pageattr.c?id=742563777e8da62197d6cb4b99f4027f59454735 Thanks for all your help tracking this down. |