Bug 65841
Summary: | Dell Venue 8 Pro (Bay Trail) kernel crashes part way through boot with kernels 3.12 and 3.13 | ||
---|---|---|---|
Product: | EFI | Reporter: | Adam Williamson (adamw) |
Component: | Services | Assignee: | EFI Virtual User (efi) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | alan, deckerrexy, kamiciy920, kernel, matt, mjg59-kernel, tianyu.lan |
Priority: | P1 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 3.13.0-0.rc1.git2.1.fc21 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
bottom of the first traceback seen
another trace that pops up some seconds after the first clean screenshot of the full trace acpidump (requested by mjg59) |
Description
Adam Williamson
2013-11-26 02:39:12 UTC
Created attachment 116091 [details]
bottom of the first traceback seen
Boot proceeds for a while and then explodes, with a whole bunch of output landing at once. This is the bottom of that output. It then sits like this for a while, before posting some more tracebacks, I think because it detects the CPUs are sitting idle. I'll post a screenshot of one of those later tracebacks next.
Created attachment 116101 [details]
another trace that pops up some seconds after the first
https://www.happyassassin.net/extras/v8p_fail.mp4 is a shakycam video of a boot with kernel 3.13.0-0.rc1.git2.1.fc21 . I can't manage to transcode it to another format or rotate it at present, sorry :( ah, large USB sticks to the rescue! I now have the system installed to a second USB stick and can successfully boot from that. So I should be able to stick a 3.13 kernel into the 'installed' system, boot, then boot back to 3.11 and extract the logs from the failed boot. Stay tuned. Unfortunately, it looks like the journal doesn't capture the kernel crashes :( https://bugs.freedesktop.org/attachment.cgi?id=89807 is what gets captured in journald during a boot of a 3.13 kernel, but it doesn't include the traces for this bug, I don't think. If anyone has any bright ideas how I can get 'em out, do yell. Funnily enough, after installing and trying to boot a 3.13 kernel, I can no longer boot successfully even with the 3.11 kernel! It starts throwing traces during boot too. Odd. So I'm craning my neck around awkwardly and stepping through the video frame-by-frame. It appears that the crash happens right around microcode loading. I see something like: "platform microcode: Direct firmware load failed with error -2" repeated around when the traceback shows up. The first line of the traceback looks to be in 'virt_efi_get_time'. I can also see (there may be typos or errors in the individual characters here, reading off the fuzzy video): [47.176986] microcode: CPU0 sig=0x30673, pf=0x2, revision=0x312 [47.412697] BUG: unable to handle kernel paging request at fed08004 [47.420696] IP: [<f805fa36>] 0xf805fa36 [47.428607] mpde = 3207f067 *pte = 00000000 [47.436536] Ooops: 0000 [#1] SMP ... [47.502595] task: f5e3ed80 ti: f5f22000 task.ti: f5f22000 [47.509960] platform microcode: Direct firmware load failed with error -2 [47.509985] platform microcode: Falling back to user helper [47.510154] microcode: CPU1 sig=0x30673, pf=0x2, revision=0x312 [47.510963] platform microcode: Direct firmware load failed with error -2 [47.510967] platform microcode: Falling back to user helper [47.511056] microcode: CPU2 sig=0x30673, pf=0x2, revision=0x312 [47.511756] platform microcode: Direct firmware load failed with error -2 [47.577060] EIP: 0060:[<f805fa36>] EFLAGS: 00010046 CPU: 3 [47.584893] EIP is at 0xf805fa36 [47.592644] EAX: fed03000 EBX: f5f23e0b ECX: c044(illegible) [47.600446] ESI: 00000282 EDI: f5f23e34 ERP: f5f23dc(illegible) ... [47.654779] Call Trace: [47.662473] [<c04481ec>] ? virt_efi_get_time+0x1c20x50 [47.670252] [<c0448200>] ? virt_efi_get_time+0x3020x50 [47.677954] [<c0447fcd>] ? efi_set_rtc_????+0x1b20xc0 .....god, my neck is getting tired...from there the trace goes to __getestimeofday, update_persistent_clock, sync_cmos_clock, process_one_work, process_one_work again, worker_thread, process_one_work, kthread, trace_hardirqs_on, ret_from_kernel_thread, insert_kthread_work, then a Code: block, an EIP line, a CR2 line and the trace ends. After that I see a line: BUG: sleeping function called from invalid context(? - line continues, but is illegible) then: in_atomic(): 1, irqs_disabled(): 1, pid: (illegible) INFO: lockdep is turned off irq event stamp: 136104 hardirqs last enabled at (136103): (looks like the getesttimeofday function from the trace) hardirqs last disabled at (136104): [<c0a7eba3>] _raw_spin_lock_irqs softirqs last enabled at (136042): (illegible) softirqs last disabled at (136037): (illegible) CPU: 3 PID: 74 Comm: kworker/(illegible) Tainted: (illegible) looks like the start of another trace... That's as much as my neck can stand tonight, hope it's of some kind of use. Can try to decipher more tomorrow if necessary. CCing Lan - hi Lan, I saw you on a couple of other Bay Trail bugs, hope you don't mind being copied on this one. Created attachment 116111 [details]
clean screenshot of the full trace
Aha! So removing microcode_ctl doesn't solve the crash, but it makes the traceback much cleaner. With microcode_ctl package removed the entire crash looks like it fits on one screen. Here is a picture of the whole thing.
It looks like it is making a call to 64bit EFI services even though they are not present. If you boot a 32bit Fedora does that behave any better ? Alan (adding the font of EFI firmware wisdom to the cc list) Looks like it *is* a 32-bit kernel. We don't default to using the EFI time services on 64-bit, and possibly we should do the same thing on 32-bit, but it's strange that it worked in 3.11 and breaks in 3.12. We've certainly made fewer efforts to implement any UEFI bug workarounds on 32-bit systems. Duh yes.. good point. So trying a 64bit Fedora might actually help. I'e got a T100 heading my way so hopefully I can have more of a poke at this and at the fact we are doing VESA bios calls in EFI mode as well. It's a 32-bit UEFI implementation and we don't have a 32-bit bootloader that knows how to boot a 64-bit kernel (and if we did, we'd then refuse to perform any UEFI calls). Fixing that is possible, but not high on my (or anyone else's) list. Thats not necessarily a lose if we could stop it then deciding that it wants to do BIOS calls instead. It is possible to boot ubuntu on the T100 in 64bit via 32bit EFI, although obviously you then can't access EFI services (what a shame ;)) http://forum.xda-developers.com/showthread.php?t=2500078&page=5 has much of the background to all of this. Well, other than there being no way to set up the bootloader. Yes, as discussed above, this is a pure 32-bit image and I cannot easily do 64-bit-on-32-bit unless Matthew or someone else smart fixes that up for me (or, I guess, I could steal whatever bootloader people are using to do the 64-on-32 trick with Ubuntu, but if anything I'd rather this gets *less* messy not *more* :>) Matthew, should I move this downstream? Would anyone be interested in fixing it? I think it'd be nice to make these systems work... This change suggested by mjg59 appears to fix the problem (I built a 3.13 kernel with this patch, built a live image with that kernel, and it boots to a desktop): --- linux-3.13.0-0.rc1.git3.1.fc21.x86_64/arch/x86/platform/efi/efi.c 2013-11-28 12:59:36.613028002 -0800 +++ linux-3.13.0-0.rc1.git3.1.fc21.x86_64/arch/x86/platform/efi/efi.c.new 2013-11-28 13:01:08.043768967 -0800 @@ -690,7 +690,7 @@ set_bit(EFI_MEMMAP, &x86_efi_facility); -#ifdef CONFIG_X86_32 +#if 0 if (efi_is_native()) { x86_platform.get_wallclock = efi_get_time; x86_platform.set_wallclock = efi_set_rtc_mmss; Created attachment 116931 [details]
acpidump (requested by mjg59)
An equivalent patch from Matthew made it into v3.13-rc4 in Linus' tree and is marked for stable, https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/platform/efi/efi.c?id=04bf9ba720fcc4fa313fa122b799ae0989b6cd50 |