Bug 73761
Summary: | efibootmgr OOPS | ||
---|---|---|---|
Product: | EFI | Reporter: | Andy Lutomirski (luto) |
Component: | Other | Assignee: | EFI Virtual User (efi) |
Status: | CLOSED INVALID | ||
Severity: | normal | CC: | matt |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.14 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Complete kernel logs, instrumented
Complete kernel logs, instrumented Logs w/ EFI page table |
Description
Andy Lutomirski
2014-04-09 17:33:52 UTC
I think the stack trace is screwy past efi_call5. gdb shows: 0xffffffff811c8f7f <+47>: mov %r10,-0x30(%rbp) 0xffffffff811c8f83 <+51>: mov %rdi,-0x38(%rbp) 0xffffffff811c8f87 <+55>: callq 0xffffffff817034c0 <_cond_resched> 0xffffffff811c8f8c <+60>: mov -0x38(%rbp),%r9 0xffffffff811c8f90 <+64>: mov -0x30(%rbp),%r10 0xffffffff811c8f94 <+68>: nopl 0x0(%rax,%rax,1) 0xffffffff811c8f99 <+73>: mov %r9,%r15 What's a call to cond_resched doing in here? For security_context_to_sid_core, I see: 0xffffffff81302609 <+393>: mov %r15d,%eax 0xffffffff8130260c <+396>: mov -0x30(%rbp),%rdx 0xffffffff81302610 <+400>: xor %gs:0x28,%rdx 0xffffffff81302619 <+409>: jne 0xffffffff813026d3 <security_context_to_sid_core+595> 0xffffffff8130261f <+415>: add $0x68,%rsp 0xffffffff81302623 <+419>: pop %rbx so the address is between instructions. Presumably some random crap is on the stack. My motherboard is an MSI X79A-GD65 (8D) (MS-7760) with BIOS revision 4.6, and that BIOS is fairly crappy. efi=old_map makes no difference. Andy, could you upload the dmesg from your machine? I'd like to figure out what specific region the RIP is in at the time of the fault. Is there a known working kernel version where this oops doesn't occur? It looks like our "For the love of God don't write too much data to the NVRAM" workaround from efi_query_variable_store() is causing this. Apparently your firmware tried to jump through a NULL pointer, $ ./scripts/decodecode < /tmp/oops [ 68.260757] Code: 00 eb 2b b8 ab aa aa aa f7 25 ff 99 00 00 d1 ea 8d 04 12 49 03 c0 48 3b c8 75 09 c6 05 dd 99 00 00 00 eb 0a 48 8b 05 3a 95 00 00 <ff> 50 48 48 83 c4 28 c3 cc cc 48 89 5c 24 08 48 89 6c 24 10 48 All code ======== 0: 00 eb add %ch,%bl 2: 2b b8 ab aa aa aa sub -0x55555555(%rax),%edi 8: f7 25 ff 99 00 00 mull 0x99ff(%rip) # 0x9a0d e: d1 ea shr %edx 10: 8d 04 12 lea (%rdx,%rdx,1),%eax 13: 49 03 c0 add %r8,%rax 16: 48 3b c8 cmp %rax,%rcx 19: 75 09 jne 0x24 1b: c6 05 dd 99 00 00 00 movb $0x0,0x99dd(%rip) # 0x99ff 22: eb 0a jmp 0x2e 24: 48 8b 05 3a 95 00 00 mov 0x953a(%rip),%rax # 0x9565 2b:* ff 50 48 callq *0x48(%rax) <-- trapping instruction 2e: 48 83 c4 28 add $0x28,%rsp 32: c3 retq 33: cc int3 34: cc int3 35: 48 89 5c 24 08 mov %rbx,0x8(%rsp) 3a: 48 89 6c 24 10 mov %rbp,0x10(%rsp) 3f: 48 rex.W Can you create new variables/new boot entries with efibootmgr? It would also be useful if you could figure out which efi.set_variable() call in efi_query_variable_store() causes this oops. Created attachment 131881 [details]
Complete kernel logs, instrumented
It's the first set_variable call. The dmesg I attached is from a kernel with this patch: diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index b97acec..7a7face 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -1159,6 +1159,7 @@ efi_status_t efi_query_variable_store(u32 attributes, unsi if (!dummy) return EFI_OUT_OF_RESOURCES; + printk(KERN_INFO "set_variable %ld %p\n", (long)dummy_size, (voi status = efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID, EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS | @@ -1170,6 +1171,7 @@ efi_status_t efi_query_variable_store(u32 attributes, unsi * This should have failed, so if it didn't make sure * that we delete it... */ + printk(KERN_INFO "set_variable zero %p\n", (void*)dummy) efi.set_variable(efi_dummy_name, &EFI_DUMMY_GUID, EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS | Andy, did you definitely upload the correct log? All I can see is a file with loads of btrfs messages. Created attachment 132371 [details]
Complete kernel logs, instrumented
I have too many files called dmesg.txt. This should be the right one.
Sorry Andy, could you boot a v3.15-rc kernel and enable CONFIG_EFI_PGT_DUMP? I need the output to figure out which region the firmware was in when it crashed (I forgot that with the new EFI memmap stuff regions aren't mapped at a linear offset). Sure. Note that, on 3.14, the bug is masked by the change to skip reading the BGRT payload if the valid bit isn't set. Ignore the comment about BGRTs. I somehow mixed up the bugs in my head. Created attachment 135341 [details]
Logs w/ EFI page table
For better or for worse, efibootmgr just successfully deleted an entry on this kernel.
Andy, are you still seeing issues here? I just triggered this issue again on a stock Fedora 23 live USB stick on the same machine. I can't give a proper log because the machine hung hard with only a useless tiny bit of log message at the bottom. RIP = 0xfffffffee14dcc93 I just manged to crash the builtin firmware setup screen a few times while manipulating boot options. I'm closing this as INVALID since it's most likely a severe UEFI firmware bug and there's probably nothing that Linux can do about it. |