Bug 75021 - SGI UV fails to boot with recent EFI changes
Summary: SGI UV fails to boot with recent EFI changes
Status: RESOLVED CODE_FIX
Alias: None
Product: EFI
Classification: Unclassified
Component: Boot (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: EFI Virtual User
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-28 21:54 UTC by Alex Thorlton
Modified: 2016-03-14 20:34 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.15-rc3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
.config file (117.91 KB, application/octet-stream)
2014-04-28 21:54 UTC, Alex Thorlton
Details

Description Alex Thorlton 2014-04-28 21:54:24 UTC
Created attachment 134091 [details]
.config file

This bug is introduced with commit d2f7cbe7b26a74dbbbf8f325b2a6fd01bc34032c.  After we identified the bug, Boris (bp@alien8.de) added a workaround to quirk out UV and allow us to boot (commit 95648c0e9fdd1cb1199ef387025d684704a8e62e).  While this does work to get things booting, it does not address the underlying issue.

For reference, below is the output from the failed boot, on the latest kernel, built a few minutes ago.  Config file is attached.

<snip>
Enabled IRQ remapping in x2apic mode
Enabling x2apic
Enabled x2apic
Switched APIC routing to cluster x2apic.
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
smpboot: CPU0: Genuine Intel(R) CPU  @ 2.60GHz (fam: 06, model: 2d, stepping: 06)
UV: Found UV2 hub
------------[ cut here ]------------
kernel BUG at arch/x86/mm/init_64.c:351!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc3-athorlton-dirty #846
Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013
task: ffff880ff977a010 ti: ffff880ff977c000 task.ti: ffff880ff977c000
RIP: 0010:[<ffffffff818ca862>]  [<ffffffff818ca862>] __init_extra_mapping+0x111/0x143
RSP: 0000:ffff880ff977dd18  EFLAGS: 00010206
RAX: 0000000000000f00 RBX: ffff880001c6b018 RCX: 0000000000000002
RDX: ffff880fff8d7f00 RSI: 0000000002000000 RDI: 00000000fc000000
RBP: ffff880ff977dd48 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88ef7e7f5000 R11: 0000000000000000 R12: 00000000fc000000
R13: 0000000002000000 R14: ffff8800fc000000 R15: 0000000080000000
FS:  0000000000000000(0000) GS:ffff880fffc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88ef7efff000 CR3: 00000000017f4000 CR4: 00000000000406f0
Stack:
 80000000000001fb 0000000000000000 0000000000000080 000000000000b018
 000000000000b010 000000000000b008 ffff880ff977dd58 ffffffff818ca8a7
 ffff880ff977de28 ffffffff818c600b ffff880fffc0cc80 0000000000000080
Call Trace:
 [<ffffffff818ca8a7>] init_extra_mapping_uc+0x13/0x15
 [<ffffffff818c600b>] uv_system_init+0x102/0x111d
 [<ffffffff8108c3f2>] ? clockevents_config_and_register+0x21/0x25
 [<ffffffff81029283>] ? setup_APIC_timer+0xbb/0xc7
 [<ffffffff8154ee04>] ? printk+0x72/0x74
 [<ffffffff818c3da6>] ? setup_boot_APIC_clock+0x4a8/0x4b7
 [<ffffffff8154ee04>] ? printk+0x72/0x74
 [<ffffffff818c1a6e>] native_smp_prepare_cpus+0x389/0x3d6
 [<ffffffff818b57c6>] kernel_init_freeable+0xb7/0x1fb
 [<ffffffff81546900>] ? rest_init+0x74/0x74
 [<ffffffff81546909>] kernel_init+0x9/0xd5
 [<ffffffff81552f7c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81546900>] ? rest_init+0x74/0x74
Code: ff ff ff 3f 00 00 48 23 13 48 b8 00 00 00 00 00 88 ff ff 48 01 c2 4c 89 e0 48 c1 e8 12 25 f8 0f 00 00 48 01 c2 48 83 3a 00 74 04 <0f> 0b eb fe 48 8b 45 d0 49 81 ed 00 00 20 00 4c 09 e0 49 81 c4
RIP  [<ffffffff818ca862>] __init_extra_mapping+0x111/0x143
 RSP <ffff880ff977dd18>
---[ end trace d3716733eb04969d ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
</snip>

I made the following changes to remove the workaround and re-expose the bug (did it this way, as I wasn't sure of the implications of reverting the entire commit containing the WAR):

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 3781dd3..aa8d237 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -1336,6 +1336,6 @@ void __init efi_apply_memmap_quirks(void)
        /*
         * UV doesn't support the new EFI pagetable mapping yet.
         */
-       if (is_uv_system())
-               set_bit(EFI_OLD_MEMMAP, &efi.flags);
+//     if (is_uv_system())
+//             set_bit(EFI_OLD_MEMMAP, &efi.flags);
 }

Original lkml discussion of the bug can be found here:

http://www.gossamer-threads.com/lists/linux/kernel/1855555

We (SGI) are beginning to investigate and attempt to resolve the bug, but wanted to track our progress here in the community, in case others run into similar issues.

- Alex
Comment 1 Matt Fleming 2014-05-19 13:33:27 UTC
Alan, I don't think it's correct to label this as a regression.

There *was* a regression, before commit 95648c0e9fdd1cb1199ef387025d684704a8e62e, but every config that used to work for SGI UV should still work after that commit without user tweaks.

It's just that SGI UV doesn't take advantage of the new code.
Comment 2 Borislav Petkov 2014-05-19 13:42:06 UTC
(In reply to Matt Fleming from comment #1)
> It's just that SGI UV doesn't take advantage of the new code.

And we're working on fixing that too - it is just not trivial and the
quirk in a5d90c923bcf ("x86/efi: Quirk out SGI UV") is for the interim.
Comment 3 Matt Fleming 2016-03-14 20:34:03 UTC
This is supported in recent kernels (with recent firmware). Closing.

Note You need to log in before you can comment on or make changes to this bug.