Bug 216375 - EFI stub: ERROR: Exit boot services failed
Summary: EFI stub: ERROR: Exit boot services failed
Status: NEW
Alias: None
Product: EFI
Classification: Unclassified
Component: Boot (show other bugs)
Hardware: ARM Linux
: P1 high
Assignee: EFI Virtual User
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-18 17:37 UTC by TJ
Modified: 2023-04-21 15:19 UTC (History)
2 users (show)

See Also:
Kernel Version: v5.19
Subsystem:
Regression: No
Bisected commit-id:


Attachments
libstub debug patches (10.06 KB, patch)
2022-08-18 17:42 UTC, TJ
Details | Diff
Kernel build .config (188.19 KB, text/plain)
2022-08-18 17:52 UTC, TJ
Details
debugging with msr use (5.06 KB, patch)
2022-08-21 09:37 UTC, TJ
Details | Diff
Attempting create_event_ex() (6.48 KB, patch)
2022-08-21 21:59 UTC, TJ
Details | Diff
Attempting create_event_ex() #2 (6.83 KB, patch)
2022-08-21 22:54 UTC, TJ
Details | Diff
Attempting create_event_ex() #3 using EFI_EVT_MEMORY_MAP_CHANGE (8.57 KB, patch)
2022-08-21 23:56 UTC, TJ
Details | Diff

Description TJ 2022-08-18 17:37:07 UTC
ExitBootServices() fails on Samsung W737 (Galaxy Book2), a Qualcomm SDM850 ARM64 device.

I've been diagnosing a failure to boot on a Samsung Galaxy Book2 (W737) 
but have not been able to get the kernel to boot, and would welcome some 
assistance:

EFI stub: ERROR: Exit boot services failed

The problem occurs in 
drivers/firmware/efi/libstub/efi-stub-helper.c::efi_exit_boot_services(). 
EFI_INVALID_PARAMETER is returned by efi_bs_call(exit_boot_services...).

Since the addition of commit d4b341269efb3 "arm64: dts: qcom: Add 
support for Samsung Galaxy Book2" I had inferred it is now possible to 
boot this device into Linux, but so far it seems not.

Based on reading the EDK source I suspect either the key is changing 
or memory alignment is wrong; from a reading of EDK's 
CoreExitBootServices() and CoreTerminateMemoryMap() and making a large 
assumption that Lenovo/Aptio (American Megatrends) firmware is based 
around EDK (proven to be true via extracting firmware via EFI shell).

The only possible lead I have currently is via EDK's MdeModulePkg/Core/Dxe/Mem/Page.c::CoreTerminateMemoryMap() and the reasons it returns EFI_INVALID_PARAMETER; namely:

if ((Entry->Start & (RUNTIME_PAGE_ALLOCATION_GRANULARITY - 1)) != 0) {
  DEBUG ((DEBUG_ERROR | DEBUG_PAGE, "ExitBootServices: A RUNTIME memory entry is not on a proper alignment.\n"));
  Status =  EFI_INVALID_PARAMETER;
  goto Done;
}

if (((Entry->End + 1) & (RUNTIME_PAGE_ALLOCATION_GRANULARITY - 1)) != 0) {
  DEBUG ((DEBUG_ERROR | DEBUG_PAGE, "ExitBootServices: A RUNTIME memory entry is not on a proper alignment.\n"));
  Status =  EFI_INVALID_PARAMETER;
  goto Done;
}

But I'm not entirely clear on what, precisely, those "Entry" are in relation to the kernel; nor whether the kernel can influence that in any way. I checked the alignment value in-kernel and it reported 0x10000 (64KiB) which is correct.

I added extensive "efi_debug()" logging to try to identify the problem and 
have reported the patches I'm using and results extensively (too 
extensively for including in email or here!) at "Issue #11: Samsung Galaxy 
Book2 (SM-W737Y) snapdragon 850" of the aarch64-laptops project:

https://github.com/aarch64-laptops/debian-cdimage/issues/11

I've tried all manner of efi= options. It never gets far enough for "novamap". Kernel has "# CONFIG_EFI_DISABLE_PCI_DMA is not set" and I've tried "disable_early_pci_dma" and "nochunk".

Sample log below was from a boot via GRUB. When I switched to using the EFI Shell to directly start the kernel it was unable to find the DTB - not sure if that is a clue or not. It would report "EFI stub: Generating empty DTB".

To solve that I'd manually pass the static "arch/arm64/boot/dts/qcom/sdm850-samsung-w737.dtb" in with "dtb=\EFI\BOOT\sdm850-samsung-w737.dtb" when it'd report "EFI stub: Using DTB from command line".

I tried increasing EFI_MMAP_NR_SLACK_SLOTS=32 based on reading bug #213991 and even added a loop of 10 attempts in efi_exit_boot_services() around the retry block for efi_bs_call(exit_boot_services...) in case the comment in that block of code about number of repeats might be another clue!

A typical boot attempt (via GRUB):

EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: DEBUG: Entering allocate_new_fdt_and_exit_boot()
EFI stub: Exiting boot services...
EFI stub: DEBUG: Calling efi_exit_boot_services() with map:
EFI stub: DEBUG: efi_boot_memmap = {map=0x000000009ffce870 
map_size=0xa20 desc_size=0x30 desc_ver=0x1
  key_ptr=0x000000009ffce868 *key_ptr=0x2b2c buff_size=0xc90 }
EFI stub: DEBUG: Entered efi_exit_boot_services()
EFI stub: DEBUG: Memory map before calling priv_func():
EFI stub: DEBUG: efi_boot_memmap = {map=0x000000009ffce870 
map_size=0xa20 desc_size=0x30 desc_ver=0x1
  key_ptr=0x000000009ffce868 *key_ptr=0x2b2c buff_size=0xc90 }
EFI stub: DEBUG: Memory map after calling priv_func():
EFI stub: DEBUG: efi_boot_memmap = {map=0x000000009ffce870 
map_size=0xa20 desc_size=0x30 desc_ver=0x1
  key_ptr=0x000000009ffce868 *key_ptr=0x2b2c buff_size=0xc90 }
EFI stub: DEBUG: Calling efi_bs_call(exit_boot_services, 
0x00000000fcb87c98, 0x2b2c)
EFI stub: DEBUG: efi_bs_call() returned -9223372036854775806 
(EFI_INVALID_PARAMETER)
EFI stub: DEBUG: Calling efi_bs_call() with get_memory_map
EFI stub: DEBUG: Memory map after calling get_memory_map() again:
EFI stub: DEBUG: efi_boot_memmap = {map=0x000000009ffce870 
map_size=0xa20 desc_size=0x30 desc_ver=0x1
  key_ptr=0x000000009ffce868 *key_ptr=0x2b6e buff_size=0xc90 }
EFI stub: DEBUG: Calling priv_func() again
EFI stub: DEBUG: Memory map after calling priv_func() again:
EFI stub: DEBUG: efi_boot_memmap = {map=0x000000009ffce870 
map_size=0xa20 desc_size=0x30 desc_ver=0x1
  key_ptr=0x000000009ffce868 *key_ptr=0x2b6e buff_size=0xc90 }
EFI stub: DEBUG: Calling (again!) efi_bs_call(exit_boot_services, 
0x00000000fcb87c98, 0x2b6e)
EFI stub: DEBUG: fail: efi_exit_boot_services() returns -9223372036854775806
EFI stub: DEBUG: Returned from efi_Exit_boot_services()
EFI stub: ERROR: Exit boot services failed.
EFI stub: DEBUG: fail_free_new_fdt: status= -9223372036854775806
EFI stub: DEBUG: fail: status= -9223372036854775806
EFI stub: DEBUG: Busy Wait so you can read the debug messages!

Also emailed linux-efi@ with the same information.

https://lore.kernel.org/linux-efi/2e9741aa-562c-cd0d-7ee4-8281512bc2e0@iam.tj/T/#u
Comment 1 TJ 2022-08-18 17:42:36 UTC
Created attachment 301598 [details]
libstub debug patches
Comment 2 TJ 2022-08-18 17:52:17 UTC
Created attachment 301599 [details]
Kernel build .config
Comment 3 TJ 2022-08-18 20:49:50 UTC
Created a call-stack overview to get an idea of where to look if there's an issue in-kernel. With it being a possible alignment issue I tested with "nokaslr" with no improvement.

# Entry point
→ efi_pe_entry()
 ↦ check_platform_features()
 ↦ efi_system_table->boottime->handle_protocol()
 ↦ efi_convert_cmdline()
 ↦ efi_parse_options()
 = efi_info("Booting Linux Kernel...\n");
 ↦ setup_graphics()
 ↦ handle_kernel_image()
 ↦ efi_retrieve_tpm2_eventlog()
 ↦ efi_enable_reset_attack_mitigation()
 ↦ efi_get_secureboot()
 = efi_err("Ignoring DTB from command line.\n");
 ↦ efi_load_dtb()
 = efi_err("Failed to load device tree!\n");
 = efi_info("Using DTB from command line\n");
 | efi_info("Using DTB from configuration table\n");
 | efi_info("Generating empty DTB\n");
 ↦ efi_load_initrd()
 ↦ efi_random_get_seed()
 ↦ get_efi_config_table()
 ? if (!IS_ENABLED(CONFIG_HIBERNATION) && !efi_nokaslr && !flat_va_mapping)
  = efi_get_random_bytes()
  = virtmap_base = EFI_RT_VIRTUAL_BASE + (((headroom >> 21) * rnd) >> (32 - 21));
  
 ↦ install_memreserve_table()

 ↳ allocate_new_fdt_and_exit_boot()
  ↦ efi_get_memory_map()
  ↦ efi_allocate_pages()
  ↦ efi_get_memory_map()
  ↦ update_fdt()

  ↳ efi_exit_boot_services()
   ↦ priv_func()
   ↦ efi_pci_disable_bridge_busmaster()
   ↦ efi_bs_call(exit_boot_services ...)
   ↩

  ↦ efi_system_table->runtime->set_virtual_address_map()
  ↦ efi_free(MAX_FDT_SIZE, *new_fdt_addr);
  ↩

 = efi_err("Failed to update FDT and exit boot services\n");
↩
Comment 4 TJ 2022-08-19 09:54:03 UTC
Weird result but got EFI_SUCCESS!

Not being able to debug into the firmware to discover the precise reason for returning EFI_INVALID_PARAMETER from CoreTerminateMemoryMap() I decided to time how long the call into exit_boot_services was taking via EFI's GetNextMonotonicCount() - intending to then measure the time taken with deliberately wrong handle for example, to try to determine which code path was being taken since there's a loop to check alignment which should take longer than a simple wrong handle..

I first added the prototype in drivers/firmware/efi/libstub/efistub.h::union efi_boot_services:

efi_status_t (__efiapi *get_next_monotonic_count)(u64 *);

Then in efi_exit_boot_services() I added two calls to that function; before and after the call to exit_boot_services:

u64 start, end;
...
status = efi_bs_call(get_next_monotonic_count, &start);
status = efi_bs_call(exit_boot_services, handle, *map->key_ptr);
status = efi_bs_call(get_next_monotonic_count, &end);
efi_debug("Time taken: %llu\n", end - start);

Then as usual booted the PC to the Shell and manually launched Linux:

Shell> mode 240 67
Shell> FS2:
FS2:> cd \EFI\BOOT
FS2:\EFI\BOOT> LINUXAA64.EFI debug efi=debug dtb=\EFI\BOOT\sdm850-samsung-w737.dtb

Imagine my surprise when exit_boot_services returned EFI_SUCCESS!

EFI stub: Booting Linux Kernel...
...
EFI stub: DEBUG: Entered allocate_new_fdt_and_exit_boot()
...
EFI stub: DEBUG: Entered efi_exit_boot_services()
...
EFI stub: DEBUG: Calling efi_bs_call(exit_boot_services, 0x00000000fc9aa318, 0x3e29)
EFI stub: DEBUG: Time taken: 1
EFI stub: DEBUG: Returning from efi_exit_boot_services() with EFI_SUCCESS
EFI stub: DEBUG: Returned from efi_exit_boot_services()
EFI stub: DEBUG: Preparing to do virtual address map
EFI stub: DEBUG: Calling svam()
EFI stub: DEBUG: Returned from svam()
EFI stub: DEBUG: svam() failed with -9223372036854775805
EFI stub: DEBUG: allocate_new_fdt_and_exit_boot() returning EFI_SUCCESS


The svam() failure was expected since I'd disabled the actual physical to virtual mapping being done in fdt.c::exit_boot_func() in case it was causing issues. I shall now back that change and test once again.
Comment 5 TJ 2022-08-19 10:36:18 UTC
Well - those calls to get_next_monotonic_count() are surely the tonic!

After re-enabling the physical to virtual mapping in fdt.c::exit_boot_func() Linux booted and lots of regular kernel messages scrolled by before, unfortunately, the system reset instantly :(

At least that is major progress. I'm going to try replacing the calls to get_next_monotonic_count() with a single call to stall() - although I have those in other places so not holding out hope that'll solve it as yet...

... no, calls to stall(10) before or after exit_boot_services fails.

Single call to get_next_monotonic_count() before exit_boot_services also fails.

Single call to get_next_monotonic_count() **AFTER** exit_boot_services results in EFI_SUCCESS and Linux continues to boot until it hits

Kernel panic - not syncing@ VFS: Unable to mount root fs on unknown-block(0,0)

This must be the first time I've been relieved and pleased to see that message!

So now we need to figure out why adding support for and calling get_next_monotonic_count() AFTER exit_boot_services() solves this issue, and what the permanent solution/patch should look like.
Comment 6 TJ 2022-08-19 13:17:41 UTC
On 19/08/2022 08:22, Ard Biesheuvel wrote:
> (cc Steve - not sure who else still works on this stuff these days)
>>
>> EFI stub: ERROR: Exit boot services failed
>>
> Looking at your adventures, it seems you are looking for something
> that the EFI stub in Linux is doing wrong, causing the
> ExitBootServices() call to fail.
> 
...
> One thing you could try is setting the TPL level to TPL_HIGH before
> calling exit_boot_services (or just disable interrupts). That way,
> your first call is more likely to succeed, as nothing that could
> change the memory map could run in the meantime.

Can you give me guidance on how to do that; a grep for TPL_* and 
{raise,restore}_tpl seems to indicate these are not currently implemented?

If so, presumably I need to do the same with those as I did with 
GetNextMonotonicCount() / get_next_monotonic_count ?
Comment 7 TJ 2022-08-19 17:43:36 UTC
I tried with adding calls to RaiseTPL()/raise_tpl to efi_exit_boot_services():

// efistub.h
unsigned long (__efiapi *raise_tpl)(unsigned long);
void          (__efiapi *restore_tpl)(unsigned long);


// efi-stub-helper.c
unsigned long tpl;
...
efi_debug("Calling efi_bs_call(raise_tpl, EFI_TPL_HIGH_LEVEL)\n");
tpl = efi_bs_call(raise_tpl, EFI_TPL_HIGH_LEVEL);
efi_debug("Calling efi_bs_call(exit_boot_services, 0x%p, 0x%lx)\n", handle, *map->key_ptr);
status = efi_bs_call(exit_boot_services, handle, *map->key_ptr);
efi_debug("Calling efi_bs_call(restore_tpl, tpl)\n");
efi_bs_call(restore_tpl, tpl);

But that seems to hang in raise_tpl()

EFI stub: DEBUG: Calling efi_bs_call(raise_tpl, EFI_TPL_HIGH_LEVEL)
...

nothing more

Then I tried with EFI_TPL_NOTIFY ... this continues but exit_boot_services again returns EFI_INVALID_PARAMETER !

Return to calling get_next_monotonic_count() and it again returns EFI_SUCCESS.
Comment 8 TJ 2022-08-20 00:08:50 UTC
I've now managed to boot into the initrd.img shell.

I was struggling to come up with a rational explanation for how adding this call **after** the call to exit_boot_services is making it work... then I looked closely and realised I'm replacing the EFI_INVALID_PARAMETER with EFI_SUCCESS in the 2nd call - duh!

status = efi_bs_call(exit_boot_services, handle, *map->key_ptr);
status = efi_bs_call(get_next_monotonic_count, &end);

So basically get_next_monotonic has nothing to do with this; it is simply replacing the bad result from exit_boot_services and moving on!

I cannot believe I missed something so obvious - that's what comes of too many hours staring at it!

So the question now is, if the boot seems to carry on, how to deal with this?  Some kind of quirk to recognise the specific flaky firmware?
Comment 9 TJ 2022-08-21 08:45:39 UTC
The issue here may be that the map key is increasing in the background.

I added several efi_info() reports of map->key and, assuming it should increment by 1 each time a new allocation is added, then the reported values seem to indicate more reallocations than those called from libstub.

EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: allocate_new_fdt...() 1: map->key=0x174b
EFI stub: Exiting boot services
EFI stub: allocate_new_fdt...() 2: map->key=0x1756
EFI stub: efi_Exit_boot_services() 1 map->key=0x1756 mymap->key=0x175b
EFI stub: efi_Exit_boot_services() 2 map->key=0x1756 mymap->key=0x1762
EFI stub: efi_Exit_boot_services() 3 map->key=0x1756 mymap->key=0x1769
EFI stub:  exit_boot_services call 1 status = 0x8000000000000002

Hung

In case there was some issue with the map passed from allocate_new_fdt_and_exit_boot() to efi_exit_boot_services() I created mymap in the latter function and use it. The messages above show how the key appears to have increased by 19 (0x1756 to 0x1769) just whilst that function call is progressing.

I'll try the EFI_TPL_HIGH_LEVEL workaround once more.
Comment 10 Ard Biesheuvel 2022-08-21 08:55:15 UTC
The problem with the map key is that it races with asynchronous events that are signalled over the back of the timer interrupt. This is why exit_boot_services() is documented as potentially failing the first time, in which case it needs to be called again. The first call will disable the timer interrupt, (and signal ExitBootServices events synchronously) so that the second get_memory_map()/exit_boot_services() call pair will no longer be interrupted - in theory, at least.

As you are on arm64, you could try asm("msr daifset, #3") instead of raising the TPL, and if this makes no difference, we could experiment with signalling the ExitBootServices event group explicitly before calling get_memory_map(). However, these are not changes I am not willing to consider for upstreaming - adding horrid hacks like these to support systems that the vendor clearly has no interest in supporting on Linux is a bit too last-decade for me.
Comment 11 TJ 2022-08-21 09:28:56 UTC
EFI_TPL_HIGH_LEVEL hangs; EFI_TPL_NOTIFY is safe but the increment continues (approximately 7 steps between calls to efi_get_memory_map() for reporting the key).

@Ard: Thanks for the explanation and clues. I appreciate what you say about horrid hacks.

I carry a handful of useful patches and build the latest release/stable with those on top so if we can find a solution to this issue it isn't a problem to deploy.

This experience does make me wonder what MS Windows is doing differently since from what I can see from examining the EFI images (from EDK shell) and the Windows boot manager the latter is 'stock' Windows 10 S. 

The reason I got this Book2 device is it has an AMOLED display; my aim was (is?) to use it for a custom in-vehicle display project where the AMOLED doesn't have a glaring backlight at night-time.

No apparent difference using:

asm("msr daifset, #3");


EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: allocate_new_fdt...() 1: map->key=0x1598
EFI stub: Exiting boot services
EFI stub: allocate_new_fdt...() 2: map->key=0x15ab
EFI stub: efi_Exit_boot_services() 1 map->key=0x15ab mymap->key=0x15b4
EFI stub: efi_Exit_boot_services() 2 map->key=0x15ab mymap->key=0x15bf
EFI stub: efi_Exit_boot_services() 3 map->key=0x15ab mymap->key=0x15ca
EFI stub:  doing asm(\"msr daifset, #3\"); and then calling exit_boot_services
EFI stub:  exit_boot_services call 1 status = 0x8000000000000002
EFI stub: RETRY exit_boot_services() 4 map->key=0x15ab mymap->key=0x15e8
EFI stub: RETRY exit_boot_services() (after priv_func()) 5 map->key=0x15ab mymap->key=0x15e8
EFI stub: ERROR: Exit boot services failed.
EFI stub: ERROR: Failed to update FDT and exit boot services
Comment 12 TJ 2022-08-21 09:35:35 UTC
I wondered if having GRUB in the workflow might be causing this so I did a direct boot of Linux - same results with increments of key.
Comment 13 TJ 2022-08-21 09:37:39 UTC
Created attachment 301615 [details]
debugging with msr use

Here's the current patch I'm using.
Comment 14 TJ 2022-08-21 14:05:50 UTC
I experimented a little to see if it is possible to determine what is causing the key to increment so frequently. I wondered initially if it was due to booting from a USB device (maybe the USB driver(s) were doing something).

I wondered if the map was being replaced because it was growing too fast so I tried with:

#define EFI_MMAP_NR_SLACK_SLOTS        1024

but the address of the map, the map_size, desc_size, buff_size remained constant across the multiple calls to get_memory_map() whilst key increased so this doesn't seem like an issue of the memory map increasing in size - possibly its the number of descriptors being used?

I added a 5 second stall between two calls to get_memory_map() and saw the key increments quite a lot in that time; which seems to confirm that there is some regular (possibly hardware-induced) event in the background operating at a higher TPL (presumably EFI_TPL_NOTIFY or higher firmware-private).

I copied Linux onto the internal storage as /EFI/BOOT/BOOTAA64.EFI and tested without the USB device connected. Key increment occurs.

To gather more clues I am now wondering if there's any point in creating notify events to monitor EFI_EVENT_GROUP_BEFORE_EXIT_BOOT_SERVICES and/or EFI_EVENT_GROUP_EXIT_BOOT_SERVICES to record the key into a variable and report its value later after exit_boot_services() returns.
Comment 15 TJ 2022-08-21 21:59:59 UTC
Created attachment 301624 [details]
Attempting create_event_ex()

I've added a call to create_event_ex() but so far it returns EFI_INVALID_PARAMETER and I've not been able to figure out why - mainly because I cannot find any examples to guide me!

Patch attached.
Comment 16 TJ 2022-08-21 22:54:36 UTC
Created attachment 301625 [details]
Attempting create_event_ex() #2

Last attempt for today!

I used EFI_EVT_NOTIFY_SIGNAL and a call-back function and create_event_ex() was successful - as was the callback function. Unfortunately no apparent improvement; key seems to carry on incrementing:(

I also noticed it is no longer getting the DTB - not sure if that is due to direct booting Linux, or something has changed in the firmware config itself. In earlier tests (using GRUB) it'd show "Using DTB from configuration table" - cannot recall now if that was also reported at the point I switched to doing direct Linux boots.

EFI stub: Booting Linux Kernel...
EFI stub: Generating empty DTB
EFI stub: allocate_new_fdt...() 1: map->key=0x1727
EFI stub: Exiting boot services
EFI stub: allocate_new_fdt...() 2: map->key=0x1733
EFI stub: EFI_MMAP_NR_SLACK_SLOTS=1024
EFI stub: efi_exit_boot_services() 1 key=0x1738 map_size=0xbd0 desc_size=0x30 buff_size=0xccc0
EFI stub: efi_exit_boot_services() 2 key=0x1741 map_size=0xbd0 desc_size=0x30 buff_size=0xccc0
EFI stub: efi_exit_boot_services() 3 key=0x1746 map_size=0xbd0 desc_size=0x30 buff_size=0xccc0
EFI stub: create_event_ex (EFI_EVT_GROUP_EXIT_BOOT_SERVICES) = 0x0
EFI stub: >> event_notified() ! <<
EFI stub: signal_event(EFI_EVT_GROUP_SIGNAL_EXIT_BOOT_SERVICES) = 0X0
EFI stub: Calling exit_boot_services now
EFI stub:  exit_boot_services call 1 status = 0x8000000000000002
EFI stub: efi_exit_boot_services() 4 key=0x1746 map_size=0xbd0 desc_size=0x30 buff_size=0xccc0
EFI stub: Stalling for 5 seconds
EFI stub: RETRY exit_boot_services() 5 key=0x176e map_size=0xbd0 desc_size=0x30 buff_size=0xccc0
EFI stub:  exit_boot_services call 2 status = 0x8000000000000002
Comment 17 TJ 2022-08-21 23:56:57 UTC
Created attachment 301626 [details]
Attempting create_event_ex() #3 using EFI_EVT_MEMORY_MAP_CHANGE

I had the bright idea to track and report changes to the memory map via getting notified of EFI_EVT_MEMORY_MAP_CHANGE.

Attached patch shows the code; added early in efi_pe_entry(). The call-back function reports every 4 notifies to avoid spamming the screen.

static int map_changed_count = 0;
void __efiapi event_memory_map_changed(efi_event_t event, void *context)
{
       if (++map_changed_count % 4 == 0)
               efi_info(">> Memory Map Changed 0x%x <<\n", map_changed_count);
}

Result:
EFI stub: create_event_ex (EFI_EVT_MEMORY_MAP_CHANGED) = 0X0
EFI stub: >> Memory Map Changed 0x4 <<
EFI stub: Booting Linux Kernel...
EFI stub: >> Memory Map Changed 0x8 <<
EFI stub: >> Memory Map Changed 0xc <<
EFI stub: Generating empty DTB
EFI stub: >> Memory Map Changed 0x10 <<
EFI stub: >> Memory Map Changed 0x14 <<
EFI stub: >> Memory Map Changed 0x18 <<
EFI stub: allocate_new_fdt...() 1: map->key=0x1783
EFI stub: >> Memory Map Changed 0x1c <<
EFI stub: Exiting boot services
EFI stub: >> Memory Map Changed 0x20 <<
EFI stub: >> Memory Map Changed 0x24 <<
EFI stub: allocate_new_fdt...() 2: map->key=0x1797
EFI stub: >> Memory Map Changed 0x28 <<
EFI stub: EFI_MMAP_NR_SLACK_SLOTS=1024
EFI stub: efi_exit_boot_services() 1 key=0x17a0 map_size=0xc30 desc_size=0x30 buff_size=0xcd20
EFI stub: >> Memory Map Changed 0x2c <<
EFI stub: efi_exit_boot_services() 2 key=0x17b1 map_size=0xc30 desc_size=0x30 buff_size=0xcd20
EFI stub: >> Memory Map Changed 0x30 <<
EFI stub: efi_exit_boot_services() 3 key=0x17ba map_size=0xc30 desc_size=0x30 buff_size=0xcd20
EFI stub: >> Memory Map Changed 0x34 <<
EFI stub: create_event_ex (EFI_EVT_GROUP_EXIT_BOOT_SERVICES) = 0x0
EFI stub: >> event_notified() ! <<
EFI stub: >> Memory Map Changed 0x38 <<
EFI stub: >> Memory Map Changed 0x3c <<
EFI stub: >> Memory Map Changed 0x40 <<
EFI stub: >> Memory Map Changed 0x44 <<
EFI stub: signal_event(EFI_EVT_GROUP_SIGNAL_EXIT_BOOT_SERVICES) = 0X0
EFI stub: Calling exit_boot_services now
EFI stub: >> Memory Map Changed 0x48 <<
EFI stub:  exit_boot_services call 1 status = 0x8000000000000002
EFI stub: >> Memory Map Changed 0x4c <<
EFI stub: efi_exit_boot_services() 4 key=0x17ba map_size=0xc30 desc_size=0x30 buff_size=0xcd20
EFI stub: Stalling for 5 seconds
EFI stub: >> Memory Map Changed 0x50 <<
EFI stub: RETRY exit_boot_services() 5 key=0x1803 map_size=0xc90 desc_size=0x30 buff_size=0xcd20
EFI stub: >> Memory Map Changed 0x54 <<
EFI stub:  exit_boot_services call 2 status = 0x8000000000000002

Notice how the buff_size, and then map_size, change in report #4 and #5 after the initial call to exit_boot_services?

And the map continues to be changed rapidly throughout.
Comment 18 TJ 2022-08-22 01:58:06 UTC
I tried one more workaround - I put the call to exit_boot_services() into a loop that increments the key value rapidly to try to 'catch up' with the firmware. It doesn't manage to - just keeps on going.

ebs_count = 0;                                                                               ebs_key = *map->key_ptr;
do {                                                                                                  status = efi_bs_call(exit_boot_services, handle, ebs_key);                            
  ++ebs_count;
  ++ebs_key;
  if (ebs_count % 8 == 0)
    efi_info(" exit_boot_services call %d with key 0x%lx (started from 0x%lx)) status = 0x%lx\n", ebs_count, ebs_key, *map->key_ptr, status);                           |         * device tree, so this can be allocated anywhere.                                    
                                                                                              |         */                                                                                   
        } while (status != EFI_SUCCESS);                                                      |        cmdline_ptr = efi_convert_cmdline(image, &cmdline_size);                              
        efi_info(" exit_boot_services call %d with key 0x%lx status = 0x%lx\n", ebs_count, ebs|        if (!cmdline_ptr) {                                                                   
_key, status);
Comment 19 TJ 2022-08-22 02:00:11 UTC
I tried one more workaround - I put the call to exit_boot_services() into a loop that increments the key value rapidly to try to 'catch up' with the firmware. It doesn't manage to - just keeps on going.

ebs_count = 0;
ebs_key = *map->key_ptr;
do {
  status = efi_bs_call(exit_boot_services, handle, ebs_key);                            
  ++ebs_count;
  ++ebs_key;
  if (ebs_count % 8 == 0)
    efi_info(" exit_boot_services call %d with key 0x%lx (started from 0x%lx)) status = 0x%lx\n", ebs_count, ebs_key, *map->key_ptr, status);

} while (status != EFI_SUCCESS);
efi_info(" exit_boot_services call %d with key 0x%lx status = 0x%lx\n", ebs_count, ebs_key, status);
Comment 20 TJ 2022-09-13 11:38:37 UTC
I've separated out my debug code into individual commits to make it easier to play about with it. All the different approaches are guarded with "if (0) { ...}" so to test them change to "if (1)"

https://github.com/iam-TJ/linux/tree/efi-libstub-debug
Comment 21 Augustin Thiercelin 2023-04-21 15:19:05 UTC
We ran into a similar problem, maybe explaining it can help here.

Our kernel wasn't booting, and was returning the error EFI_BUFFER_TOO_SMALL.

We investigated by adding `efi_info` debug traces to the kernel sources and
found out that the cause of the error was the call to
'efi_pci_disable_bridge_busmaster()' just before 'exit_boot_services' resulting
in an enlargement of the EFI memory map, which seems not handled by the kernel
loader code.

Removing the `CONFIG_EFI_DISABLE_PCI_DMA` parameter to prevent this call seemed
like to fix this particular problem. However, it was now the
'exit_boot_services' call that was always failing, returning
`EFI_INVALID_PARAMETER`. The MapKey was always wrong, just like in your case.

The important information is that we tried to recompile the kernel without debug
traces, and it finally booted! We suspect without having yet investigated in
depth that debug traces are in fact modifying the memory map, and therefore, the
MapKey.

Note You need to log in before you can comment on or make changes to this bug.