Created attachment 287431 [details] ACPI warnings from FreeBSD kernel Stacktrace: // osl.c:acpi_os_wait_semaphore: 1262 down_timeout(sem, jiffies) // exsystem.c:acpi_ex_system_wait_semaphore: 47 status = acpi_os_wait_semaphore(semaphore, 1, timeout); // evglock.c:acpi_ev_acquire_global_lock: 240 status = acpi_ex_system_wait_semaphore( acpi_gbl_global_lock_semaphore, ACPI_WAIT_FOREVER); // exmutex.c:acpi_ex_acquire_mutex_object: 152 status = acpi_ev_acquire_global_lock(timeout); // exutils.c:acpi_ex_acquire_global_lock: 181 status = acpi_ex_acquire_mutex_object(ACPI_WAIT_FOREVER, acpi_gbl_global_lock_mutex, acpi_os_get_thread_id()); // exfield.c:acpi_ex_write_data_to_field: 394 acpi_ex_acquire_global_lock(...); // exstore.c:acpi_ex_store_object_to_node: 523 status = acpi_ex_write_data_to_field(...) exstore.c:acpi_ex_store: // ACPI_GET_DESCRIPTOR_TYPE(dest_desc) = ACPI_DESC_TYPE_NAMED 75 status = acpi_ex_store_object_to_node(...) exoparg1.c:acpi_ex_opcode_1A_1T_1R: // case AML_STORE_OP: 422 status = acpi_ex_store(...) dswexec.c:acpi_ds_exec_end_op: // case AML_CLASS_EXECUTE: 405 status = acpi_gbl_op_type_dispatch[op_type](walk_state); acpi_ps_parse_loop acpi_ps_parse_aml acpi_ps_execute_method acpi_ns_evaluate acpi_ns_init_one_device(obj_handle="\_SB_.PCI0", nesting_level=2) acpi_ns_walk_namespace acpi_ns_initialize_devices acpi_initialize_objects acpi_init
Created attachment 287433 [details] DSDT dump $ iasl -d medion.dsdt.bin Intel ACPI Component Architecture ASL+ Optimizing Compiler/Disassembler version 20180105 Copyright (c) 2000 - 2018 Intel Corporation Input file medion.dsdt.bin, Length 0x5DA6E (383598) bytes ACPI: DSDT 0x0000000000000000 05DA6E (v02 MEDION MEDIONAG 01072009 INTL 20160527) Pass 1 parse of [DSDT] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20180105/psobject-263)
Linux dmesg: <...lots more AE_ALREADY_EXISTS preceding..., then> ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.XHC.RHUB.SS06._UPC], AE_ALREADY_EXISTS (20200110/dswload2-323) ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20200110/psobject-220) ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.XHC.RHUB.SS06._PLD], AE_ALREADY_EXISTS (20200110/dswload2-323) ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20200110/psobject-220) ACPI: Skipping parse of AML opcode: Method (0x0014) ACPI: 11 ACPI AML tables successfully acquired and loaded ACPI: EC: EC started ACPI: EC: interrupt blocked ACPI: \: Used as first EC ACPI: \: GPE=0x10, IRQ=-1, EC_CMD/EC_SC=0x66, EC_DATA=0x62 ACPI: EC: Boot ECDT EC used to handle transactions <hang>
Does not seem to be a regression; 4.4.76 (but with patches from openSUSE) and 4.12.14 (also patched by openSUSE) exhibit the same hang.
It looks like we have a mutex deadlock on the global lock in there.
Do you eventually get a timeout error (like below:) ACPI Error: Aborting method \_SB.PCI0.LPCB.EC0.RRAM due to previous error (AE_AML_LOOP_TIMEOUT)
I do not get such an error, at least not in a reasonable time. The longest I observed the console is about 50 minutes - the hung task checker kicks in after about 16min, though. [138.837] ACPI: EC: Boot ECDT EC used to handle transactions [1115.250] INFO: task swapper/0:1 blocked for more than 491 seconds. <call trace as printed by stack_trace_print> Given the execution is stuck with ACPI_WAIT_FOREVER (jiffies==MAX_SCHEDULE_TIMEOUT), I also see no possibility for down_timeout to return anytime soon to give parent functions a chance to return AE_ABORT_METHOD anywhere. Due to exfield.c:394 ignoring the return code from acpi_ex_acquire_global_lock, AE_TIME also never propagates further upwards. Furthermore, grepping for AE_ABORT_METHOD yields only one place which can return it, dbxface.c:222. This depends on acpi_gbl_abort_method being set, but this variable is never set to "true" anywhere from the looks of it.
>acpi_gbl_abort_method This flag is primarily used by the AML debugger and the AcpiExec application. I'm not sure, but I would guess that all of the AE_ALREADY_EXISTS exceptions are causing some confusion in the interpreter -- such that the Global Lock isn't getting released somewhere. These are serious errors, indicating some very bad ASL code, however. You might try getting a firmware upgrade to the latest for your machine. Bob
I find that acpi_gbl_global_lock_semaphore is initialized with 0 units (nsaccess.c): /* Create additional counting semaphore for global lock */ status = acpi_os_create_semaphore(1, 0, &acpi_gbl_global_lock_semaphore); and that the acpi_ev_global_lock_handler function (evglock.c) that would send a unit never gets called, which leads me to believe the hardware is not generating any ACPI_EVENT_GLOBAL events. But should it, and when?
Created attachment 287517 [details] Cure/Workaround Using this ad-hoc patch, I can successfully boot userspace.
Created attachment 287559 [details] Full ACPI table dump
Another possibility (without comment 9 patch) is to simply enable CONFIG_ACPI_REDUCED_HARDWARE_ONLY=y in .config, as this also will skip evglock.c code.
Thanks for the report. At this time, we need more debug information. If you could build a custom kernel, please build with CONFIG_ACPI_DEBUG=y without any patches of a kernel 5.0 or newer.. Please provide the full following dmesg: 1) plain dmesg 2) dmesg booting with the parameters "acpi.debug_level=0x80000 acpi.debug_layer=0xffffffff" 3) dmesg booting with parameters "acpi.debug_level=0x01000000 acpi.debug_layer=0x4" There's definitely something holding the lock. We just don't know what's holding it and why it's not releasing it.
>There's definitely something holding the lock. Did you observe comment 8? It's not the lock, it's a OS-level semaphore that is initialized to 0, and the hardware never emits events needed to "up" it to 1, hence the first "down" call is the one that hangs. >without any patches of a kernel 5.0 ... 1) plain dmesg Without any patches, that is going to be impossible. ACPI is so early that an external observer is required, like a serial console (for which laptops have not had a port since a long time), netconsole is going to be impossible because stupid manufacturers have also removed that port, and I can't reach userspace to dump dmesg to a disk, because obviously there's the hang. I can do a video capture, but then someone will have to sift through something like 2 hours of non-text material.
Created attachment 287629 [details] Alternate patch to comment 9. Diagnostic patch. This patch kills off the unbounded wait on acpi_gbl_global_lock_semaphore. Needless to say unbounded waits are kinda bad. It is here replaced with a 44ms wait (just something that will stand out in logs). Observations from running this patch on the hardware: 1. ACPI_ACQUIRE_GLOBAL_LOCK never succeeds in acquiring acpi_gbl_FACS. Looking at what that macro does, this may turn out to be a hardware bug that can only be worked around. Upgrading the firmware is not an option -- there is no upgrade to be found on the manufacturer's website. 2. (as before) acpi_gbl_global_lock_semaphore never gets posted either, so acpi_ex_system_wait_semaphore will never complete in a stock kernel. Unlike comment 9, this patch does not skip nearly as much locking code, so the dmesg should be better.
Created attachment 287631 [details] dmesg as per comment 12 item 1, with patch from comment 14 The kernel used was v5.5.5, the only extra being the patch from comment 14 (which is line number preserving).
Created attachment 287633 [details] dmesg as per comment 12 item 2
Created attachment 287635 [details] dmesg as per comment 12 item 3, with patch from comment 14
(In reply to Jan Engelhardt from comment #13) > >There's definitely something holding the lock. > > Did you observe comment 8? It's not the lock, it's a OS-level semaphore that > is initialized to 0, and the hardware never emits events needed to "up" it > to 1, hence the first "down" call is the one that hangs. Sorry, I wasn't reading that closely when I initially read the report. This definitely sounds like a hardware bug. Since we can't trust the global lock, we should not use it if we deem this as broken.
Created attachment 287637 [details] patch to detect bad global lock please try this patch
Your patch is faulty; 1. AE_TIME is emitted from the function, which is incorrect. We want to execute a similar pattern such as the "Make sure that a global lock actually exists." branch a few lines above. 1b. The caller at exfield.c:394 (see comment 2) ignores the AE_TIME return value, which causes some more problems down the line because things pretend the lock was taken when it's not. Eventually the system hangs again at some point. 2. Due to "break;", the acpi_os_acquire_lock call is skipped but not the acpi_os_release_lock after the loop. This leads to a release of a mutex that is not locked. New patch follows.
Created attachment 287651 [details] Updated patch
Created attachment 287653 [details] Patch from a completely different angle The real bug is in __acpi_acquire_global_lock. The firmware is fine (well, mostly). The FACS table contains some garbage bits, which __acpi_acquire_global_lock fails to deal with. This patch also makes the system boot successfully. This opens up another consideration though: if the FACS table were to contain 0xffffffff in the GL register, then acquiring the lock would always fail logically, even if the firmware is actually performing properly. IOW, I think we need to add a facs->global_lock&=~0x3 statement right after loading the FACS as well, just to reset that register to a sane value.
My patch from comment 22 ignores the bits 2-31 that are marked reserved by the spec, https://uefi.org/sites/default/files/resources/ACPI_6_2.pdf . Looking at FreeBSD's acpi_acquire_global_lock shows they are even ignoring bits 1-31 (which coincides with my first observation in this bugreport that FreeBSD is booting without issues): int acpi_acquire_global_lock(volatile uint32_t *lock) { uint32_t new, old; do { old = *lock; new = (old & ~ACPI_GLOCK_PENDING) | ACPI_GLOCK_OWNED; if ((old & ACPI_GLOCK_OWNED) != 0) new |= ACPI_GLOCK_PENDING; } while (atomic_cmpset_32(lock, old, new) == 0); return ((new & ACPI_GLOCK_PENDING) == 0); }
FreeBSD "accidentally" fixed the issue at some point. commit fccbffe7d83251aeb28395fd853f38f45bf3782d Author: jkim <jkim@FreeBSD.org> Date: Fri Feb 10 23:30:29 2012 +0000 De-obfuscate acpi_acquire_global_lock(). It seems the function was directly translated from i386 assembly version. - return ((new < GL_BIT_MASK) ? GL_ACQUIRED : GL_BUSY); + return ((new & GL_BIT_PENDING) == 0);
Created attachment 287659 [details] Fragment for idea from comment 22
@Rafael, what do you think about taking the patch from comment 22 for Linux?
Sounds good to me. Jan, please send this patch to linux-acpi, thanks!