Bug 97141
Summary: | ACPI initialisation broken since 3.14-rc1 for some mainboards. | ||
---|---|---|---|
Product: | ACPI | Reporter: | Marius Tolzmann (marius) |
Component: | BIOS | Assignee: | acpi_bios |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | aaron.lu, croccifixio29, jlee, lenb, lv.zheng, matt, rjw, wwwutz |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.0 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
full dmesg of broken ACPI after booting into 73f7d1c with applied c4e1acbb
ACPI / init: Split acpi_early_init() |
Description
Marius Tolzmann
2015-04-23 09:38:37 UTC
kernel versions 3.18.x, 3.19.x, 4.0 do not boot at all: they die with a kernel panic in 10Gig thernetdriver (ixgbe) and are unable to find root device. After reverting c4e1acbb35e4a3838cdfc0e7f8237e844aff00b6 and 73f7d1ca32638028e3271f54616773727e2f9f26 v4.0 kernel starts up like a charm again. IMO, acpi_load_tables() invoked in acpi_early_init() might be wrong. It bypasses all module level execution during boot time. So how useful the namespace nodes created in this stage will be? I doubt that there should be real drivers depending on such broken or half-parsed namespace nodes... Thanks and best regards -Lv Add relevant people. The original position of acpi_early_init() is after efi_enter_virtual_mode(), 73f7d1ca3 patch moved acpi_early_init() to the position before acpi_early_init() for two reasons: - Find the BGRT table earlier. <=== move to before efi_enter_virtual_mode() - Prepare setting system clock with ACPI TAD. <=== move to before timekeeping_init() The c4e1acbb3 patch reverted the part of ACPI TAD, it moved acpi_early_init() to before efi_enter_vritual_mode() but not the original position: after efi_enter_vritual_mode() The original discussion about EFI benefit is here: https://lkml.org/lkml/2013/12/20/471 Checked v4.0 kernel, the efi_bgrt_init() still called by efi_late_init() in the original place. Add Matt Fleming to Cc. list to clarify the benefit of finding BGRT. (In reply to Lee, Chun-Yi from comment #4) > The original position of acpi_early_init() is after efi_enter_virtual_mode(), > 73f7d1ca3 patch moved acpi_early_init() to the position before > acpi_early_init() for two reasons: ^^^^^^^^^^^^^^^^^ before timekeeping_init(), sorry for my typo Thanks for the information. Though my concern is not related to this bug. It looks the TAD use case is the root cause of such dirty implementation in the ACPICA interpreter around module level code... Just note this here for people who is looking at this issue. Thanks and best regards -Lv I'm OK with BGRT which is a data table, but for TAD which comes from a definition block table, I do have concerns... At this early boot time, PCI configuration space may not be accessible. Thus some module level code in the ACPI definition block table is also not executable. Invoking acpi_load_tables() so early causes a very strange software scheme in ACPICA. ACPICA interpreter has to execute only special AML code and ignore all module level code in this early stage and at a late stage, re-execute the module level code. This makes ACPICA interpreter implementation very dirty. Thanks -Lv (In reply to Lv Zheng from comment #8) > I'm OK with BGRT which is a data table, but for TAD which comes from a > definition block table, I do have concerns... > > At this early boot time, PCI configuration space may not be accessible. > Thus some module level code in the ACPI definition block table is also not > executable. > Invoking acpi_load_tables() so early causes a very strange software scheme > in ACPICA. > ACPICA interpreter has to execute only special AML code and ignore all > module level code in this early stage and at a late stage, re-execute the > module level code. This makes ACPICA interpreter implementation very dirty. > > Thanks > -Lv I fully agree your point. But if TAD could not access earlier before timekeeping_init(), then kernel can not use TAD to set system clock. That causes kernel accessing another RTC interface, like: EFI time service. Unfortunately the EFI time service is unreliable at run time so we avoid using it. Using EFI time service to get the offset in EFI stub maybe works, that need time to implement and check. Is it safe to "move" acpi_early_init() back to the position it used to be before commit 73f7d1ca326 (e.g. by reverting the both commits mentioned in comment 1)? we could use this as a quick fix to startup current linux kernel on our tyan mainboards - or will we hit some ugly side effects because other init-functions depend on it? since older kernel versions work great, we can also continue to use older kernels till this issue has been fixed. best regards and thanks for the help.. From the purely ACPI perspective, it should be safe to move acpi_early_init(). I'm not sure what EFI changes have been made since the commit in question, though, which may depend on it. Matt's input would be necessary to answer that. However, if your machines don't use EFI anyway, please try moving acpi_early_init(). (In reply to Lv Zheng from comment #8) > I'm OK with BGRT which is a data table, but for TAD which comes from a > definition block table, I do have concerns... > > At this early boot time, PCI configuration space may not be accessible. > Thus some module level code in the ACPI definition block table is also not > executable. > Invoking acpi_load_tables() so early causes a very strange software scheme > in ACPICA. > ACPICA interpreter has to execute only special AML code and ignore all > module level code in this early stage and at a late stage, re-execute the > module level code. This makes ACPICA interpreter implementation very dirty. I'm not sure I understand your point correctly. Do you have any specific examples of what may go wrong as a result of this? (In reply to Rafael J. Wysocki from comment #12) > (In reply to Lv Zheng from comment #8) > > I'm OK with BGRT which is a data table, but for TAD which comes from a > > definition block table, I do have concerns... > > > > At this early boot time, PCI configuration space may not be accessible. > > Thus some module level code in the ACPI definition block table is also not > > executable. > > Invoking acpi_load_tables() so early causes a very strange software scheme > > in ACPICA. > > ACPICA interpreter has to execute only special AML code and ignore all > > module level code in this early stage and at a late stage, re-execute the > > module level code. This makes ACPICA interpreter implementation very dirty. > > I'm not sure I understand your point correctly. > > Do you have any specific examples of what may go wrong as a result of this? For both the code inside of a Method and the code at the module level for the table (do not belong to any method), AML grammer for them defined in the spec is TermList. So they should be interpreted in the same style - both table loading and method execution should work in the same way. I've tried and it worked here: http://sourceforge.net/p/sdfirm/code/ci/master/tree/kernel/acpi/acpi_interp.c Both acpi_interpret_table() and acpi_evaluate_object() can invoke acpi_interpret_aml() with same arguments. But ACPICA is currently not implemented in this way. In acpi_load_tables(), only limitted opcodes (named object creation opcodes) are executed in a 2-pass parsing (load_pass_1 and load_pass_2) and many opcodes in the "If", "Else", .. packages are just linked together (acpi_ps_link_module_code()). This happens during the early stage (acpi_early_init()). At a later stage (acpi_bus_init()), such module level code will be executed (acpi_ns_exec_module_code_list()). The code for module level execution is nasty, it hacks "\" from a device to a method in order for the executer code to work and hacks it back after completing the module level execution. While for method invocation, it is currently executed only in 1-pass parsing (execute_pass). For the above 2-load-pass parsing implementation, only limitted opcodes can be supported during table loading. And the 2-load-pass parsing caused some known compliance issues. For example, if the load-pass-1 failed at a point, then the namespace node created before this point will be successfully created, but since the load-pass-2 is aborted, such namespace nodes will be left without operand objects attached, they will be non-calulatable objects... So far no BIOS issues reported on Windows around such broken tables. This links to the dicussion of this nasty code issue: http://bugs.acpica.org/show_bug.cgi?id=963 I was thinking the reason for current implementation is due to the all-in-one code base for both the iasl disassembler and the in-kernel interpreter. But it seems Linux also has something to do with this, it is hard to move acpi_load_tables() to a late stage and merges it with the module level execution into the big execute-pass parsing. Thanks and best regards -Lv Actually, according to the ACPI spec, executable code at module level is illegal and not supported. It was legal in ACPI 1.0 but made illegal in ACPI 2.0
However, ACPICA supports it (as much as it has to).
Bob
> -----Original Message-----
> From: bugzilla-daemon@bugzilla.kernel.org [mailto:bugzilla-
> daemon@bugzilla.kernel.org]
> Sent: Tuesday, May 05, 2015 6:30 PM
> To: acpi-bugzilla@lists.sourceforge.net
> Subject: [Bug 97141] ACPI initialisation broken since 3.14-rc1 for some
> mainboards.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=97141
>
> --- Comment #13 from Lv Zheng <lv.zheng@intel.com> --- (In reply to Rafael
> J. Wysocki from comment #12)
> > (In reply to Lv Zheng from comment #8)
> > > I'm OK with BGRT which is a data table, but for TAD which comes from
> > > a definition block table, I do have concerns...
> > >
> > > At this early boot time, PCI configuration space may not be
> accessible.
> > > Thus some module level code in the ACPI definition block table is
> > > also not executable.
> > > Invoking acpi_load_tables() so early causes a very strange software
> > > scheme in ACPICA.
> > > ACPICA interpreter has to execute only special AML code and ignore
> > > all module level code in this early stage and at a late stage,
> > > re-execute the module level code. This makes ACPICA interpreter
> implementation very dirty.
> >
> > I'm not sure I understand your point correctly.
> >
> > Do you have any specific examples of what may go wrong as a result of
> this?
>
> For both the code inside of a Method and the code at the module level for
> the table (do not belong to any method), AML grammer for them defined in
> the spec is TermList.
> So they should be interpreted in the same style - both table loading and
> method execution should work in the same way.
> I've tried and it worked here:
> http://sourceforge.net/p/sdfirm/code/ci/master/tree/kernel/acpi/acpi_inter
> p.c
> Both acpi_interpret_table() and acpi_evaluate_object() can invoke
> acpi_interpret_aml() with same arguments.
>
> But ACPICA is currently not implemented in this way.
> In acpi_load_tables(), only limitted opcodes (named object creation
> opcodes) are executed in a 2-pass parsing (load_pass_1 and load_pass_2)
> and many opcodes in the "If", "Else", .. packages are just linked together
> (acpi_ps_link_module_code()). This happens during the early stage
> (acpi_early_init()). At a later stage (acpi_bus_init()), such module level
> code will be executed (acpi_ns_exec_module_code_list()).
> The code for module level execution is nasty, it hacks "\" from a device
> to a method in order for the executer code to work and hacks it back after
> completing the module level execution.
>
> While for method invocation, it is currently executed only in 1-pass
> parsing (execute_pass).
>
> For the above 2-load-pass parsing implementation, only limitted opcodes
> can be supported during table loading. And the 2-load-pass parsing caused
> some known compliance issues.
> For example, if the load-pass-1 failed at a point, then the namespace node
> created before this point will be successfully created, but since the
> load-pass-2 is aborted, such namespace nodes will be left without operand
> objects attached, they will be non-calulatable objects... So far no BIOS
> issues reported on Windows around such broken tables.
>
> This links to the dicussion of this nasty code issue:
> http://bugs.acpica.org/show_bug.cgi?id=963
>
> I was thinking the reason for current implementation is due to the all-in-
> one code base for both the iasl disassembler and the in-kernel
> interpreter.
> But it seems Linux also has something to do with this, it is hard to move
> acpi_load_tables() to a late stage and merges it with the module level
> execution into the big execute-pass parsing.
>
> Thanks and best regards
> -Lv
>
> --
> You are receiving this mail because:
> You are watching the assignee of the bug.
>
> --------------------------------------------------------------------------
> ----
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications Performance
> metrics, stats and reports that give you Actionable Insights Deep dive
> visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> acpi-bugzilla mailing list
> acpi-bugzilla@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/acpi-bugzilla
Where can we find the "module level code is illegal" like statement in the current spec? I can only find this: AMLCode := DefBlockHeader TermList DefMethod := MethodOp PkgLength NameString MethodFlags TermList Both the AML code in module level and in method are TermList. If it is illegal, the spec may define AMLCode using a different term. Hi, It could be better to use "dmesg -c" after each brightness key press. And upload several files here, each contains a "dmesg -c" output for one press. Thanks and best regards -Lv (In reply to Lv Zheng from comment #16) > Hi, > > It could be better to use "dmesg -c" after each brightness key press. > And upload several files here, each contains a "dmesg -c" output for one > press. > > Thanks and best regards > -Lv This is a wrong post against another bug. Sorry for the noise. Please ignore it. Thanks -Lv (changing HW to ALL, since this the change that broke this Tyan/AMD could break any system) (In reply to Rafael J. Wysocki from comment #11) > From the purely ACPI perspective, it should be safe to move > acpi_early_init(). > > I'm not sure what EFI changes have been made since the commit in question, > though, which may depend on it. Matt's input would be necessary to answer > that. > > However, if your machines don't use EFI anyway, please try moving > acpi_early_init(). Moving acpi_early_init() after efi_enter_virtual_mode() will probably work. I can't think of anything that will break necessarily, I would just try it and see. The benefit is doing it before for BGRT is that we can free the EfiBootServices memory a little bit earlier on boot. But it's not at all clear to me what the root problem is here and whether a straight up revert is just papering over a problem. (In reply to Marius Tolzmann from comment #0) [...snip] > > [ 0.000552] ACPI: Core revision 20131115 > [ 0.007292] ACPI: All ACPI Tables successfully acquired > [ 0.049357] ACPI Error: Hardware did not enter ACPI mode > (20131115/evxfevnt-113) > [ 0.049710] ACPI Warning: AcpiEnable failed (20131115/utxfinit-161) > [ 0.049977] ACPI: Unable to enable ACPI The above error message indicate the SCI_EN bit didn't set by the hardware platform and kernel already wait 3 seconds for hardware setting SCI_EN. That means no SCI interrupt will occur. The position of acpi_early_init() before and after changed: start_kernel(void) { [...snip] anon_vma_init(); acpi_early_init(); <=== current position #ifdef CONFIG_X86 if (efi_enabled(EFI_RUNTIME_SERVICES)) /* legacy BIOS in this case, no EFI */ efi_enter_virtual_mode(); #endif #ifdef CONFIG_X86_ESPFIX64 /* Should be run before the first non-init thread is created */ init_espfix_bsp(); #endif thread_info_cache_init(); cred_init(); fork_init(); proc_caches_init(); buffer_init(); key_init(); security_init(); dbg_late_init(); vfs_caches_init(totalram_pages); signals_init(); /* rootfs populating might need page-writeback */ page_writeback_init(); proc_root_init(); nsfs_init(); cpuset_init(); cgroup_init(); taskstats_init_early(); delayacct_init(); check_bugs(); /* acpi_early_init(); */ /* before LAPIC and SMP init */ <=== original position sfi_init_late(); After a quick glance, honestly I didn't see dubious function call between old and new position. Created attachment 178101 [details]
ACPI / init: Split acpi_early_init()
Marius, can you please check if this patch makes any difference on the affected system?
(In reply to Rafael J. Wysocki from comment #21) > Created attachment 178101 [details] > ACPI / init: Split acpi_early_init() > > Marius, can you please check if this patch makes any difference on the > affected system? yes it works - the affected system seems to work fine again with the patch applied. thanks a lot for troubleshooting and fixing our problem ;) OK, thanks for testing! I'll add a changelog and post it for review, then. (In reply to Rafael J. Wysocki from comment #23) > OK, thanks for testing! > > I'll add a changelog and post it for review, then. Submitted as https://patchwork.kernel.org/patch/6512961/ Linux-4.2-rc1 has: commit b064a8fa77dfead647564c46ac8fc5b13bd1ab73 Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Wed Jun 10 01:33:36 2015 +0200 ACPI / init: Switch over platform to the ACPI mode later I have similar problem on Fedora22. How can I apply the patch that you have developed? Build the latest Linus' git tree should have this patch included. commit id is b064a8fa77dfead647564c46ac8fc5b13bd1ab73 afaik it is also included in older kernels: in 4.1.x since v4.1.3 (as 9e60048) in 4.0.x since v4.0.9 (as 8ce8431) in 3.14.x since v3.14.49 (as cdbbeb6) |