Created attachment 90191 [details] output of "dmesg | grep -i acpi" I have an Acer Aspire One AO725 netbook in which everything (ACPI related) seems to work perfectly the battery/ac adapter management, works the power button, works the Lid Switch, works the FN key to swiitch on/off the wireless, works the FN keys to control the brightness backlight, works but I see some ACPI error in the dmesg who I do not know and do not understand, like these: ACPI Warning: 0x0000000000000b00-0x0000000000000b07 SystemIO conflicts with Region \_SB_.PCI0.SMBS.SMB0 1 (20120913/utaddress-251) [ 5.375825] ACPI Error: [SSZE] Namespace lookup failure, AE_ALREADY_EXISTS (20120913/dsfield-211) [ 5.376075] ACPI Error: Method parse/execution failed [\_SB_.ACAD._PSR] (Node ffff88010a287c18), AE_ALREADY_EXISTS (20120913/psparse-536) [ 5.376315] ACPI Exception: AE_ALREADY_EXISTS, Error reading AC Adapter state (20120913/ac-122) I have attached the "dmesg | grep -i acpi" output of my netbook I hope you help me understand what these error messages Thanks Francesco Muzio
please attach the acpidump output of your netbook.
Created attachment 90921 [details] acpidump output acpidump output attached
Method (_PSR, 0, NotSerialized) { ... CreateWordField (XX00, Zero, SSZE) ... } This seems to be an ACPICA issue to me. Bob, for the above ASL code, will SSZE be created as a global variable, or a static one, say it is release soon after _PSR evaluation finished? BTW, what will happen if _PSR is re-entered?
Created attachment 111701 [details] dmesg grep -i acpi + acpidump my output is little different, but the same problem
(In reply to sov.info@mail.ru from comment #4) > Created attachment 111701 [details] > dmesg grep -i acpi + acpidump > > my output is little different, but the same problem if this is a problem )
If _PSR is reentered, it will fail on the CreateWordField with an AE_ALREADY_EXISTS. However, in this type of case, ACPICA will dynamically mark the method as "serialized" to prevent further similar issues.
This global can be set to mark all methods serialized /* * Automatically serialize ALL control methods? Default is FALSE, meaning * to use the Serialized/NotSerialized method flags on a per method basis. * Only change this if the ASL code is poorly written and cannot handle * reentrancy even though methods are marked "NotSerialized". */ UINT8 ACPI_INIT_GLOBAL (AcpiGbl_AllMethodsSerialized, FALSE); (acpi_gbl_all_methods_serialized on linux)
Currently ACPICA treats such issues as BIOS bugs as asynchrony of ACPICA methods' executions is ensured by multi-threading mutexes rather than a stallable asynchronous execution schedular. Please try this global variable through the following kernel parameter: acpi_serialize.
Created attachment 119341 [details] dmesg of vmlinuz-3.11-2-amd64 root=UUID=3d398f1d-2af1-4c45-81a1-537ec0cdcb6d ro acpi_serialize Thanks for your support I have booted the laster kernel available on Debian testing with acpi_serialized option Unfortunately the ACPI messages persist I have attached the dmesg output
Hi, Thanks for the report. This is good case to learn how ACPICA interpreter behaves. I think the implementation of acpi_gbl_all_methods_serialized is wrong. It relies on interpreter lock but there is really unlocking cases in the control methods execution. So if a control method is executed and blocked, another execution instance of the same control method can re-enter to trigger the failure message you've seen. Bob, is it right? Rui, please re-assign this bug to me, let me try to fix it.
If the acpi_gbl_all_methods_serialized is set, the interpreter is not relinquished in the blocking cases. void AcpiExRelinquishInterpreter ( void) { ACPI_FUNCTION_TRACE (ExRelinquishInterpreter); /* * If the global serialized flag is set, do not release the interpreter. * This forces the interpreter to be single threaded. */ if (!AcpiGbl_AllMethodsSerialized) { AcpiExExitInterpreter (); } return_VOID; }
There are still cases AcpiExExitInterpreter invoked directly. For example, region callbacks (before invoking setup/handler), global lock acquisition, module code execution after "Load/LoadTable" opcodes. They can happen during an execution of a control method. Specific to this case, there are region accesses happening in _PSR, thus when _PSR is blocked and interpreter lock is released, same instance of _PSR can be re-entered to trigger this error message.
thank you for haven't abandoned this bug report. The machine still have the described problem above and is running a recent version 3.12 of the kernel. I'm available to test workarounds, patches and give logs/dumps if requested
Created attachment 122371 [details] ACPICA: Interpreter: Extends AcpiGbl_AllMethodsSerialized to global lock and non initialization callbacks This is a workaround. I tested it on my platform, it doesn't hang. Please give it a try.
Let me also say something more about this issue. I think originally AcpiGbl_AllMethodsSerialized might be only used to avoid explict interpreter exit (Sleep/Acquire opcodes). We just see a requirement that the same control method should not be re-entered in any cases (Serialize/NotSerialize). This does not relate to "serialization", so it should be handled in different way. Thus currently we don't need a workaround to extend AcpiGbl_AllMethodsSerialized to implicit interpreter exit (region accesses and etc.), it might be dangerous as it might introduce regressions like dead locks (but I have offered such a workaround patch for you in the previous post, you can test it to see if it can work for your platform). We may also need your test support to test another patch that offers protection to avoid re-entrance on the the same control method later.
Created attachment 122401 [details] ACPICA: Executer: Add lockless interpreter locking primitive support. Well, I start to think that the issue is just caused by the following reason: We need a lockless environment for callback invocations, but the implementation simply exits the interpreter lock to achieve this which breaks control method serialization. This patch can offer a lockless environment for callbacks with interpreter lock still held. I tested it in the ACPICA simulation environment and my Linux box. Could you give it a try?
I need clarification. The second patch replaced the first?
Please try 122371 first and report the result. Then, please try 122401 and report the result. Thanks.
Created attachment 122601 [details] boot with kernel patched with attachment 122371 [details] I have bad news kernel patched with attachment 122371 [details] boot the machine very slowly it stops for about 30 sec on "hda-codec: out of range cmd 0:20:400:ffffffff" messages and the ACPI errors are still present. see the dmesg attached kernel patched with attachment 122401 [details] won't boot the machine, it hangs after some "ACPI : Added _OSI.." messages see the photo attached
Created attachment 122611 [details] boot with kernel patched with attachment 122401 [details]
boot the machine with kernel patched with attachment 122371 [details] also broken the analog audio device
Thanks for the report. Patch 1 is just an workaround making all control methods serialized, the result shows we do need interpreter lock unheld for region accesses. Patch 2 is just a fix to remove intpreter lock exit/enter sequence around the region accesses even without the workaround specified, this leads to dead locks. I considered this issue again, we can find 3 possbile aspects to address the root cause of this issue: 1. control method serialization 2. lockless environment of region accesses 3. object creations All solutions we've been discussing are around possible issues that caused by 1 and 2, they are all not working. I think it is time to talk about solutions for 3. What if we never return AE_ALREADY_EXIST errors for object creations. We can just return reference increased object if it is already exist. Let us compose a patch to achieve this for you.
Created attachment 122661 [details] ACPICA: Namespace: Fix issue of returning AE_ALREADY_EXIST for field objects creation. This patch tries not to return AE_ALREADY_EXISTS but to wait until creation possible. Hence this patch implements a fix to fix the issue caused by the 3rd possible cause listed in Comment 22. I've booted my kernel with this patch attached. Please also give it a try.
(In reply to Francesco Muzio from comment #19) > Created attachment 122601 [details] > boot with kernel patched with attachment 122371 [details] I checked the dmesg, it seems you didn't specify acpi_serialize boot parameter. Actually, this patch only takes effect when acpi_serialize is specified, otherwise it is an no-op. Could you please boot again with this patch applied and acpi_serialize specified? > I have bad news > kernel patched with attachment 122371 [details] > [details] boot the machine very slowly > it stops for about 30 sec on "hda-codec: out of range cmd 0:20:400:ffffffff" > messages Since this patch should be no-op, this seems to be caused by other issues. > and the ACPI errors are still present. This is reasonable.
(In reply to Francesco Muzio from comment #20) > Created attachment 122611 [details] > boot with kernel patched with attachment 122401 [details] I tested this patch, there was no crash but hang. Hang is reasonable but crash is not. This patch change interpreter lock into a lock that do not held any OSPM locks. So that if there are OSPM locks locked before invoking a control method and the same locks would get locked again in the region/exception callbacks, this patch can ensure control method serialization without dead lock introduced. But this patch cannot solve such situation: OSPM invokes control method inside of a callback. I checked the code and found that there are _HID/_CID/_ADR/_SET/_BBN control method invocations in acpi_ev_pci_config_region_setup. Then dead lock could happen against since such invocations will try to lock intpreter lock again. I've tried a modified patch with acpi_ex_exit_interpreter()/acpi_ex_enter_interpreter() restored for region->setup callbacks and Linux successfully booted. In order to use this solution, we need to first modify acpi_ev_pci_config_region_setup to avoid invoking control methods in it. Then applying the patch of "attachment 122401 [details]". I trend not to do such things if other solutions can work.
Here is my requests. 1. Please apply attachment 122661 [details], do not apply others, then perform a build/boot test, and post the dmesg here. I expect this patch can solve the issue and can be the final solution we want to upstream. 2. Please apply attachment 122371 [details], do not apply others, then perform a build and boot the kernel with acpi_serialize specified, and post the dmesg here. I expect the result would be same as the result of booting a kernel with attachment 122401 [details] applied and acpi_serialize not specified. Thanks in advance.
Created attachment 122701 [details] ACPICA: Namespace: Fix issue of returning AE_ALREADY_EXISTS for field objects creation. The previous version has issues. I even forgot to delete the error message. So if you booted the test kernel with it applied, there surely will be the AE_ALREADY_EXISTS errors in the new dmesg output. This updated patch fixed the issues and I marked the old one as obsoleted. Please use this new revision instead.
Created attachment 122761 [details] ACPICA: Namespace: Fix issue of returning AE_ALREADY_EXISTS for field objects creation. Sorry for the noise. The buggy one is obsoleted.
I'm a bit confused, Can you repeat how many tests, with patched kernel, are to do? Please give me an ordered list of the tests and for each tell me which patch should I use Thanks in advance
1. Please apply "attachment 122371 [details]", do not apply others, then perform a kernel build and boot the kernel "__with__ acpi_serialize specified", and post the dmesg here. We need to confirm this workaround is not working. 2. Please apply "attachment 122761 [details]", do not apply others, then perform a kernel build and boot the kernel "__without__ acpi_serialize specified", and post the dmesg here. I need to confirm whether this workaround can work or not.
Created attachment 122981 [details] ACPICA: Dispatcher: Add support to automatically mark named object creation method as Serialized. The new solution suggested by Rafael, Authorized by Bob and tested by me. I'll list the test requests later as there are 2 more patches related.
Created attachment 122991 [details] ACPICA: Executer: Cleanup acpi_gbl_all_method_serialized mechanism. We are going to deprecate acpi_gbl_all_methods_serialized option, so the test result of attachment 12371 [details] is no longer need to be confirmed. Let me list the latest test requests after uploading all of the patches.
Created attachment 123001 [details] ACPICA: Dispatcher: Cleanup ACPI_METHOD_SERIALIZED_PENDING mechanism. The old marking mechanism is no longer needed.
Here are the test requests. We now only need to confirm the following things: SOLUTION 1. applying the following patches, build and boot the kernel, post the dmesg here: attachment 122991 [details] attachment 123001 [details] attachment 122981 [details] SOLUTION 2. applying the following patches, build and boot the kernel, post the dmesg here: attachment 122991 [details] attachment 123001 [details] attachment 122761 [details] Sorry for confusing you so much. Please give them a try. Thanks.
Created attachment 123321 [details] dmesg for solution 1 I have patched the source code of the latest kernel available (3.13) I had to patch manually include/acpi/acpixf.h because the patch 122991 could not find in the line 74 "extern u8 acpi_gbl_all_methods_serialized" What version of the kernel have you based this patch? for both solutions I have booted the machine without acpi_serialize parameter, I hope to have wrong nothing Both solution doesn't fix the problem (ACPI errors are still present) and an " hda-codec: out of range cmd 0:20:400:ffffffff" message is printed many times Also the 2nd solution boot completely the machine one time on three, the others two times a black screen is showed after init is entering on runlevel 2
Created attachment 123331 [details] dmesg for solution 2
Created attachment 123361 [details] The minimized DSDT that contains all SSZE creation that can be found in DSDT/SSDTs I checked the results. 1. In solution 1 result, there are 2 times AE_ALREADY_EXISTS appearing for \_SB.ACAD._PSR evalution. 2. In solution 2 result, there is 1 time AE_ALREADY_EXISTS appearing for \_SB.ACAD._PSR evalution. I searched the whole DSDT/SSDTs, and collected all SSZE creations in the attached minimized DSDT for us to discuss.
Hi, let me post some investigation results here. For solution 1, I think I know why it doesn't work. This solution only changes control methods that contain "Name()" opcodes into "Serialized", but not control methods that contain "CreateXField()" opcodes into "Serialized" (root cause 1), thus you can still see the AE_ALREADY_EXISTS error message twice. My further investigation shows if we marked control methods that contain "CreateXField()" opcodes into "Serialized", the platforms would fail to boot! There were many drivers hung, just like what you've seen for hda-codec on your platform. This means there are really many control methods requiring to be executed in parallel and we cannot simply change them into "Serialized" (root cause 2). This cause is also the cause why solution 2 cannot work and we still can see the AE_ALREADY_EXISTS error message once (because of timeouts). Let's figure out which control methods on your platform are facing such issue mentioned by "root cause 2". I'll collect a list of marked "control methods" and asking hda expert for help. We also want to know if you built and booted the same kernel without all of the above posted patches applied and did not specify "acpi_serilized", would you still suffer from the hda-codec issue?
> I had to patch manually include/acpi/acpixf.h because the patch 122991 > could not find in the line 74 "extern u8 acpi_gbl_all_methods_serialized" > What version of the kernel have you based this patch? Did you revert solution 1 patches before trying solution 2?
Created attachment 123391 [details] dmesg output after booting AO725 with a clean 3.13 version of the kernel >We also want to know if you built and booted the same kernel without all of >the above >posted patches applied and did not specify "acpi_serilized", would >you still suffer from >the hda-codec issue? it's true, With an non-patched kernel (v3.13) the hda-codec issue also happens, see the dmesg attached >Did you revert solution 1 patches before trying solution 2? yes, but the fist two patches are the same, so these are the steps: I have downloaded and extracted the source of kernel v3.13 I have put in the resulted directory the .config files shipped with standard Debian kernel I have applied the attachment 122991 [details] as a patch I have applied the attachment 123001 [details] as a patch I have applied the attachment 122981 [details] as a patch I have build the kernel for the first solution I have reverted the patch of attachment 122981 [details] I have applied the attachment 122761 [details] as a patch I have build the kernel for the second solution
> yes, but the fist two patches are the same, so these are the steps: > I have downloaded and extracted the source of kernel v3.13 I was working on the following branch: 1. torvalds/linux/master 2. rafael/linux-pm/linux-next 3. internal branch containing ACPICA patches that haven't been upstreamed. You are right, the attachment 122991 [details] need to be rebased for v3.13. Since you have corrected it, I'd not upload a seperate one for v3.13. > it's true, > With an non-patched kernel (v3.13) the hda-codec issue also happens, > see the dmesg attached OK, so we have to forget "root cause 2". For the "root cause 1", I'll upload patches of solution 1 for you to try again.
Created attachment 123441 [details] ACPICA: Add auto-serialization support for ill-behaved control methods. The patch extracted from ACPICA upstream. This is the updated solution 1 patch.
Created attachment 123451 [details] ACPICA: Dispatcher: Add auto-serialization for CreateXField and XField opcodes. This is a fix for root cause 1.
Here is the test request: 1. Apply the following patches: attachment 122991 [details] (you need to rebase it for v3.13) attachment 123001 [details] attachment 123441 [details] attachment 123451 [details] 2. Build the kernel with "CONFIG_ACPI_DEBUG" enabled. This is meant to enable the logging of ACPI_DEBUG_PRINT messages. 3. Boot the kernel build with "acpi.debug_layer=0x00000040" This means ACPI_DISPATCHER component is enabled, we want to see "Method serialized ..." messages in the kernel log. 4. Post the dmesg output here. Thanks.
Created attachment 124141 [details] dmesg output after patch submitted on comment #44 very good, after the patches I see only two "ACPI Exception" probably not related to this bug see the dmesg
Hi, Sorry for the delayed response. We've new year holidays here. Thanks for the testing. I can see one risk in your previous test result: --- Also the 2nd solution boot completely the machine one time on three, the others two times a black screen is showed after init is entering on runlevel 2. --- So I need your confirmaion: Do you face the same problem when booting the same kernel without patching any patches in this thread? And if you don't, do you face the same problem when booting the same kernel after patching it with the 1st solution patches listed in the comment 44?
Unpatched kernel, and kernel patched with attachments retrieved on comment 44, boots my machine normally without to experience the issue described on comment 35
OK. The required patches of solution 1 is upstreamed in acpica/master branch. Let's close this bug. Thanks for the reporting and testing.
well, thank you just one last question/curiosity: this solution only quiets an error or make any improvements? I can experienced now some enhancement related to the ACPI layer?
There is code originally in ACPICA marking methods as "Serialized" when AE_ALREADY_EXISTS encountered, thus for your platform, the answer is "yes, it only reduces 1 error message". But we also notice that it not only leads to an error message, but also leads to a failure of the very first execution of such a control method. So if this control method execution is among the driver initialization steps, it can actually lead to malfunctioning of OSPM. Note that ACPICA has made an assumption that "BIOSes will not write ASL to allow same control method to be re-entered from multiple threads". If this assumption is correct, this solution improves ACPICA interpreter; if this assumption is not correct, we may see dead locks triggered by this solution on the NotSerialized control methods that are meant to be re-entered from multiple threads where the execution instances are blocked waiting for each other. Without knowing the other de-facto standard interpreter's behavior, this is going to be tested in the real world.
ACPICA has made an assumption that "BIOSes will not write ASL to allow same control method to be re-entered from multiple threads". This is not true, ACPICA makes no such assumption. We only mark methods "serialized" if they create named objects.
Created attachment 125671 [details] An example ASL that might be affected by these fixes. Hi, Bob You are right. Thanks for pointing out my mistake. It should be reworded as: The solutions in this thread assume there is no control method in the real world allowing multi-threading reentrance while still creating named objects. The penalty ASL may look like the attached example.
"The solutions in this thread assume there is no control method in the real world allowing multi-threading reentrance while still creating named objects." This is becoming an off-topic discussion, but however: This is still not correct. If anything, by marking methods that create named objects "serialized", ACPICA is making an assumption that there are no AML control methods that _require_ multiple threads to enter it in order to function properly.
Hi, Bob I just worry about the way the defacto standard interpreter implements object creation, probably it is uneccessary: For the following creation conflict cases: 1. A named object is conflict with an existing named object created by the same control method; 2. A field object is conflict with an existing field object created to reference the same global region/buffer using same parameters (offset/length). The object to be created and the existing object can be deemed as same objects. Such object creation opcodes can also be moved out of the control method. If the interpreter implementation implements an object creation facility that looks up the existing objects to find the _same_ object using the conditions mentioned above and increases reference count of the existing one, the method then doesn't require to be changed from "NotSerialized" to "Serialized". Well, this concern is just for academic rigour in approaching this problem.
Closing this bug due to code has been shipped in linux-pm/linux-next branch.