Bug 6687

Summary: AE_AML_MUTEX_NOT_ACQUIRED - Acer TM3012, HP NX1625
Product: ACPI Reporter: Norbert Preining (preining)
Component: ACPICA-CoreAssignee: Robert Moore (Robert.Moore)
Status: REJECTED UNREPRODUCIBLE    
Severity: normal CC: acpi-bugzilla, protasnb, trenn
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.17-rc6-mm2 Subsystem:
Regression: --- Bisected commit-id:
Attachments: acpidmp output
acpidmp output
Do not abort method execution if asked to release not acquired mutex
acoidump
warnings
2.6.18-rc1 patch to fix acpi_os_get_thread_id()
namespace fix from upcoming release
test
/var/log/warn
Patch to move deletion of methods' namespace under its mutex

Description Norbert Preining 2006-06-13 16:37:06 UTC
Most recent kernel where this bug did not occur: no idea
Distribution: Debian/sid
Hardware Environment: Acer TravelMate 3012WTMi
Software Environment: ???
Problem Description:
The kernel spits out some warning messages (several of them, but not too many)
ACPI Error (exmutex-0248): Cannot release Mutex [MUT1], not acquired [20060310]
ACPI Error (psparse-0522): Method parse/execution failed [\_SB_.BAT1._BST] (Node
f7c12a80), AE_AML_MUTEX_NOT_ACQUIRED
ACPI Exception (acpi_battery-0201): AE_AML_MUTEX_NOT_ACQUIRED, Evaluating _BST
[20060310]

Steps to reproduce:
reboot

The dmesg and acpidmp output can be found here:
dmesg: http://www.logic.at/people/preining/acer/acpi/dmesg.txt
acpidmp: http://www.logic.at/people/preining/acer/acpi/acpidmp.txt

Best wishes

Norbert
Comment 1 Vladimir Lebedev 2006-06-14 17:25:01 UTC
We are interested in "Most recent kernel where this bug did not occur"
Please try the latest stable kernel; 2.6.16.20 is available now,
Please try the 2.6.17-rc6-mm2 with option ec_intr=0
Comment 2 Norbert Preining 2006-06-15 03:03:27 UTC
With 2.6.16.20 no error messages
With 2.6.17-rc6-mm2 and ec_intr=0 same error messages as without ec_intr=0
More tests?
Comment 3 Luming Yu 2006-06-15 07:19:21 UTC
Please attach acpidump output
Comment 4 Norbert Preining 2006-06-15 08:09:47 UTC
Created attachment 8316 [details]
acpidmp output

Here is it (above was the link)
Comment 5 Norbert Preining 2006-06-15 08:12:23 UTC
Created attachment 8317 [details]
acpidmp output
Comment 6 Vladimir Lebedev 2006-06-19 12:04:35 UTC
Please try the latest stable kernel 2.6.17 - it will give us more info.
Comment 7 Norbert Preining 2006-06-19 13:45:24 UTC
As with 2.6.16.20 no error messages with 2.6.17.
Comment 8 Luming Yu 2006-06-20 08:29:48 UTC
So, do you think the bug should be closed?
If NOT, please reopen.
If Yes, and you do want to know why 2.6.17 works, please take a look at 
changelog in which you could find out the individual patch.
Comment 9 Norbert Preining 2006-06-20 08:44:00 UTC
For me it is ok to close the bug. I only thought that what is in the -mm kernels
for acpi will go into the main kernel in the next rc round, so I thought this
bug might pop in again. We will see.
Comment 10 Alexey Starikovskiy 2006-06-21 06:06:17 UTC
Created attachment 8360 [details]
Do not abort method execution if asked to release not acquired mutex

There is no reason to abort method execution if mutex is not acquired. Printing
of  error message should be sufficient.
Comment 11 Robert Moore 2006-06-21 15:28:12 UTC
What is really going on here? In the DSDT, all the releases have a 
corresponding acquire with an infinite wait, so this error should not happen.
Comment 12 Alexey Starikovskiy 2006-06-26 04:40:07 UTC
It seems that there is an error in our code, that either releases ASL mutex
twice, etc. In either case the correct behavior seems to not abort execution.
Error seems  to vanish in 2.6.17, so we should just apply this patch. 
Comment 13 Len Brown 2006-07-07 10:47:21 UTC
Alexey confirms that this regression is seen on an HP NX6125,
and that it was not present in 2.6.17 (ACPICA 20060127).

ACPI Error (exmutex-0248): Cannot release Mutex [C112], not acquired [20060623]
ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.TZ4_._TMP] 
(Node ffff810017d47810), AE_AML_MUTEX_NOT_ACQUIRED

looking at the latest DSDT for the HP NX6125 in bug #5534

it contains the construct where a mutex is acquired in one
method and release in another.  while it isn't he mutex in the
error message, it may be related:

                    Method (C194, 0, NotSerialized)
                    {
                        Acquire (C191, 0xFFFF)
                        Store (0x55, C17C)
                    }

                    Method (C195, 0, NotSerialized)
                    {
                        Store (0xAA, C17C)
                        Release (C191)
                    }
Comment 14 Vladimir Lebedev 2006-07-07 12:04:12 UTC
Created attachment 8502 [details]
acoidump

acpidump
Comment 15 Vladimir Lebedev 2006-07-07 12:08:56 UTC
Created attachment 8503 [details]
warnings

warnings
Comment 16 Vladimir Lebedev 2006-07-07 12:20:05 UTC
This is an additional information about nx6125 system.
The errors are occured on kernel  2.6.18-rc1 and  others git*.

The  simple program tcatacpi and /var/log/warn are attached.
Run  the "tcatacpi"  in parallel  and see the results at least on nx6125 and  
F3200
Comment 17 Len Brown 2006-07-07 17:00:47 UTC
Created attachment 8505 [details]
2.6.18-rc1 patch to fix acpi_os_get_thread_id()

please try this patch

ACPI: acpi_os_get_thread_id() returns current

While the Linux mutexes and the debug code that
that reference acpi_os_get_thread_id() are happy with 0,
the AML mutexes in exmutex.c expect a unique non-zero
number for each thread - as they track this thread_id
to permit the mutex re-entrancy defined by the ACPI spec.
Comment 18 Vladimir Lebedev 2006-07-08 00:57:14 UTC
Verified on '2.6.18-rc1 + patch from comment #17'. AML mutexes work fine, but 
other two error messages still exist - AE_ALREADE_EXISTS and AE_NOT_FOUND,. 
See warnings from comment #15.

 
Comment 19 Alexey Starikovskiy 2006-07-08 09:35:33 UTC
Created attachment 8507 [details]
namespace fix from upcoming release

Please check if this fix from 20060707 fixes the namespace problem.
Comment 20 Vladimir Lebedev 2006-07-09 00:38:55 UTC
Created attachment 8510 [details]
test
Comment 21 Vladimir Lebedev 2006-07-09 00:43:28 UTC
Created attachment 8511 [details]
/var/log/warn

Verified on '2.6.18-rc1 + patch #17 + patch #18'.
AE_ALREADE_EXISTS and AE_NOT_FOUND still exist. But there is progress - the
frequency of error messages is very low.
I ran in a loop 'cat /proc/acpi files' concurensly, this test script and
/var/log/warn are attached.
Comment 22 Robert Moore 2006-07-10 14:24:15 UTC
It is possible that the AE_ALREADY_EXISTS problem is fixed in ACPICA version 
20060707, since this is a serialized method that creates namespace objects.

From the release note:
Fixed a problem with Serialized control methods where the semaphore associated 
with the method could be over-signaled after multiple method invocations.
Comment 23 Alexey Starikovskiy 2006-07-12 07:01:21 UTC
Created attachment 8538 [details]
Patch to move deletion of methods' namespace under its mutex

Here is a patch against 2.6.18-rc1... to move deletion of methods namespace
under mutex of this method. No more errors in dmesg on nx6125
Comment 24 Alexey Starikovskiy 2006-07-12 11:20:00 UTC
Problem still exists with the last patch.
Comment 25 Robert Moore 2006-07-12 15:08:05 UTC
RE: Patch in comment #23

The interpreter is locked during the execution of TerminateMethod, therefore 
another thread cannot begin execution of a method. I believe that this means 
that the signal/deletion order should not matter.
Comment 26 Robert Moore 2006-09-28 15:10:06 UTC
Jul  7 21:57:14 linux kernel: ACPI Error (dsfield-0441): [C0A4] Namespace 
lookup failure, AE_ALREADY_EXISTS
Jul  7 21:57:14 linux kernel: ACPI Error (dsfield-0441): [C0A5] Namespace 
lookup failure, AE_ALREADY_EXISTS
Jul  7 21:57:14 linux kernel: ACPI Error (dsfield-0441): [C0A6] Namespace 
lookup failure, AE_ALREADY_EXISTS
Jul  7 21:57:14 linux kernel: ACPI Error (dsfield-0441): [C0A7] Namespace 
lookup failure, AE_ALREADY_EXISTS
Jul  7 21:57:14 linux kernel: ACPI Error (dsfield-0441): [C0A8] Namespace 
lookup failure, AE_ALREADY_EXISTS
Jul  7 21:57:14 linux kernel: ACPI Error (dsfield-0441): [C0A9] Namespace 
lookup failure, AE_ALREADY_EXISTS
Jul  7 21:57:14 linux kernel: ACPI Error (dsfield-0441): [C0AA] Namespace 
lookup failure, AE_ALREADY_EXISTS
Jul  7 21:57:14 linux kernel: ACPI Error (psargs-0355): [C0A7] Namespace 
lookup failure, AE_NOT_FOUND

If these do not go away, try the acpi_serialize option.
Comment 27 Natalie Protasevich 2007-10-22 22:49:26 UTC
Any update on this problem please.
Thanks.
Comment 28 Norbert Preining 2007-10-22 22:57:35 UTC
As I said already, I am for closing this bug, since quite some time it does not occur anymore.
Comment 29 Natalie Protasevich 2007-10-22 23:04:06 UTC
Len, Alexey, would you agree we should close this bug?