Subject : 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
Submitter : Jiri Slaby <firstname.lastname@example.org>
Date : 2009-03-15 10:47
References : http://marc.info/?l=linux-kernel&m=123711408225013&w=4
Handled-By : Lin Ming <email@example.com>
This entry is being used for tracking a regression from 2.6.28. Please don't
close it until the problem is fixed in the mainline.
The debug patch:
Jiri, any update?
$ dmesg|grep -i acpi.debug
I don't know how to reproduce it, my feeling is that it usually happens after few days of being turned off (in S4).
the bug is still not reproduced, right?
(In reply to comment #3)
> ping Jiri,
> the bug is still not reproduced, right?
$ dmesg|grep -i 'ACPI Debug'
$ uname -a # just to eliminate "I forgot to boot the right kernel" issue...
Linux anemoi 2.6.29 #76 SMP PREEMPT Tue Mar 24 21:22:15 CET 2009 x86_64 x86_64 x86_64 GNU/Linux
$ strings /usr/src/bu/vmlinux|egrep 'ACPI Debug|Linux version'
Linux version 2.6.29 (xslaby@anemoi) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #76 SMP PREEMPT Tue Mar 24 21:22:15 CET 2009
<7>ACPI Debug: NULL field object
<7>ACPI Debug: %s
<7>ACPI Debug: field node path: %s
<7>ACPI Debug: The original object:
<7>ACPI Debug: The bad object:
I hope this is related:
No 'ACPI Debug' messages though...
Does not look related.
closing as un-reproducible. Please re-open if you see
anything like this in the future.
I'm not quite convinced it is unrelated, it has the very same symptoms -- boot lockup around ACPI, few boot cycles to get rid of it. From the picture, I cannot utter whether it is related or not. What lead you to this conclusion?
The exceptions present there are not usually in the kernel messages. Couldn't they mean some memory was overwritten by mess like at the pictures earlier?
The photo shows an AE_ALREADY_EXISTS error. The interpreter is marking the method as 'serialized' in order to attempt to recover from a problem in the BIOS AML code. I don't see anything related to a memory/object overwrite issue in that particular snapshot.
Here is the recovery code:
/* Check for possible multi-thread reentrancy problem */
if ((Status == AE_ALREADY_EXISTS) &&
"Marking method %4.4s as Serialized because of AE_ALREADY_EXISTS error",
* Method tried to create an object twice. The probable cause is
* that the method cannot handle reentrancy.
* The method is marked NotSerialized, but it tried to create
* a named object, causing the second thread entrance to fail.
* Workaround this problem by marking the method permanently
* as Serialized.
WalkState->MethodDesc->Method.MethodFlags |= AML_METHOD_SERIALIZED;
WalkState->MethodDesc->Method.SyncLevel = 0;
(In reply to comment #9)
> I don't see anything related to a memory/object overwrite issue
> in that particular snapshot.
Couldn't this really be caused by some ACPI internal structures being overwritten?
I'll add a long delay into the second `if' from the patch to see if it fires (and writes something what I can't see) and the scenario from the last photo occurs afterwards.
And, I have not enough permissions to reopen the entry. Could anybody do that?
(In reply to comment #10)
> Couldn't this really be caused by some ACPI internal structures being
All I can say is that from the snapshot, it looks like the interpreter is executing normally.
(In reply to comment #12)
> it looks like the interpreter is executing normally.
Yes, I agree it behaves as expected.
Should I see those messages every time I boot/resume? I see them only when the lockup happens. When I turn off and on again (and resume successfully), no such messages are in dmesg. Neither in a boot log when not resuming, but booting fresh system.
Created attachment 20891 [details]
I was not able to reproduce the issue anymore. Now running on 30-rc5 with one method serialized in dsdt.