Bug 12908

Summary: acpi_ex_extract_from_field -- div by zero
Product: ACPI Reporter: Rafael J. Wysocki (rjw)
Component: ACPICA-CoreAssignee: Lin Ming (ming.m.lin)
Status: CLOSED UNREPRODUCIBLE    
Severity: high CC: acpi-bugzilla, jirislaby, lenb, Robert.Moore, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29-rc8 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 12398    
Attachments: acpidump

Description Rafael J. Wysocki 2009-03-21 10:57:36 UTC
Subject    : 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
Submitter  : Jiri Slaby <jirislaby@gmail.com>
Date       : 2009-03-15 10:47
References : http://marc.info/?l=linux-kernel&m=123711408225013&w=4
Handled-By : Lin Ming <ming.m.lin@intel.com>

This entry is being used for tracking a regression from 2.6.28.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Lin Ming 2009-03-23 18:59:50 UTC
The debug patch:
http://lkml.org/lkml/2009/3/20/9

Jiri, any update?
Comment 2 Jiri Slaby 2009-03-24 02:30:02 UTC
No :(:
$ dmesg|grep -i acpi.debug
$
I don't know how to reproduce it, my feeling is that it usually happens after few days of being turned off (in S4).
Comment 3 Zhang Rui 2009-03-30 05:52:43 UTC
ping Jiri,
the bug is still not reproduced, right?
Comment 4 Jiri Slaby 2009-03-30 11:40:35 UTC
(In reply to comment #3)
> ping Jiri,
> the bug is still not reproduced, right?

Correct:
$ dmesg|grep -i 'ACPI Debug'
$ uname -a     # just to eliminate "I forgot to boot the right kernel" issue...
Linux anemoi 2.6.29 #76 SMP PREEMPT Tue Mar 24 21:22:15 CET 2009 x86_64 x86_64 x86_64 GNU/Linux
$ strings /usr/src/bu/vmlinux|egrep 'ACPI Debug|Linux version'
Linux version 2.6.29 (xslaby@anemoi) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #76 SMP PREEMPT Tue Mar 24 21:22:15 CET 2009
<7>ACPI Debug: NULL field object
<7>ACPI Debug: %s
<7>ACPI Debug: field node path: %s
<7>ACPI Debug: The original object:
<7>ACPI Debug: The bad object:
Comment 5 Jiri Slaby 2009-04-06 19:53:35 UTC
I hope this is related:
http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops2.png
?

No 'ACPI Debug' messages though...
Comment 6 Robert Moore 2009-04-06 20:44:13 UTC
Does not look related.
Comment 7 Len Brown 2009-04-07 02:02:40 UTC
closing as un-reproducible.  Please re-open if you see
anything like this in the future.
Comment 8 Jiri Slaby 2009-04-07 10:57:26 UTC
I'm not quite convinced it is unrelated, it has the very same symptoms -- boot lockup around ACPI, few boot cycles to get rid of it. From the picture, I cannot utter whether it is related or not. What lead you to this conclusion?

The exceptions present there are not usually in the kernel messages. Couldn't they mean some memory was overwritten by mess like at the pictures earlier?
Comment 9 Robert Moore 2009-04-07 16:05:55 UTC
The photo shows an AE_ALREADY_EXISTS error. The interpreter is marking the method as 'serialized' in order to attempt to recover from a problem in the BIOS AML code. I don't see anything related to a memory/object overwrite issue in that particular snapshot.

Here is the recovery code:

/* Check for possible multi-thread reentrancy problem */

if ((Status == AE_ALREADY_EXISTS) &&
    (!WalkState->MethodDesc->Method.Mutex))
{
    ACPI_INFO ((AE_INFO,
        "Marking method %4.4s as Serialized because of AE_ALREADY_EXISTS error",
        WalkState->MethodNode->Name.Ascii));

    /*
     * Method tried to create an object twice. The probable cause is
     * that the method cannot handle reentrancy.
     *
     * The method is marked NotSerialized, but it tried to create
     * a named object, causing the second thread entrance to fail.
     * Workaround this problem by marking the method permanently
     * as Serialized.
     */
    WalkState->MethodDesc->Method.MethodFlags |= AML_METHOD_SERIALIZED;
    WalkState->MethodDesc->Method.SyncLevel = 0;
}
Comment 10 Jiri Slaby 2009-04-08 19:45:29 UTC
(In reply to comment #9)
> I don't see anything related to a memory/object overwrite issue
> in that particular snapshot.

Couldn't this really be caused by some ACPI internal structures being overwritten?

I'll add a long delay into the second `if' from the patch to see if it fires (and writes something what I can't see) and the scenario from the last photo occurs afterwards.
Comment 11 Jiri Slaby 2009-04-08 19:46:54 UTC
And, I have not enough permissions to reopen the entry. Could anybody do that?
Comment 12 Robert Moore 2009-04-08 20:04:52 UTC
(In reply to comment #10)
> Couldn't this really be caused by some ACPI internal structures being
> overwritten?

All I can say is that from the snapshot, it looks like the interpreter is executing normally.
Comment 13 Jiri Slaby 2009-04-08 20:09:59 UTC
(In reply to comment #12)
> it looks like the interpreter is executing normally.

Yes, I agree it behaves as expected.

Should I see those messages every time I boot/resume? I see them only when the lockup happens. When I turn off and on again (and resume successfully), no such messages are in dmesg. Neither in a boot log when not resuming, but booting fresh system.
Comment 14 Jiri Slaby 2009-04-08 20:32:43 UTC
Created attachment 20891 [details]
acpidump
Comment 15 Jiri Slaby 2009-05-11 22:13:42 UTC
I was not able to reproduce the issue anymore. Now running on 30-rc5 with one method serialized in dsdt.