Bug 31872

Summary: boot panic unless acpi=off, Thread overran stack, or stack corrupted - Toshiba Satellite/mobile P4
Product: ACPI Reporter: Pascal Dormeau (pdormeau)
Component: ACPICA-CoreAssignee: Rafael J. Wysocki (rjw)
Status: CLOSED CODE_FIX    
Severity: normal CC: florian, lenb, maciej.rutecki, rjw
Priority: P1    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.38 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 27352    
Attachments: lspci -vvv
/proc/cpuinfo
picture of boot message
picture of boot message
full boot sequence
config of failing kernel
2.6.38 dmesg with bf325f9538d8c89312be305b9779edbcb436af00 reverted
ouput of acpidump
Debug early registration of power resources
ACPI: Avoid infinite recurrence in registering power resources
ACPI: Avoid infinite recurrence in registering power resources (v2)

Description Pascal Dormeau 2011-03-25 20:17:12 UTC
Created attachment 51972 [details]
lspci -vvv

My laptop cannot boot anymore with latest 2.6.38 kernel (also confirmed
with 2.6.38 rc6, rc7 and rc8) while ACPI support is enabled (booting with ACPI=off allows ending the boot sequence but many functionalities are lost). A kernel panic occurs early during the boot.

Older kernels until 2.6.37.2 did not trigger this bug on that laptop.

It's the official Debian kernel 2.6.38-1-686.
Comment 1 Pascal Dormeau 2011-03-25 20:26:13 UTC
Created attachment 51982 [details]
/proc/cpuinfo
Comment 2 Pascal Dormeau 2011-03-25 20:29:20 UTC
Created attachment 51992 [details]
picture of boot message
Comment 3 Pascal Dormeau 2011-03-25 20:30:45 UTC
Created attachment 52002 [details]
picture of boot message
Comment 4 Pascal Dormeau 2011-03-25 20:36:15 UTC
Created attachment 52012 [details]
full boot sequence
Comment 5 Pascal Dormeau 2011-03-25 20:54:56 UTC
I could capture boot messages until the crash with a camera using the boot_delay option. Relevant messages could be those on the two pictures in attachment (I am not sure). I also linked to a tarball with pictures of the whole boot sequence (just wget http://dl.free.fr/nbej8o6bE should do it). I stopped capture boot messages until they seem to repeat endelessly, but if needed I can provide more.

Please ask me if you need more information.

Regards

Pascal Dormeau
Comment 6 Len Brown 2011-03-29 01:40:15 UTC
Please confirm that this fails with unmodified kernel.org 2.6.38,
and that it does not fail with the kernel.org 2.6.37.stable (now 2.6.37.6)

Can you bisect which change between 2.6.37 and 2.6.38-rc6 causes
the failure, or at least try the rc's, such as -rc1?

please attach the .config for the failing kernel,
in the hopes that it can be reproduced on an additional machine.
Comment 7 Pascal Dormeau 2011-03-29 18:30:04 UTC
Created attachment 52512 [details]
config of failing kernel
Comment 8 Pascal Dormeau 2011-03-29 18:31:32 UTC
Thanks,

I will do both (confirm which unmodified kernel.org version fails, and bisect) and report back when done.

In the meantime, please find in attachment the config of the failing kernel (sorry to forget about this one).

Regards
Comment 9 Rafael J. Wysocki 2011-03-30 22:47:10 UTC
FWIW, I seriously doubt this is an ACPICA problem.  It rather looks like this
is related to interrupts (I/O ACPI or LAPIC issue perhaps).
Comment 10 Pascal Dormeau 2011-04-01 20:51:48 UTC
Created attachment 53002 [details]
2.6.38 dmesg with bf325f9538d8c89312be305b9779edbcb436af00 reverted

Hello,

I tested kernels from kernel.org:

v2.6.37.6   -> OK
v2.6.38.1   -> failed
v2.6.38-rc1 -> failed

With git bisect I could isolate the commit that results into the crash:

commit bf325f9538d8c89312be305b9779edbcb436af00
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Thu Nov 25 00:10:44 2010 +0100

    ACPI / PM: Register power resource devices as soon as they are needed
    
    Depending on the organization of the ACPI namespace, power resource
    device objects may generally be scanned after the "regular" device
    objects that they are referred from through _PRn.  This, in turn, may
    cause acpi_bus_get_power_flags() to attempt to access them through
    acpi_bus_init_power() before they are registered (and initialized by
    acpi_power_driver).  [This is not a theoretical issue, it actually
    happens for one PnP device on my testbed HP nx6325.]
    
    To fix this problem, make acpi_bus_get_power_flags() attempt to
    register power resource devices as soon as they have been found in
    the _PRn output for any other devices.
    
    Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
    Signed-off-by: Len Brown <len.brown@intel.com>

I could build and run a 2.6.38 kernel with the above commit reverted. It boots with no problem (with acpi support)
The corresponding dmesg is in attachement

Best regards

Pascal Dormeau
Comment 11 Rafael J. Wysocki 2011-04-17 20:27:10 UTC
Please attach the output of acpidump from your machine.
Comment 12 Pascal Dormeau 2011-04-18 18:30:38 UTC
Created attachment 54612 [details]
ouput of acpidump

Please,

Find in attachement the output of the acpidump command.

Best regards,

Pascal Dormeau
Comment 13 Rafael J. Wysocki 2011-04-23 20:18:35 UTC
Created attachment 55262 [details]
Debug early registration of power resources

Please apply think patch and see if the crash happens.  If it doesn't, please
attach the dmesg output.
Comment 14 Rafael J. Wysocki 2011-04-23 20:19:28 UTC
Sorry, the "think" above should be "this".
Comment 15 Rafael J. Wysocki 2011-04-23 21:08:06 UTC
Created attachment 55272 [details]
ACPI: Avoid infinite recurrence in registering power resources

Well, I think I know what the problem is.

In your DSDT the _PR0 object of power resource PUT2 points back to this
power resource.  In consequence, while registering PUT2
acpi_bus_get_power_flags() sees that it depends on PUT2 and tries to
register it again, which leads to an infinitely deep recurrence.

The attached patch should work around this issue.  If it does, please
disregard the two previous comments.
Comment 16 Pascal Dormeau 2011-04-25 06:17:54 UTC
Hello,

Problem fixed when patch applied. Thanks a lot.

Should I understand that the DSDT table is too much buggy here 
(I really have no understanding of the ACPI spec.) ?
In such case, should I remove PUT2 inside the
Name (_PR0, Package (0x01)
                            {
                                PUT2
                            }
stanza ?

Best regards

Pascal Dormeau
Comment 17 Rafael J. Wysocki 2011-04-25 08:59:27 UTC
Created attachment 55342 [details]
ACPI: Avoid infinite recurrence in registering power resources (v2)

Well, it shouldn't be there, but I bet your BIOS is not the only one with
a problem of this kind, so we should add a safeguard against that.

Please check if the attached patch helps too.
Comment 18 Pascal Dormeau 2011-04-25 17:42:34 UTC
Hello,

The acpi-power-resources-fix.patch v2 also helps. Thanks.

Note that I tested v2 alone (not v1+v2).

Best regards,

Pascal Dormeau
Comment 19 Rafael J. Wysocki 2011-04-25 19:25:04 UTC
(In reply to comment #18)
> Hello,
> 
> The acpi-power-resources-fix.patch v2 also helps. Thanks.
> 
> Note that I tested v2 alone (not v1+v2).

That was as intended. :-)

Thanks for testing, I'll submit the patch for merging shortly.