Bug 58201

Summary: Toshiba P870-303:regression: panic on boot
Product: ACPI Reporter: jerome cantenot (jerome.cantenot)
Component: Config-OtherAssignee: Lan Tianyu (tianyu.lan)
Status: CLOSED CODE_FIX    
Severity: normal CC: mnowak, rjw, tianyu.lan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.9 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: journalctl log when i boot with acpi=off
config file to build the kernel (taken from arch linux)
output of /proc/cpuinfo
output of dmidecode
output of lspci -vvv as root
output of proc/module
outout of acpidump
output of git bisect
journalctl log with patch described in comment 18
log kernel panic with patch of the comment #21
DSDT.hex
output of acpidump with bios v6.30
debug.patch
journalctl log with patch described in comment 30
debug.patch
systemd log with updated bios and patch from comment 32
ACPI / PM: Do not execute _PS0 for devices without _PSC during initialization
ACPI / PM: Try to run _PS0 for devices without _PSC after all
log with patch of the comment #44

Description jerome cantenot 2013-05-15 11:20:19 UTC
Created attachment 101621 [details]
journalctl log when i boot with acpi=off

Kernel panic on boot with the kernel 3.9.0, 3.9.1 or 3.9.2.

There is no problem with the kernel 3.8.13.
I can boot if I add the parameter acpi=off but in this case i have the following problem -> modprobe: ERROR: could not insert 'i915': No such device

I have a Toshiba laptop P870-303 with hybride graphics cards.

I do not know which component must be selected, so I choose "config-other".


[0.302946] [<ffffffff812dc8c8>] ? acpi_add_single_object+0x36e/0x36e
[0.303010] [<ffffffff812f595e>]  acpi_walk_namespace+0x95/0xc5
[0.303074] [<ffffffff812dc9e4>] acpi_bus_scan+0x4d/0x9d
[0.303140] [<ffffffff819068f3>] acpi_scan_init+0x61/0x15f
[0.303203] [<ffffffff81270d38>] ? ida_get_new_above+0x218/0x290
[0.303267] [<ffffffff81906707>] acpi_init+0x25d/0x2a6
[0.303330] [<ffffffff819064aa>] ? acpi_sleep_proc_init+0x2a/0x2a
[0.303396] [<ffffffff8100210a>] do_one_initcall+0x100/0x160
[0.303461] [<ffffffff818d5037>] kernel_init_freeable+0x15b/0x1dc
[0.303526] [<ffffffff818d4881>] ? do_early_param+0x88/0x88
[0.303591] [<ffffffff814b3bb0>] ? rest_init+0x90/0x90
[0.303653] [<ffffffff814b3bbe>] kernell_init+0xe/0x190
[0.303716] [<ffffffff814d96ac>] ret_from_fork+0x7c/0x60
[0.303779] [<ffffffff814b3bb0>] ? rest_init+0x90/0x90
[0.303840] code: 8d a7 28 fe ff ff 53 48 89 fb 48 c7 c7 10 af 87 81 e8 4d e2 1e 00 48 8b 93 f8 02 00 48 8b 83 00 03 00 00 48 c7 c7 10 af 87 81 <48> 89 42 08 48 89 10 48 b8 00 01 1 00 00 00 ad de 48 89 83 f8
[0.306531] RIP [<ffffffff8801c60b5c88>] acpi_release_power_ressource+0x37/0x7a
[0.306636] RSP <ffff8801c60b5c88>
[0.306694] CR2: 0000000000000008
[0.306760] ---[ end trace 18770136fafb3923 ]---[
[0.306828] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
Comment 1 jerome cantenot 2013-05-15 11:20:54 UTC
Created attachment 101631 [details]
config file to build the kernel (taken from arch linux)
Comment 2 jerome cantenot 2013-05-15 11:21:23 UTC
Created attachment 101641 [details]
output of /proc/cpuinfo
Comment 3 jerome cantenot 2013-05-15 11:21:53 UTC
Created attachment 101651 [details]
output of dmidecode
Comment 4 jerome cantenot 2013-05-15 11:22:25 UTC
Created attachment 101661 [details]
output of lspci -vvv as root
Comment 5 jerome cantenot 2013-05-15 11:23:15 UTC
Created attachment 101671 [details]
output of proc/module
Comment 6 Lan Tianyu 2013-05-15 12:19:24 UTC
Could you do a bisect between 3.8.13 and 3.9.0 to find which commit cause this regression?
Comment 7 Lan Tianyu 2013-05-15 12:44:40 UTC
Please provide the output of acpidump.
Comment 8 jerome cantenot 2013-05-15 13:38:33 UTC
Created attachment 101701 [details]
outout of acpidump
Comment 9 jerome cantenot 2013-05-15 13:40:34 UTC
I will try to do the bisect. Can you confirm that a "bisect" is the process described in "http://wiki.gentoo.org/wiki/Kernel_git-bisect" ?
Comment 10 Lan Tianyu 2013-05-15 13:50:01 UTC
Yes. That should work.
BTW, could you provide more panic log? There maybe  more clues to find the cause.
Comment 11 jerome cantenot 2013-05-15 14:02:27 UTC
When the panic occurs, i can not scroll and i found nothing in the log. I wrote down the text but if you have a solution to see the beginning of the text, I can send more log.
Comment 12 jerome cantenot 2013-05-16 15:17:47 UTC
Created attachment 101751 [details]
output of git bisect
Comment 13 jerome cantenot 2013-05-16 15:19:29 UTC
I finished the bisect.

the error is with the commit
ACPI / PM: Fix acpi_bus_get_device() check in drivers/acpi/device_pm.c
b3785492268f9f3cdaa9722facb84b266dcf8bf6 is the first bad commit
commit b3785492268f9f3cdaa9722facb84b266dcf8bf6
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Fri Feb 1 23:43:02 2013 +0100

    ACPI / PM: Do not power manage devices in unknown initial states
    
    In general, for ACPI device power management to work, the initial
    power states of devices must be known (otherwise, we wouldn't be able
    to keep track of power resources, for example).  Hence, if it is
    impossible to determine the initial ACPI power states of some
    devices, they can't be regarded as power-manageable using ACPI.
    
    For this reason, modify acpi_bus_get_power_flags() to clear the
    power_manageable flag if acpi_bus_init_power() fails and add some
    extra fallback code to acpi_bus_init_power() to cover broken
    BIOSes that provide _PS0/_PS3 without _PSC for some devices.
    
    Verified to work on my HP nx6325 that has this problem.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: Peter Wu <lekensteyn@gmail.com>
Comment 14 Lan Tianyu 2013-05-17 05:26:15 UTC
Thanks for bisect. Please try the following patch.

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 5deb9bd..8adc983 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1186,7 +1186,7 @@ static void acpi_bus_get_power_flags(struct acpi_device *device)
                device->power.states[ACPI_STATE_D3_COLD].flags.os_accessible = 1;
 
        if (acpi_bus_init_power(device)) {
-               acpi_free_power_resources_lists(device);
+               //acpi_free_power_resources_lists(device);
                device->flags.power_manageable = 0;
        }
 }
Comment 15 jerome cantenot 2013-05-17 07:29:41 UTC
There is no change with this patch. I have the same kernel panic on boot.
Comment 16 Lan Tianyu 2013-05-17 08:40:58 UTC
Ok. How about this patch? Currently, have no idea about the cause since we can not see the whole log. Just comment some codes of the bad commit.

diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
index dd314ef..1776663 100644
--- a/drivers/acpi/device_pm.c
+++ b/drivers/acpi/device_pm.c
@@ -332,10 +332,10 @@ int acpi_bus_init_power(struct acpi_device *device)
                        return result;
        } else if (state == ACPI_STATE_UNKNOWN) {
                /* No power resources and missing _PSC? Try to force D0. */
-               state = ACPI_STATE_D0;
-               result = acpi_dev_pm_explicit_set(device, state);
-               if (result)
-                       return result;
+//             state = ACPI_STATE_D0;
+//             result = acpi_dev_pm_explicit_set(device, state);
+//             if (result)
+//                     return result;
        }
        device->power.state = state;
        return 0;
Comment 17 jerome cantenot 2013-05-17 10:10:59 UTC
I can boot without problem with this patch.
Comment 18 Lan Tianyu 2013-05-18 14:19:00 UTC
Thanks for test. Please try change the bootup resolution to get more panic log.
Follow the "Update Grub2 configuration" of this link.
http://wiki.sabayon.org/index.php?title=HOWTO:_Using_Custom_Framebuffer_Resolution_with_GRUB2

Let's further narrow the gap, try following the patch. 
diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
index dd314ef..7a140c7 100644
--- a/drivers/acpi/device_pm.c
+++ b/drivers/acpi/device_pm.c
@@ -332,7 +332,7 @@ int acpi_bus_init_power(struct acpi_device *device)
                        return result;
        } else if (state == ACPI_STATE_UNKNOWN) {
                /* No power resources and missing _PSC? Try to force D0. */
-               state = ACPI_STATE_D0;
+//             state = ACPI_STATE_D0;
                result = acpi_dev_pm_explicit_set(device, state);
                if (result)
                        return result;

-----


This patch is to produce a log of printing the device path at the panic place
and then sleep 10s to ensure you can get the log before panic.
diff --git a/drivers/acpi/power.c b/drivers/acpi/power.c
index 34f5ef1..ed1298c 100644
--- a/drivers/acpi/power.c
+++ b/drivers/acpi/power.c
@@ -44,6 +44,7 @@
 #include <linux/sysfs.h>
 #include <acpi/acpi_bus.h>
 #include <acpi/acpi_drivers.h>
+#include <linux/delay.h>
 #include "sleep.h"
 #include "internal.h"

@@ -818,6 +819,9 @@ static void acpi_release_power_resource(struct device *dev)
        struct acpi_device *device = to_acpi_device(dev);
        struct acpi_power_resource *resource;

+       acpi_handle_info(device->handle, "%s \n", __func__);
+       msleep(10000);
+
        resource = container_of(device, struct acpi_power_resource, device);

        mutex_lock(&power_resource_list_lock);
Comment 19 jerome cantenot 2013-05-19 09:21:33 UTC
Created attachment 101951 [details]
journalctl log with patch described in comment 18
Comment 20 jerome cantenot 2013-05-19 09:25:51 UTC
With the patch in comment 18, I can boot without any problem.

When I use a different resolution for GRUB, the size of screen is modified but then the sceen stays blank. I tried all values given by GRUB with the option vga=ask. Therefore I can not give the whole log when there is an error.
Comment 21 Lan Tianyu 2013-05-20 08:47:44 UTC
Hi, please try the following patch.

diff --git a/drivers/acpi/device_pm.c b/drivers/acpi/device_pm.c
index dd314ef..1ef070a 100644
--- a/drivers/acpi/device_pm.c
+++ b/drivers/acpi/device_pm.c
@@ -332,8 +332,8 @@ int acpi_bus_init_power(struct acpi_device *device)
                        return result;
        } else if (state == ACPI_STATE_UNKNOWN) {
                /* No power resources and missing _PSC? Try to force D0. */
-               state = ACPI_STATE_D0;
-               result = acpi_dev_pm_explicit_set(device, state);
+//             state = ACPI_STATE_D0;
+               result = acpi_dev_pm_explicit_set(device, ACPI_STATE_D0);
                if (result)
                        return result;
        }
Comment 22 jerome cantenot 2013-05-20 15:15:12 UTC
Created attachment 102091 [details]
log kernel panic with patch of the comment #21

I can not boot with the patch of the comment 21.

I have taken a photo of the log. After a sleep, I had the same panic as previously.
Comment 23 Lan Tianyu 2013-05-26 11:53:08 UTC
Created attachment 102591 [details]
DSDT.hex

Pleae override the DSDT table with attachment and try again.

cd (kernel source)
cp DSDT.hex include/
make menuconfig

Make the following change.
CONFIG_ACPI_CUSTOM_DSDT_FILE="DSDT.hex"
CONFIG_ACPI_CUSTOM_DSDT=y

Compile and install kernel.
Comment 24 Lan Tianyu 2013-05-26 12:12:50 UTC
BTW, is there new bios for this machine to upgrade?
Comment 25 jerome cantenot 2013-05-26 18:02:38 UTC
I can boot without problem with the custom DSDT file.

There is a new bios but toshiba gives only an executable without any readme. I can try to update the bios if you think it is usefull.
Comment 26 Lan Tianyu 2013-05-27 03:12:48 UTC
Ok. Please try.

I found there is a Bios problem. PCI0's _PS0 and _PS3 call SPS0/SPS3() but SPS0/SPS3() are not  actually defined in any ACPI tables. My DSDT table is to remove the code of calling SPS0/SPS3(). Why this issue doesn't happen before commit b3785492 is that PCI0's _PS0 was never called.
Comment 27 jerome cantenot 2013-05-28 11:34:59 UTC
I had the same panic with the updated bios v6.30.

I had contacted TOSHIBA but they do not seem to care.

When I wait for a proper solution what is the best solution:
  -compile the kernel with the custom DSDT table or
  -compile the kernel with the patch of comment 18.

Thank for your help.
Comment 28 Lan Tianyu 2013-05-28 11:46:40 UTC
Please provide the new acpidump.
Comment 29 jerome cantenot 2013-05-28 13:16:42 UTC
Created attachment 102721 [details]
output of acpidump with bios v6.30

the output of acpidump with the updated bios
Comment 30 Lan Tianyu 2013-05-30 05:55:31 UTC
Created attachment 102961 [details]
debug.patch

Hi, Please try this debug patch. This is also not final solution. Currently, I can set the SPS0/SPS1() method in the SSDT2/SSDT7.(Sorry, previously my iasl version was a little old and can't translate the SSDT tables on this macine.) I found the SPS0/SPS1() will access SMI io port. This is suspicious. So this debug.patch is to prevent from accessing SMI io port.
Comment 31 jerome cantenot 2013-05-30 17:11:56 UTC
Created attachment 103021 [details]
journalctl log with patch described in comment 30

I can boot with the patch from comment 30. However I had many error like (taken from the log):

ACPI: Unable to enable ACPI
usb 3-1: device not accepting address 3, error -110
Comment 32 Lan Tianyu 2013-05-31 02:46:25 UTC
Created attachment 103051 [details]
debug.patch

Ok. Please try this patch which only prohibits to accessing SMI ioport when issue takes place.
Comment 33 jerome cantenot 2013-05-31 18:53:15 UTC
Created attachment 103131 [details]
systemd log with updated bios and patch from comment 32

I can boot whitout any problem with patch from comment 32.

I send the log because there is somme acpi warning and error:
ACPI Error: Null physical address for ACPI table [(null)] (20130117/tbutils-468)
ACPI Warning: BIOS XSDT has NULL entry, using RSDT (20130117/tbutils-682)
Comment 34 Rafael J. Wysocki 2013-06-04 23:03:15 UTC
Created attachment 103471 [details]
ACPI / PM: Do not execute _PS0 for devices without _PSC during initialization

I suppose you'll be able to boot with this patch too, then.  Can you please try it?
Comment 35 Rafael J. Wysocki 2013-06-04 23:04:34 UTC
The warning message is from the ACPI core that doesn't like the XSDT table in your system and uses the RSDT one instead.  In indicates a BIOS bug.
Comment 36 jerome cantenot 2013-06-05 16:58:13 UTC
You were right. I can boot without any problem with the patch from the comment #34.
Comment 37 Rafael J. Wysocki 2013-06-05 19:34:36 UTC
Thanks for the confirmation!
Comment 38 Michal Nowak 2013-06-07 08:11:02 UTC
Thanks for the fix, much appreciated.

Is the next 3.9.x kernel targeted?
Comment 39 Rafael J. Wysocki 2013-06-07 10:38:02 UTC
Yes, it should be picked up by the -stable team after it's been merged into the mainline.
Comment 40 Rafael J. Wysocki 2013-06-08 01:03:47 UTC
Fixed by commit a086bdf (ACPI / PM: Do not execute _PS0 for devices without _PSC during initialization).
Comment 41 Rafael J. Wysocki 2013-06-17 19:20:30 UTC
I need to reopen this bug, because it turns out that commit a086bdf leads to a regression in 3.9.

Moreover, the Tianyu's analysis of the problem turns out to be incorrect and I should have double checked it.
Comment 42 Rafael J. Wysocki 2013-06-17 19:42:20 UTC
(In reply to comment #41)
> Moreover, the Tianyu's analysis of the problem turns out to be incorrect and
> I
> should have double checked it.

Well, it is correct.  However, the BIOS bug may trigger a bug in the kernel causes the original crash to happen.
Comment 43 Rafael J. Wysocki 2013-06-17 19:52:12 UTC
Created attachment 105121 [details]
ACPI / PM: Try to run _PS0 for devices without _PSC after all

Jerome, can you please test this patch on 3.10-rc6 and report back?
Comment 44 Rafael J. Wysocki 2013-06-18 11:29:16 UTC
(In reply to comment #43)
> Created an attachment (id=105121) [details]
> ACPI / PM: Try to run _PS0 for devices without _PSC after all
> 
> Jerome, can you please test this patch on 3.10-rc6 and report back?

Well, I'm taking this back, please don't try this patch. :-)

We'll address the regression introduced by commit a086bdf in a way that's unrelated to this report.

So again: fixed by commit a086bdf (ACPI / PM: Do not execute _PS0 for devices without _PSC during initialization).
Comment 45 jerome cantenot 2013-06-18 11:51:56 UTC
Created attachment 105201 [details]
log with patch of the comment #44

It's too late. I just finished to test the patch. I send the log if it can be usefull. I can boot with the new patch but there is many errors in the log. For example :

juin 18 13:41:14 toshiba kernel: ACPI Error: [IRG*] Namespace lookup failure, AE_NOT_FOUND (20130328/psargs-359)
juin 18 13:41:14 toshiba kernel: ACPI Error: Method parse/execution failed [\_SB_.PCI0.GLAN._PRW] (Node ffff8801c8035b90), AE_NOT_FOUND (20130328/psparse-537)
Comment 46 Rafael J. Wysocki 2013-06-18 12:49:56 UTC
Well, sorry about that, but thanks for the information. :-)

It pretty much confirms what I thought would happen: things break left and right.