Bug 54181 - Power management issues on a Toshiba Z930 laptop
Summary: Power management issues on a Toshiba Z930 laptop
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lv Zheng
URL:
Keywords:
: 53011 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-02-21 04:32 UTC by Daniel Rowe
Modified: 2013-06-25 00:47 UTC (History)
5 users (show)

See Also:
Kernel Version: kernel => 3.3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg (64.91 KB, text/plain)
2013-02-21 04:32 UTC, Daniel Rowe
Details
dmesg from pm testing. (204.77 KB, text/plain)
2013-02-24 12:35 UTC, Daniel Rowe
Details
acpidump from Toshiba 935 kernel v3.8.4 (236.84 KB, application/octet-stream)
2013-03-27 15:29 UTC, Brint E. Kriebel
Details
acpidump from the z930 (222.16 KB, application/octet-stream)
2013-03-29 01:54 UTC, Daniel Rowe
Details
[RFC Patch 2/3] ACPICA: Hardware: Modify sleep (6.95 KB, application/x-gzip-compressed)
2013-04-25 06:29 UTC, Lv Zheng
Details

Description Daniel Rowe 2013-02-21 04:32:45 UTC
Created attachment 93771 [details]
dmesg

Toshiba Z930 fails to enter suspend and seem to have general issue with power management.

I have tested on a number of distros with different kernels. This effects all distros that use kernels of 3.3 or greater.

Any kernel < 3.3 work as expected. And Kernel => 3.3 including test 3.8 kernels (using fedora 18).

It seems to be able to enter suspend once, it resumes from suspend then subsequently cannot enter suspend again. Trying to get the machine to enter suspend just causes the machine to lock up, the screen goes off the fan goes on full and it gets hot. Hard reset is required to get it back.

This laptop seems to run Linux well other than the power management issues.Using a 3.2 kernel with Sabayon distro the laptop seems to work well. 

The BIOS is in legacy mode as I could not get the machine to boot at all (or install) with UEFI switched on.

I have fiddled around with BIOS setting to see if anything would help to no avail. 

Steps to Reproduce:
1.Suspend the laptop
2.Wake the laptop
3.Suspend the laptop
4.Cant enter Suspend
5.Locks up
Comment 1 Aaron Lu 2013-02-21 05:31:25 UTC
Hi Daniel,

Thanks for your report!

Is it possible for you to follow kernel document Documentation/basic-pm-debugging.txt to see what might be the problem after a suspend/resume cycle?

Simply put, to test if devices failed to be suspended, you can do:
# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state

If there is any error, please attach the dmesg here, thanks.
Comment 2 Daniel Rowe 2013-02-24 12:34:24 UTC
Running all of the tests in the documentation seems to work.

This was run on 3.7.9-201.fc18.x86_64 on Fedora 18.

Anyway attached is the dmesg after a couple of "echo mem > /sys/power/state".
Comment 3 Daniel Rowe 2013-02-24 12:35:59 UTC
Created attachment 94001 [details]
dmesg from pm testing.

Did a "echo core > /sys/power/pm_test" and a couple of "echo mem > /sys/power/state".
Comment 4 Aaron Lu 2013-02-25 01:43:49 UTC
Thanks for the test.

I wonder if it is possible for you to boot into console mode with the following kernel parameter: nomodeset no_console_suspend
and then reproduce this problem, see what's blocking the 2nd suspend?

And better to test on the upstream kernel, not the disto one.
Comment 5 Aaron Lu 2013-03-07 03:13:14 UTC
Hi Daniel,

Any update on this?
Comment 6 Daniel Rowe 2013-03-07 13:02:44 UTC
Hi

I have booted to init=/bin/sh with the other prams and it seems to do the "echo mem > /sys/power/state" fine. I cant see any errors and it seems to be able to do this test fine any number of times.
 
This was on the distro kernel which is at 3.8.1 at the moment.

It still wont suspend the second time normally.

I have downloaded and built a vanilla kernel and will see what I get with that and will report back.
Comment 7 Aaron Lu 2013-03-07 13:17:13 UTC
(In reply to comment #6)
> Hi
> I have booted to init=/bin/sh with the other prams and it seems to do the
> "echo
> mem > /sys/power/state" fine. I cant see any errors and it seems to be able
> to
> do this test fine any number of times.

Thanks for the test.
Please try to boot with init=/bin/sh and no_console_suspend(don't add nomodeset this time), if S3 still works OK, then it is possible to be a graphics driver issue.

> This was on the distro kernel which is at 3.8.1 at the moment.
> It still wont suspend the second time normally.
> I have downloaded and built a vanilla kernel and will see what I get with
> that
> and will report back.

Thanks.
Comment 8 Aaron Lu 2013-03-12 05:40:24 UTC
Hi Daniel,

Any update?
Comment 9 Daniel Rowe 2013-03-18 12:54:51 UTC
Hi

Sorry for the delay again.

I have done a fair bit of play around tonight.

If I don't do a 'echo <something> > /sys/power/pm_test' it has the issue even from the console.

If I boot to init=/bin/sh and do an 'echo mem > /sys/power/state' the second suspend fails as it does in normal mode.

I have also tried with i915 RC6=0 to see if it make a difference and it does not.

Seems as soon as I echo something to /sys/power/pm_test the bug does not happen.

For example I 'echo devices > /sys/power/pm_test' then suspend works.

I have tested all the levels in basic-pm-debugging.txt and all work once I echo the level to pm_test.

Also tested with nomodeset and it make no difference.
Comment 10 Daniel Rowe 2013-03-19 05:34:37 UTC
I noticed Toshiba had a new firmware out for this device.

I have just put the latest BIOS/Firmware on this device (had to install windows to do it grrrrr) and still same.
Comment 11 Aaron Lu 2013-03-19 05:49:38 UTC
(In reply to comment #9)
> Hi
> 
> Sorry for the delay again.
> 
> I have done a fair bit of play around tonight.
> 
> If I don't do a 'echo <something> > /sys/power/pm_test' it has the issue even
> from the console.
> 
> If I boot to init=/bin/sh and do an 'echo mem > /sys/power/state' the second
> suspend fails as it does in normal mode.

What's the error message?

> 
> I have also tried with i915 RC6=0 to see if it make a difference and it does
> not.
> 
> Seems as soon as I echo something to /sys/power/pm_test the bug does not
> happen.
> 
> For example I 'echo devices > /sys/power/pm_test' then suspend works.
> 
> I have tested all the levels in basic-pm-debugging.txt and all work once I
> echo
> the level to pm_test.

Work once? So do you mean after a fresh boot:
# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state
works ok
# echo mem > /sys/power/state
failed this time
?
Comment 12 Daniel Rowe 2013-03-19 06:01:37 UTC
I mean:

# echo mem > /sys/power/state
Fails the second time after a fresh boot.

# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state
This does not fail at all. Seems to work correctly.

As for error messages I can not see any errors. The system locks up on the second suspend as described above so errors are no visible. On the first suspend that does work, it can not see any thing unusual in the logs of dmesg.

Seems that something is getting corrupted after it resumes from the first suspend so the second one does not work and this does not happen when in test mode (by echoing something to the /sys/power/pm_test file).

Thanks for help and please let me know how I can further help.
Comment 13 Aaron Lu 2013-03-19 06:15:28 UTC
Can you do a git bisect to find the offending commit?
Comment 14 Daniel Rowe 2013-03-19 06:20:00 UTC
There is some info here with details of what commit causes this:

b74f05d61b73af584d0c39121980171389ecfaaa
Comment 15 Daniel Rowe 2013-03-19 06:20:25 UTC
And the link:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1094800
Comment 16 Aaron Lu 2013-03-19 06:24:17 UTC
So this is also the commit that breaks your system?
Comment 17 Brint E. Kriebel 2013-03-24 00:11:58 UTC
I can confirm that this issue also exists on the Toshiba Portege Z935-ST4N04 with Arch Linux. As in the Ubuntu bug that Daniel referenced, stock kernel 3.3.8 works properly, v3.4 and above do not. Starting with v3.4-rc1, the the system reboots when trying to wake from the first resume. Starting with v3.4-rc3, the reboot issue no longer occurs, but the "second suspend" issue exists. I am testing more now, but I assume that commit from the Ubuntu bug accurately is the one that switches from the reboot issue to the second suspend issue.

The common head of 3.3.8 and 3.4 (v3.3) also appears to have the second suspend issue - so it may have been fixed between 3.3 and 3.3.8 before a regression in 3.4. Once my bisecting of v3.4-rc1 and 3.4-rc2 is complete, I will start bisecting 3.3 and 3.3.8.

Please let me know if there is any other information I can provide.
Comment 18 Brint E. Kriebel 2013-03-26 22:13:15 UTC
So, the true culprit appears to be:
commit 2feec47d4c5f80b05f1650f5a24865718978eea4
Author: Bob Moore <robert.moore@intel.com>
Date:   Tue Feb 14 15:00:53 2012 +0800

    ACPICA: ACPI 5: Support for new FADT SleepStatus, SleepControl registers
    
    Adds sleep and wake support for systems with these registers.
    One new file, hwxfsleep.c

When the new extended functions are used, the "second sleep freezes" issue occurs. If I force the system to use the legacy_sleep and legacy_wake functions, the machine properly resumes, even on 3.8.4.

I am investigating now if there is a way to fix the ACPI 5 sleep functions, or if there is just a way to force it to use the legacy functions without changing the source.
Comment 19 Aaron Lu 2013-03-27 05:02:38 UTC
Thanks Brint, it is very helpful.

Add the commit author Bob.

Hi Bob,

Can you please take a look? Commit 2feec47d4c5f80b05f1650f5a24865718978eea4 will make some Toshiba laptop fail to suspend the 2nd time. Thanks.
Comment 20 Aaron Lu 2013-03-27 08:27:49 UTC
Hi Brint,

Please attach acpidump output:
# acpidump > acpidump.out

Thanks.
Comment 21 Brint E. Kriebel 2013-03-27 15:29:55 UTC
Created attachment 96401 [details]
acpidump from Toshiba 935 kernel v3.8.4

acpidump from Toshiba 935 kernel v3.8.4
Comment 22 Robert Moore 2013-03-27 16:13:51 UTC
In the FADT, the "hardware reduced" bit is not set, but the extended sleep control and sleep status registers are populated:

                       Hardware Reduced (V5) : 0

[0F4h 0244  12]       Sleep Control Register : [Generic Address Structure]
[0F4h 0244   1]                     Space ID : 01 [SystemIO]
[0F5h 0245   1]                    Bit Width : 08
[0F6h 0246   1]                   Bit Offset : 00
[0F7h 0247   1]         Encoded Access Width : 03 [DWord Access:32]
[0F8h 0248   8]                      Address : 0000000000000405

[100h 0256  12]        Sleep Status Register : [Generic Address Structure]
[100h 0256   1]                     Space ID : 01 [SystemIO]
[101h 0257   1]                    Bit Width : 08
[102h 0258   1]                   Bit Offset : 00
[103h 0259   1]         Encoded Access Width : 03 [DWord Access:32]
[104h 0260   8]                      Address : 0000000000000401

The extended sleep control and the sleep status registers overlap the legacy PM1A registers:

[038h 0056   4]     PM1A Event Block Address : 00000400
[040h 0064   4]   PM1A Control Block Address : 00000404

I don't see anything really obviously wrong here, unless the use of the extended registers somehow confuses the machine.

One possiblity is that the extended registers specify an access width of 32 bits -- meaning that a 32-bit read or write will always be performed (even though the actual bit width of each register is only 8 bits.)

You may have to dig down and figure out what is the difference in the I/O behavior between the legacy case and the extended case.
Comment 23 Brint E. Kriebel 2013-03-27 16:52:39 UTC
Thanks, Bob. Do you have any suggestions on how to start tracking this down? I'm comfortable tweaking the code, but don't know much about ACPI internals, so I'm not sure where to begin.

Is it reasonable to introduce some kind of boot-time option to force the use of the legacy functions, or is it more worth the effort to dissect the extended functions to see if we can fix them?
Comment 24 Robert Moore 2013-03-27 16:57:43 UTC
I would hope that we don't need yet another boot-time option.

I'm not really sure that the code that uses the extended registers has actually ever seen a real ACPI 5.0 FADT. So, it is possible that there is a problem.

The code in question:
acpi_hw_extended_sleep (and wake): hwesleep.c
acpi_hw_legacy_sleep (and wake): hwsleep.c

A bit of instrumentation here to see what exactly is being written to which registers (in both extended and legacy cases) may reveal the problem.
Comment 25 Daniel Rowe 2013-03-29 01:54:30 UTC
Created attachment 96521 [details]
acpidump from the z930
Comment 26 Daniel Rowe 2013-03-29 01:56:01 UTC
Comment on attachment 96521 [details]
acpidump from the z930

This is on 3.8.4
Comment 27 Aaron Lu 2013-04-02 06:26:53 UTC
Hi Bob,

I'll move this bug to ACPICA and assign it to you, is it OK?
Comment 28 Robert Moore 2013-04-02 16:00:23 UTC
It is not clear what the problem is here. Please do not move it yet.
Comment 29 Aaron Lu 2013-04-07 12:00:58 UTC
Hi Bob,

Section 4.8.3.7 of ACPI spec has words like this:
The optional ACPI sleep registers (SLEEP_CONTROL_REG and SLEEP_STATUS_REG) specify a standard mechanism for system sleep state entry on HW-Reduced ACPI systems.

So it seems to me, if this is not a HW reduced system, we shouldn't use these regs, no matter what the values are in the FADT table. Does this make sense to you?
Comment 30 Robert Moore 2013-04-10 00:50:42 UTC
I find the spec to be a bit ambiguous:

"When implemented, the Sleep registers are a replacement for the SLP_TYP, SLP_EN and WAK_STS registers in the PM1_BLK"

For most (or all) of the other extended GAS registers in the FADT, "when implemented" means "non-zero".

In any case, the system in question has apparently valid values in these register fields. So we still don't really know why the system is broken.
Comment 31 Lv Zheng 2013-04-25 06:29:02 UTC
Created attachment 99981 [details]
[RFC Patch 2/3] ACPICA: Hardware: Modify sleep

Thanks for reporting this bug.

Your report can be a guide for us to implement sleep functionality that is compatible with ACPI 5.0.
Could you please give me a test using the attached patches to confirm whether the fix is valid?
The patch set is generated against latest kernel release.
If you have problem in applying this patch set, please let me know and post your kernel version number on the bugzilla.

Thanks in advance.
Comment 32 kk1 2013-04-25 08:21:06 UTC
I confirm, that the patch solved problem with second suspend for me on notebook Toshiba Tecra R950. I tested kernel 3.9-rc8 with applied patch. Second suspend problem still occurs on kernel 3.9-rc8 without patch.

Thanks in advance.
Comment 33 Brint E. Kriebel 2013-04-25 21:35:43 UTC
Confirming the same findings as kk1 on the Z935 with 3.9-rc8. Without the patches, the second suspend issue occurs. With the patches, the system suspends and resumes multiple times without issue. Thanks, Lv!
Comment 34 Lv Zheng 2013-04-26 03:19:55 UTC
Hi,

I'll propose this fix to ACPICA.
Thanks for reporting and testing, I'll Cc you guys when it is accepted by ACPICA and converted to Linuxized patch during an ACPICA release process.

Thanks
Comment 35 Zhang Rui 2013-05-13 02:26:54 UTC
*** Bug 53011 has been marked as a duplicate of this bug. ***
Comment 36 Daniel Rowe 2013-06-09 04:03:03 UTC
Note there is a loadable Kernel module here while we wait main line.


https://bugzilla.redhat.com/show_bug.cgi?id=904303
Comment 37 Lv Zheng 2013-06-25 00:47:30 UTC
The patches that have been pasted to fix the sleep flow and the bit-masked register access are not merged currently.

The solution from comment #29 is upstreamed, it can also fix the issues found on the Toshiba machines.

author Lv Zheng <lv.zheng@intel.com> 2013-06-08 00:59:18 (GMT) 
committer Rafael J. Wysocki <rafael.j.wysocki@intel.com> 2013-06-15 22:56:22 (GMT) 
commit 7cec7048fe22e3e92389da2cd67098f6c4284e7f (patch) 
tree 69ace1151d4870173b6bf35dce23a9d3bd608f11 
parent 42f47869c6a73a6893c998725365b587b0311f9a (diff) 

ACPICA: Do not use extended sleep registers unless HW-reduced bit is setPrevious implementation incorrectly used the ACPI 5.0 extended
sleep registers if they were simply populated. This caused
problems on some non-HW-reduced machines. As per the ACPI spec,
they should only be used if the HW-reduced bit is set.  Lv Zheng,
ACPICA BZ 1020.

This bug is going to be closed.

Note You need to log in before you can comment on or make changes to this bug.