Bug 74371 - suspend: Some devices failed to suspend
Summary: suspend: Some devices failed to suspend
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Tables (show other bugs)
Hardware: i386 Linux
: P1 high
Assignee: Rafael J. Wysocki
URL:
Keywords:
: 74831 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-04-17 23:09 UTC by wxg4net
Modified: 2015-07-21 18:57 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.14.1 - 3.15-rc
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
suspend dmesg (52.45 KB, text/plain)
2014-04-17 23:09 UTC, wxg4net
Details
acpidump (154.72 KB, text/plain)
2014-04-19 13:11 UTC, wxg4net
Details
dmesg-for first bad commit (59.17 KB, text/plain)
2014-04-19 13:14 UTC, wxg4net
Details
serial.patch (1.61 KB, patch)
2014-04-19 13:42 UTC, Lan Tianyu
Details | Diff
dmesg-kernel-3.13.10-ck (55.80 KB, text/plain)
2014-04-21 03:44 UTC, wxg4net
Details
acpidump-kernel-3.13.10-ck (154.72 KB, text/plain)
2014-04-21 03:45 UTC, wxg4net
Details
PNP / ACPI: Check if _DIS is present in pnpacpi_disable_resources() (1.08 KB, patch)
2014-04-29 20:35 UTC, Rafael J. Wysocki
Details | Diff
PNP / ACPI: Check if _DIS and _SRS are present when disabling/enabling devices (2.26 KB, patch)
2014-04-30 17:20 UTC, Rafael J. Wysocki
Details | Diff

Description wxg4net 2014-04-17 23:09:26 UTC
Created attachment 132841 [details]
suspend dmesg

[pc:~] dmesg | grep fail
[    0.294492] ACPI: \_PR_.P003: failed to get CPU APIC ID.
[    0.294495] ACPI: \_PR_.P004: failed to get CPU APIC ID.
[    0.294497] ACPI: \_PR_.P005: failed to get CPU APIC ID.
[    0.294500] ACPI: \_PR_.P006: failed to get CPU APIC ID.
[    0.294749] acpi PNP0A03:00: _OSC failed (AE_SUPPORT); disabling ASPM
[  358.435996] serial 00:0a: disable failed
[  358.436005] PM: Device 00:0a failed to suspend: error -5
[  358.713839] PM: Some devices failed to suspend, or early wake event detected
[  360.279019] serial 00:0a: disable failed
[  360.279034] PM: Device 00:0a failed to suspend: error -5
[  360.546168] PM: Some devices failed to suspend, or early wake event detected
[  367.806589] serial 00:0a: disable failed
[  367.806597] PM: Device 00:0a failed to suspend: error -5
[  368.085033] PM: Some devices failed to suspend, or early wake event detected
Comment 1 Lan Tianyu 2014-04-18 02:13:26 UTC
Hi, from the log, serial device blocks the system suspend.
Comment 2 wxg4net 2014-04-18 03:26:59 UTC
(In reply to Lan Tianyu from comment #1)
> Hi, from the log, serial device blocks the system suspend.

yea,  but it should be suspend.

kernel 3.13 (<3.14) works good.  

and have no error text
"
[    0.294492] ACPI: \_PR_.P003: failed to get CPU APIC ID.
[    0.294495] ACPI: \_PR_.P004: failed to get CPU APIC ID.
[    0.294497] ACPI: \_PR_.P005: failed to get CPU APIC ID.
[    0.294500] ACPI: \_PR_.P006: failed to get CPU APIC ID.
"
Comment 3 wxg4net 2014-04-18 03:34:59 UTC
[pc:~] lspci 
00:00.0 RAM memory: NVIDIA Corporation MCP61 Host Bridge (rev a1)
00:01.0 ISA bridge: NVIDIA Corporation MCP61 LPC Bridge (rev a2)
00:01.1 SMBus: NVIDIA Corporation MCP61 SMBus (rev a2)
00:01.2 RAM memory: NVIDIA Corporation MCP61 Memory Controller (rev a2)
00:02.0 USB controller: NVIDIA Corporation MCP61 USB 1.1 Controller (rev a3)
00:02.1 USB controller: NVIDIA Corporation MCP61 USB 2.0 Controller (rev a3)
00:04.0 PCI bridge: NVIDIA Corporation MCP61 PCI bridge (rev a1)
00:05.0 Audio device: NVIDIA Corporation MCP61 High Definition Audio (rev a2)
00:06.0 IDE interface: NVIDIA Corporation MCP61 IDE (rev a2)
00:07.0 Bridge: NVIDIA Corporation MCP61 Ethernet (rev a2)
00:08.0 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2)
00:08.1 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2)
00:09.0 PCI bridge: NVIDIA Corporation MCP61 PCI Express bridge (rev a2)
00:0b.0 PCI bridge: NVIDIA Corporation MCP61 PCI Express bridge (rev a2)
00:0c.0 PCI bridge: NVIDIA Corporation MCP61 PCI Express bridge (rev a2)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control
02:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)
02:00.1 Audio device: NVIDIA Corporation High Definition Audio Controller (rev a1)
[pc:~] lsusb
Bus 001 Device 003: ID 1871:0101 Aveo Technology Corp. UVC camera (Bresser microscope)
Bus 001 Device 004: ID 05e3:0723 Genesys Logic, Inc. GL827L SD/MMC/MS Flash Card Reader
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 002: ID 046a:0011 Cherry GmbH G83 (RS 6000) Keyboard
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Comment 4 Lan Tianyu 2014-04-18 05:14:39 UTC
(In reply to wxg4net from comment #2)
> (In reply to Lan Tianyu from comment #1)
> > Hi, from the log, serial device blocks the system suspend.
> 
> yea,  but it should be suspend.
> 
> kernel 3.13 (<3.14) works good. 

So this is a regression. Could you do git bisect?

> 
> and have no error text
> "
> [    0.294492] ACPI: \_PR_.P003: failed to get CPU APIC ID.
> [    0.294495] ACPI: \_PR_.P004: failed to get CPU APIC ID.
> [    0.294497] ACPI: \_PR_.P005: failed to get CPU APIC ID.
> [    0.294500] ACPI: \_PR_.P006: failed to get CPU APIC ID.
> "

This is another issue and not related with system suspend.
Comment 5 wxg4net 2014-04-18 07:16:16 UTC
ok i will try.
Comment 6 wxg4net 2014-04-19 11:00:36 UTC
@Lan Tianyu. i just paste it as follows. (my english is not very good , but i am working on it)

[pc:~/Source/linux] git bisect bad
202317a573b20d77a9abb7c16a3fd5b40cef3d9d is the first bad commit
commit 202317a573b20d77a9abb7c16a3fd5b40cef3d9d
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Fri Nov 22 21:54:37 2013 +0100

    ACPI / scan: Add acpi_device objects for all device nodes in the namespace
    
    Modify the ACPI namespace scanning code to register a struct
    acpi_device object for every namespace node representing a device,
    processor and so on, even if the device represented by that namespace
    node is reported to be not present and not functional by _STA.
    
    There are multiple reasons to do that.  First of all, it avoids
    quite a lot of overhead when struct acpi_device objects are
    deleted every time acpi_bus_trim() is run and then added again
    by a subsequent acpi_bus_scan() for the same scope, although the
    namespace objects they correspond to stay in memory all the time
    (which always is the case on a vast majority of systems).
    
    Second, it will allow user space to see that there are namespace
    nodes representing devices that are not present at the moment and may
    be added to the system.  It will also allow user space to evaluate
    _SUN for those nodes to check what physical slots the "missing"
    devices may be put into and it will make sense to add a sysfs
    attribute for _STA evaluation after this change (that will be
    useful for thermal management on some systems).
    
    Next, it will help to consolidate the ACPI hotplug handling among
    subsystems by making it possible to store hotplug-related information
    in struct acpi_device objects in a standard common way.
    
    Finally, it will help to avoid a race condition related to the
    deletion of ACPI namespace nodes.  Namely, namespace nodes may be
    deleted as a result of a table unload triggered by _EJ0 or _DCK.
    If a hotplug notification for one of those nodes is triggered
    right before the deletion and it executes a hotplug callback
    via acpi_hotplug_execute(), the ACPI handle passed to that
    callback may be stale when the callback actually runs.  One way
    to work around that is to always pass struct acpi_device pointers
    to hotplug callbacks after doing a get_device() on the objects in
    question which eliminates the use-after-free possibility (the ACPI
    handles in those objects are invalidated by acpi_scan_drop_device(),
    so they will trigger ACPICA errors on attempts to use them).
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>

:040000 040000 35dbf7e48aefa948423794c493735c1f53888ca7 468f72c1d2701c915ea11dad167a037ae17ee442 M	Documentation
:040000 040000 dab9b3c433acff383f2e4388b69cdb5c70165419 8c9784d505369210a42bc69aeffdacf623e5f7ac M	drivers
:040000 040000 d22abfd7cfccd5212bde95b41ca354f51aa8630d b99738ef79bd9e9d7392f6e400029b5a1622aefc M	include
Comment 7 Lan Tianyu 2014-04-19 13:04:19 UTC
Please provide the output of acpidump.
Comment 8 wxg4net 2014-04-19 13:11:15 UTC
Created attachment 133011 [details]
acpidump
Comment 9 wxg4net 2014-04-19 13:14:30 UTC
Created attachment 133021 [details]
dmesg-for first bad commit
Comment 10 Lan Tianyu 2014-04-19 13:42:09 UTC
Created attachment 133041 [details]
serial.patch

Please try this patch.
Comment 11 wxg4net 2014-04-19 14:30:30 UTC
(In reply to Lan Tianyu from comment #10)
> Created attachment 133041 [details]
> serial.patch
> 
> Please try this patch.

it do not work 
pciture;http://t.cn/8sjcDfH
Comment 12 wxg4net 2014-04-21 03:11:53 UTC
Now, what shall we do next?
Comment 13 Lan Tianyu 2014-04-21 03:37:17 UTC
Could you provide dmesg on good kernel?
Comment 14 wxg4net 2014-04-21 03:44:20 UTC
Created attachment 133121 [details]
dmesg-kernel-3.13.10-ck
Comment 15 wxg4net 2014-04-21 03:45:07 UTC
Created attachment 133131 [details]
acpidump-kernel-3.13.10-ck
Comment 16 nonproffessional 2014-04-29 19:38:37 UTC
Hi, I have just experienced the same error and can recreate it by reapplying kernel 3.14.  What extra info is needed to help?

$ lspci
00:00.0 Host bridge: NVIDIA Corporation MCP79 Host Bridge (rev b1)
00:00.1 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1)
00:03.0 ISA bridge: NVIDIA Corporation MCP79 LPC Bridge (rev b2)
00:03.1 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1)
00:03.2 SMBus: NVIDIA Corporation MCP79 SMBus (rev b1)
00:03.3 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1)
00:03.4 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1)
00:03.5 Co-processor: NVIDIA Corporation MCP79 Co-processor (rev b1)
00:04.0 USB controller: NVIDIA Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:04.1 USB controller: NVIDIA Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:06.0 USB controller: NVIDIA Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:06.1 USB controller: NVIDIA Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:08.0 Audio device: NVIDIA Corporation MCP79 High Definition Audio (rev b1)
00:09.0 PCI bridge: NVIDIA Corporation MCP79 PCI Bridge (rev b1)
00:0a.0 Ethernet controller: NVIDIA Corporation MCP79 Ethernet (rev b1)
00:0b.0 IDE interface: NVIDIA Corporation MCP79 SATA Controller (rev b1)
00:0c.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1)
00:15.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1)
00:16.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1)
01:06.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI Fast Ethernet (rev 02)
02:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 9800 GT] (rev a2)
04:00.0 IDE interface: JMicron Technology Corp. JMB368 IDE controller
$ lsusb
Bus 004 Device 004: ID 04d9:1603 Holtek Semiconductor, Inc. Keyboard
Bus 004 Device 003: ID 046d:c408 Logitech, Inc. Marble Mouse (4-button)
Bus 004 Device 002: ID 05e3:0608 Genesys Logic, Inc. USB-2.0 4-Port HUB
Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 23
Model name:            Intel(R) Core(TM)2 Duo CPU     E7300  @ 2.66GHz
Stepping:              6
CPU MHz:               1603.000
CPU max MHz:           2670.0000
CPU min MHz:           1603.0000
BogoMIPS:              5335.85
L1d cache:             32K
L1i cache:             32K
L2 cache:              3072K
NUMA node0 CPU(s):     0,1
Comment 17 Rafael J. Wysocki 2014-04-29 20:05:44 UTC
There are two errors mentioned in the report, the

ACPI: \_PR_.P003: failed to get CPU APIC ID.

one which should be worked around by commit f778d1218f10
(ACPI / scan: reduce log level of "ACPI: \_PR_.CPU4: failed
to get CPU APIC ID") in 3.15-rc1 and later.

The second one is the serial suspend issue.

Which of the problems do you mean?
Comment 18 Rafael J. Wysocki 2014-04-29 20:22:24 UTC
(In reply to wxg4net from comment #12)
> Now, what shall we do next?

I suppose that pnpacpi_suspend() fails for you, but we don't know which part of it fails.
Comment 19 Rafael J. Wysocki 2014-04-29 20:30:07 UTC
Actually, I was wrong, it's pnp_stop_dev() that fails.

Moreover, the error message tells us that the protocol's ->disable() callback returns an error code and that's pnpacpi_disable_resources() for the ACPI protocol.

Now, the only reason why that can fail is because

acpi_evaluate_object(handle, "_DIS", NULL, NULL)

returns an error code, which is quite obvious in your case, because there's no _DIS for that device in the ACPI tables of your system.  That is not a BIOS bug, but a bug in pnpacpi_disable_resources() which should check for that particular case.
Comment 20 Rafael J. Wysocki 2014-04-29 20:35:10 UTC
Created attachment 134281 [details]
PNP / ACPI: Check if _DIS is present in pnpacpi_disable_resources()

Can you please check if this patch helps?
Comment 21 nonproffessional 2014-04-29 20:44:52 UTC
Both issues occur with 3.14 only.  I see almost identical results as wxg4net.

$ dmesg | grep fail
[    0.129027] ACPI: \_PR_.P003: failed to get CPU APIC ID.
[    0.129031] ACPI: \_PR_.P004: failed to get CPU APIC ID.
[    0.129363] acpi PNP0A03:00: _OSC failed (AE_SUPPORT); disabling ASPM
[   80.215565] serial 00:0a: disable failed
[   80.215577] PM: Device 00:0a failed to suspend: error -5
[   80.270098] PM: Some devices failed to suspend, or early wake event detected
[   87.625801] serial 00:0a: disable failed
[   87.625812] PM: Device 00:0a failed to suspend: error -5
[   87.680101] PM: Some devices failed to suspend, or early wake event detected
[   95.043763] serial 00:0a: disable failed
[   95.043774] PM: Device 00:0a failed to suspend: error -5
[   95.100102] PM: Some devices failed to suspend, or early wake event detected

And here is one of the suspend attempts in detail:

[   74.227178] PM: Syncing filesystems ... done.
[   74.232772] PM: Preparing system for mem sleep
[   74.737994] Freezing user space processes ... (elapsed 0.002 seconds) done.
[   74.740107] Freezing remaining freezable tasks ... (elapsed 5.471 seconds) done.
[   80.211817] PM: Entering mem sleep
[   80.211847] Suspending console(s) (use no_console_suspend to debug)
[   80.215257] sd 3:0:0:0: [sdb] Stopping disk
[   80.215301] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[   80.215565] serial 00:0a: disable failed
[   80.215574] dpm_run_callback(): pnp_bus_suspend+0x0/0x20 returns -5
[   80.215577] PM: Device 00:0a failed to suspend: error -5
[   80.216104] sd 2:0:0:0: [sda] Stopping disk
[   80.270098] PM: Some devices failed to suspend, or early wake event detected
[   80.271486] sd 2:0:0:0: [sda] Starting disk
[   80.271567] sd 3:0:0:0: [sdb] Starting disk
[   80.590030] ata7: SATA link down (SStatus 0 SControl 300)
[   80.590041] ata8: SATA link down (SStatus 0 SControl 300)
[   80.590051] ata5: SATA link down (SStatus 0 SControl 300)
[   80.590073] ata6: SATA link down (SStatus 0 SControl 300)
[   80.593212] PM: resume of devices complete after 323.109 msecs
[   80.614945] PM: Finishing wakeup.
[   80.614947] Restarting tasks ... done.
Comment 22 Rafael J. Wysocki 2014-04-29 21:02:16 UTC
Thanks.

So what about the patch in comment #20?
Comment 23 nonproffessional 2014-04-30 11:45:47 UTC
Suspend now works with that patch!  If I understand your description this means  the error is detected but not hidden, which fits what can now be seen in the log.

$ dmesg | grep fail
[    0.126395] ACPI: \_PR_.P003: failed to get CPU APIC ID.
[    0.126399] ACPI: \_PR_.P004: failed to get CPU APIC ID.
[    0.126734] acpi PNP0A03:00: _OSC failed (AE_SUPPORT); disabling ASPM
[   34.265683] serial 00:0a: activation failed
[   34.265690] PM: Device 00:0a failed to resume: error -5

Previously the suspend would fail 3 times but now it succeeds on the first attempt.
Comment 24 Rafael J. Wysocki 2014-04-30 17:18:29 UTC
Thanks a lot for testing!

What you're seeing now is an analogous problem during resume: we are trying to use the _SRS method which is missing.
Comment 25 Rafael J. Wysocki 2014-04-30 17:20:51 UTC
Created attachment 134481 [details]
PNP / ACPI: Check if _DIS and _SRS are present when disabling/enabling devices

Please check if this patch fixes the resume error too.
Comment 26 nonproffessional 2014-04-30 19:45:28 UTC
That definitely fixed it.  The only remaining error message is the earlier one which you say is gone as of 3.15-rc1.

Congratulations on your bug hunt.
Comment 27 Rafael J. Wysocki 2014-04-30 20:58:31 UTC
Thanks!

Patch: https://patchwork.kernel.org/patch/4096201/
Comment 28 wxg4net 2014-05-01 00:54:53 UTC
Good! to Lan Tianyu and Rafael J. Wysocki thanks very much.
Comment 29 Lan Tianyu 2014-05-12 01:34:53 UTC
*** Bug 74831 has been marked as a duplicate of this bug. ***
Comment 30 Len Brown 2015-07-21 18:57:32 UTC
In Linux 3.15-rc4:

commit a8d22396302b7e4e5f0a594c1c1594388c29edaf
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Wed Apr 30 22:36:33 2014 +0200

    PNP / ACPI: Do not return errors if _DIS or _SRS are not present

Note You need to log in before you can comment on or make changes to this bug.