Bug 74371
Summary: | suspend: Some devices failed to suspend | ||
---|---|---|---|
Product: | ACPI | Reporter: | wxg4net (wxg4net) |
Component: | Config-Tables | Assignee: | Rafael J. Wysocki (rjw) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | faerbit, lenb, nonproffessional, rjw, rui.zhang, tianyu.lan |
Priority: | P1 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 3.14.1 - 3.15-rc | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
suspend dmesg
acpidump dmesg-for first bad commit serial.patch dmesg-kernel-3.13.10-ck acpidump-kernel-3.13.10-ck PNP / ACPI: Check if _DIS is present in pnpacpi_disable_resources() PNP / ACPI: Check if _DIS and _SRS are present when disabling/enabling devices |
Hi, from the log, serial device blocks the system suspend. (In reply to Lan Tianyu from comment #1) > Hi, from the log, serial device blocks the system suspend. yea, but it should be suspend. kernel 3.13 (<3.14) works good. and have no error text " [ 0.294492] ACPI: \_PR_.P003: failed to get CPU APIC ID. [ 0.294495] ACPI: \_PR_.P004: failed to get CPU APIC ID. [ 0.294497] ACPI: \_PR_.P005: failed to get CPU APIC ID. [ 0.294500] ACPI: \_PR_.P006: failed to get CPU APIC ID. " [pc:~] lspci 00:00.0 RAM memory: NVIDIA Corporation MCP61 Host Bridge (rev a1) 00:01.0 ISA bridge: NVIDIA Corporation MCP61 LPC Bridge (rev a2) 00:01.1 SMBus: NVIDIA Corporation MCP61 SMBus (rev a2) 00:01.2 RAM memory: NVIDIA Corporation MCP61 Memory Controller (rev a2) 00:02.0 USB controller: NVIDIA Corporation MCP61 USB 1.1 Controller (rev a3) 00:02.1 USB controller: NVIDIA Corporation MCP61 USB 2.0 Controller (rev a3) 00:04.0 PCI bridge: NVIDIA Corporation MCP61 PCI bridge (rev a1) 00:05.0 Audio device: NVIDIA Corporation MCP61 High Definition Audio (rev a2) 00:06.0 IDE interface: NVIDIA Corporation MCP61 IDE (rev a2) 00:07.0 Bridge: NVIDIA Corporation MCP61 Ethernet (rev a2) 00:08.0 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2) 00:08.1 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2) 00:09.0 PCI bridge: NVIDIA Corporation MCP61 PCI Express bridge (rev a2) 00:0b.0 PCI bridge: NVIDIA Corporation MCP61 PCI Express bridge (rev a2) 00:0c.0 PCI bridge: NVIDIA Corporation MCP61 PCI Express bridge (rev a2) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Address Map 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control 02:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2) 02:00.1 Audio device: NVIDIA Corporation High Definition Audio Controller (rev a1) [pc:~] lsusb Bus 001 Device 003: ID 1871:0101 Aveo Technology Corp. UVC camera (Bresser microscope) Bus 001 Device 004: ID 05e3:0723 Genesys Logic, Inc. GL827L SD/MMC/MS Flash Card Reader Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 002: ID 046a:0011 Cherry GmbH G83 (RS 6000) Keyboard Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub (In reply to wxg4net from comment #2) > (In reply to Lan Tianyu from comment #1) > > Hi, from the log, serial device blocks the system suspend. > > yea, but it should be suspend. > > kernel 3.13 (<3.14) works good. So this is a regression. Could you do git bisect? > > and have no error text > " > [ 0.294492] ACPI: \_PR_.P003: failed to get CPU APIC ID. > [ 0.294495] ACPI: \_PR_.P004: failed to get CPU APIC ID. > [ 0.294497] ACPI: \_PR_.P005: failed to get CPU APIC ID. > [ 0.294500] ACPI: \_PR_.P006: failed to get CPU APIC ID. > " This is another issue and not related with system suspend. ok i will try. @Lan Tianyu. i just paste it as follows. (my english is not very good , but i am working on it) [pc:~/Source/linux] git bisect bad 202317a573b20d77a9abb7c16a3fd5b40cef3d9d is the first bad commit commit 202317a573b20d77a9abb7c16a3fd5b40cef3d9d Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Fri Nov 22 21:54:37 2013 +0100 ACPI / scan: Add acpi_device objects for all device nodes in the namespace Modify the ACPI namespace scanning code to register a struct acpi_device object for every namespace node representing a device, processor and so on, even if the device represented by that namespace node is reported to be not present and not functional by _STA. There are multiple reasons to do that. First of all, it avoids quite a lot of overhead when struct acpi_device objects are deleted every time acpi_bus_trim() is run and then added again by a subsequent acpi_bus_scan() for the same scope, although the namespace objects they correspond to stay in memory all the time (which always is the case on a vast majority of systems). Second, it will allow user space to see that there are namespace nodes representing devices that are not present at the moment and may be added to the system. It will also allow user space to evaluate _SUN for those nodes to check what physical slots the "missing" devices may be put into and it will make sense to add a sysfs attribute for _STA evaluation after this change (that will be useful for thermal management on some systems). Next, it will help to consolidate the ACPI hotplug handling among subsystems by making it possible to store hotplug-related information in struct acpi_device objects in a standard common way. Finally, it will help to avoid a race condition related to the deletion of ACPI namespace nodes. Namely, namespace nodes may be deleted as a result of a table unload triggered by _EJ0 or _DCK. If a hotplug notification for one of those nodes is triggered right before the deletion and it executes a hotplug callback via acpi_hotplug_execute(), the ACPI handle passed to that callback may be stale when the callback actually runs. One way to work around that is to always pass struct acpi_device pointers to hotplug callbacks after doing a get_device() on the objects in question which eliminates the use-after-free possibility (the ACPI handles in those objects are invalidated by acpi_scan_drop_device(), so they will trigger ACPICA errors on attempts to use them). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> :040000 040000 35dbf7e48aefa948423794c493735c1f53888ca7 468f72c1d2701c915ea11dad167a037ae17ee442 M Documentation :040000 040000 dab9b3c433acff383f2e4388b69cdb5c70165419 8c9784d505369210a42bc69aeffdacf623e5f7ac M drivers :040000 040000 d22abfd7cfccd5212bde95b41ca354f51aa8630d b99738ef79bd9e9d7392f6e400029b5a1622aefc M include Please provide the output of acpidump. Created attachment 133011 [details]
acpidump
Created attachment 133021 [details]
dmesg-for first bad commit
Created attachment 133041 [details]
serial.patch
Please try this patch.
(In reply to Lan Tianyu from comment #10) > Created attachment 133041 [details] > serial.patch > > Please try this patch. it do not work pciture;http://t.cn/8sjcDfH Now, what shall we do next? Could you provide dmesg on good kernel? Created attachment 133121 [details]
dmesg-kernel-3.13.10-ck
Created attachment 133131 [details]
acpidump-kernel-3.13.10-ck
Hi, I have just experienced the same error and can recreate it by reapplying kernel 3.14. What extra info is needed to help? $ lspci 00:00.0 Host bridge: NVIDIA Corporation MCP79 Host Bridge (rev b1) 00:00.1 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1) 00:03.0 ISA bridge: NVIDIA Corporation MCP79 LPC Bridge (rev b2) 00:03.1 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1) 00:03.2 SMBus: NVIDIA Corporation MCP79 SMBus (rev b1) 00:03.3 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1) 00:03.4 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1) 00:03.5 Co-processor: NVIDIA Corporation MCP79 Co-processor (rev b1) 00:04.0 USB controller: NVIDIA Corporation MCP79 OHCI USB 1.1 Controller (rev b1) 00:04.1 USB controller: NVIDIA Corporation MCP79 EHCI USB 2.0 Controller (rev b1) 00:06.0 USB controller: NVIDIA Corporation MCP79 OHCI USB 1.1 Controller (rev b1) 00:06.1 USB controller: NVIDIA Corporation MCP79 EHCI USB 2.0 Controller (rev b1) 00:08.0 Audio device: NVIDIA Corporation MCP79 High Definition Audio (rev b1) 00:09.0 PCI bridge: NVIDIA Corporation MCP79 PCI Bridge (rev b1) 00:0a.0 Ethernet controller: NVIDIA Corporation MCP79 Ethernet (rev b1) 00:0b.0 IDE interface: NVIDIA Corporation MCP79 SATA Controller (rev b1) 00:0c.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1) 00:15.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1) 00:16.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1) 01:06.0 Ethernet controller: Silicon Integrated Systems [SiS] SiS900 PCI Fast Ethernet (rev 02) 02:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 9800 GT] (rev a2) 04:00.0 IDE interface: JMicron Technology Corp. JMB368 IDE controller $ lsusb Bus 004 Device 004: ID 04d9:1603 Holtek Semiconductor, Inc. Keyboard Bus 004 Device 003: ID 046d:c408 Logitech, Inc. Marble Mouse (4-button) Bus 004 Device 002: ID 05e3:0608 Genesys Logic, Inc. USB-2.0 4-Port HUB Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 23 Model name: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz Stepping: 6 CPU MHz: 1603.000 CPU max MHz: 2670.0000 CPU min MHz: 1603.0000 BogoMIPS: 5335.85 L1d cache: 32K L1i cache: 32K L2 cache: 3072K NUMA node0 CPU(s): 0,1 There are two errors mentioned in the report, the ACPI: \_PR_.P003: failed to get CPU APIC ID. one which should be worked around by commit f778d1218f10 (ACPI / scan: reduce log level of "ACPI: \_PR_.CPU4: failed to get CPU APIC ID") in 3.15-rc1 and later. The second one is the serial suspend issue. Which of the problems do you mean? (In reply to wxg4net from comment #12) > Now, what shall we do next? I suppose that pnpacpi_suspend() fails for you, but we don't know which part of it fails. Actually, I was wrong, it's pnp_stop_dev() that fails. Moreover, the error message tells us that the protocol's ->disable() callback returns an error code and that's pnpacpi_disable_resources() for the ACPI protocol. Now, the only reason why that can fail is because acpi_evaluate_object(handle, "_DIS", NULL, NULL) returns an error code, which is quite obvious in your case, because there's no _DIS for that device in the ACPI tables of your system. That is not a BIOS bug, but a bug in pnpacpi_disable_resources() which should check for that particular case. Created attachment 134281 [details]
PNP / ACPI: Check if _DIS is present in pnpacpi_disable_resources()
Can you please check if this patch helps?
Both issues occur with 3.14 only. I see almost identical results as wxg4net. $ dmesg | grep fail [ 0.129027] ACPI: \_PR_.P003: failed to get CPU APIC ID. [ 0.129031] ACPI: \_PR_.P004: failed to get CPU APIC ID. [ 0.129363] acpi PNP0A03:00: _OSC failed (AE_SUPPORT); disabling ASPM [ 80.215565] serial 00:0a: disable failed [ 80.215577] PM: Device 00:0a failed to suspend: error -5 [ 80.270098] PM: Some devices failed to suspend, or early wake event detected [ 87.625801] serial 00:0a: disable failed [ 87.625812] PM: Device 00:0a failed to suspend: error -5 [ 87.680101] PM: Some devices failed to suspend, or early wake event detected [ 95.043763] serial 00:0a: disable failed [ 95.043774] PM: Device 00:0a failed to suspend: error -5 [ 95.100102] PM: Some devices failed to suspend, or early wake event detected And here is one of the suspend attempts in detail: [ 74.227178] PM: Syncing filesystems ... done. [ 74.232772] PM: Preparing system for mem sleep [ 74.737994] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 74.740107] Freezing remaining freezable tasks ... (elapsed 5.471 seconds) done. [ 80.211817] PM: Entering mem sleep [ 80.211847] Suspending console(s) (use no_console_suspend to debug) [ 80.215257] sd 3:0:0:0: [sdb] Stopping disk [ 80.215301] sd 2:0:0:0: [sda] Synchronizing SCSI cache [ 80.215565] serial 00:0a: disable failed [ 80.215574] dpm_run_callback(): pnp_bus_suspend+0x0/0x20 returns -5 [ 80.215577] PM: Device 00:0a failed to suspend: error -5 [ 80.216104] sd 2:0:0:0: [sda] Stopping disk [ 80.270098] PM: Some devices failed to suspend, or early wake event detected [ 80.271486] sd 2:0:0:0: [sda] Starting disk [ 80.271567] sd 3:0:0:0: [sdb] Starting disk [ 80.590030] ata7: SATA link down (SStatus 0 SControl 300) [ 80.590041] ata8: SATA link down (SStatus 0 SControl 300) [ 80.590051] ata5: SATA link down (SStatus 0 SControl 300) [ 80.590073] ata6: SATA link down (SStatus 0 SControl 300) [ 80.593212] PM: resume of devices complete after 323.109 msecs [ 80.614945] PM: Finishing wakeup. [ 80.614947] Restarting tasks ... done. Thanks. So what about the patch in comment #20? Suspend now works with that patch! If I understand your description this means the error is detected but not hidden, which fits what can now be seen in the log. $ dmesg | grep fail [ 0.126395] ACPI: \_PR_.P003: failed to get CPU APIC ID. [ 0.126399] ACPI: \_PR_.P004: failed to get CPU APIC ID. [ 0.126734] acpi PNP0A03:00: _OSC failed (AE_SUPPORT); disabling ASPM [ 34.265683] serial 00:0a: activation failed [ 34.265690] PM: Device 00:0a failed to resume: error -5 Previously the suspend would fail 3 times but now it succeeds on the first attempt. Thanks a lot for testing! What you're seeing now is an analogous problem during resume: we are trying to use the _SRS method which is missing. Created attachment 134481 [details]
PNP / ACPI: Check if _DIS and _SRS are present when disabling/enabling devices
Please check if this patch fixes the resume error too.
That definitely fixed it. The only remaining error message is the earlier one which you say is gone as of 3.15-rc1. Congratulations on your bug hunt. Thanks! Patch: https://patchwork.kernel.org/patch/4096201/ Good! to Lan Tianyu and Rafael J. Wysocki thanks very much. *** Bug 74831 has been marked as a duplicate of this bug. *** In Linux 3.15-rc4: commit a8d22396302b7e4e5f0a594c1c1594388c29edaf Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Wed Apr 30 22:36:33 2014 +0200 PNP / ACPI: Do not return errors if _DIS or _SRS are not present |
Created attachment 132841 [details] suspend dmesg [pc:~] dmesg | grep fail [ 0.294492] ACPI: \_PR_.P003: failed to get CPU APIC ID. [ 0.294495] ACPI: \_PR_.P004: failed to get CPU APIC ID. [ 0.294497] ACPI: \_PR_.P005: failed to get CPU APIC ID. [ 0.294500] ACPI: \_PR_.P006: failed to get CPU APIC ID. [ 0.294749] acpi PNP0A03:00: _OSC failed (AE_SUPPORT); disabling ASPM [ 358.435996] serial 00:0a: disable failed [ 358.436005] PM: Device 00:0a failed to suspend: error -5 [ 358.713839] PM: Some devices failed to suspend, or early wake event detected [ 360.279019] serial 00:0a: disable failed [ 360.279034] PM: Device 00:0a failed to suspend: error -5 [ 360.546168] PM: Some devices failed to suspend, or early wake event detected [ 367.806589] serial 00:0a: disable failed [ 367.806597] PM: Device 00:0a failed to suspend: error -5 [ 368.085033] PM: Some devices failed to suspend, or early wake event detected