Bug 14100 - re-suspend on wakeup due to ata exceptions - Lenovo X61 Tablet
Summary: re-suspend on wakeup due to ata exceptions - Lenovo X61 Tablet
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: All Linux
: P1 enhancement
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-31 16:40 UTC by Alexander Myodov
Modified: 2010-01-14 20:10 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.30-1-amd64, 2.6.31-rc6-amd64
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Dmesg output after suspend-resume-resuspend-resume (124.70 KB, text/plain)
2009-10-06 00:22 UTC, Alexander Myodov
Details
Better dmesg output (212.35 KB, application/octet-stream)
2009-10-16 09:49 UTC, Alexander Myodov
Details

Description Alexander Myodov 2009-08-31 16:40:52 UTC
This is the upstream propagation of Debian bug http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=544169
Sorry, I am not much into the kernel/C development, so cannot provide a detailed analysis myself.


After I migrated my Lenovo X61 Tablet laptop running Debian Squeeze/testing from latest 2.6.26 kernel (linux-image-2.6.26-2-amd64=2.6.26-17lenny2) to 2.6.30 kernel (linux-image-2.6.30-1-amd64=2.6.30-6), the wake-up process after Suspend-To-Ram (by closing the lid) with about 1/2 (or more, though not always) probability ends in re-Suspend.

That is: I open the lid, the laptop wakes up, the screen changes to text mode, several error lines are printed, and in a second it goes STR again. Sometimes I managed to log in to KDE before it resuspends. If, after resuspend, with the lid open, I press the power button to wake it up again, it wakes up properly and does not resuspend.

I managed to shot the error lines which are printed before the laptop suspends. Each line contain some kind of timestamp; I am printing it for the first line and changing it to xxxxx.xxxxxx for the next lines. If for some reason you need the exact photo (instead of my retype which may contain mistypes), please request it:

------------------------Error starts here------------------------
[18153.323052] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[xxxxx.xxxxxx] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xf t4
[xxxxx.xxxxxx] ata1: irq_stat 0x00400040, connection status changed
[xxxxx.xxxxxx] CPU1: Temperature/speed normal
[xxxxx.xxxxxx] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[xxxxx.xxxxxx] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
[xxxxx.xxxxxx] ata1: irq_stat 0x00400040, connection status changed
[xxxxx.xxxxxx] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[xxxxx.xxxxxx] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
[xxxxx.xxxxxx] ata1: irq_stat 0x00400040, connection status changed
[xxxxx.xxxxxx] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[xxxxx.xxxxxx] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
[xxxxx.xxxxxx] ata1: irq_stat 0x00400040, connection status changed
[xxxxx.xxxxxx] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[xxxxx.xxxxxx] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xf t4
[xxxxx.xxxxxx] ata1: irq_stat 0x00400040, connection status changed
[xxxxx.xxxxxx] CPU1: Temperature/speed normal
[xxxxx.xxxxxx] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[xxxxx.xxxxxx] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
[xxxxx.xxxxxx] ata1: irq_stat 0x00400040, connection status changed
[xxxxx.xxxxxx] e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
[xxxxx.xxxxxx] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xf t4
[xxxxx.xxxxxx] ata1: irq_stat 0x00400040, connection status changed
------------------------ Error ends here ------------------------

The similar behaviour also occured with all the previous 2.6.30RC and 2.6.29 kernels; iirc it has not occured with 2.6.28 line, though I don't have any 2.6.28 packages to confirm. It NEVER occured for many months of using 2.6.26 line.

By additional request in bugs.debian.org, I tried the 2.6.31-rc6 kernel, with the same (still resuspends) result.

Please request me whatever additional information/tests I may provide to you. Thanks in advance.
Comment 1 Zhang Rui 2009-09-01 01:01:04 UTC
If you enter S3 by running "echo mem > /sys/power/state", and then press the power button to resume, does the problem still exist?
Comment 2 Alexander Myodov 2009-09-01 09:12:20 UTC
Hmm, I've made several attempts and didn't got a problem even once. Any hints what that may mean and why it started only with upgrading to 2.6.29/30 (and goes back if I downgrade back to 2.6.26)?
Comment 3 Alexander Myodov 2009-09-01 12:20:54 UTC
I mean, I've made several attempts of "echo mem > /sys/power/state", as you requested.
Comment 4 Zhang Rui 2009-09-02 01:29:58 UTC
the ata exceptions are still there in the above tests, right?

I think the problem is that your laptop receives another lid event on resume.

run "lsof /proc/acpi/event" to see which process is reading this file,
kill that process and close the lid, does the laptop suspend?
Comment 5 Len Brown 2009-09-09 02:25:24 UTC
just to make sure I understand...

When you suspend by typing "echo mem > /sys/power/state"
and resume by using the power button, it always works
in all versions tested, and has no ata errors no matter
how many times it is invoked, yes?

There are two issues here, maybe independent.
Firstly, the LID events seem to be causing the double suspend.
Secondly, the ata error messages...
Comment 6 Alexander Myodov 2009-09-11 19:07:43 UTC
Zhang, it seems that since 2.6.30 (maybe even earlier), Debian kernel team stopped to include support of /proc/acpi/event into the debian kernels. Therefore, "lsof /proc/acpi/event" shows just the single acpid process under 2.6.26, and fails with "No such file or directory" under 2.6.30. But note it doesn't stop "echo mem > /sys/power/state" from working under both kernels!

What is the most contemporary way of listening to the acpi events in 2.6.30 kernels, except /proc/acpi/event? How can I find out what is causing double suspend (if it is a root cause of the issue)? Does it worth to get the Debian maintainers for acpid and acpi-support packages notified on the issue, if it happens on the boundary of kernel and user-level acpi support?

Len, I tried it again in 2.6.30, and it seems to me, thar the similar ATA/e1000e errors are STILL generated with "echo mem > /sys/power/state", but they do not block the laptop from wakeup. Yes, these may be two independent issues, and the ata errors are irrelevant to the double suspend.
Comment 7 Zhang Rui 2009-09-15 05:40:49 UTC
please run
"echo 0x00080044 > /sys/module/acpi/parameters/debug_layer"
"echo 0x88000107 > /sys/module/acpi/parameters/debug_level"
and attach the dmesg output after suspend and resume.
Comment 8 Zhang Rui 2009-09-28 03:46:25 UTC
ping...
Comment 9 Alexander Myodov 2009-09-28 09:52:29 UTC
Sorry for slow response. Seems like I don't have "CONFIG_ACPI_DEBUG=y" in stock Debian kernel config, hence I need to figure out how to properly recompile the kernel with these options, and to run the tests afterwards. I'll definitely reply as soon as I succeed.
Comment 10 Alexander Myodov 2009-10-06 00:20:30 UTC
Well, if it helps - attaching the dmesg output. But seems it was stripped, and increasing the bufsize for dmesg does not help much.
Comment 11 Alexander Myodov 2009-10-06 00:22:30 UTC
Created attachment 23277 [details]
Dmesg output after suspend-resume-resuspend-resume
Comment 12 Zhang Rui 2009-10-09 02:52:37 UTC
please boot with log_buf_len=4M and reattach the dmesg output.
Comment 13 Alexander Myodov 2009-10-16 09:49:59 UTC
Created attachment 23427 [details]
Better dmesg output
Comment 14 Alexander Myodov 2009-10-16 09:51:15 UTC
Attached the new files.
I managed to catch "dmesg" in a second between the first resume and a moment it went suspended again, that is the "1" file. The "2" file is the dmesg output after the second, successful resume.
Comment 15 Zhang Rui 2009-10-22 03:40:31 UTC
please remove the acpi button driver before S3 and see if the problem still exists.
Note that you may need to rebuild the kernel if button driver is built in.
Comment 16 Zhang Rui 2009-12-04 03:48:02 UTC
ping ...
Comment 17 Alexander Myodov 2009-12-04 07:35:05 UTC
Please feel free to close this bug with something like INSUFFICIENT_DATA; I'm not sure if this bug is fixed, but I'm afraid I am not able to perform my part of investigation right now. If I get a chance to collect some more information, I'll reopen it, but at the moment, it should not spoil the statistics.
Comment 18 Zhang Rui 2009-12-07 01:43:51 UTC
Okay. close this bug for now.
Comment 19 Peter Niederlag 2010-01-12 10:55:23 UTC
I was struck by the same issue, which is reported on several trackers.

For me the solution "purge uswsusp" (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=554046) solved the issue on Debian.
Comment 20 Alexander Myodov 2010-01-14 20:10:06 UTC
Purging "uswsusp" didn't help in my case; neither did purging "apmd" or the outdated "kpowersave" and "klaptopdaemon". Purging "acpi-support" (as suggested over there too) caused my laptop to don't suspend automatically at all after closing the lid, so I returned it. I wonder what else could be purged while I am on the purging spree.

Note You need to log in before you can comment on or make changes to this bug.