Bug 49331

Summary: Resume from suspend fails for sda
Product: IO/Storage Reporter: Dimitris Damigos (damigos)
Component: Serial ATAAssignee: power-management_other
Severity: normal CC: aaron.lu, alan, bahramwhh, damigos, florian, jgarzik
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.6rc1-3.6.6 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 56331    
Attachments: lspci output
dmesg output
Possible problematic patch
Possible problematic patch 2
Possible problematic patch 3
Debug patch for SDD on top of 3.5.7
Debug patch for SDD on top of git commits
dmesg of original 3.5.7
dmesg of problematic kernel
DSDT table
Fix resume problem by restoring the ACPI disable functionality

Description Dimitris Damigos 2012-10-22 19:49:17 UTC
I have Arch install on a Sony VAIO VPCCW1S1E. After upgrade to 3.6.2 kernel resume from suspend doesn't work. It seems that the hard disk fails to resume and I get input/output errors. Up to 3.5.6 resume worked fine. I have attached dmesg and lspci output.
Comment 1 Dimitris Damigos 2012-10-22 19:50:01 UTC
Created attachment 84361 [details]
lspci output
Comment 2 Dimitris Damigos 2012-10-22 19:50:36 UTC
Created attachment 84371 [details]
dmesg output
Comment 3 Bahram BahramBeigy 2012-10-25 19:59:18 UTC
same problem here using kernel 3.6.3-1-ARCH and VAIO laptop and similar hardware setup
Comment 4 Dimitris Damigos 2012-11-14 17:04:54 UTC
I have just tried kernel 3.6rc1 and the bug is present.
Comment 5 Jeff Garzik 2012-11-16 04:29:00 UTC
Added Aaron Lu to CC, in case he has ACPI ideas.  Definitely looks like ACPI is pooping itself.
Comment 6 Aaron Lu 2012-11-19 01:56:43 UTC
Hello Dimitris,

Is it possible for you to test the following 3 git commits? See if any of them breaks your system resume, thanks.

febe53ba6b781862c12686c7ea1972bdedee457a libata-acpi: set acpi state for SATA port

30dcf76acc695cbd2fa919e294670fe9552e16e7 libata: migrate ACPI code over to new bindings

6b66d95895c149cbc04d4fac5a2f5477c543a8ae libata: bind the Linux device tree to the ACPI device tree
Comment 7 Dimitris Damigos 2012-11-19 10:52:10 UTC
Hi Aaron. First I would like to say that I don't know how to use git, so I did it by hand :-)

I started with 3.5.7 sources and changed the relevant files of  6b66d95895c149cbc04d4fac5a2f5477c543a8ae, built it successfully and the bug didn't occur.

Then I passed 30dcf76acc695cbd2fa919e294670fe9552e16e7 and it couldn't build. I had to use 092369efbd6ef6b4a215741ce9f65446bf45beff mfd: lpc_ich: Fix a 3.5 kernel regression for iTCO_wdt driver and then it built successfully and the bug appeared.
Comment 8 Aaron Lu 2012-11-20 03:17:09 UTC
Hello Dimitris,

I don't think these patches have anything to do with the mfd: lpc_ich patch...
So I extracted the patches I think may cause problem, you can apply them on top of 3.5.7.

Comment 9 Aaron Lu 2012-11-20 03:18:32 UTC
Created attachment 86651 [details]
Possible problematic patch
Comment 10 Aaron Lu 2012-11-20 03:19:07 UTC
Created attachment 86661 [details]
Possible problematic patch 2
Comment 11 Aaron Lu 2012-11-20 03:19:29 UTC
Created attachment 86671 [details]
Possible problematic patch 3
Comment 12 Dimitris Damigos 2012-11-20 11:54:49 UTC
Hello Aaron,

I tried the patches you extracted and the problematic one is:
cb76c3a197af1e5264f185ad46b79599d5d0e709 libata: migrate ACPI code over to new bindings
As for the build problem with mfd: lpc_ich it occured because of a specific patch that Arch applies to its kernel and I hadn't noticed, so don't take account of it.
Thank you for your time and effort.

Thank you for the quick lesson in git ;-)
Comment 13 Dimitris Damigos 2012-11-20 11:58:16 UTC
I meant 30dcf76acc695cbd2fa919e294670fe9552e16e7 libata: migrate ACPI code over to new
Comment 14 Aaron Lu 2012-11-21 14:05:40 UTC
Hi Dimitris,

Please attach your DSDT table.

You can get the DSDT table like this:
$ sudo cp /sys/firmware/acpi/tables/DSDT .
Then attach it here.

And also, I want to make sure in kernel 3.5.7, _SDD actually gets executed. Please apply the attached sdd_debug_good.patch on top of 3.5.7 and then attach the dmesg after a suspend/resume cycle.

And I've attached sdd_debug_bad.patch, you can apply it on top of the 3 patches I attached yesterday(the problematic patch1/2/3). And please attach the dmesg after a suspend/resume cycle. I want to know if on probe, _SDD executes correctly, but only failed on resume.

A quick and dirty hack to try to recover from the failed resume would be something like this:

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index c6f0101..84b729e 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -904,8 +904,8 @@ int ata_acpi_on_devcfg(struct ata_device *dev)
 	/* do _SDD if SATA */
 	if (acpi_sata) {
 		rc = ata_acpi_push_id(dev);
-		if (rc && rc != -ENOENT)
-			goto acpi_err;
+		//if (rc && rc != -ENOENT)
+		//	goto acpi_err;
 	/* do _GTF */

You can try this to see if you can get an usable system after a suspend/resume cycle.

Comment 15 Aaron Lu 2012-11-21 14:06:47 UTC
Created attachment 86851 [details]
Debug patch for SDD on top of 3.5.7
Comment 16 Aaron Lu 2012-11-21 14:07:28 UTC
Created attachment 86861 [details]
Debug patch for SDD on top of git commits
Comment 17 Dimitris Damigos 2012-11-21 16:14:09 UTC
Hi Aaron,

First I would like to mention that with your last patch I managed to suspend/resume correctly.

I have also attached all the files you requested.
Comment 18 Dimitris Damigos 2012-11-21 16:17:45 UTC
Created attachment 86871 [details]
dmesg of original 3.5.7
Comment 19 Dimitris Damigos 2012-11-21 16:18:50 UTC
Created attachment 86881 [details]
dmesg of problematic kernel
Comment 20 Dimitris Damigos 2012-11-21 16:21:30 UTC
Created attachment 86891 [details]
DSDT table
Comment 21 Dimitris Damigos 2012-11-21 16:24:36 UTC
Something I would like to add that I have forgotten. In order for suspend to work I had always been using "acpi_sleep=nonvs" if that makes any difference.
Comment 22 Aaron Lu 2012-11-22 09:23:59 UTC
Hi Dimitris,

Thanks for the info.

I think the cause has been found: Before commit 30dcf76acc695cbd2fa919e294670fe9552e16e7 libata: migrate ACPI code over
to new bindings, if the ata_acpi_on_devcfg failed twice(either due to ata_acpi_push_id or ata_acpi_exec_tfs), the acpi related function will be disabled by NULL the device->acpi_handle field, so next time ata_acpi_on_devcfg is called, it will simply return. With that bad commit, this is no longer the case, so the ata_acpi_on_devcfg keeps failing(to be precise, it is ata_acpi_push_id always fails) and finally made ATA core thinks the device is not usable and gets disabled.

So the solution can be simply revert to previous behaviour, but I have a question: should _SDD be executed multiple times after the pre-defined event, like COMRESET occured? The ACPI spec doesn't have clear words on this, and from your dmesg, the ata_acpi_push_id succeeded the first time it is invoked after resume, but all failed in the next calls. But during boot time, _SDD always succeeded no matter how many times it is called.

And BTW, there is no _SDD in DSDT table, please attach all the SSDT* tables, thanks.
Comment 23 Dimitris Damigos 2012-11-22 14:28:31 UTC
Hi Aaron,

Here are the tables you asked for. I hope your question is not for me, because I don't have a clue what you are talking about :-)
Comment 24 Dimitris Damigos 2012-11-22 14:29:12 UTC
Created attachment 87001 [details]
Comment 25 Dimitris Damigos 2012-11-22 14:29:34 UTC
Created attachment 87011 [details]
Comment 26 Dimitris Damigos 2012-11-22 14:29:50 UTC
Created attachment 87021 [details]
Comment 27 Dimitris Damigos 2012-11-22 14:30:13 UTC
Created attachment 87031 [details]
Comment 28 Dimitris Damigos 2012-11-22 14:30:30 UTC
Created attachment 87041 [details]
Comment 29 Dimitris Damigos 2012-11-22 14:30:48 UTC
Created attachment 87051 [details]
Comment 30 Aaron Lu 2012-11-23 00:29:53 UTC
Hi Dimitris,

Yeah, I was asking Tejun/Jeff/Alan if they have a clue about this :-)
Comment 31 Aaron Lu 2012-11-26 05:43:08 UTC
Created attachment 87221 [details]
Fix resume problem by restoring the ACPI disable functionality

Hi Dimitris Damigos,

Please test the attached patch, it should be able to fix your problem, and I'll also send it to the mailing list, if everything works, please reply with a Tested-by tag, thanks.

And about the ACPI table, I'm afraid there isn't much I can do about it, it is not easy to see what the firmware people has done to this _SDD method. If possible, I think you can contact Sony about this problem and see if they want to help:
'The _SDD control method will fail the 2nd time it is invoked after system resume from S3.'
And to be precise, this failure has nothing to do with kernel version, that is, in v3.5.7, it has the same failure. So this patch will ignore such failures as v3.5 kernel does, and your system will be able to resume, only ATA ACPI functionality will be lost, which is not a big deal for SATA disks.
Comment 32 Dimitris Damigos 2012-11-26 15:52:56 UTC
(In reply to comment #31)
> Created an attachment (id=87221) [details]
> Fix resume problem by restoring the ACPI disable functionality

Tested-by: Dimitris Damigos <damigos@freemail.gr>

Hi Aaron,
Your patch fixes my problem.
Thank you for your work!
Comment 33 Aaron Lu 2012-12-05 09:00:43 UTC
Patch taken by Jeff, so I think we can close this bug, but I don't know how to...
Comment 34 Dimitris Damigos 2012-12-05 13:09:25 UTC
I did it. Thank you again...
Comment 35 Bahram BahramBeigy 2012-12-15 09:20:40 UTC
I have tried both 3.6.10 and 3.7 versions of Linux kernel and my laptop problem isn't solved ! 
Dimitris is your problem solved in the latest kernel versions ?!
Comment 36 Dimitris Damigos 2012-12-15 09:38:53 UTC
I don't know anything about the release model of linux kernel, but I can see that the commit is in linux-next tree, so I believe we have to wait for 3.8.
Or you can compile your own kernel with this patch included...