Bug 14329

Summary: Sata disk doesn't wake up after S3 suspend
Product: IO/Storage Reporter: frodone
Component: Serial ATAAssignee: Tejun Heo (tj)
Status: CLOSED CODE_FIX    
Severity: normal CC: rjw, tj, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31 --> 2.6.32-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 13615, 14616    
Attachments: messages
Output of hdparm -I sda/b/c
Output of hdparm -I sdd
Output of dmesg as requested
messages
Output of lspci -nn
nv-hardreset-resume.patch
dmesg with suspend/resume sequence

Description frodone 2009-10-05 22:58:08 UTC
Created attachment 23274 [details]
messages

After suspend one of my secondary disks doesn't resume. It works ok under kernel 2.6.30.9 and previous. I have 4 disk connected to the same Nvidia sata controller: 3 identical maxtor (sda/b/c) and 1 Hitachi (sdd), and this last one doesn't wakes up.
In attachment the interesting lines from /var/log/messages and the output of hdparm -I.
Thank you.
Comment 1 frodone 2009-10-05 22:59:46 UTC
Created attachment 23275 [details]
Output of hdparm -I sda/b/c
Comment 2 frodone 2009-10-05 23:01:22 UTC
Created attachment 23276 [details]
Output of hdparm -I sdd
Comment 3 Rafael J. Wysocki 2009-10-05 23:45:28 UTC
Can you attach a complete boot log from 2.6.32-rc3, please?
Comment 4 frodone 2009-10-06 13:28:07 UTC
Created attachment 23282 [details]
Output of dmesg as requested
Comment 5 frodone 2009-10-06 13:35:04 UTC
Created attachment 23283 [details]
messages

More output and done with kernel 2.6.32-rc3
Comment 6 ykzhao 2009-10-12 14:47:14 UTC
Will you please confirm whether the box can work well on 2.6.31 kernel?
If it can work, will you please use the git-bisect to identify the bad commit which causes the regression?
thanks.
Comment 7 frodone 2009-10-12 19:40:26 UTC
(In reply to comment #6)
> Will you please confirm whether the box can work well on 2.6.31 kernel?
> If it can work, will you please use the git-bisect to identify the bad commit
> which causes the regression?
> thanks.

Ok, i bisected till this:

7f4774b38ee6270bbc6c3015cb3fa6c415ffb340 is the first bad commit
commit 7f4774b38ee6270bbc6c3015cb3fa6c415ffb340                                 
Author: Tejun Heo <tj@kernel.org>                                               
Date:   Wed Jun 10 16:29:07 2009 +0900                                          

    sata_nv: use hardreset only for post-boot probing

...
Comment 8 Rafael J. Wysocki 2009-10-12 20:59:48 UTC
First-Bad-Commit : 7f4774b38ee6270bbc6c3015cb3fa6c415ffb340
Comment 9 Tejun Heo 2009-10-13 02:03:45 UTC
Reset protocol implementations on these nv chips are complete jokes.  Good that we aren't gonna see more of them in the future.  :-(

One workaround I can think of is to fall back to hardreset if this is the last reset try for the attached device.  As failure will lead to device disablement anyway, there isn't much to lose.  The problem is that such workaround would still be visible on your case as extra delay during resume.  Better than losing a disk over suspend/resume cycle, but still...

Can you please attach the output of "lspci -nn"?

Thanks.
Comment 10 frodone 2009-10-13 12:08:39 UTC
Created attachment 23387 [details]
Output of lspci -nn
Comment 11 Tejun Heo 2009-10-13 13:50:02 UTC
Created attachment 23388 [details]
nv-hardreset-resume.patch

Does this patch make any difference?  Regardless of its success, can you please full dmesg output including both the boot and resume messages with this patch applied?

Thanks.
Comment 12 frodone 2009-10-13 15:50:34 UTC
Created attachment 23389 [details]
dmesg with suspend/resume sequence

It works!
If you need more testing, i'm here.

Thank you.
Comment 13 Tejun Heo 2009-10-14 02:19:03 UTC
Alright, thanks.  Patch forwarded upstream.  Resolving as CODE_FIX.
Comment 14 Rafael J. Wysocki 2009-10-26 19:20:26 UTC
Fixed by commit 6489e3262e6b188a1a009b65e8a94b7aa17645b7 .