Bug 6262

Summary: SATA hard disk with AHCI mode lost after S3
Product: IO/Storage Reporter: Austin Yuan (yuanshengquan)
Component: Serial ATAAssignee: Rafael J. Wysocki (rjwysocki)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16-rc6 Subsystem:
Regression: --- Bisected commit-id:
Attachments: kernel configuration file
DSDT table
error message after resuming from S3
output of lspci -v
bios information

Description Austin Yuan 2006-03-21 01:24:23 UTC
Most recent kernel where this bug did not occur: N/A

Distribution: Redflag 5.0 with 2.6.16-rc6 kernel

Hardware Environment:
CPU: P4 3.6G
MB/Chipset: 945GNTR with on-board Gfx/Audio/Lan
(http://www.intel.com/products/motherboard/d945gnt/index.htm)
Hard-disk: ST3120026AS, 120G

Software Environment:
Redflag 5.0 and 2.6.16-rc6 kernel with an AHCI suspend/resume patch from
http://marc.theaimsgroup.com/?l=linux-kernel&m=114122220923417&w=2

Problem Description:
SATA hard disk with AHCI mode lost after resuming from S3


Steps to reproduce:
1. Set SATA mode in BIOS into AHCI (another two mode is "IDE" and "RAID")

2. Get latest 2.6.16-rc6 kernel and patch it with a patch from
http://marc.theaimsgroup.com/?l=linux-kernel&m=114122220923417&w=2

3. Configure the kernel with minimal components (removing module,USB,1394, audio
and other unnecessary components support, but including SATA/AHCI, ACPI and
network support), build and install it.

4. Boot with the new kernel into console mode (init 3)

5. log into the system from one remote machine by ssh

6. Enter S3 by "echo mem > /sys/power/state", and then press powerbutton to
resume (using vbetool to re-post the display)

7. Then display is normal, and ssh session is still living, but hard disk is
lost. (there are many error messages like "sd 0:0:0:0: SCSI error: return code =
0x40000")
Comment 1 Austin Yuan 2006-03-21 01:31:42 UTC
If setting SATA mode to "IDE" in BIOS, the hard disk can be resumed perfectly
from S3
Comment 2 Austin Yuan 2006-03-21 01:41:56 UTC
Created attachment 7621 [details]
kernel configuration file
Comment 3 Austin Yuan 2006-03-21 01:51:41 UTC
Created attachment 7622 [details]
DSDT table
Comment 4 Austin Yuan 2006-03-21 01:52:18 UTC
Created attachment 7623 [details]
error message after resuming from S3
Comment 5 Austin Yuan 2006-03-21 01:53:16 UTC
Created attachment 7624 [details]
output of lspci -v
Comment 6 Austin Yuan 2006-03-21 01:54:13 UTC
Created attachment 7625 [details]
bios information
Comment 7 Diego Calleja 2006-03-21 02:36:51 UTC
*** Bug 6260 has been marked as a duplicate of this bug. ***
Comment 8 Diego Calleja 2006-03-21 02:37:13 UTC
*** Bug 6261 has been marked as a duplicate of this bug. ***
Comment 9 Jiang, Brendan 2006-03-23 01:57:37 UTC
I repeated this bug and found the ahci interface will keep busy 
(PxTFD.STS.BSY=1) after resuming, thus enabling DMA will fail. Then i tried to 
reset port to recover (using ahci_restart_port) but it didn't help. AHCI spec 
v1.1 section 10.4 has listed three reset levels, in which port reset is the 
second one. Haven't tried HBA reset (deepest level) yet. 

Also I found that ahci driver's suspend has been changed much in libata-
dev.git tree's upstream tree, including improving error recovery (reset) 
functions. I tried the upstream kernel tree plus the patch 
(http://marc.theaimsgroup.com/?l=linux-kernel&m=114122220923417&w=2
) but got an oops similar to (http://lkml.org/lkml/2006/1/7/108). It seems 
that this oops has not been fixed. It's not ahci's bug but brought out by ahci 
driver when entering an error handling path.

Anybody got any good idea on this?
Comment 10 Jiang, Brendan 2006-03-24 03:33:11 UTC
For the oops issue for libata-dev git tree, I found that's caused (not root 
caused) by ahci_start_engine()'s failure. For those ata port without sata 
device connected, ahci_start_engine() will return error when called by 
ahci_init_one()->ata_device_add()->ata_host_add()->ahci_port_start() during 
kernel's initialization at booting up. Then ata_host_add() will call 
scsi_host_put() then cause an oops. When forcing ahci_start_engine() to return 
0, system could boot up normally.

libata-dev git tree has much change/improvements on resetting 
HBA/port/software. At last i can use it as a start point :-)
Comment 11 Rafael J. Wysocki 2006-10-26 10:07:50 UTC
Can you please verify if the problem is still present in the recent -rc kernels
(eg. 2.6.19-rc3)?
Comment 12 Rafael J. Wysocki 2006-11-17 09:52:06 UTC
I think this problem has been fixed.  Please reopen if this is not the case.