Bug 6262 - SATA hard disk with AHCI mode lost after S3
Summary: SATA hard disk with AHCI mode lost after S3
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
: 6260 6261 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-03-21 01:24 UTC by Austin Yuan
Modified: 2007-04-28 12:49 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.16-rc6
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
kernel configuration file (19.64 KB, text/plain)
2006-03-21 01:41 UTC, Austin Yuan
Details
DSDT table (149.72 KB, text/plain)
2006-03-21 01:51 UTC, Austin Yuan
Details
error message after resuming from S3 (10.67 KB, text/plain)
2006-03-21 01:52 UTC, Austin Yuan
Details
output of lspci -v (7.78 KB, text/plain)
2006-03-21 01:53 UTC, Austin Yuan
Details
bios information (10.46 KB, text/plain)
2006-03-21 01:54 UTC, Austin Yuan
Details

Description Austin Yuan 2006-03-21 01:24:23 UTC
Most recent kernel where this bug did not occur: N/A

Distribution: Redflag 5.0 with 2.6.16-rc6 kernel

Hardware Environment:
CPU: P4 3.6G
MB/Chipset: 945GNTR with on-board Gfx/Audio/Lan
(http://www.intel.com/products/motherboard/d945gnt/index.htm)
Hard-disk: ST3120026AS, 120G

Software Environment:
Redflag 5.0 and 2.6.16-rc6 kernel with an AHCI suspend/resume patch from
http://marc.theaimsgroup.com/?l=linux-kernel&m=114122220923417&w=2

Problem Description:
SATA hard disk with AHCI mode lost after resuming from S3


Steps to reproduce:
1. Set SATA mode in BIOS into AHCI (another two mode is "IDE" and "RAID")

2. Get latest 2.6.16-rc6 kernel and patch it with a patch from
http://marc.theaimsgroup.com/?l=linux-kernel&m=114122220923417&w=2

3. Configure the kernel with minimal components (removing module,USB,1394, audio
and other unnecessary components support, but including SATA/AHCI, ACPI and
network support), build and install it.

4. Boot with the new kernel into console mode (init 3)

5. log into the system from one remote machine by ssh

6. Enter S3 by "echo mem > /sys/power/state", and then press powerbutton to
resume (using vbetool to re-post the display)

7. Then display is normal, and ssh session is still living, but hard disk is
lost. (there are many error messages like "sd 0:0:0:0: SCSI error: return code =
0x40000")
Comment 1 Austin Yuan 2006-03-21 01:31:42 UTC
If setting SATA mode to "IDE" in BIOS, the hard disk can be resumed perfectly
from S3
Comment 2 Austin Yuan 2006-03-21 01:41:56 UTC
Created attachment 7621 [details]
kernel configuration file
Comment 3 Austin Yuan 2006-03-21 01:51:41 UTC
Created attachment 7622 [details]
DSDT table
Comment 4 Austin Yuan 2006-03-21 01:52:18 UTC
Created attachment 7623 [details]
error message after resuming from S3
Comment 5 Austin Yuan 2006-03-21 01:53:16 UTC
Created attachment 7624 [details]
output of lspci -v
Comment 6 Austin Yuan 2006-03-21 01:54:13 UTC
Created attachment 7625 [details]
bios information
Comment 7 Diego Calleja 2006-03-21 02:36:51 UTC
*** Bug 6260 has been marked as a duplicate of this bug. ***
Comment 8 Diego Calleja 2006-03-21 02:37:13 UTC
*** Bug 6261 has been marked as a duplicate of this bug. ***
Comment 9 Jiang, Brendan 2006-03-23 01:57:37 UTC
I repeated this bug and found the ahci interface will keep busy 
(PxTFD.STS.BSY=1) after resuming, thus enabling DMA will fail. Then i tried to 
reset port to recover (using ahci_restart_port) but it didn't help. AHCI spec 
v1.1 section 10.4 has listed three reset levels, in which port reset is the 
second one. Haven't tried HBA reset (deepest level) yet. 

Also I found that ahci driver's suspend has been changed much in libata-
dev.git tree's upstream tree, including improving error recovery (reset) 
functions. I tried the upstream kernel tree plus the patch 
(http://marc.theaimsgroup.com/?l=linux-kernel&m=114122220923417&w=2
) but got an oops similar to (http://lkml.org/lkml/2006/1/7/108). It seems 
that this oops has not been fixed. It's not ahci's bug but brought out by ahci 
driver when entering an error handling path.

Anybody got any good idea on this?
Comment 10 Jiang, Brendan 2006-03-24 03:33:11 UTC
For the oops issue for libata-dev git tree, I found that's caused (not root 
caused) by ahci_start_engine()'s failure. For those ata port without sata 
device connected, ahci_start_engine() will return error when called by 
ahci_init_one()->ata_device_add()->ata_host_add()->ahci_port_start() during 
kernel's initialization at booting up. Then ata_host_add() will call 
scsi_host_put() then cause an oops. When forcing ahci_start_engine() to return 
0, system could boot up normally.

libata-dev git tree has much change/improvements on resetting 
HBA/port/software. At last i can use it as a start point :-)
Comment 11 Rafael J. Wysocki 2006-10-26 10:07:50 UTC
Can you please verify if the problem is still present in the recent -rc kernels
(eg. 2.6.19-rc3)?
Comment 12 Rafael J. Wysocki 2006-11-17 09:52:06 UTC
I think this problem has been fixed.  Please reopen if this is not the case.

Note You need to log in before you can comment on or make changes to this bug.