Most recent kernel where this bug did not occur: N/A Distribution: Redflag 5.0 with 2.6.16-rc6 kernel Hardware Environment: CPU: P4 3.6G MB/Chipset: 945GNTR with on-board Gfx/Audio/Lan (http://www.intel.com/products/motherboard/d945gnt/index.htm) Hard-disk: ST3120026AS, 120G Software Environment: Redflag 5.0 and 2.6.16-rc6 kernel with an AHCI suspend/resume patch from http://marc.theaimsgroup.com/?l=linux-kernel&m=114122220923417&w=2 Problem Description: SATA hard disk with AHCI mode lost after resuming from S3 Steps to reproduce: 1. Set SATA mode in BIOS into AHCI (another two mode is "IDE" and "RAID") 2. Get latest 2.6.16-rc6 kernel and patch it with a patch from http://marc.theaimsgroup.com/?l=linux-kernel&m=114122220923417&w=2 3. Configure the kernel with minimal components (removing module,USB,1394, audio and other unnecessary components support, but including SATA/AHCI, ACPI and network support), build and install it. 4. Boot with the new kernel into console mode (init 3) 5. log into the system from one remote machine by ssh 6. Enter S3 by "echo mem > /sys/power/state", and then press powerbutton to resume (using vbetool to re-post the display) 7. Then display is normal, and ssh session is still living, but hard disk is lost. (there are many error messages like "sd 0:0:0:0: SCSI error: return code = 0x40000")
If setting SATA mode to "IDE" in BIOS, the hard disk can be resumed perfectly from S3
Created attachment 7621 [details] kernel configuration file
Created attachment 7622 [details] DSDT table
Created attachment 7623 [details] error message after resuming from S3
Created attachment 7624 [details] output of lspci -v
Created attachment 7625 [details] bios information
*** Bug 6260 has been marked as a duplicate of this bug. ***
*** Bug 6261 has been marked as a duplicate of this bug. ***
I repeated this bug and found the ahci interface will keep busy (PxTFD.STS.BSY=1) after resuming, thus enabling DMA will fail. Then i tried to reset port to recover (using ahci_restart_port) but it didn't help. AHCI spec v1.1 section 10.4 has listed three reset levels, in which port reset is the second one. Haven't tried HBA reset (deepest level) yet. Also I found that ahci driver's suspend has been changed much in libata- dev.git tree's upstream tree, including improving error recovery (reset) functions. I tried the upstream kernel tree plus the patch (http://marc.theaimsgroup.com/?l=linux-kernel&m=114122220923417&w=2 ) but got an oops similar to (http://lkml.org/lkml/2006/1/7/108). It seems that this oops has not been fixed. It's not ahci's bug but brought out by ahci driver when entering an error handling path. Anybody got any good idea on this?
For the oops issue for libata-dev git tree, I found that's caused (not root caused) by ahci_start_engine()'s failure. For those ata port without sata device connected, ahci_start_engine() will return error when called by ahci_init_one()->ata_device_add()->ata_host_add()->ahci_port_start() during kernel's initialization at booting up. Then ata_host_add() will call scsi_host_put() then cause an oops. When forcing ahci_start_engine() to return 0, system could boot up normally. libata-dev git tree has much change/improvements on resetting HBA/port/software. At last i can use it as a start point :-)
Can you please verify if the problem is still present in the recent -rc kernels (eg. 2.6.19-rc3)?
I think this problem has been fixed. Please reopen if this is not the case.