Bug 6840

Summary: HPA needs to be reinitilized on resume
Product: IO/Storage Reporter: Lee Trager (lt73)
Component: IDEAssignee: Rafael J. Wysocki (rjwysocki)
Status: CLOSED CODE_FIX    
Severity: high CC: eric, eric, federico, forrestwenner, pavel, rjwysocki
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.17-Present Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216    
Attachments: My attempt at fixing the hpa problem - its broken
Working patch against 2.6.18-rc4
Patch using correct Linux design
Patch for 2.6.21

Description Lee Trager 2006-07-15 18:35:03 UTC
Most recent kernel where this bug did not occur: Unknown
Distribution: Gentoo 2006.0
Hardware Environment: IBM Thinkpad T40
IDE Controller: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 01)
Harddrive: Hitachi IC25N040ATCS05-0
Software Environment:
GLIBC: 2.4
GCC: 4.1.1
Problem Description:
Upon resuming my laptop from sleep mode or hibernate mode my harddrive has full
usage and my dmesg is fill with this

ide: failed opcode was: unknown
hda: task_out_intr: status=0x51 { DriveReady SeekComplete Error }
hda: task_out_intr: error=0x10 { SectorIdNotFound }, LBAsect=74853287,
sector=74853287 

I checked the drive with the IBM hardware analysis tools as well as the Hitachi
Drive Fitness extended test and both say the drive is fine. What is strange is
that according to hdparm my drive only has 72037362 sectors while this error is
on sector 74853287.

This error happens with and without wireless DMA and alsa. The kernel option in
ATA/ATAPI/MFM/RLL support named IDEDISK_MULTI_MODE does not fix the error. I
have also tested this with the mainline kernel, supend2 sources, and gentoo
sources and this error happens on all of them.

Steps to reproduce:
1. Put computer into sleep mode or hibernate mode
2. Wake the computer up
Comment 1 Pawel Golik 2006-07-16 05:27:46 UTC
I have the same problem with my ATA (not SATA) drives. Hardware: GigaByte 
K8NS-939 MB with nForce3 chipset (amd74xx ide driver) with Athlon 64 3000+, 120 
GB SATA drive (/dev/sda, root) 80GB Samsung ATA drive (/dev/hda, secondary HD), 
NEC ATAPI DVDRW drive (/dev/hdc). 2.6.16 kernel (gentoo). The computer 
hibernates to RAM (ACPI S3) fine and wakes up, but ATA devices become unusable 
and eventually lock up the system. The SATA HDD resumes corectly and continues 
to work, but attempting to use the ATA HDD or ATAPI DVDRW gives many timeout 
errors and usually (always with the HDD, sometimes with the DVDRW) locks up the 
system (NumLock and CapsLock LEDs flash, system completely unresponsive). 
Examples of errors:
Jul 15 00:40:17 [kernel] hdc: ide_intr: huh? expected NULL handler on exit 
 Jul 15 00:40:17 [kernel] hdc: ATAPI reset complete 
 Jul 15 00:41:53 [kernel] hdc: cdrom_decode_status: status=0x51 { DriveReady 
SeekComplete Error } 
 Jul 15 00:41:53 [kernel] hdc: cdrom_decode_status: error=0x44 { AbortedCommand 
LastFailedSense=0x04 } 
 Jul 15 00:41:53 [kernel] ide: failed opcode was: unknown 
 Jul 15 00:41:59 [kernel] hdc: cdrom_decode_status: status=0x51 { DriveReady 
SeekComplete Error } 
 Jul 15 00:41:59 [kernel] hdc: cdrom_decode_status: error=0x44 { AbortedCommand 
LastFailedSense=0x04 } 
 Jul 15 00:41:59 [kernel] ide: failed opcode was: unknown 
 Jul 15 00:42:06 [kernel] hdc: cdrom_decode_status: status=0x51 { DriveReady 
SeekComplete Error } 
 Jul 15 00:42:06 [kernel] hdc: cdrom_decode_status: error=0x44 { AbortedCommand 
LastFailedSense=0x04 } 
 Jul 15 00:42:06 [kernel] ide: failed opcode was: unknown 
 Jul 15 00:42:12 [kernel] hdc: cdrom_decode_status: status=0x51 { DriveReady 
SeekComplete Error } 
 Jul 15 00:42:12 [kernel] hdc: cdrom_decode_status: error=0x44 { AbortedCommand 
LastFailedSense=0x04 } 
 Jul 15 00:42:12 [kernel] ide: failed opcode was: unknown 
 Jul 15 00:42:12 [kernel] hdc: DMA disabled 
 Jul 15 00:42:12 [kernel] hdc: ide_intr: huh? expected NULL handler on exit 
 Jul 15 00:42:12 [kernel] hdc: ATAPI reset complete 
 Jul 15 00:42:12 [kernel] ISO 9660 Extensions: Microsoft Joliet Level 3 
 Jul 15 00:42:12 [kernel] ISO 9660 Extensions: RRIP_1991A 
 Jul 15 00:42:42 [kernel] hdc: tray open 
 Jul 15 00:42:42 [kernel] end_request: I/O error, dev hdc, sector 64 
 Jul 15 00:42:42 [kernel] Buffer I/O error on device hdc, logical block 8 
 Jul 15 00:42:42 [kernel] hdc: tray open 
 Jul 15 00:42:42 [kernel] end_request: I/O error, dev hdc, sector 64 
 Jul 15 00:42:42 [kernel] Buffer I/O error on device hdc, logical block 8 
 Jul 15 00:42:42 [kernel] hdc: tray open 
 Jul 15 00:42:42 [kernel] end_request: I/O error, dev hdc, sector 64 
 Jul 15 00:42:42 [kernel] Buffer I/O error on device hdc, logical block 8 
 Jul 15 00:42:42 [kernel] hdc: tray open 
 Jul 15 00:42:42 [kernel] end_request: I/O error, dev hdc, sector 64 
 Jul 15 00:42:42 [kernel] Buffer I/O error on device hdc, logical block 8 

(hdc is the DVDRW)

Jul  4 11:40:25 [kernel] hda: dma_timer_expiry: dma status == 0x21

(hda is the ATA HDD)

Re-setting DMA with hdparm does not help. Restarting dbus service locks up the 
system. Unloading ide_cd and ide_disk modules before suspend doesn't help, the 
system locks up attempting to reload them after resume.
Comment 2 Eric Johnson 2006-08-01 13:08:10 UTC
I also see this bug but I have a configuration almost identical to the original
reporter.  Apparently I don't know my way around the system well enough to
capture information while my system is in the midst of locking up, so I'm glad
the original poster figured it out.

Adding myself as a CC so I can track/test the fix when it rolls out.
Comment 3 Lee Trager 2006-08-01 20:11:38 UTC
Ive spoken to a developer of the suspend2 project and apparently this is a
problem with the ide driver. Ive joined the linux-ide mailing list and Ill try
to figure out a fix sometime next week, works a little crazy this week.
Comment 4 Daniel Russel 2006-08-16 08:46:15 UTC
I have this bug too with a Thinkpad T41 after installing Fedora 5. However, I
did not experience it with Fedora 4 even when running the same kernel build (I
use Volker Braun's thinkpad kernel rpms rather than the redhat ones and tried
all the old kernels he had available after experiencing the bug).
Comment 5 Lee Trager 2006-08-16 16:56:41 UTC
Someone the the Linux Kernel IDE mailing list suggested I try libata which is
included in the mm sources. There was a kernel bug unrelated to thish one that
prevented me from testing it. While libata may be a solution im not sure how
great of a solution it is. libata is not stable and for production use its not
the best idea. Does anyone know of a way to fix the current driver?
Comment 6 Pavel Machek 2006-08-17 02:12:19 UTC
Do you have host-protected-area enabled?
Comment 7 Pavel Machek 2006-08-17 02:15:24 UTC
Hmm, this is strange:

Most recent kernel where this bug did not occur: Unknown

T40 definitely worked with suspend-to-disk before. Try few different (vanilla!)
versions and see where it broke.
Comment 8 Lee Trager 2006-08-17 02:24:56 UTC
Honest I used this laptop more as a mobile desktop for a year or so. Ill try
early 2.6 series. Also I did try this with 2.6.17.x-2.6.18-rcx. I tried the
vanilla, mm-sources, git-sources, gentoo-sources, and knoppix live cd(just to
make sure it wasnt a config error), same problem on all of them.
Comment 9 Lee Trager 2006-08-17 20:28:06 UTC
The oldest kernel I could compile was 2.6.12, probably has something to do with
my gcc version(4.1.1) or glibc version(2.4). Anyway the bug is in 2.6.12 as
well. How do I know if the host protected area is enabled?
Comment 10 Lee Trager 2006-08-17 20:51:09 UTC
Ok I got googled around for it and I disabled in my BIOS and I get the same thing.
Comment 11 Lee Trager 2006-08-19 01:15:39 UTC
Alan Cox on the LKML suggested that the problem is that the kernel does not
restore HPA on resume. According to him the only two fixes for this is to format
the drive and make sure to get HPA off or patch the kernel. I attempted to make
a patch against 2.6.18-rc4 but all that it does now is never come out of sleep
mode. Ill post the patch here incase someone wants to play with it but ill keep
trying to get this to work.
Comment 12 Lee Trager 2006-08-19 01:16:42 UTC
Created attachment 8833 [details]
My attempt at fixing the hpa problem - its broken

It would be great if someone could shed some light on why this isnt working.
Comment 13 Lee Trager 2006-08-20 17:48:15 UTC
Ive created a patch that fixes the problem. I submitted to the LKML and awaiting
it to be submitted to 2.6.18.
Comment 14 Lee Trager 2006-08-20 17:48:49 UTC
Created attachment 8838 [details]
Working patch against 2.6.18-rc4
Comment 15 Lee Trager 2006-08-21 17:17:21 UTC
Renamed for better description.
Comment 16 Pavel Machek 2006-08-24 01:18:07 UTC
Patch looks okay to me, can you push it through usual channels?

(I guess we can CODE_FIX it now, and close it when patch is merged?
Comment 17 Eric Johnson 2006-08-24 01:18:27 UTC
I am going to be traveling Wednesday, 8/23, and next Monday (8/28).

On Thursday (8/24) and Friday (8/25), I will be on East Coast time, and I for the most part I will not be working.  I may be checking my email.  I should be able to respond to brief phone conversations if needed, so please don't hesitate to call me on my cell phone (650-454-6982) if you need me.

Otherwise, I'll be back in the office on Tuesday, 8/29.

-Eric.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>Out of Office AutoReply: [Bug 6840] HPA needs to be reinitilized on resume</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=2>I am going to be traveling Wednesday, 8/23, and next Monday (8/28).<BR>
<BR>
On Thursday (8/24) and Friday (8/25), I will be on East Coast time, and I for the most part I will not be working.&nbsp; I may be checking my email.&nbsp; I should be able to respond to brief phone conversations if needed, so please don't hesitate to call me on my cell phone (650-454-6982) if you need me.<BR>
<BR>
Otherwise, I'll be back in the office on Tuesday, 8/29.<BR>
<BR>
-Eric.</FONT>
</P>

</BODY>
</HTML>
Comment 18 Lee Trager 2006-08-24 12:52:47 UTC
I posted the patch to the Linux-ide maling list and havn't gotten much responce.
Should I post it to the Linux Kernel maling list? This is my first patch so im
not sure what the proper process is.
Comment 19 Anonymous Emailer 2006-08-25 12:53:44 UTC
Reply-To: pavel@ucw.cz

Hi!

> I posted the patch to the Linux-ide maling list and havn't gotten much responce.
> Should I post it to the Linux Kernel maling list? This is my first patch so im
> not sure what the proper process is.

Just mail the patch to B.Zolnierkiewicz@elka.pw.edu.pl, cc:
linux-ide@vger.kernel.org, cc: linux-kernel, cc: andrew morton.

									Pavel
Comment 20 Lee Trager 2006-08-28 19:00:44 UTC
Created attachment 8892 [details]
Patch using correct Linux design

This is a redo of the previous patch since the hole one didn't follow correct
Linux design and thus would never get into the kernel(heh its my first patch).
Anyway can someone please post if this patch works correctly for them?

Thanks,

Lee
Comment 21 Eric Johnson 2006-09-27 16:54:00 UTC
For what it is worth, I have applied the latest patch here to my latest stable
gentoo kernel sources (2.6.17-r8), and have not had a problem with sleep since then.
Comment 22 Lee Trager 2006-09-27 18:06:47 UTC
This was in one of the mm-sources(forgot which version) but was taken out
because apparently it made was some unable to resume from sleep. On a side note
about a week ago my dad came home with an IBM Thinkpad T40(same as mine) and had
Fedora Core 5 installed on it, with HPA(according to boot dmesg). Resume works
fine on this laptop using the redhat sources. I have been unable to check out
the redhat source on my laptop since I just started college. Hopefully when I
get some time I'll try it out. It would be great if some other people could test
this on different laptops even if they don't have the problem.
Comment 23 Pavel Kysilka 2006-11-14 12:07:55 UTC
Hi Lee,

i have this same problem with HPA. I retest your latest patch this week on my
ThinkPad A21m.
Comment 24 Eric Sandall 2007-04-13 12:09:28 UTC
Lee, the patch from Comment #20 fixes this problem for me, many thanks!
Comment 25 Eric Sandall 2007-04-26 09:17:58 UTC
Resume fails in 2.6.20 as well, but the patch does not apply cleanly to test it:
root@thunk:[linux-2.6.21]# patch -p1 < ../ibm-hpa.patch 
patching file include/linux/ide.h
Hunk #1 succeeded at 1005 (offset 18 lines).
patching file drivers/ide/ide.c
Hunk #1 FAILED at 1229.
Hunk #2 succeeded at 1281 (offset 37 lines).
1 out of 2 hunks FAILED -- saving rejects to file drivers/ide/ide.c.rej
patching file drivers/ide/ide-disk.c
Comment 26 Lee Trager 2007-05-01 00:12:14 UTC
Created attachment 11355 [details]
Patch for 2.6.21

I updated the patch for 2.6.21 and have tested it for a day, it works just the
same as the old one does. If this patch continues to work for you guys tell me
and I'll resubmit it to the kernel.

As for libata, I'll try to port this patch over when I get a chance, currently
I'm swamped at school and don't have the time to figure it out.
Comment 27 Eric Johnson 2007-05-01 09:22:00 UTC
For each "stable" release of the "gentoo-sources" of the kernel, I have applied
this patch, and resume works.  Thats for 2.6.17 - 2.6.20, which I just upgraded
to yesterday.  Absent the patch, resume from sleep eventually triggers failures.
Comment 28 Eric Sandall 2007-05-02 09:11:14 UTC
The patch in Comment #26 applies cleanly to 2.6.21 and 2.6.21.1 and allows me to
resume after suspending, thanks Lee!
Comment 29 Lee Trager 2007-05-12 13:50:40 UTC
I've submitted the patch into the kernel on the LKML(linux-ide). We'll see if it
gets in.
Comment 30 Rafael J. Wysocki 2007-05-30 11:13:48 UTC
Can you please give us a link to the patch?
Comment 31 Rafael J. Wysocki 2007-05-31 09:52:20 UTC
Sorry, I meant a link to the LKML post with the patch, if that's not a problem.
Comment 32 Rafael J. Wysocki 2007-06-18 10:24:14 UTC
The Lee's patch has been merged before 2.6.22-rc5, AFAICS.

I'm closing the bug, please reopen if necessary.