Bug 5901

Summary: Resume from suspend-to-memory worked in 2.6.12 but broken in 2.6.15
Product: Power Management Reporter: Ross Boswell (drb)
Component: APMAssignee: Bartlomiej Zolnierkiewicz (bzolnier)
Status: REJECTED INVALID    
Severity: normal CC: zach
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.15 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Revert APM idle code to 2.6.14
Revert my 2.6.15 GDT changes
Output from dmesg

Description Ross Boswell 2006-01-15 19:45:46 UTC
Most recent kernel where this bug did not occur: 2.6.12.1
Distribution: Debian Sarge
Hardware Environment: Toshiba Potege 4010
Software Environment: 
Problem Description: "apm -s" still works, but on resume the screen is blank
(backlight on) and hard disk spinning.  

Steps to reproduce:  apm -s; then after suspend press power switch
Comment 1 Zachary Amsden 2006-01-16 09:58:36 UTC
Do you mean the released 2.6.15 kernel, or a -git tree?

There is one obvious bug introduced into APM code between 2.6.12 and 2.6.15; SMP
systems should never call the APM BIOS on CPUs other than zero.  This code was
removed; perhaps it is worked around elsewhere, but I did not see it. 
Nevertheless, it is not your bug.

I am suspicious of the changes to apm_console_blank in the latest -git tree. 
The code breaks out of the loop early if an error code of APM_NOT_ENGAGED is
returned, whereas it used to try extra hard to blank or unblank despite any
returned error code.

So it is very important to know if this code is in your kernel, since it sounds
like it might explain your symptoms (and it is quite possible the Toshiba BIOS
does not implement the APM spec perfectly - most APM implementations are
notoriously buggy).

It also looks like a patch of mine was misapplied to the latest -git tree; my
diffs assumed that only CPU-0 would be calling the APM BIOS, and some change in
between appears to have violated that assumption.  Again, probably not your bug.
Comment 2 Ross Boswell 2006-01-16 10:12:31 UTC
The kernel in which the bug is manifest is the released 2.6.15 version.  
The laptop has (of course) only one CPU.

FWIW, ACPI resume from suspend-to-memory in 2.6.15 also fails in the same way if
the resume is made after some delay -- blank screen with backlight on and disk
spinning.  If resume is done within a few seconds (~10sec) then it often
succeeds.  I haven't tried ACPI in prior kernels, so don't know if this
phenomenon is new or old.  Combinations of s3_bios and s3_mode as given in
Documentation/power/video.txt don't help.  
Comment 3 Zachary Amsden 2006-01-16 10:24:18 UTC
What is you APM config looking like?

Specifically, do you have CONFIG_APM_ALLOW_INTS turned on?  This option can be
dangerous in either setting, but you can try toggling it.

It sounds like you are hitting a more fundamental console blanking problem than
the APM code.  I am testing APM blanking now on 2.6.15, but I have only one APM
BIOS implementation to test against.

Zach
Comment 4 Ross Boswell 2006-01-16 11:56:51 UTC
CONFIG_APM_ALLOW_INTS was OFF.  Changing it to ON made no difference.
But in the process I discovered I had described the symptoms incorrectly.  

On resume from APM suspend-to-RAM in kernel 2.6.15 the console is not blank, it
is live showing the same display as was there at suspend.  The disk is spinning
and there is no response from the OS to keystrokes, but the BIOS seems to be
handling them correctly (eg Fn/F10 turn on the numeric keypad light).  

Sorry for the incorrect information previously.  I was confusing APM symptoms
with ACPI symptoms.  
Comment 5 Zachary Amsden 2006-01-17 16:57:26 UTC
I seem to be able to reproduce an APM regression in 2.6.15 as well.  Console
blanking works with my APM BIOS in 2.6.14, but not in 2.6.15.  2.6.14 + my APM
GDT patches appears to have no problem.  I am testing suspend now, although I
can't guarantee much luck with that, as I am not convinced our suspend to RAM
does anything remotely similar to what your BIOS does.

This seems to highlight the changes to apm_do_idle as a potential problem source.

Comment 6 Zachary Amsden 2006-01-17 16:59:43 UTC
Created attachment 7053 [details]
Revert APM idle code to 2.6.14

Could you try this patch and see if it fixes the problem?
Comment 7 Ross Boswell 2006-01-17 22:43:32 UTC
No, reverting to 2.6.14 code for APM idle did not change the visible behaviour.  
Comment 8 Zachary Amsden 2006-01-18 09:51:44 UTC
2.6.14 and 2.6.15 both with and without my patch seem to suspend to RAM fine for
me.  To rule out my APM segment changes in 2.6.15, please try the following patch.
Comment 9 Zachary Amsden 2006-01-18 09:56:25 UTC
Created attachment 7066 [details]
Revert my 2.6.15 GDT changes

Can you see if reverting my APM GDT changes fixes the bug?
Comment 10 Ross Boswell 2006-01-21 11:29:51 UTC
Zach

Sorry for the delay in replying -- I'm currently travelling. 

Reverting APM GDT changes to 2.6.14 did not change the behaviour.  
The Portege 4010 still hangs on resume from suspend-to-RAM.

Cheers -- Ross
Comment 11 Ross Boswell 2006-01-22 19:50:20 UTC
Further information:-

My default setup is to load the kernel with "quiet" command-line parameter.  If
I boot the 2.6.15 kernel without that parameter, then the hang on resume from
apm suspend gives a continuous spool of error messages (see below).  

If I boot the 2.6.15 kernel without loading modules, then sometimes apm resume
works OK; sometimes it gives a burst of error messages then resumes.  

The error message appears to come from ide which is built-in to my standard
kernel, not loaded as a module.  The modules I routinely load are:

---8<------8<------8<------8<------8<------8<------8<------8<------8<---
#
ohci_hcd
#
snd_trident
snd_ali5451
snd_pcm_oss
#
yenta_socket
#
hermes
orinoco
orinoco_cs
#
serial_core
8250
slhc
ppp_generic
zlib_inflate
zlib_deflate
ppp_deflate
ppp_async
#
ntfs
vfat
loop
#
sd_mod
usb_storage
#
---8<------8<------8<------8<------8<------8<------8<------8<------8<---

The error message on eventually-successful resume is:

---8<------8<------8<------8<------8<------8<------8<------8<------8<---
nomad:~# apm -s
PCI: Found IRQ 11 for device 0000:00:10.0
PCI: Sharing IRQ 11 with 0000:01:00.0
PCI: Found IRQ 11 for device 0000:00:11.0
PCI: Sharing IRQ 11 with 0000:00:12.0
PCI: Found IRQ 11 for device 0000:00:11.1
nomad:~# hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest
Error }
hda: task_in_intr: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hda: task_in_intr: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hda: task_in_intr: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hda: task_in_intr: error=0x04 { DriveStatusError }
ide: failed opcode was: unknown
ide0: reset: success

nomad:~#
---8<------8<------8<------8<------8<------8<------8<------8<------8<---

The "PCI:" messages are given on all resumes.  If the resume fails, the sequence
 hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
 hda: task_in_intr: error=0x04 { DriveStatusError }
 ide: failed opcode was: unknown
repeats indefinitely until hard reset or power cycle.  
Comment 12 Zachary Amsden 2006-01-23 09:46:35 UTC
Thanks for the update.  It appears my changes are not at fault.  Have you tried
narrowing the interval between kernel versions?  You can binary search the last
working kernel version, then binary search off the commit list until you find
the point of failure.  It looks like maybe suspending with an IDE interrupt
pending is causing some trouble?
Comment 13 Andrew Morton 2006-01-23 13:28:55 UTC
Bart, we think this is an IDE problem.
Comment 14 Bartlomiej Zolnierkiewicz 2006-01-24 00:25:21 UTC
Ross, could you send output of 'dmesg' command?
Comment 15 Ross Boswell 2006-01-24 10:57:36 UTC
Created attachment 7120 [details]
Output from dmesg

Thanks for your interest in this bug.  
Output from dmesg attached.  
The last 20 or so lines (following EXT3) result from  suspend-resume.  

Cheers -- Ross
Comment 16 Bartlomiej Zolnierkiewicz 2006-01-24 14:46:05 UTC
> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> ide: Assuming 66MHz system bus speed for PIO modes
> Probing IDE interface ide0...
> hda: IC25N030ATCS04-0, ATA DISK drive
> Probing IDE interface ide1...
> hdc: HL-DT-STDVD-ROM GDR8081N, ATAPI CD/DVD-ROM drive
> Probing IDE interface ide2...
> Probing IDE interface ide3...
> Probing IDE interface ide4...
> Probing IDE interface ide5...
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> ide1 at 0x170-0x177,0x376 on irq 15
> hda: max request size: 128KiB
> hda: 58605120 sectors (30005 MB) w/1768KiB Cache, CHS=58140/16/63
> hda: cache flushes not supported
>  hda: hda1 hda2 hda3 hda4
> hdc: ATAPI 24X DVD-ROM drive, 512kB Cache

You are using ide-generic driver instead of proper driver for your chipset.
Generic driver has very limited suspend/resume support - it doesn't know how to
reprogram IDE chipset and devices during resume...
Comment 17 Ross Boswell 2006-01-25 02:06:40 UTC
Problem fixed by configuring kernel with ALi15x3 IDE driver instead of generic
driver.  

Thanks Zach and Bartolomiej for your help and your patience.

Cheers -- Ross