Bug 8247

Summary: S3: resume fails - 2.6.21 regression
Product: ACPI Reporter: Tobias Doerffel (tobias.doerffel)
Component: Power-Sleep-WakeAssignee: Len Brown (lenb)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, anton, rjwysocki
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.21-rc5 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Output of dmesg after resume (made using SSH-login)
My kernel-configuration (also tried with less drivers etc. - same result)
Output of acpidump
Hang machine with patch 20070126
dmesg-output when booting with apic=debug
output of dmesg after resume

Description Tobias Doerffel 2007-03-21 06:18:36 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.20.3
Distribution: Debian
Hardware Environment: Acer Aspire 5652 (Core Duo, 1 GB RAM, Intel
82801G/ICH7-chipset)
Software Environment: Desktop
Problem Description:

Suspend to RAM used to work fine on my computer up to 
2.6.20.3. But no matter which rc of 2.6.21 I use, suspend to RAM doesn't work 
anymore. Up to rc3 even suspending stopped at "suspending console" which 
appearently seems to be fixed in rc4. I tried rc4-git4 with minimal config 
(no dyndicks, no HRT, no MSI, no sound, no bluetooth, no PCMCIA, no WLAN, no 
USB, no cpufreq) but still I can't resume properly. Caps works and I can 
login through SSH. Back to a more complete config (sound, MMC, WLAN, PCMCIA - 
still no dynticks or HRT - see attachment "config") I get exactly the same 
behaviour.

When logged in through SSH after resume I saved output of dmesg (which 
includes full power management debug messages), see 
attachement "dmesg-resume". The system basically seems to be back but lot of 
things do not work such as loading/unloading e.g. my WLAN-driver (ipw3945), 
running "top" or "dstat" etc.   "uptime" always returns 0 min, even with 
power management debug disabled.

Kernel:
Linux version 2.6.21-rc4 (gcc version 4.1.2 20061115 (prerelease) (Debian 
4.1.1-21)) #23 SMP PREEMPT Mon Mar 19 12:27:56 CET 2007


A complete bisect between 
2.6.20 and 2.6.21-rc4-git4 stops at a stage 
(a4bbb810dedaecf74d54b16b6dd3c33e95e1024c) where I'm not able to compile the 
kernel anymore because of compiling-errors in arch/i386/kernel/setup.c 
(ACPI-related compiling errors). Stepping some revisions back until it 
compiled again resume didn't work either.

So I started all over again with bisect only on arch/i386 and ended up at 
ceb6c46839021d5c7c338d48deac616944660124 as the bad commit. But this file 
seems to be some kind of finalization of a series of patches ("ACPICA: Remove 
duplicate table manager")...


Steps to reproduce:

echo mem /sys/power/state

and power on machine again by pressing a key or pushing power-on-button.
Comment 1 Tobias Doerffel 2007-03-21 06:20:59 UTC
Created attachment 10894 [details]
Output of dmesg after resume (made using SSH-login)
Comment 2 Tobias Doerffel 2007-03-21 06:22:01 UTC
Created attachment 10896 [details]
My kernel-configuration (also tried with less drivers etc. - same result)
Comment 3 Tobias Doerffel 2007-03-21 06:23:10 UTC
Created attachment 10897 [details]
Output of acpidump
Comment 4 Len Brown 2007-03-21 18:51:25 UTC
> I can login through SSH.

So the console doesn't come back up?
What if you use a text console instead of X -- does that work?
(ie. suspend/resume from S3, not from S5?)
Sounds like resume sort of worked, but something basic is broken,
like system time.  What do you see with

date
sleep 10
date
after resume?
Comment 5 Len Brown 2007-03-21 18:52:42 UTC
also, see if this patch helps on top of rc4:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/release/2.6.21/
acpi-release-20070126-2.6.21-rc4.diff.bz2
Comment 6 Tobias Doerffel 2007-03-22 16:06:05 UTC
First of all, your patch does not work at all :( The computer just hangs on 
boot. I made a photo of it, see attachement. Also tried with "pci=nomsi" 
and "vga=normal". Anything else I could test? BTW, my BIOS is up to date.

The console itself never came up after resume (always black - all 
kernel-versions) but this was no problem because X always resumed just fine. 
However even if I had no X run, I could type commands etc. (e.g. a find in / 
and see HD-LEDs blinking) and press Alt+Ctrl+Del for example which isn't the 
case with 2.6.21-kernels.

I couldn't do any further testing right now as I currently have no second 
computer (for SSH). This will change tomorrow, so I'll post more information.
Comment 7 Tobias Doerffel 2007-03-22 16:08:12 UTC
Created attachment 10919 [details]
Hang machine with patch 20070126

sorry for the poor quality, just made with my mobile phone ;-) if you need
more/detailed photos, tell me.
Comment 8 Len Brown 2007-03-25 20:05:33 UTC
Please verify that this hang is not in 2.6.21-rc5

There was a bad C-state patch in acpi-release-20070126-2.6.21-rc4.diff.bz2
You should be able to avoid it by booting with "processor.max_cstate=1",
or simply running 2.6.21-rc5, which includes all of the above, except
this bad bit.
Comment 9 Tobias Doerffel 2007-03-26 06:02:49 UTC
Sorry for disappointing you again, but rc5 doesn't work either. Now I even can't
login through SSH or ping the machine :-( Caps works but that's all. I don't
know how to debug now...
Comment 10 Tobias Doerffel 2007-03-29 12:22:56 UTC
After taking a look at my dmesg-attachement (#10894) I found the following line:

Calibrating delay using timer specific routine.. 189932.25 BogoMIPS (lpj=94966129)


This looks very strange to me. Is this already fixed with the timer-patch you
mentioned?
Comment 11 Tobias Doerffel 2007-03-29 12:24:24 UTC
"Please verify that this hang is not in 2.6.21-rc5"

Does not hang at startup anymore, that's right.
Comment 12 Len Brown 2007-03-29 21:41:47 UTC
okay, then we are back where we started,
2.6.21-rc5 doesn't resume from S3
on this machine, but 2.6.20.3 did resume.

> Calibrating delay using timer specific routine.. 189932.25 BogoMIPS 
(lpj=94966129)

Yeah, that is way off.  I don't know what to make of it.
Any better if you boot with maxcpus=1 and "noapic"?
Comment 13 Len Brown 2007-03-30 16:40:02 UTC
Hmmm, this BIOS has two MADTs:

lenb@d975xbx2:~/Documents/8247> /usr/bin/acpixtract -a acpidump
Acpi table [DSDT] -  23955 bytes written to DSDT.dat
Acpi table [FACS] -     64 bytes written to FACS.dat
Acpi table [FACP] -    116 bytes written to FACP.dat
Acpi table [APIC] -    104 bytes written to APIC1.dat
Acpi table [HPET] -     56 bytes written to HPET.dat
Acpi table [MCFG] -     60 bytes written to MCFG.dat
Acpi table [SLIC] -    374 bytes written to SLIC.dat
Acpi table [DBGP] -     52 bytes written to DBGP.dat
Acpi table [APIC] -    104 bytes written to APIC2.dat
Acpi table [BOOT] -     40 bytes written to BOOT.dat
Acpi table [SSDT] -   1615 bytes written to SSDT1.dat
Acpi table [SSDT] -   1682 bytes written to SSDT2.dat
Acpi table [SSDT] -    607 bytes written to SSDT3.dat
Acpi table [SSDT] -    166 bytes written to SSDT4.dat
Acpi table [SSDT] -   1228 bytes written to SSDT5.dat
Acpi table [RSDT] -     88 bytes written to RSDT.dat
Acpi table [RSDP] -     20 bytes written to RSDP.dat
lenb@d975xbx2:~/Documents/8247> madt < APIC1.dat
ACPI: APIC (v001 Acer   Grape    0x06040000 LOHR 0x0000005a) @ 0x(nil)
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
ACPI: IOAPIC (id[0x01] address[0xfec00000] global_irq_base[0x0])
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
Length 104 OK
Checksum OK
lenb@d975xbx2:~/Documents/8247> madt < APIC2.dat
ACPI: APIC (v001 PTLTD           APIC   0x06040000  LTP 0x00000000) @ 0x(nil)
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0])
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Length 104 OK
Checksum OK

Please boot with apic=debug, and attach the output from
dmesg -s64000 and paste the /proc/interrupts from booting
with and without "acpi_apic_instance=0" or the patch
from bug 8283; and report if it has any effect on the
suspend/resume issue at hand.
Comment 14 Tobias Doerffel 2007-03-31 06:27:24 UTC
After waiting a bit after resume I found out, that my machine resumes properly
after about 60 seconds (60 seconds after I pressed a key/the power-LED turned
on). This also "worked" without your patch or without "acpi_apic_instance=0".
However this of course still is not satisfying. After I resumed (after these 60
seconds), the average load is at 32. Furthermore my screen doesn't come back if
I don't use proprietary NVIDIA-driver (which appearently has fine
power-management). I attached dmesg-output (with NVIDIA-driver) after suspend as
well as dmesg-output when booting with apic=debug.
Comment 15 Tobias Doerffel 2007-03-31 06:29:18 UTC
Created attachment 11014 [details]
dmesg-output when booting with apic=debug
Comment 16 Tobias Doerffel 2007-03-31 06:31:06 UTC
Created attachment 11015 [details]
output of dmesg after resume
Comment 17 Tobias Doerffel 2007-04-03 03:03:31 UTC
Everything seems to work well now with 2.6.21-rc5-git9 - thank you :)