Bug 6270

Summary: Toshiba Satellite (hyperthreaded P4) fails to resume without 'noapic' - "Not enough cpus"
Product: ACPI Reporter: Holger Macht (holger)
Component: Power-Sleep-WakeAssignee: Shaohua (shaohua.li)
Status: REJECTED INSUFFICIENT_DATA    
Severity: normal CC: acpi-bugzilla, bunk, pavel
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Image of kernel panic
dmesg before suspend
config.gz
kernel panic with apic=debug
dmesg before disabling cpu1 and suspending
image of disabling cpu1, suspending, resuming and enabling cpu1

Description Holger Macht 2006-03-22 06:39:44 UTC
Distribution: SUSE Linux 10.1 Beta8 with vanilla 2.6.16 kernel

Hardware Environment:

Toshiba Satellite P10-554 Notebook.
Hyperthreaded Pentium 4.

/proc/cpuinfo:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping        : 9
cpu MHz         : 2793.509
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
 tm pbe cid xtpr
bogomips        : 5595.70

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping        : 9
cpu MHz         : 2793.509
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
 tm pbe cid xtpr
bogomips        : 5586.31

Problem Description:
On both suspend to disk and suspend to ram, after waking up the system and 
reloading the image, kernel panics with the line "kernel panic - not syncing: 
Not enough cpus". I will attach an image of complete resume process.

Both suspend to disk and to ram work flawless when giving 'noapic' as boot 
parameter.

Manually disabling and enabling CPU1 with writing values to /sys/devices/system/
cpu/cpu1/online works, too. 

Steps to reproduce:
Boot with init=/bin/bash, mount /sys, swapon -a, echo disk/mem > /sys/power/
state, wake up system
Comment 1 Holger Macht 2006-03-22 06:42:40 UTC
Created attachment 7637 [details]
Image of kernel panic
Comment 2 Shaohua 2006-03-22 21:16:29 UTC
Please do: manually offline a CPU and then do a suspend/resume circle. Let's 
if you can online the CPU after resume.

Do you use genapic? (better if you can provide the config file and dmesg 
before resume)
Comment 3 Shaohua 2006-03-22 21:32:06 UTC
Could you please take the photo of system hang with boot option 'apic=debug'? 
This will give us more info to root cause this issue.
Comment 4 Holger Macht 2006-03-23 06:50:56 UTC
> Please do: manually offline a CPU and then do a suspend/resume circle. Let's 
> if you can online the CPU after resume.
System resumes if cpu1 is offlined before suspend. But I can't enable it 
afterwards. I get the same message like short before the Line "Error taking cpu 
1 up: -22" everytime I try to enable it. But now kernel panic.

> Do you use genapic? (better if you can provide the config file and dmesg 
> before resume)
/var/log/messages tells me "Mar 23 14:48:40 (none) kernel: Unknown genapic `apic
=debug' specified.". Nevertheless, I will attach dmesg and config.gz.

> Could you please take the photo of system hang with boot option 'apic=debug'? 
> This will give us more info to root cause this issue.
Will attach it, but it only contains one additional line.

Comment 5 Holger Macht 2006-03-23 06:51:47 UTC
Created attachment 7649 [details]
dmesg before suspend
Comment 6 Holger Macht 2006-03-23 06:52:27 UTC
Created attachment 7650 [details]
config.gz
Comment 7 Holger Macht 2006-03-23 06:53:10 UTC
Created attachment 7651 [details]
kernel panic with apic=debug
Comment 8 Shaohua 2006-03-23 17:38:19 UTC
Ok, you are using genapic. please try 'CONFIG_X86_PC' instead 
of 'CONFIG_X86_GENERICARCH'.

>System resumes if cpu1 is offlined before suspend. But I can't enable it 
>afterwards. I get the same message like short before the Line "Error taking 
>cpu 1 up: -22" everytime I try to enable it. But now kernel panic.
This only could happen when there are two cpus to me. What I'd like you to try 
is offline cpu1 manually. and then do suspend/resume. After resume, manually 
online cpu1. let's see if it works. and please give me the dmesg.

>acpic=debug
It appears you spelled it wrong. it's 'apic=debug'.
Comment 9 Holger Macht 2006-03-24 02:30:47 UTC
>Ok, you are using genapic. please try 'CONFIG_X86_PC' instead 
>of 'CONFIG_X86_GENERICARCH'.
All tests are done with the new kernel now.

>This only could happen when there are two cpus to me. What I'd like you to try 
>is offline cpu1 manually. and then do suspend/resume. After resume, manually 
>online cpu1. let's see if it works. and please give me the dmesg.
That's exactly what I did. Unfortunatelly, nothing is written to disk after 
resume. So I will attach dmesg before setting cpu1 offline and an image with 
disabling cpu1, suspending, resuming and setting cpu1 online again.

>It appears you spelled it wrong. it's 'apic=debug'.
Yes, I noticed that already but did attach the 'old' dmesg.

Comment 10 Holger Macht 2006-03-24 02:32:18 UTC
Created attachment 7657 [details]
dmesg before disabling cpu1 and suspending
Comment 11 Holger Macht 2006-03-24 02:33:13 UTC
Created attachment 7658 [details]
image of disabling cpu1, suspending, resuming and enabling cpu1
Comment 12 Shaohua 2006-03-26 17:52:20 UTC
How about boot option 'lpj=11172736'?
Comment 13 Shaohua 2006-03-26 18:20:11 UTC
After resume and before online cpu1, please check if time is correct. Thanks!
Comment 14 Shaohua 2006-03-26 18:25:59 UTC
Also, how about boot option 'clock=tsc' or 'clock=pit'?
Comment 15 Len Brown 2006-03-30 18:49:55 UTC
lets try to no panic when the 2nd cpu fails to start up
and see if the system can come up wihh 1 cpu...
Comment 16 Holger Macht 2006-04-03 05:21:07 UTC
Well, the time is indeed not correct after resume.

The time varies within a specific range. Hours and minutes always stay the same.

lpj=11172736 --> 1 to 4 seconds (like without a boot param)
clock=tsc    --> 1 to 2 seconds
clock=pit    --> it stays always the same

But the ranges maybe be random, though.

For example:

`date`
14:01:01
as soon as it reaches 14:01:05, it switches back to 14:01:01 and the same game 
starts again.
Comment 17 Shaohua 2006-04-03 18:45:11 UTC
Looks we are approaching to the root cause ;).
please deselect 'CONFIG_X86_PM_TIMER', let's try if resume works. I guess pm 
timer is the root cause, so don't use it.
Comment 18 Pavel Machek 2006-04-04 03:06:33 UTC
Do interrupts work after resume? I.e. if you do time sleep 1, does it actually
return after one second, or does it hang forever?
Comment 19 Holger Macht 2006-04-28 05:53:07 UTC
Sorry for the delay, I was on vacation for some time and had no access to the 
machine...

I can't disable 'CONFIG_X86_PM_TIMER' because it got removed some time ago 
IIRC. At least it gets automatically readded in .config or set to CONIFG_X86_PM_
TIMER=y if I try

Interrups do not work. time sleep 1 hangs forever.

Comment 20 Shaohua 2006-04-28 18:29:54 UTC
Ok, let's use a rude method :) You can delete the line 
include '&timer_pmtmr_init,' in arch/i386/kernel/timers/timer.c

I now can know the pm timer doesn't work. I guess the LPC's config space isn't 
completely restored in the resume, so LPC allocate pm timer's io port to a 
different position, which cause the io port can't be decoded.
Comment 21 Shaohua 2006-04-28 18:30:32 UTC
Can you attach the lspci -xxx output before/after resume?
Comment 22 Adrian Bunk 2006-08-06 03:52:33 UTC
Please reopen this bug if:
- it is still present in kernel 2.6.17 and
- you can provide the requested data.