Bug 1908

Summary: Kernel panic booting 2.4 on Abit VP6 w/ ACPI
Product: ACPI Reporter: Ian Pilcher (i.pilcher)
Component: BIOSAssignee: Luming Yu (luming.yu)
Status: REJECTED DUPLICATE    
Severity: normal CC: acpi-bugzilla
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.4.23 w/ ACPI 20031203 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Kernel panic text
System.map
ksymoops output
kernel config
2.4.27 patch to write SMI_CMD from CPU0
proposed 2.4.27 patch

Description Ian Pilcher 2004-01-18 15:51:50 UTC
Distribution:  Fedora Core 1
Hardware Environment:  Abit VP6, 2x 1GHz Pentium III, 512MB
Software Environment:  Fedora Core 1

Problem Description:

Booting 2.4.23 (or 2.4.22 or 2.4.24) with ACPI on causes a kernel panic.

Steps to reproduce:

Boot kernel 2.4.23 on dual processor Abit VP6.
Comment 1 Ian Pilcher 2004-01-18 15:53:01 UTC
Created attachment 1897 [details]
Kernel panic text
Comment 2 Ian Pilcher 2004-01-18 15:54:52 UTC
Created attachment 1898 [details]
System.map
Comment 3 Ian Pilcher 2004-01-18 15:56:19 UTC
Created attachment 1899 [details]
ksymoops output
Comment 4 Ian Pilcher 2004-01-18 15:59:48 UTC
Created attachment 1900 [details]
kernel config
Comment 5 Ian Pilcher 2004-01-18 17:24:14 UTC
This panic does not occur if I compile with "CONFIG_DEBUG_SLAB=y".  It also
doesn't occur if I boot with "idle=poll".
Comment 6 Len Brown 2004-01-18 19:26:36 UTC
Does the boot panic go away if booted with acpi=off 
or if built with CONFG_ACPI=n? 
 
The backtrace shows default_idle().  However, if the ACPI processor module 
were initialized, pm_idle would be set to acpi_processor_idle() and default_idle() 
would never be called.  (assuming the processor module isn't unloaded) 
Can you try disabling the ACPI processor driver by 
removing the processor driver module or build with CONFIG_ACPI_PROCESSOR=n 
and see if that has any effect? 
 
re: no panic if "idle=poll" -- this sets pm_idle to poll_idle. 
Here too, if this takes effect, then default_idle would never be called, 
implying it is default_idle that is causing the crash. 
 
re: no panic with CONFIG_DEBUG_SLAB=y 
Hmm, maybe code mistakenly is depending on freed data and the slab debug code somehow 
invalidates that data to hide the bug? 
 
wild guess: any change if you boot with "nosmp"? 
(if this works, can you attach the dmesg?) 
 
thanks, 
-Len 
 
Comment 7 Ian Pilcher 2004-01-18 20:04:28 UTC
I should have mentioned, booting with "maxcpus=1" suppresses the panic.  So I
would assume that "nosmp" would have the same effect.

Booting with "acpi=off" avoids the panic, but it leaves the IRQ routing for the
built-in USB controllers messed up.

I can easily provide a dmesg log with any of the above methods.  Which do you
prefer?

WRT default_idle and "idle=poll", I'm running with a patch that allows runtime
control of the idle function (default_idle vs. poll_idle).  It allows me to boot
with "idle=poll" and then switch back to the normal idle function once the
system is booted, keeping my CPUs cool.  It definitely works, because I can see
the temperatures of both processors respond when I change the setting.  So I
don't believe that this board supports ACPI processor power management.

I've played with adding "while (1);" infinite loops to the ACPI code.  If I add
it before the call to acpi_os_write_port at line 143 of hwacpi.c, the boot
process simply hangs.  If I add the loop after that line, I get the panic.  I
know that this is not 100% reliable, but I believe that the transition to ACPI
mode is causing the idle thread on the other processor to crash.  If you can
think of a way to verify whether or not this is the case, please let me know.
Comment 8 Luming Yu 2004-01-19 18:30:22 UTC
Would you please have 2.6 baseline kernel a try? I want to know whether this is
2.4 only issue.

Thanks,
Luming
Comment 9 Ian Pilcher 2004-01-19 18:53:32 UTC
It is a 2.4-only issue.  2.6.0 and 2.6.1 boot just fine with ACPI on.
Comment 10 Len Brown 2004-01-20 14:11:35 UTC
does this still fail if you run the unmodified baseline kernel -- ie no idle-patch? 
 
thanks, 
-Len 
 
Comment 11 Ian Pilcher 2004-01-20 14:49:38 UTC
Yes.  The idle patch has no impact on the panic; it simply allows me to use
HLT-based idling after I've worked around the panic by booting with "idle=poll".
Comment 12 Len Brown 2004-05-17 21:21:56 UTC
is this still a problem with a recent kernel? 
Comment 13 Norberto Garc 2004-06-21 08:19:24 UTC
I have also that Abit board, and the same problem.
From kernel 2.4.22-2.4.26, if acpi support is compiled, kernel crashes on boot.
Booting with acpi=off, prevents the crash.

2.6.x works ok.

I have processor as module. 
Comment 14 Anil S Keshavamurthy 2004-07-20 15:31:05 UTC
Even on 2.6.7 the transition to ACPI
mode is causing the panic and found that this happens if and only if the 
transition is made to happen from non-boot cpu. If the transition is made to 
happen from boot cpu then you do not see this bug.

I have a fix for this for 2.6.7 at 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=125841

Please try this and I hope this should work for you.
Comment 15 Luming Yu 2004-07-22 03:08:33 UTC
To comments #14:
  I think the patch for bug 2515 trigger the similar kernel panic on 2.6.
Without that patch, 2.6 don't have problem with entering acpi mode.

--Luming
Comment 16 Len Brown 2004-07-30 22:21:30 UTC
Created attachment 3445 [details]
2.4.27 patch to write SMI_CMD from CPU0

if comment #14 is correct, then this 2.4.27 patch should
address the boot panic, and maxcpus=1 should no longer be necessary.
Please give it a try, you should see a debug line in the dmesg:
LENB: save_cpus_allowed 0xffffffff
Comment 17 Len Brown 2004-08-13 21:04:49 UTC
Created attachment 3506 [details]
proposed 2.4.27 patch

Please test this 2.4.27 patch.	It moves the ACPI initialization that
writes the SMI_CMD=ACPI_MODE earlier to before the other
processors are initialized.  It also addresses a similar issue
with the LAPIC timer on laptop platforms.
Comment 18 Len Brown 2004-08-24 19:52:34 UTC
please re-open if the patch didn't fix the problem. 

*** This bug has been marked as a duplicate of 2941 ***