Distribution: Fedora Core 1 Hardware Environment: Abit VP6, 2x 1GHz Pentium III, 512MB Software Environment: Fedora Core 1 Problem Description: Booting 2.4.23 (or 2.4.22 or 2.4.24) with ACPI on causes a kernel panic. Steps to reproduce: Boot kernel 2.4.23 on dual processor Abit VP6.
Created attachment 1897 [details] Kernel panic text
Created attachment 1898 [details] System.map
Created attachment 1899 [details] ksymoops output
Created attachment 1900 [details] kernel config
This panic does not occur if I compile with "CONFIG_DEBUG_SLAB=y". It also doesn't occur if I boot with "idle=poll".
Does the boot panic go away if booted with acpi=off or if built with CONFG_ACPI=n? The backtrace shows default_idle(). However, if the ACPI processor module were initialized, pm_idle would be set to acpi_processor_idle() and default_idle() would never be called. (assuming the processor module isn't unloaded) Can you try disabling the ACPI processor driver by removing the processor driver module or build with CONFIG_ACPI_PROCESSOR=n and see if that has any effect? re: no panic if "idle=poll" -- this sets pm_idle to poll_idle. Here too, if this takes effect, then default_idle would never be called, implying it is default_idle that is causing the crash. re: no panic with CONFIG_DEBUG_SLAB=y Hmm, maybe code mistakenly is depending on freed data and the slab debug code somehow invalidates that data to hide the bug? wild guess: any change if you boot with "nosmp"? (if this works, can you attach the dmesg?) thanks, -Len
I should have mentioned, booting with "maxcpus=1" suppresses the panic. So I would assume that "nosmp" would have the same effect. Booting with "acpi=off" avoids the panic, but it leaves the IRQ routing for the built-in USB controllers messed up. I can easily provide a dmesg log with any of the above methods. Which do you prefer? WRT default_idle and "idle=poll", I'm running with a patch that allows runtime control of the idle function (default_idle vs. poll_idle). It allows me to boot with "idle=poll" and then switch back to the normal idle function once the system is booted, keeping my CPUs cool. It definitely works, because I can see the temperatures of both processors respond when I change the setting. So I don't believe that this board supports ACPI processor power management. I've played with adding "while (1);" infinite loops to the ACPI code. If I add it before the call to acpi_os_write_port at line 143 of hwacpi.c, the boot process simply hangs. If I add the loop after that line, I get the panic. I know that this is not 100% reliable, but I believe that the transition to ACPI mode is causing the idle thread on the other processor to crash. If you can think of a way to verify whether or not this is the case, please let me know.
Would you please have 2.6 baseline kernel a try? I want to know whether this is 2.4 only issue. Thanks, Luming
It is a 2.4-only issue. 2.6.0 and 2.6.1 boot just fine with ACPI on.
does this still fail if you run the unmodified baseline kernel -- ie no idle-patch? thanks, -Len
Yes. The idle patch has no impact on the panic; it simply allows me to use HLT-based idling after I've worked around the panic by booting with "idle=poll".
is this still a problem with a recent kernel?
I have also that Abit board, and the same problem. From kernel 2.4.22-2.4.26, if acpi support is compiled, kernel crashes on boot. Booting with acpi=off, prevents the crash. 2.6.x works ok. I have processor as module.
Even on 2.6.7 the transition to ACPI mode is causing the panic and found that this happens if and only if the transition is made to happen from non-boot cpu. If the transition is made to happen from boot cpu then you do not see this bug. I have a fix for this for 2.6.7 at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=125841 Please try this and I hope this should work for you.
To comments #14: I think the patch for bug 2515 trigger the similar kernel panic on 2.6. Without that patch, 2.6 don't have problem with entering acpi mode. --Luming
Created attachment 3445 [details] 2.4.27 patch to write SMI_CMD from CPU0 if comment #14 is correct, then this 2.4.27 patch should address the boot panic, and maxcpus=1 should no longer be necessary. Please give it a try, you should see a debug line in the dmesg: LENB: save_cpus_allowed 0xffffffff
Created attachment 3506 [details] proposed 2.4.27 patch Please test this 2.4.27 patch. It moves the ACPI initialization that writes the SMI_CMD=ACPI_MODE earlier to before the other processors are initialized. It also addresses a similar issue with the LAPIC timer on laptop platforms.
please re-open if the patch didn't fix the problem. *** This bug has been marked as a duplicate of 2941 ***