Most recent kernel where this bug did not occur:
Vanilla release 2.6.14 is good, 2.6.15 is affected. git bisected it down to
(with the help of dsd@gentoo irc):
# git bisect bad
cd8e2b48daee891011a4f21e2c62b210d24dcc9e is first bad commit
diff-tree cd8e2b48daee891011a4f21e2c62b210d24dcc9e (from
Author: Venkatesh Pallipadi <email@example.com>
Date: Fri Oct 21 19:22:00 2005 -0400
[ACPI] fix 2.6.13 boot hang regression on HT box w/ broken BIOS
Signed-off-by: Venkatesh Pallipadi <firstname.lastname@example.org>
Signed-off-by: Len Brown <email@example.com>
:040000 040000 9cb687b77dcd64bf82e9a73214db467c964c1266
b1bde4a4ad91720daa6645c60bdc123b824c39b2 M drivers
Hardware Environment: Supermicro 370DER mainboard, dual p3 1ghz coppermine,
512mb ecc reg pc 133, scsi drive
Software Environment: Gentoo
Problem Description: CPU0 and CPU1 are both detected by the kernel, and show in
a program like top. but on any kernel after given commit CPU1 won
Created attachment 7085 [details]
acpidump from affected system
dump made using acpidump
I have see this (and reported it to the kernel mailing lists, but I wasn't able
to bisect the commit.
The machine is a dual P3 1 Ghz with a Supermicro 370 DE6 (same chipset)
In my box, I can make it work again by setting CONFIG_ACPI_PROCESSOR to "m". The
problem is only reproduceable when CONFIG_ACPI_PROCESSOR=y
Created attachment 7094 [details]
Does attached patch help? Thanks!
And please also provide the dmesg form the buggy case!
Not sure if it is required but it can do no harm: Probably a good idea to turn
on CONFIG_ACPI_DEBUG as well as applying that patch.
Created attachment 7100 [details]
dmesg for the buggy case
Created attachment 7101 [details]
dmesg for the buggy case
David, the patch does not fix the problem for me. dmesg attached.
Created attachment 7110 [details]
Does this one help a little? When ACPI returns wrong ID, we might wrongly free
Sorry for letting you try so many, I haven't a system to reproduce it.
Created attachment 7111 [details]
Yes, this one definitively works in my box. Attached working dmesg.
Mark this one as resolved. Ronald, does the patch work for you?
I am not sure whether the patch in cmment #8 is the right fix.
It is not clear to me why we are ending up here in this error case in the first
place. Ideally, it should not come here unless BIOS has messed up acpi_id.
I want to get more information on this failure.
Diego/Ronald. Can you attach the dmesg from your system after you apply the
Created attachment 7122 [details]
Debug patch number 3
the patch doesn't seems to apply on top of current linus's git tree
ok I applied it by hand - it was just a extra space
Created attachment 7123 [details]
This is the corresponding dmesg
Thanks for the prompt and quick check of all the debug patches. I have narrowed
down on the problem here.
As per my theory, backing out this patch
should also make your system work as before.
Now that I have understood the problem, I will work with Shaohua and Ashok and
get to a clean solution.
Description of the problem.
- This particular BIOS (both Ronald and Diego) is unique in that, it has the
disabled ACPI madt entries mapping to one of the enabled madt entry. Though
strictly it is not out of ACPI spec, it is uncommon though. Typically BIOSes
give some lapic_id like 0x80, 0x81, etc to these disabled CPUs.
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:8 APIC version 17
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 6:8 APIC version 17
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x00] disabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x00] disabled)
- With the above change from Ashok, we now look at disabled CPUs as well and
store there lapic_id. As this id is same as one of the enabled CPUs, the ACPI
gets confused while adding CPUs and that results in an error at later point.
- Though this issue was exposed by 5452, that is not the real cause for the problem.
These are busy days for me, i will try to get in some testing in by tomorrow
evening CET. My apologies, normally i have more time at hands to respond as
swiftly as you guys did in working on this bug, thanks for that!
Yes, backing out that change from Linus's git tree also solved the problem
Created attachment 7139 [details]
Dont record disabled lapic values to avoid conflict in some BIOS's
Could you please apply and let me know if this fixes the problem?
Patch from comment 19 fixes the problem for me.
Applies cleanly to and fixes both gentoo patched 2.6.15-gentoo-r1 and vanilla
ditto for me
Ashok, is this patch OK to apply in the Gentoo kernel, or is a final patch in
Sorry for the delay...
Yes.. this is the final patch. i just got email from Andi Kleen that he pushed
to linus. So it should be showing up in git trees pretty soon.
shipped in linux-2.6.16-rc2, closing.
Also available in 184.108.40.206 stable release.