Bug 5930
Summary: | 2.6.15 regression - 2nd CPU unused - Serverworks OSB4/Supermicro 370DER | ||
---|---|---|---|
Product: | ACPI | Reporter: | Ronald Hummelink (ronald) |
Component: | Config-Processors | Assignee: | Venkatesh Pallipadi (venki) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | acpi-bugzilla, again, ashok.raj, diegocg, dsd, hallbw |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.15 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
acpidump from affected system
debug patch dmesg for the buggy case dmesg for the buggy case debug patch working dmesg Debug patch number 3 debug dmesg Dont record disabled lapic values to avoid conflict in some BIOS's |
Description
Ronald Hummelink
2006-01-20 18:38:33 UTC
Created attachment 7085 [details]
acpidump from affected system
dump made using acpidump
I have see this (and reported it to the kernel mailing lists, but I wasn't able to bisect the commit. The machine is a dual P3 1 Ghz with a Supermicro 370 DE6 (same chipset) In my box, I can make it work again by setting CONFIG_ACPI_PROCESSOR to "m". The problem is only reproduceable when CONFIG_ACPI_PROCESSOR=y Created attachment 7094 [details]
debug patch
Does attached patch help? Thanks!
And please also provide the dmesg form the buggy case! Not sure if it is required but it can do no harm: Probably a good idea to turn on CONFIG_ACPI_DEBUG as well as applying that patch. Created attachment 7100 [details]
dmesg for the buggy case
Created attachment 7101 [details]
dmesg for the buggy case
David, the patch does not fix the problem for me. dmesg attached.
Created attachment 7110 [details]
debug patch
Does this one help a little? When ACPI returns wrong ID, we might wrongly free
some info.
Sorry for letting you try so many, I haven't a system to reproduce it.
Created attachment 7111 [details]
working dmesg
Yes, this one definitively works in my box. Attached working dmesg.
Mark this one as resolved. Ronald, does the patch work for you? I am not sure whether the patch in cmment #8 is the right fix. It is not clear to me why we are ending up here in this error case in the first place. Ideally, it should not come here unless BIOS has messed up acpi_id. I want to get more information on this failure. Diego/Ronald. Can you attach the dmesg from your system after you apply the patch below. Thx. Created attachment 7122 [details]
Debug patch number 3
the patch doesn't seems to apply on top of current linus's git tree ok I applied it by hand - it was just a extra space Created attachment 7123 [details]
debug dmesg
This is the corresponding dmesg
Diego, Thanks for the prompt and quick check of all the debug patches. I have narrowed down on the problem here. As per my theory, backing out this patch http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fbe83e209ad9c8281e29ac17a60f91119d86fa8c should also make your system work as before. Now that I have understood the problem, I will work with Shaohua and Ashok and get to a clean solution. Description of the problem. - This particular BIOS (both Ronald and Diego) is unique in that, it has the disabled ACPI madt entries mapping to one of the enabled madt entry. Though strictly it is not out of ACPI spec, it is uncommon though. Typically BIOSes give some lapic_id like 0x80, 0x81, etc to these disabled CPUs. ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 6:8 APIC version 17 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 6:8 APIC version 17 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x00] disabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x00] disabled) - With the above change from Ashok, we now look at disabled CPUs as well and store there lapic_id. As this id is same as one of the enabled CPUs, the ACPI gets confused while adding CPUs and that results in an error at later point. - Though this issue was exposed by 5452, that is not the real cause for the problem. These are busy days for me, i will try to get in some testing in by tomorrow evening CET. My apologies, normally i have more time at hands to respond as swiftly as you guys did in working on this bug, thanks for that! Yes, backing out that change from Linus's git tree also solved the problem Created attachment 7139 [details]
Dont record disabled lapic values to avoid conflict in some BIOS's
Could you please apply and let me know if this fixes the problem?
Thanks
ashok
Patch from comment 19 fixes the problem for me. Applies cleanly to and fixes both gentoo patched 2.6.15-gentoo-r1 and vanilla 2.6.15.1 ditto for me Ashok, is this patch OK to apply in the Gentoo kernel, or is a final patch in the works? Sorry for the delay... Yes.. this is the final patch. i just got email from Andi Kleen that he pushed to linus. So it should be showing up in git trees pretty soon. shipped in linux-2.6.16-rc2, closing. Also available in 2.6.15.4 stable release. |