Bug 7465 - Duplicate MADT - Sony Vaio VGN-SZ340 Core Duo
Summary: Duplicate MADT - Sony Vaio VGN-SZ340 Core Duo
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: BIOS (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-11-07 05:44 UTC by Artur Souza
Modified: 2007-03-23 09:30 UTC (History)
3 users (show)

See Also:
Kernel Version: Kernel 2.6.18.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Dmesg (18.88 KB, text/plain)
2006-11-28 02:51 UTC, Artur Souza
Details
Dmesg (18.88 KB, text/plain)
2006-11-28 02:52 UTC, Artur Souza
Details
DSDT (203.72 KB, application/octet-stream)
2006-11-28 03:04 UTC, Artur Souza
Details
ACPI Dump (141.88 KB, application/octet-stream)
2006-11-29 01:55 UTC, Artur Souza
Details
workaround to parse the second MADT (830 bytes, patch)
2006-12-01 01:27 UTC, Zhang Rui
Details | Diff
Dmesg with patch (19.83 KB, application/octet-stream)
2006-12-01 03:31 UTC, Artur Souza
Details
workaround to ignore the first MADT (1.39 KB, patch)
2006-12-01 09:01 UTC, Zhang Rui
Details | Diff
Dmesg using the second patch (19.64 KB, application/octet-stream)
2006-12-01 09:45 UTC, Artur Souza
Details
Dmesg with cpufreq.debug=7 (24.55 KB, text/plain)
2006-12-06 10:52 UTC, Artur Souza
Details
Config used with my kernel 2.6.19 (41.87 KB, text/plain)
2006-12-06 10:54 UTC, Artur Souza
Details
2.6.20-rc3 patch adding "acpi_parse_multi_table=1" parameter (10.14 KB, patch)
2007-01-05 23:50 UTC, Len Brown
Details | Diff
Dmesg of 2.6.20-rc4 with patch and acpi=off (21.42 KB, application/octet-stream)
2007-01-08 03:58 UTC, Artur Souza
Details
patch-ACPI-parse-multi-table (10.89 KB, patch)
2007-01-09 21:26 UTC, Zhang Rui
Details | Diff
updated patch w/o BUG_ON in acpi_table_parse() (10.18 KB, patch)
2007-01-12 15:26 UTC, Len Brown
Details | Diff
Dmesg of 2.6.20-rc4 and last patch (21.46 KB, application/octet-stream)
2007-01-15 04:51 UTC, Artur Souza
Details
2.6.21-rc3 patch (3.97 KB, patch)
2007-03-10 23:47 UTC, Len Brown
Details | Diff

Description Artur Souza 2006-11-07 05:44:29 UTC
*Most recent kernel where this bug did not occur: -

*Distribution: Slackware

*Hardware Environment:
Sony Vaio VGN-SZ340 - Core 2 Duo T7200 (2 GHz), 1 GB RAM, 100 GB HD, wireless
ipw3945, 2 video cards (intel and nvidia go 7400)

*Software Environment: Slack defaults + KDE

*Problem Description: With ACPI enabled in the kernel, `cat /proc/cpuinfo` shows
only 1 cpu core. Giving "acpi=off" to the kernel shows the 2 cores.

*Steps to reproduce: ACPI enabled and SMP enabled also. Just boot and `cat
/proc/cpuinfo`
Comment 1 Zhang Rui 2006-11-27 19:21:41 UTC
Please attach the dmesg and acpidump output. :)
Comment 2 Artur Souza 2006-11-28 02:51:23 UTC
Created attachment 9636 [details]
Dmesg
Comment 3 Artur Souza 2006-11-28 02:52:15 UTC
Created attachment 9637 [details]
Dmesg
Comment 4 Zhang Rui 2006-11-28 03:02:58 UTC
From the dmesg, you can see:
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:15 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled)
ACPI processor driver failed to start the second CPU, as the data read from 
MADT show that the second cpu should be disabled.
Comment 5 Artur Souza 2006-11-28 03:04:57 UTC
Created attachment 9638 [details]
DSDT

I got this file doing
cat /proc/acpi/dsdt and decompiling it with iasl -d
Comment 6 Zhang Rui 2006-11-28 17:23:31 UTC
The whole acpidump is needed,and acpidump tools is available here:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils
IMO, there should be some BIOS options that have something to do with LAPIC or 
mutil-core and you disabled them. That will be nice if you can check them. :)


Comment 7 Shaohua 2006-11-28 19:49:21 UTC
>ACPI: 2 duplicate APIC table ignored.
The BIOS has two MADT tables. Rui, please check which MADT table is valid.
Comment 8 Artur Souza 2006-11-29 01:55:40 UTC
Created attachment 9650 [details]
ACPI Dump

ACPI dump using the tool you gave the link to me.
I checked the BIOS and there is nothing regarding multi cores or LAPIC. It's a
Phoenix BIOS and it does not have too many options, just a few of them (and
none regarding multi cores or LAPIC =( )

Note that if I put acpi=off it detects both cores.
Thanks
Comment 9 Shaohua 2006-11-29 18:06:22 UTC
Did you check if there is a BIOS update? Or did you try WinXP? This looks like 
a BIOS bug to me.
Comment 10 Zhang Rui 2006-11-29 19:55:23 UTC
to Artur Souza :
It's really strange that your BIOS has two MADTs. The first one disable CPU1 
LAPIC, and this cause the problem you describe.
Kernel only parse the first MADT and disable CPU1.
Comment 11 Artur Souza 2006-11-30 10:19:17 UTC
Well, under WinXP it works perfectly and I couldnt find a BIOS upgrade at Sony's
website (this model is pretty new...VGN-SZ340).
Do you think that there is any way to make the kernel parse the second MADT ?

thanks!
Comment 12 Zhang Rui 2006-12-01 01:27:20 UTC
Created attachment 9702 [details]
workaround to parse the second MADT

This is a workaround to make your laptop parse the second MADT.
Please make a test and attatch the _DMESG_ output with this patch.
:)
Comment 13 Artur Souza 2006-12-01 03:31:20 UTC
Created attachment 9703 [details]
Dmesg with patch

dmesg with patch
Comment 14 Artur Souza 2006-12-01 03:35:53 UTC
Comment on attachment 9703 [details]
Dmesg with patch

The patch didnt work. And here it is the dmesg. It still shows only one core.

Thanks for your help! =)
Comment 15 Zhang Rui 2006-12-01 09:01:09 UTC
Created attachment 9706 [details]
workaround to ignore the first MADT

Sorry, the last one is wrong.
How about this one? I think it should work. :)
Comment 16 Artur Souza 2006-12-01 09:45:28 UTC
Created attachment 9707 [details]
Dmesg using the second patch

Now it worked! I'm using ACPI and both cores. I noticed that the fan is still
"working hard" as it was when I ran without ACPI....
but now I can use the acpi to control brightness of the monitor and so on...
and It shows both cores.

What is the next step ?
=)

thanks!
Comment 17 Artur Souza 2006-12-01 10:05:37 UTC
The fan was working "hard" because I didnt set up the correct scaling governor.
But I was able just to setup the scaling governor for cpu0 and not for cpu1. Was
that expected ?

thanks
Comment 18 Zhang Rui 2006-12-03 22:53:13 UTC
>But I was able just to setup the scaling governor for cpu0 and not for cpu1.
What do you mean by "can't setup the scaling governor for cpu1"?
Only "cpu0" is shown under /sys/devices/system/cpu/?
Or "cpufreq" is lost under "cpu1"? I'm not quite clear. :)
Comment 19 Artur Souza 2006-12-04 02:44:11 UTC
> What do you mean by "can't setup the scaling governor for cpu1"?
> Only "cpu0" is shown under /sys/devices/system/cpu/?
> Or "cpufreq" is lost under "cpu1"? I'm not quite clear. :)

Ow, sorry. I mean that there is no "cpufreq" under cpu1....
=)
Comment 20 Zhang Rui 2006-12-05 01:23:58 UTC
OK.
Please add a boot option cpufreq.debug=7
and attatch the _DMESG_ output. :)
Comment 21 Artur Souza 2006-12-05 11:57:21 UTC
I got a message "Unknown boot option `cpufreq.debug=7': ignoring" with that
option you said...
Comment 22 Zhang Rui 2006-12-05 23:42:24 UTC
Check whether CONFIG_CPU_FREQ_DEBUG is set in your kernconfig.
If not, set it, recompile the kernel and try again.
And please attatch the kernel config file. :)
IMO, it will be more meaningful if you can do this test on the latest kernel 
release, i.e. 2.6.19. :)
Comment 23 Artur Souza 2006-12-06 10:52:34 UTC
Created attachment 9747 [details]
Dmesg with cpufreq.debug=7

Using the last patch you gave me (so both cores are detected). Using kernel
2.6.19 and cpufreq.debug=7
Comment 24 Artur Souza 2006-12-06 10:54:52 UTC
Created attachment 9748 [details]
Config used with my kernel 2.6.19
Comment 25 Zhang Rui 2006-12-07 01:26:10 UTC
This is some information in your BIOS
 Name (SSDT, Package (0x0C)
        {
            "CPU0IST ", 
            0x3F68BCF8, 
            0x000001EA, 
            "CPU1IST ", 
            0x00000000, 
            0xF000FF53, 
            "CPU0CST ", 
            0x3F68BAE1, 
            0x00000217, 
            "CPU1CST ", 
            0x00000000, 
            0xF000FF53
        })
"CPU1IST" "CPU1CST" are dynamic tables loaded by _PDC method.
And these tables contain the critical methods for P-state support.
But the table base_addr and length is totally wrong.
It's really strange it works well in Windows.... :(
Comment 26 Artur Souza 2006-12-07 06:49:26 UTC
is there any way to fix these errors ?
Comment 27 Len Brown 2007-01-05 10:00:00 UTC
Is it possible that the BIOS installed on this board
does not support the processor that is installed --
or did the board+processor+BIOS all come together from Sony?

In any case, the MADT issue clearly moves this into the BIOS bug category.

A long while back, we ran into a Dell system with multiple MADTs
that used to fail.  So we choose the 1st MADT and ignored the other(s).
It is possible that we could have just as well ignored the initial
and used the last, which would also work for this Sony.
We'll have to dig up the Dell to find out.

Re: P-states on 2 cpus.
Setting the governors are handled from user-space.
It is possible that due to the MADT issue above, the distro installed
a script for 1 processor and not the other.

If the kernel is exporting /sys/devices/system/cpu/cpu0 and cpu1
and cpufreq files appear beneath both, and you can set the governor
in each, and it works,  then I don't think there is a kernel issue
related to P-states on this machine.
Comment 28 Artur Souza 2007-01-05 10:21:29 UTC
> or did the board+processor+BIOS all come together from Sony?

Yes, all come together from Sony.

> If the kernel is exporting /sys/devices/system/cpu/cpu0 and cpu1
> and cpufreq files appear beneath both, and you can set the governor
> in each, and it works,  then I don't think there is a kernel issue
> related to P-states on this machine.

That's the problem: cpu1 file appear inside /sys/devices/system/cpu but cpufreq
appears just inside cpu0 and not inside cpu1.
Comment 29 Zhang Rui 2007-01-05 21:14:32 UTC
>That's the problem: cpu1 file appear inside /sys/devices/system/cpu but 
>cpufreq appears just inside cpu0 and not inside cpu1.
This is a BIOS problem.
CPU0 can be successfully initialized by cpu-freq driver. While the methods 
that support P-state for CPU1 are not found in the BIOS. I think they are 
contained in some dynamic loaded tables, "CPU1IST" or "CPU1CST".
But as shown in comment #25, the base_addr and length from BIOS for these two 
tables are meaningless so that they are not loaded. That's why CPU1 can not be 
initialized by cpu-freq driver.
I think we have root caused the bug. The only way to fix the P-state problem 
is to update the BIOS. :(
Comment 30 Len Brown 2007-01-05 23:50:36 UTC
Created attachment 10014 [details]
2.6.20-rc3 patch adding "acpi_parse_multi_table=1" parameter

Re: duplicate MADT
Please test this patch.

Boot with no parameters, it should print a message alerting
you to the duplicate tables, asking you to...

boot with "acpi_parse_multi_table=1"

Please boot that way too.
Please confirm that the original boot failed to bring up the 2nd core
and that the 2nd boot succeeded in bringing up the 2nd core.
Please attach the output from dmesg -s 64000 from both.
Comment 31 Artur Souza 2007-01-08 03:58:25 UTC
Created attachment 10027 [details]
Dmesg of 2.6.20-rc4 with patch and acpi=off

I couldnt even boot with acpi turned on. The screen remains black and the
computer just stops after LILO loads the kernel.

With acpi=off I could boot. I used the same .config as with kernel 2.6.19
(above).

Thanks
Comment 32 Zhang Rui 2007-01-09 19:02:48 UTC
I have a similar problem. Boot failed with "acpi_parse_multi_table=1".
And with a error message:
ERROR: Unable to locate IOAPIC for GSI 9
ERROR: Unable to locate IOAPIC for GSI 0
I'll debug further. :)
Comment 33 Zhang Rui 2007-01-09 21:26:42 UTC
Created attachment 10044 [details]
patch-ACPI-parse-multi-table

to Artur Souza:
Can you do the same test with this patch please? thanks. :)
Comment 34 Zhang Rui 2007-01-09 21:47:35 UTC
Len:
Only one change is made in the new patch. This makes my laptop boots 
successful with "acpi_parse_multi_table=1", even if there is only one MADT 
here, :).
in acpi_table_parse_entries:
-	table_end = (unsigned long)table_start + sdt_entry[i].size;
+	table_end = (unsigned long)table_start + sdt_entry[index].size;
If acpi_parse_multi_table > the number of MADTs in the BIOS, "i" equals to 
sdt_count and sdt_entry[i].size doesn't make sense. "index" is the actual 
table_index in sdt_entry[] that should be used here.
Comment 35 Artur Souza 2007-01-10 04:07:25 UTC
Hi, just tested with the last patch you sent me but had the same error related
above: the screen remains blank and nothing happens when it start booting.

Thanks again
Comment 36 Len Brown 2007-01-10 15:00:37 UTC
Rui, thanks for fixing the patch.

Artur, are you sure that you're running the fixed patch from comment #33?
It works for me on several machines with and without duplicate APIC tables.
Comment 37 Artur Souza 2007-01-11 03:18:22 UTC
Yeah, I'm sure about that. Today I'll try again.
What happens is that the screen turns black (off) and nothing else happens (the
LED of the HD doesnt even blink). If I turn ACPI off, then the machine can boot.
But I'll try again today!

Thanks
Comment 38 Len Brown 2007-01-12 15:26:10 UTC
Created attachment 10061 [details]
updated patch w/o BUG_ON in acpi_table_parse()

here's an updated patch that prints out if a NULL handler is seen
in acpi_table_parse() instead of a BUG_ON.
Perhaps BUG_ON was firing before your screen was on.

Please try it out, per above.
Comment 39 Artur Souza 2007-01-15 04:51:58 UTC
Created attachment 10084 [details]
Dmesg of 2.6.20-rc4 and last patch

Now I could boot, both cores were detected (used "acpi_parse_multi_table=1")
but frequency scaling is not enabled for the second core yet.

Thanks
Comment 40 Len Brown 2007-01-19 00:13:20 UTC
> ACPI: acpi_table_parse(17, 00000000) HPET NULL handler!

Ah, that explains it.
boot.c defines acpi_hpet_timer as NULL when CONFIG_HPET_TIMER=n

Who'd a thunk it...
Comment 41 Len Brown 2007-03-10 23:47:00 UTC
Created attachment 10685 [details]
2.6.21-rc3 patch

This patch vs. 2.6.21-rc3 adds cmdline "acpi_apic_instance="
and changes the default for that parameter to 2 in order to
parse the 2nd APIC/MADT by default.  Please test.
Comment 42 Artur Souza 2007-03-17 15:41:52 UTC
It worked well just like the other patches. But I still doesnt have cpufreq for
the second core. Any ideas ?

THanks
Comment 43 Len Brown 2007-03-23 09:30:07 UTC
The patch fixing the primary issue,
the duplicate MADT issue and missing cpu1 in ACPI mode,
shipped in Linux-2.6.21-rc4-git7.

So that issue and this bug report should be closed.

Please open a new bug report against the cpufreq on cpu1 issue.

Note You need to log in before you can comment on or make changes to this bug.