Bug 4799 - RSDP checksum error and ACPI malfunction on IBM x346 with latest BIOS/BMC/whatever
Summary: RSDP checksum error and ACPI malfunction on IBM x346 with latest BIOS/BMC/wha...
Status: REJECTED INVALID
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Tables (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Alexey Starikovskiy
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-06-25 12:28 UTC by Paul P Komkoff Jr
Modified: 2005-07-22 16:12 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.12
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
acpidmp output (11.45 KB, application/x-gzip)
2005-06-25 12:30 UTC, Paul P Komkoff Jr
Details
dmidecode output (2.92 KB, application/x-gzip)
2005-06-25 12:30 UTC, Paul P Komkoff Jr
Details
dmesg (13.30 KB, text/plain)
2005-06-25 12:32 UTC, Paul P Komkoff Jr
Details
table dump (54 bytes, application/octet-stream)
2005-07-21 10:21 UTC, Paul P Komkoff Jr
Details

Description Paul P Komkoff Jr 2005-06-25 12:28:42 UTC
Distribution: Fedora

Hardware Environment:
IBM 8840-21Y (dual xeon with em64t)

Software Environment:
Both 386 and x86_64 kernels have this problem

Problem Description:
Both i386 and x86_64 kernels say this at boot:
ACPI: RSDP (v002 IBM                                   )  <at>  0x00000000000fdfb0
  >>> ERROR: Invalid checksum

And nothing more. Then, it go crazy (incorrect sibling count, [censored] cache,
...).

Steps to reproduce:
Always reproducible
Comment 1 Paul P Komkoff Jr 2005-06-25 12:30:08 UTC
Created attachment 5213 [details]
acpidmp output
Comment 2 Paul P Komkoff Jr 2005-06-25 12:30:48 UTC
Created attachment 5214 [details]
dmidecode output
Comment 3 Paul P Komkoff Jr 2005-06-25 12:32:44 UTC
Created attachment 5215 [details]
dmesg
Comment 4 Paul P Komkoff Jr 2005-07-03 23:32:11 UTC
I've tried this with acpi-devel as of 2.6.13-rc1-mm1 and commented out checksum
check. It hangs at scsi controller init. Will try grab console output.
Comment 5 Len Brown 2005-07-20 07:57:15 UTC
it seems that our duplicate RSDP scanning code has finally bit us
and we need to delete the linux one and always use the correct
one in ACPICA.  assigning to Alexey.
Comment 6 Paul P Komkoff Jr 2005-07-20 12:41:23 UTC
So, where I can grab an updated code to test it on my box?
Comment 7 Robert Moore 2005-07-20 12:54:19 UTC
Just for clarity: the ACPICA RSDP memory scanner will validate the checksum 
and continue the scan until a valid signature/checksum combination is found.  
This code is available immediately at boot time and does not depend on full 
initialization of the ACPI CA subsystem. There should be only one version of 
this rather sensitive code within the operating system kernel.

I suppose that the next problem that will come up is what to do if there is 
only one RSDP signature and the associated table has an invalid checksum. This 
may be a "what does Windows do" kind of question.
 
Comment 8 Len Brown 2005-07-20 15:58:13 UTC
Alexey and I have noticed that the 20-bytes of the RSDP
checksum properly.  The problem is that the RSDP is version 2
and so len then checksums the entire 36-bytes of the RSDP
to get the XSDT, but this checksum fails and the machine
boots with acpi=off.  This seems like correct behaviour
on the part of Linux.

Paul, Please run a new version of acpidump located here:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/acpidump.new
(note it may take an hour to show up on the mirror)
It should dump the entire 36-byte RSDP and we can
check it to verify the assertion above.  (ie. the
1st 36-bytes of the 1st table when converted to binary
should total to 0, but I expect they will not)

Say it doesn't check out, then the question is what to do...
Bob, is it legal for a version 2 RSDP to have no XSDT pointer?
If yes, then I suppose we could add a special case to
ignore the 36-byte checksum and carry-on with the RSDT
pointer and its good 20-byte checksum.  Unclear if this
workaround would be very useful on a 64-bit system.
Indeed, hiding this error bay be a bad way to go...
If no, then Linux may have discoverd a bad BIOS where
Windows was unable to.  I'm moving this bug to
category BIOS on that assumption.  The owner of the
machine should alert IBM to this issue and ask
for a new BIOS.
Comment 9 Paul P Komkoff Jr 2005-07-20 18:38:48 UTC
Where those acpica headers are? in kernel source?
My box is running x86_64 now and binary won't show anything on screen, maybe
it's because it's 32bit...

Btw I'm at OLS too, we can meet and talk for some time, maybe about this issue too.
Comment 10 Alexey Starikovskiy 2005-07-21 07:16:47 UTC
Acpidump silence on i386 means it cannot find RSDT with correct checksum as well.
There is a workaround you can try -- specify address (0x00000000000fdfb0) and
length (36) of the RSDT. In this case contents are not verified.
Source code is a part of acpica-unix package, and it needs its headers. But that
is a good idea to use kernel headers. I'll try to modify code to work this way.
Comment 11 Paul P Komkoff Jr 2005-07-21 10:21:14 UTC
-b option is not working (output is always binary)
it's attached
Comment 12 Paul P Komkoff Jr 2005-07-21 10:21:54 UTC
Created attachment 5356 [details]
table dump
Comment 13 Chris McDermott 2005-07-21 10:47:10 UTC
ACPI checksum error has been corrected in the latest version of the x346 BIOS.
Problem was that the BIOS guys updated the ACPI tables to support ACPI 2.0
specification (to include DBS support, I believe). You can download the latest
x346 bios from the firmware link at:

http://www-1.ibm.com/servers/eserver/support/xseries/x346/downloadinghwonly.html
Comment 14 Robert Moore 2005-07-21 11:44:05 UTC
Here is my understanding of the ACPI specification:

It is valid for a revision 2 RSDP to have a NULL XSDT pointer, and in this 
case, the ACPI CA code will correctly fall back to using the original (ACPI 
1.0) RSDT pointer.  However, if the revision is 2, the entire ACPI 2.0 RSDP 
structure must be present, and both the "Checksum" field and the "Extended 
Checksum" fields must be correct. 

We could, of course, add a global option to ignore ACPI table checksums and 
blindly go forth into the darkness.
Comment 15 Chris McDermott 2005-07-21 12:24:03 UTC
That's my understanding as well. The checksum problem in the BIOS had nothing to
do with the NULL XSDT, and everything to do with the fact that the BIOS guys
failed to update the extended checksum when they added support for the 2.0 spec.
I didn't make that clear in my last comment.
Comment 16 Paul P Komkoff Jr 2005-07-21 17:47:58 UTC
Actually, IBM BIOS update helped. The server now boots with full acpi support. I
don't know though, will it again emit scary messages about lost interrupts, lost
timer ticks, or whatever.
But actually problem was indeed in buggy bios.
Thanks.
Comment 17 Robert Moore 2005-07-22 13:48:45 UTC
I'd like to know if Windows accepted the bad extended checksum in the RSDP.
Comment 18 Paul P Komkoff Jr 2005-07-22 13:54:02 UTC
Hmm
I think I can try to downgrade the bios and install windows on that box, but it
will take some time - while I reflashed it and rebooted linux over the net from
OLS, any other action probably will involve more closer interaction :)
Comment 19 Chris McDermott 2005-07-22 16:09:24 UTC
In response to Bob's question in comment #17, the problem was never reported
against Windows, so I suspect not. 

In response to Paul's concern in comment #16, there is another known ACPI
problem on the x346, that has to do with interrupt routing. All interrupt
routing information for the PCI slots is stored in the SSDT, rather than the
DSDT. The linux ACPI CA doesn't handle this correctly. Specifically, kernel
hangs with adapters in PCI slots 3,4. I'm supposed to be looking into this, but
haven't had the time, and since there is a workaround, it's been low on my
priority queue. 
Comment 20 Chris McDermott 2005-07-22 16:12:16 UTC
Sheez, I should read my comments before I commit. What I meant to say regarding
comment #17 is that the problem was never reported against Windows. So, I
suspect that Windows either doesn't validate checksums, or simply accepted the
bad extended checksum, perhaps because the XSDT was NULL.

Note You need to log in before you can comment on or make changes to this bug.