Kernel Bug Tracker – Bug 4799
RSDP checksum error and ACPI malfunction on IBM x346 with latest BIOS/BMC/whatever
Last modified: 2005-07-22 16:12:16 UTC
IBM 8840-21Y (dual xeon with em64t)
Both 386 and x86_64 kernels have this problem
Both i386 and x86_64 kernels say this at boot:
ACPI: RSDP (v002 IBM ) <at> 0x00000000000fdfb0
>>> ERROR: Invalid checksum
And nothing more. Then, it go crazy (incorrect sibling count, [censored] cache,
Steps to reproduce:
Created attachment 5213 [details]
Created attachment 5214 [details]
Created attachment 5215 [details]
I've tried this with acpi-devel as of 2.6.13-rc1-mm1 and commented out checksum
check. It hangs at scsi controller init. Will try grab console output.
it seems that our duplicate RSDP scanning code has finally bit us
and we need to delete the linux one and always use the correct
one in ACPICA. assigning to Alexey.
So, where I can grab an updated code to test it on my box?
Just for clarity: the ACPICA RSDP memory scanner will validate the checksum
and continue the scan until a valid signature/checksum combination is found.
This code is available immediately at boot time and does not depend on full
initialization of the ACPI CA subsystem. There should be only one version of
this rather sensitive code within the operating system kernel.
I suppose that the next problem that will come up is what to do if there is
only one RSDP signature and the associated table has an invalid checksum. This
may be a "what does Windows do" kind of question.
Alexey and I have noticed that the 20-bytes of the RSDP
checksum properly. The problem is that the RSDP is version 2
and so len then checksums the entire 36-bytes of the RSDP
to get the XSDT, but this checksum fails and the machine
boots with acpi=off. This seems like correct behaviour
on the part of Linux.
Paul, Please run a new version of acpidump located here:
(note it may take an hour to show up on the mirror)
It should dump the entire 36-byte RSDP and we can
check it to verify the assertion above. (ie. the
1st 36-bytes of the 1st table when converted to binary
should total to 0, but I expect they will not)
Say it doesn't check out, then the question is what to do...
Bob, is it legal for a version 2 RSDP to have no XSDT pointer?
If yes, then I suppose we could add a special case to
ignore the 36-byte checksum and carry-on with the RSDT
pointer and its good 20-byte checksum. Unclear if this
workaround would be very useful on a 64-bit system.
Indeed, hiding this error bay be a bad way to go...
If no, then Linux may have discoverd a bad BIOS where
Windows was unable to. I'm moving this bug to
category BIOS on that assumption. The owner of the
machine should alert IBM to this issue and ask
for a new BIOS.
Where those acpica headers are? in kernel source?
My box is running x86_64 now and binary won't show anything on screen, maybe
it's because it's 32bit...
Btw I'm at OLS too, we can meet and talk for some time, maybe about this issue too.
Acpidump silence on i386 means it cannot find RSDT with correct checksum as well.
There is a workaround you can try -- specify address (0x00000000000fdfb0) and
length (36) of the RSDT. In this case contents are not verified.
Source code is a part of acpica-unix package, and it needs its headers. But that
is a good idea to use kernel headers. I'll try to modify code to work this way.
-b option is not working (output is always binary)
Created attachment 5356 [details]
ACPI checksum error has been corrected in the latest version of the x346 BIOS.
Problem was that the BIOS guys updated the ACPI tables to support ACPI 2.0
specification (to include DBS support, I believe). You can download the latest
x346 bios from the firmware link at:
Here is my understanding of the ACPI specification:
It is valid for a revision 2 RSDP to have a NULL XSDT pointer, and in this
case, the ACPI CA code will correctly fall back to using the original (ACPI
1.0) RSDT pointer. However, if the revision is 2, the entire ACPI 2.0 RSDP
structure must be present, and both the "Checksum" field and the "Extended
Checksum" fields must be correct.
We could, of course, add a global option to ignore ACPI table checksums and
blindly go forth into the darkness.
That's my understanding as well. The checksum problem in the BIOS had nothing to
do with the NULL XSDT, and everything to do with the fact that the BIOS guys
failed to update the extended checksum when they added support for the 2.0 spec.
I didn't make that clear in my last comment.
Actually, IBM BIOS update helped. The server now boots with full acpi support. I
don't know though, will it again emit scary messages about lost interrupts, lost
timer ticks, or whatever.
But actually problem was indeed in buggy bios.
I'd like to know if Windows accepted the bad extended checksum in the RSDP.
I think I can try to downgrade the bios and install windows on that box, but it
will take some time - while I reflashed it and rebooted linux over the net from
OLS, any other action probably will involve more closer interaction :)
In response to Bob's question in comment #17, the problem was never reported
against Windows, so I suspect not.
In response to Paul's concern in comment #16, there is another known ACPI
problem on the x346, that has to do with interrupt routing. All interrupt
routing information for the PCI slots is stored in the SSDT, rather than the
DSDT. The linux ACPI CA doesn't handle this correctly. Specifically, kernel
hangs with adapters in PCI slots 3,4. I'm supposed to be looking into this, but
haven't had the time, and since there is a workaround, it's been low on my
Sheez, I should read my comments before I commit. What I meant to say regarding
comment #17 is that the problem was never reported against Windows. So, I
suspect that Windows either doesn't validate checksums, or simply accepted the
bad extended checksum, perhaps because the XSDT was NULL.