Distribution: Fedora Hardware Environment: IBM 8840-21Y (dual xeon with em64t) Software Environment: Both 386 and x86_64 kernels have this problem Problem Description: Both i386 and x86_64 kernels say this at boot: ACPI: RSDP (v002 IBM ) <at> 0x00000000000fdfb0 >>> ERROR: Invalid checksum And nothing more. Then, it go crazy (incorrect sibling count, [censored] cache, ...). Steps to reproduce: Always reproducible
Created attachment 5213 [details] acpidmp output
Created attachment 5214 [details] dmidecode output
Created attachment 5215 [details] dmesg
I've tried this with acpi-devel as of 2.6.13-rc1-mm1 and commented out checksum check. It hangs at scsi controller init. Will try grab console output.
it seems that our duplicate RSDP scanning code has finally bit us and we need to delete the linux one and always use the correct one in ACPICA. assigning to Alexey.
So, where I can grab an updated code to test it on my box?
Just for clarity: the ACPICA RSDP memory scanner will validate the checksum and continue the scan until a valid signature/checksum combination is found. This code is available immediately at boot time and does not depend on full initialization of the ACPI CA subsystem. There should be only one version of this rather sensitive code within the operating system kernel. I suppose that the next problem that will come up is what to do if there is only one RSDP signature and the associated table has an invalid checksum. This may be a "what does Windows do" kind of question.
Alexey and I have noticed that the 20-bytes of the RSDP checksum properly. The problem is that the RSDP is version 2 and so len then checksums the entire 36-bytes of the RSDP to get the XSDT, but this checksum fails and the machine boots with acpi=off. This seems like correct behaviour on the part of Linux. Paul, Please run a new version of acpidump located here: http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/acpidump.new (note it may take an hour to show up on the mirror) It should dump the entire 36-byte RSDP and we can check it to verify the assertion above. (ie. the 1st 36-bytes of the 1st table when converted to binary should total to 0, but I expect they will not) Say it doesn't check out, then the question is what to do... Bob, is it legal for a version 2 RSDP to have no XSDT pointer? If yes, then I suppose we could add a special case to ignore the 36-byte checksum and carry-on with the RSDT pointer and its good 20-byte checksum. Unclear if this workaround would be very useful on a 64-bit system. Indeed, hiding this error bay be a bad way to go... If no, then Linux may have discoverd a bad BIOS where Windows was unable to. I'm moving this bug to category BIOS on that assumption. The owner of the machine should alert IBM to this issue and ask for a new BIOS.
Where those acpica headers are? in kernel source? My box is running x86_64 now and binary won't show anything on screen, maybe it's because it's 32bit... Btw I'm at OLS too, we can meet and talk for some time, maybe about this issue too.
Acpidump silence on i386 means it cannot find RSDT with correct checksum as well. There is a workaround you can try -- specify address (0x00000000000fdfb0) and length (36) of the RSDT. In this case contents are not verified. Source code is a part of acpica-unix package, and it needs its headers. But that is a good idea to use kernel headers. I'll try to modify code to work this way.
-b option is not working (output is always binary) it's attached
Created attachment 5356 [details] table dump
ACPI checksum error has been corrected in the latest version of the x346 BIOS. Problem was that the BIOS guys updated the ACPI tables to support ACPI 2.0 specification (to include DBS support, I believe). You can download the latest x346 bios from the firmware link at: http://www-1.ibm.com/servers/eserver/support/xseries/x346/downloadinghwonly.html
Here is my understanding of the ACPI specification: It is valid for a revision 2 RSDP to have a NULL XSDT pointer, and in this case, the ACPI CA code will correctly fall back to using the original (ACPI 1.0) RSDT pointer. However, if the revision is 2, the entire ACPI 2.0 RSDP structure must be present, and both the "Checksum" field and the "Extended Checksum" fields must be correct. We could, of course, add a global option to ignore ACPI table checksums and blindly go forth into the darkness.
That's my understanding as well. The checksum problem in the BIOS had nothing to do with the NULL XSDT, and everything to do with the fact that the BIOS guys failed to update the extended checksum when they added support for the 2.0 spec. I didn't make that clear in my last comment.
Actually, IBM BIOS update helped. The server now boots with full acpi support. I don't know though, will it again emit scary messages about lost interrupts, lost timer ticks, or whatever. But actually problem was indeed in buggy bios. Thanks.
I'd like to know if Windows accepted the bad extended checksum in the RSDP.
Hmm I think I can try to downgrade the bios and install windows on that box, but it will take some time - while I reflashed it and rebooted linux over the net from OLS, any other action probably will involve more closer interaction :)
In response to Bob's question in comment #17, the problem was never reported against Windows, so I suspect not. In response to Paul's concern in comment #16, there is another known ACPI problem on the x346, that has to do with interrupt routing. All interrupt routing information for the PCI slots is stored in the SSDT, rather than the DSDT. The linux ACPI CA doesn't handle this correctly. Specifically, kernel hangs with adapters in PCI slots 3,4. I'm supposed to be looking into this, but haven't had the time, and since there is a workaround, it's been low on my priority queue.
Sheez, I should read my comments before I commit. What I meant to say regarding comment #17 is that the problem was never reported against Windows. So, I suspect that Windows either doesn't validate checksums, or simply accepted the bad extended checksum, perhaps because the XSDT was NULL.