Most recent kernel where this bug did not occur: ? Distribution: Debian Sarge 3.1 + custom kernel Hardware Environment: HP XW6200 Software Environment: Problem Description: If I'm booting normally I'm getting theses errors in dmesg (using either a 2.6.16.x or 2.6.17.11): EXT3-fs: mounted filesystem with ordered data mode. Adding 2048276k swap on /dev/sda3. Priority:-1 extents:1 across:2048276k EXT3 FS on sda2, internal journal ACPI Error (evgpe-0688): No handler or method for GPE[ 0], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[ 1], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[ 2], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[ 5], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[ 6], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[ 7], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[ 9], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[ A], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[ F], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[10], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[11], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[12], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[13], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[14], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[15], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[16], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[17], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[19], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[1A], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[1B], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[1C], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[1D], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[1E], disabling event [20060127] ACPI Error (evgpe-0688): No handler or method for GPE[1F], disabling event [20060127] mice: PS/2 mouse device common for all mice usbcore: registered new driver usbmouse Also, if I add the acpi=off it removes the ACPI Error msgs but then I just can see two cpu instead of the usual 4 (2 + hyper-threading)... I really don't know if I'm actually seing two physical CPU or just one with it's hyperthreading? Here is one cpuinfo: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 1 cpu MHz : 3200.728 cache size : 1024 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pni monitor ds_cpl cid cx16 xtpr bogomips : 6409.57 Top with acpi=off top - 17:57:51 up 1:28, 3 users, load average: 2.56, 2.61, 2.41 Tasks: 120 total, 4 running, 116 sleeping, 0 stopped, 0 zombie Cpu0 : 56.3% us, 5.5% sy, 0.0% ni, 31.4% id, 1.5% wa, 0.2% hi,5.1% si Cpu1 : 57.3% us, 5.7% sy, 0.0% ni, 33.1% id, 1.7% wa, 0.2% hi,2.0% si Top without acpi=off top - 17:58:42 up 8 min, 1 user, load average: 0.00, 0.05, 0.03 Tasks: 84 total, 1 running, 83 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi,0.0% si Cpu1 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi,0.0% si Cpu2 : 0.0% us, 0.0% sy, 0.0% ni, 99.0% id, 0.0% wa, 0.0% hi,1.0% si Cpu3 : 0.0% us, 0.0% sy, 0.0% ni, 94.4% id, 0.0% wa, 0.0% hi,5.6% si Note that noapic option did not changed any behaviour about this.
Please post the entire dmesg and the acpidump for the machine.
Created attachment 8948 [details] dmesg on a 2.6.17.11 kernel
Created attachment 8949 [details] acpidump of a 2.6.17.11 kernel I used acpidump from http://packages.debian.org/unstable/source/acpidump compiled from sources.
There appears to be a problem with the GPE0 block in the FADT. First, the checksum for the table is incorrect: Checksum : D2 /* Incorrect checksum, should be EB */ Second, there is a mismatch between the 32-bit GPE0 block address and the 64- bit X version of the field: [050h 080 4] GPE0 Event Address : 0000F828 ... GPE0 Block Register : <Generic Address Structure> [0DCh 220 1] Space ID : 01 (SystemIO) [0DDh 221 1] Bit Width : 20 [0DEh 222 1] Bit Offset : 00 [0DFh 223 1] Access Width : 00 [0E0h 224 8] Address : 000000000001F030 In this situation, the software will use the 64-bit generic address structure, since the address is non-zero and this field supersedes the older 32-bit field. The problem may be that the 64-bit address (1f030) is incorrect, and the 32-bit address (f828) is the correct address. Also, the BitWidth in the 64- bit structure is 0x20, and it should be 0x08. It may be that something has scribbled on the FADT (thus the incorrect checksum), overwriting the 64-bit GPE block with a garbage address. You'll need to determine which address is the real address for the GPE block first. Does Windows run correctly on this machine?
I don't know... We never even tried.. :) We bought about 10 of thoses boxes without having windows in mind. Theses are operationnal forecasters workstations really not intended to run windows. Anyhow, if it is really needed I might be able to find somebody to install it in the Office technologies department..
In the SSDT, there are control methods for GPEs 3,4,8,B,C,D,E,18. These are the exact GPEs missing from the block of error messages. What seems to be happening is that the ACPI code thinks that ALL of the GPEs 0-1F have fired, presumably because it is looking in the wrong place in I/O space. This appears to be either a bug in the actual FADT as presented by the BIOS, or some code is writing on the FADT. I believe that the ACPICA code is doing the right thing by using the 64-bit address instead of the 32-bit address, but I will investigate further just to make sure this is correct. In any case, something is very odd about having two different addresses for the GPE0 block.
You kinda lost me here! Just to add a fe things, this is running a 32bit kernel (although the cpu's are probably able to run 64bit). Also the BIOS is at revision 2.08... We tried up a few 1.0x version of it and the same thing occured (before finally updating all of them to the latest 2.08 ... we have around a dozen of thoses workstations). We also have 3 XW8200 that are running Fedora Core 4. They will soon be migrated also to Debian Sarge using our local 2.6.17.x kernel. I presume the same kind of error might also happend. What about the CPU detection without ACPI? Is it normal that only one processor + hyperthreading OR two processors without hyperthreading shows up (duno which case it is)? BTW, thnx for digging into this!
Created attachment 8968 [details] Disassembly of FADT Attached the disassembled FADT for this machine. You may need to report this problem (2 different addresses for the GPE0 block to the manufacturer. However, I would try to take a look at the table in memory after reset, before the OS loads, to determine if it is the fault of the BIOS of the OS. I can't speak to any non-ACPI issues.
Just tell me how to do so and I'll try to gather the info for you. In case it's a BIOS problem I'll report it to HP unless there are people in kernel.org to who we could report this to. For the CPU detection problem should I simply openup a new bug?
ACPI: FADT (v003 COMPAQ TUMWATER 0x00000001 0x00000000) @ 0xbfff52ac You need to dump the 0xF4 bytes of memory at location 0xbfff52ac. Open a new, non-ACPI bug for cpu detection.
The CPU detection you describe is normal and correct. HT siblings are enumerated with ACPI only -- so with acpi=off, you are going to see only what the older MPS tables enumerate -- one sibling/package. Sometimes there is a BIOS option to enumerate HT with MPS (though this is rare b/c it is non-standard). You also have the option of booting with "acpi=ht", which will disable ACPI on this machine, except for the early boot stuff which is necessary to enable HT. The GPE issue is the problem at hand and should remain the focus of this report.
I've reconfigured all our XW6200 to use acpi=ht and this works without any errors and I do see all of the four processors now. thnx for this info. Now, I can I dump the 0xF4 bytes of memory at location 0xbfff52ac ? Can somebody point me to the right direction (URL?) ... I'm willing to the gather this info but a hint would help! thnx
> ACPI: FADT (v003 COMPAQ TUMWATER 0x00000001 0x00000000) @ 0xbfff52ac > You need to dump the 0xF4 bytes of memory at location 0xbfff52ac # dd if=/dev/mem skip=$[0xbfff52ac] bs=1c count=244 2> /dev/null > fadt.bin Please attach the resulting fadt.bin Also, can you try 2.6.21-rc? It has completely different FADT code in the kernel.
Created attachment 10775 [details] dmesg on a 2.6.18.8 kernel We have been running 2.6.18.x kernel since a few weeks. The problem is still there but just for info, here is the dmesg.
Using a Debian Sarge it might be quite hard to give it a try with a 2.6.21-rc kernel... Mainly due to the libata changes which implies a lot of modifications to our local configuration. Anyhow, I'll try to find the time in the next few days... I've tried to dump the memory region but it did not worked (ran exactly this command string): dd if=/dev/mem skip=$[0xbfff52ac] bs=1c count=244 2> /dev/null > fadt.bin The output file was having a size of 0 ? - vin
Created attachment 11024 [details] dmesg running a 2.6.20.4 kernel Here is a dmesg running a 2.6.20.4 kernel. I've also tried to dump the FADT once again without any success: dd if=/dev/mem skip=0xbfff52ac bs=1c count=244 2>/dev/null > fadt.bin Once 2.6.21 comes out I will give it a try.
Created attachment 11672 [details] dmesg running a 2.6.21.3 Still getting the same error output messages using latest 2.6.21.3 kernel
ACPI: RSDP 000E8E10, 0024 (r2 COMPAQ) ACPI: XSDT BFFF50EC, 0064 (r1 COMPAQ CPQ0063 20051222 0) ACPI: FACP BFFF52AC, 010C (r3 COMPAQ TUMWATER 1 0) ACPI Warning (tbfadt-0227): FADT (revision 3) is longer than ACPI 2.0 version, truncating length 0x10C to 0xF4 [20070126] FADT r3 and r4 (ACPI 2.0 and 3.0) should both have length 244 (0xF4). Unclear where this length of 268 (0x10C) came from -- it is larger than even the very latest ACPI spec. ACPI Error (tbfadt-0445): 32/64X address mismatch in "Gpe0Block": [0000F828] [000000000001F030], using 64X [20070126] we should try a test patch to use the 32-bit address and see if that works better on this machine. (note, the 32 vs 64-bit ACPI table address is related to the ACPI revision supported, it is independent of the 32 vs 64-bit Linux kernel) ACPI: DSDT BFFF54E2, 1098 (r1 COMPAQ DSDT 1 MSFT 100000E) ACPI: FACS BFFF5000, 0040 ACPI: SSDT BFFF657A, 2BA1 (r1 COMPAQ PROJECT 1 MSFT 100000E) ACPI Error (tbutils-0219): Null physical address for ACPI table [<NULL>] [20070126] ACPI Error (tbutils-0219): Null physical address for ACPI table [<NULL>] [20070126] ACPI Error (tbutils-0219): Null physical address for ACPI table [<NULL>] [20070126] Okay, that is just simply bizzare. ACPI: APIC BFFF53B8, 0084 (r1 COMPAQ TUMWATER 1 0) ACPI: ASF! BFFF543C, 006A (r32 COMPAQ TUMWATER 1 0) ACPI: MCFG BFFF54A6, 003C (r1 COMPAQ TUMWATER 1 0)
Created attachment 12034 [details] debug patch to use 32-bit FADT address on mis-match v 2.6.22 Please try this debug patch and attach the new dmesg. With this patch, the kernel should prefer the 32-bit address when there is a mis-match with the 64-bit address in the FADT.
The XSDT really does have three entries with NULL addresses: [024h 036 8] ACPI Table Address 0 : 00000000BFFF52AC [02Ch 044 8] ACPI Table Address 1 : 00000000BFFF657A // SSDT [034h 052 8] ACPI Table Address 2 : 0000000000000000 [03Ch 060 8] ACPI Table Address 3 : 0000000000000000 [044h 068 8] ACPI Table Address 4 : 0000000000000000 [04Ch 076 8] ACPI Table Address 5 : 00000000BFFF53B8 // APIC [054h 084 8] ACPI Table Address 6 : 00000000BFFF543C [05Ch 092 8] ACPI Table Address 7 : 00000000BFFF54A6 so that explains the "Null physical address for ACPI table" messages
I've updated acpidump to dump the 1.0 tables, even on a 2.0 system. http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/pmtools-20070714.tar.gz This should give us a peek at the RSDP and its 32-bit FADT in addition to the XSDT and its 64-bit FADT that we see today. Also, I'm still puzzled about the "truncating length 0x10C to 0xF4" bit above. The previous acpidump showed a length of 0xF4 -- so it isn't clear where this 0x10C length came from. So please run the acpidump above twice, once when booted normally in ACPI mode, and once when booted with "acpi=off". Attach the acpidump output here -- one copy if they are identical, else both.
Created attachment 12035 [details] 2.6.22 patch adding "acpi_version=1" capability please apply this debug patch to linux-2.6.22 and boot with acpi_version=1. This will cause Linux to use the RSDT and its 32-bit FADT instead of the XSDT and its 64-bit FADT. please attach the resulting dmesg.
Please reopen this bug if: - it is still present with kernel 2.6.22 and - you can provide the requested information.
This is surely a duplicate of bug # 8630
It does still occurs using a 2.6.22 kernel on both a XW6200 & a XW8200 (we just started using XW8200 also..) I noticed that there was some fixes included in the 2.6.22.5 kernel: http://lkml.org/lkml/2007/8/22/519 Bob Moore (2): ACPICA: Fixed possible corruption of global GPE list ACPICA: Clear reserved fields for incoming ACPI 1.0 FADTs I'll make sure to test this new kernel (2.6.22.6 should come out in the next few days so I'll give it a try) and I'll test the "acpi_version=1" patch. Sorry for the pretty long delay but I've been working on a totally other project full-time latelly but I'll be back to sysadmin within the next few weeks and be able to put more time on resolving this.
please attach the output from the debug version of acpidump here: http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/pmtools-20070714-debug.tar.gz please test the latest patch in bug # 8630
Created attachment 12804 [details] dmesg - vanilla 2.6.22.6 dmesg from a vanilla 2.6.22.6.
Created attachment 12805 [details] acpidump using proposed package on a running vanilla 2.6.22.6 acpidump using proposed package on a running vanilla 2.6.22.6
Hoping to have time to recompile a patched with bug #8630 within a day or two.
(In reply to comment #29) > Hoping to have time to recompile a patched with bug #8630 within a day or > two. > Thanks for the info. There are at least two bios bugs about your computer. a. XSDT has NULL entry. b. Some address in FADT deesn't match. Anyway please use the latest kernel(the patch for 8630 is included in 2.6.23-rc5) and upload the dmesg.
Created attachment 12819 [details] dmesg - vanilla 2.6.22.6 + patch from bug #8630 This seems to solve all the problem! dmesg analysis needed to fully confirm this. If this solves the problem can this patch be sent to the stable-team for the next 2.6.22.7 release ? thnx!
Created attachment 12820 [details] dmesg - vanilla 2.6.23-rc6-git3 Success once again just has with the 2.6.22.6 + patch previous dmesg. Just confirming same behaviour between both kernels. thnx.
Created attachment 12821 [details] XW6200 - dmesg - vanilla 2.6.22.6 + patch from bug #8630 + acpi ON I had forgot to remove the acpi=ht boot option in the two last dmesg... reposting. Again, this seems to solve all the problems! A full dmesg analysis is still needed to fully confirm this. If this solves the problem can this patch be sent to the stable-team for the next 2.6.22.7 release ? thnx!
Created attachment 12822 [details] XW6200 - dmesg - vanilla 2.6.23-rc6-git3 - acpi ON Reposting without this time the acpi=ht boot option :) Success once again just has with the 2.6.22.6 + patch previous dmesg. Just confirming same behaviour between both kernels. thnx.
Created attachment 12824 [details] Get the acpidump info using the attached binary Hi, Vincent. Thanks for the info and test. From the demsg it seems that the system can work well. The error about no method/handler for GPE disappear. Now the patch for #8630 has been shipped in the 2.6.23-rc5. Maybe it will be included in the stable release of 2.6.22.7. Will you please get the acpidump info using the attached binary? Thanks.
Getting this message: [root@isobare /root]# /root/acpidump zsh: floating point exception /root/acpidump
(In reply to comment #36) > Getting this message: > [root@isobare /root]# /root/acpidump > zsh: floating point exception /root/acpidump Hi, Vencent Ignore the attached binary in the comment #35. Please get the info using the following command and upload the info. ./acpidump --addr 0xbfff5238 --length 0x78 -o fadt Thanks.
Hi, Vencent The info of comment #28 is enought to analyze the reason .So it is unnecssary to upload the info. Please Ignore the comment #37 and #35. Now the reason about this bug is clear. There are at least two bugs about the system. 1. XSDT has NULL entry. 2. FADT obtained through XSDT has three exceptions: Incorrect checksum. Mismatched GPE address. GPE0 base address(0x1F030) exceeds the limit of I/O port. After applying the patch, RSDT is used instead of XSDT and the FADT through RSDT is correct. So the system can work well.