Bug 8706 - ACPI errors (possibly) make cpu frequency scaling hangs - Dell i5150
Summary: ACPI errors (possibly) make cpu frequency scaling hangs - Dell i5150
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-07-02 12:45 UTC by Federico Flego
Modified: 2007-07-24 13:50 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.20
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
output of acpidump (43.13 KB, application/octet-stream)
2007-07-02 12:47 UTC, Federico Flego
Details
acpidump from BIOS A38 (42.71 KB, text/plain)
2007-07-03 01:36 UTC, Len Brown
Details
dmesg-2.6.22-rc7 (44.50 KB, text/plain)
2007-07-03 01:38 UTC, Len Brown
Details
dmesg 2.6.20 (15.31 KB, text/plain)
2007-07-05 09:56 UTC, Federico Flego
Details
acpidump bios A38 (41.19 KB, text/plain)
2007-07-05 09:56 UTC, Federico Flego
Details
dmesg 2.6.22_1 (14.22 KB, application/octet-stream)
2007-07-15 05:24 UTC, Federico Flego
Details
acpidump bios A38 kernel 2.6.22.1 (41.19 KB, text/plain)
2007-07-15 05:24 UTC, Federico Flego
Details
kernel 2.6.22.1 config (35.69 KB, application/octet-stream)
2007-07-15 05:25 UTC, Federico Flego
Details
output of dmesg with acpi_debug=on (318.89 KB, application/x-bzip)
2007-07-24 13:50 UTC, Federico Flego
Details

Description Federico Flego 2007-07-02 12:45:58 UTC
Most recent kernel where this bug did not occur:

I haven't tested with <2.6.20. Run one test with 2.6.21 and had the same problem

Distribution: 

ARCHLinux (but I downloaded the vanilla kernel stable version from kernel.org and compiled myself.

Hardware Environment:

DELL inspiron 5150 with Mobile Intel(R) Pentium(R) 4 CPU 3.06GHz (HyperThreading capabilities). I guess is the following (512 kB L2 cache)

http://www.intel.com/cd/products/services/emea/ita/processors/mobilepentium4/133307.htm

Software Environment:

Archlinux on xfce X system. Cpufreq driver with ondemand governor and speedstep-ich driver enabled.

Problem Description:

Since using cpu frequency scaling, machine hangs up from time to time and is completely not responding (-> needed hardware reboot). Several ACPI errors/exceptions show up in the kernel log:

[flego@gardelito linux]$ dmesg |grep -i 'acpi\|speedstep'
ACPI: RSDP (v000 DELL                                  ) @ 0x000fdea0
ACPI: RSDT (v001 DELL    CPi R   0x27d4071e ASL  0x00000061) @ 0x3fff0000
ACPI: FADT (v001 DELL    CPi R   0x27d4071e ASL  0x00000061) @ 0x3fff0400
ACPI: MADT (v001 DELL    CPi R   0x27d4071e ASL  0x00000047) @ 0x3fff0c00
ACPI: DSDT (v001 INT430 SYSFexxx 0x00001001 MSFT 0x0100000e) @ 0x00000000
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Using ACPI (MADT) for SMP configuration information
ACPI: Core revision 20060707
ACPI: bus type pci registered
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
PCI quirk: region 1000-107f claimed by ICH4 ACPI/GPIO/TCO
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 9 10 *11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 7) *11
ACPI: PCI Interrupt Link [LNKC] (IRQs 9 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 7 9 10 *11)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGP_._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIE._PRT]
PCI: Using ACPI for IRQ routing
ACPI: PCI Interrupt 0000:02:04.0[A] -> GSI 16 (level, low) -> IRQ 16
speedstep: frequency transition measured seems out of range (0 nSec), falling back to a safe one of 500000 nSec.
ACPI: AC Adapter [AC] (on-line)
ACPI: Battery Slot [BAT0] (battery present)
ACPI: Video Device [VID] (multi-head: yes  rom: no  post: no)
ACPI Error (dswload-0333): [SMIX] Namespace lookup failure, AE_ALREADY_EXISTS
ACPI Exception (psloop-0285): AE_ALREADY_EXISTS, During name lookup/catalog [20060707]
ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU0._PDC] (Node c1903cfc), AE_ALREADY_EXISTS
ACPI: Processor [CPU0] (supports 8 throttling states)
ACPI Error (dswload-0333): [GETC] Namespace lookup failure, AE_ALREADY_EXISTS
ACPI Exception (psloop-0285): AE_ALREADY_EXISTS, During name lookup/catalog [20060707]
ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU1._PDC] (Node c1903cd4), AE_ALREADY_EXISTS
ACPI: Processor [CPU1] (supports 8 throttling states)
ACPI: Thermal Zone [THM] (40 C)
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 16 (level, low) -> IRQ 16
ACPI: PCI Interrupt 0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 17
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 16
ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 18
ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 19
ACPI: PCI Interrupt 0000:00:1f.5[B] -> GSI 17 (level, low) -> IRQ 20
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 20 (level, low) -> IRQ 21
Time: acpi_pm clocksource has been installed.


acpidump >/tmp/acpidump
cat /tmp/acpidump



Steps to reproduce:

The problem seems to be more frequent when running on battery and using mplayer, although the machine hangs up also when just other applications (such as skype, firefox) are running.

I also noticed more frequent crashes when the panel-battery monitor was active too (not for sure).

Once, after a hang up, I could see the weather report plugin keeping showing its  varying messages, but still I couldn't interact with the machine. Maybe just the keyboard and mouse are involved?

Extra:

It's the first time I enter a (possible) bug. I hope I correctly filled all fields and provided the necessary informations. Thank you for your precious work!
Comment 1 Federico Flego 2007-07-02 12:47:50 UTC
Created attachment 11924 [details]
output of acpidump
Comment 2 Thomas Renninger 2007-07-02 14:10:44 UTC
The acpidump file looks corrupt.
Have you e.g. done: acpidump >/tmp/acpidump
Then uploaded /tmp/acpidump as attachement?
Best is you also set the mime type to text/plain.
Comment 3 Federico Flego 2007-07-02 14:58:32 UTC
Subject: Re:  ACPI errors (possibly) make cpu frequency scaling
 hangs up the machine

Yes I attached the file acpidump.
But I forgot to get rid of lines:

 > acpidump >/tmp/acpidump
 > cat /tmp/acpidump

which I wrote because I didn't know how to attach a file at the 
beginning. The file I attached seems readable, if it is not, please tell me.

Thank you very much,

Federico.


bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=8706
> 
> 
> 
> 
> 
> ------- Comment #2 from trenn@suse.de  2007-07-02 14:10 -------
> The acpidump file looks corrupt.
> Have you e.g. done: acpidump >/tmp/acpidump
> Then uploaded /tmp/acpidump as attachement?
> Best is you also set the mime type to text/plain.
> 
> 
Comment 4 Len Brown 2007-07-03 01:36:40 UTC
Created attachment 11931 [details]
acpidump from BIOS A38
Comment 5 Len Brown 2007-07-03 01:38:32 UTC
Created attachment 11932 [details]
dmesg-2.6.22-rc7
Comment 6 Len Brown 2007-07-03 01:41:47 UTC
please upgrade to the latest BIOS
please try disabling HT in the BIOS (as a test)
please don't run i8kfan (as a test)

if you still have a problem please re-open.
Comment 7 Federico Flego 2007-07-05 09:56:22 UTC
Created attachment 11947 [details]
dmesg 2.6.20
Comment 8 Federico Flego 2007-07-05 09:56:55 UTC
Created attachment 11948 [details]
acpidump bios A38
Comment 9 Federico Flego 2007-07-05 09:59:17 UTC
I upgraded to BIOS A38 and got rid of i8kfan. So far no more hangs on my machine :)
Still ACPI errors/exceptions appear in dmesg though.
I attached whole output of dmesg and acpidump.

Thanx,

Federico.
Comment 10 Thomas Renninger 2007-07-05 12:10:23 UTC
Doing:
acpixtract -a acpidump
acpiexec -x 0x1F DSDT.dat

gives:
Completing Region/Field/Buffer/Package initialization:..dsobject-0472 [11] DsBuildInternalPackage: Package List length larger than NumElements count (2), truncated
.dsobject-0472 [11] DsBuildInternalPackage: Package List length larger than NumElements count (D), truncated
.dsobject-0472 [11] DsBuildInternalPackage: Package List length larger than NumElements count (1), truncated

and a lot more of these...

later it's hanging when trying to execute _INI methods.
This was tested with acpica AML Execution/Debug Utility version 20060912

Work has been done in this area, see commit:
commit 8f9337c88335846b01801b1047a4caf10527a320
Author: Bob Moore <robert.moore@intel.com>
Date:   Fri Feb 2 19:48:18 2007 +0300

    ACPICA: Handle case NumElements > Package length
    
    Additional update for NumElements fix. Must handle
    case where NumElements > Package list length, pad package
    with null elements.
    
    Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>


Argghh, can someone tell me how I can fetch the -rcX-gitY when this patch got committed?

Ok, I manually checked, it seems to not yet be in 2.6.21. I wonder how you can boot at all, better try 2.6.22-rc7.
Comment 11 Thomas Renninger 2007-07-06 02:47:56 UTC
Sorry, I was wrong.
On Linus git tree, one could see the patch going in:
Follows: v2.6.20-rc7
Precedes: v2.6.21-rc1

The methods are very similar and I looked at the wrong one. This patch is already in 2.6.21.

However, the current official acpica version is quite old (09-11-2006, Bob wanted to publish another one, but seems he hasn't come to it before holidays...), it's not worth trying with an old acpica package. Sorry I can't help here, Intel people need to have a look at this.
Comment 12 Len Brown 2007-07-11 20:21:50 UTC
> I upgraded to BIOS A38 and got rid of i8kfan.
> So far no more hangs on my machine :)

Okay, so the primary issue that this bug is filed against (a hang) is gone.
It would be interesting if you re-enable i8kfan and hangs come back.
That, of course, would be an i8kfan bug...

> Still ACPI errors/exceptions appear in dmesg though.

So this is what remains of this bug report.

ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU0._PDC] (Node c1903cfc), AE_ALREADY_EXISTS

But your dmesg is from 2.6.20.  The dmesg from 2.6.22 I posed above
has these gone, so it appears this issue has already been addressed,
and I believe that this bug should be closed.

Re: DsBuildInternalPackage: Package List length larger than
NumElements count (D), truncated

I'm not sure why this is suspected to be related to the subject
of this bug report -- which was apparently the repeated _PDC issue.

Also, one would not expect a raw DSDT to necessarily survive
acpiexec, as the system hardware that the DSDT talks to is not
emulated.

But to answer your question,
git checkout -b foo 8f9337c88335846b01801b1047a4caf10527a320
would give you a tree with that commit on the top.
Alternatively, you can get just that patch with
git show 8f9337c88335846b01801b1047a4caf10527a320
and
gitk 8f9337c88335846b01801b1047a4caf10527a320
will bring up the GUI which will have links indicating
the preceeding and next tags.  this one shipped in 2.6.21.

closed
Comment 13 Federico Flego 2007-07-15 05:14:01 UTC
Thank you all for all your efforts! Actually I had no more hangs and that's very positive. Just to provide an extra feed-back, I tested kernel 2.6.22.1 and still have kernel error messages.
I include as attachments the .config, dmesg and acpidump of this last test.

Thank's once more,

Federico.
Comment 14 Federico Flego 2007-07-15 05:24:00 UTC
Created attachment 12039 [details]
dmesg 2.6.22_1
Comment 15 Federico Flego 2007-07-15 05:24:37 UTC
Created attachment 12040 [details]
acpidump bios A38 kernel 2.6.22.1
Comment 16 Federico Flego 2007-07-15 05:25:11 UTC
Created attachment 12041 [details]
kernel 2.6.22.1 config
Comment 17 Zhang Rui 2007-07-15 23:04:05 UTC
>ACPI: SSDT 3FFF320B, 066C (r1   DELL CPU0HTCI     1001 MSFT  100000E)
>ACPI Error (dswload-0334): [SMIX] Namespace lookup failure, AE_ALREADY_EXISTS
>ACPI Exception (psloop-0225): AE_ALREADY_EXISTS, During name lookup/catalog
>[20070126]
>ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU0._PDC]
>(Node c1908cfc), AE_ALREADY_EXISTS
>ACPI: Marking method _PDC as Serialized
>ACPI: Processor [CPU0] (supports 8 throttling states)
>ACPI: SSDT 3FFF3877, 00C6 (r1   DELL CPU1HTCI     1001 MSFT  100000E)
>ACPI Error (dswload-0334): [GETC] Namespace lookup failure, AE_ALREADY_EXISTS
>ACPI Exception (psloop-0225): AE_ALREADY_EXISTS, During name lookup/catalog
>[20070126]
>ACPI Error (psparse-0537): Method parse/execution failed [\_PR_.CPU1._PDC]
>(Node c1908cd4), AE_ALREADY_EXISTS
>ACPI: Marking method _PDC as Serialized

This problem is not clear yet.
Could you please boot with acpi.debug_level=0xf00fffff acpi.debug_layer=0xffff3ff0 and attach the dmesg output?
Comment 18 Thomas Renninger 2007-07-16 01:53:34 UTC
About comment 17: If you increase the debug output the kernel's ring buffer might get filled before everything got logged and log parts might get overridden (dmesg does not show the whole boot logs anymore, the beginning might be missing)
You can increase the buffer with the boot param:
log_buf_len=
(The value is in bytes, e.g. log_buf_len=34359738368 increases the buffer to 32MB).
Comment 19 Federico Flego 2007-07-24 13:43:50 UTC
I run kernel 2.6.22 with acpi_debug and other options. When I tried dmesg > dmesg_6_22_1_acpi_debug but I obtained just a truncated version of the whole thing. So I took /var/log/kernel.log and excerpted the last log session. Hope is useful... let me know.

Thank you!

Federico.
Comment 20 Federico Flego 2007-07-24 13:50:11 UTC
Created attachment 12125 [details]
output of dmesg with acpi_debug=on

Note You need to log in before you can comment on or make changes to this bug.