Bug 4700 - Boot hang due to BIOS reporting phantom 2nd IOAPIC - Acer Travelmate 3002, 915gm Chipset
Summary: Boot hang due to BIOS reporting phantom 2nd IOAPIC - Acer Travelmate 3002, 91...
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: BIOS (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-06-03 04:28 UTC by Tilo Lutz
Modified: 2006-03-25 16:13 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.12-rc5 - 2.6.16
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Makes kernel not panic on trying to activate non-existent IOAPICs (1.52 KB, patch)
2006-02-21 08:09 UTC, Andreas Deresch
Details | Diff

Description Tilo Lutz 2005-06-03 04:28:37 UTC
Distribution: Suse Linux 9.3  
Hardware Environment: Acer Travelmate 3002, 915gm Chipset  
Software Environment: Suse Linux 9.3  
Problem Description:  
IRQ assignment don't work correct.  
Many connected devices don't work  
although they are regocnized  
  
Steps to reproduce:  
Just start the kernel  
  
kernel-parameter: noacpi Without the kernel won't boot up.  
 
I stored all infos at: http://home.wms-hn.de/~tilo/kernel because 
this mail can only be 65000 chars long. 
  
I had to modify because the original has some errors:  
Jun  1 20:10:28 Notebook kernel:     ACPI-0352: *** Error: Looking up [Z00B] in  
namespace, AE_NOT_FOUND  
Jun  1 20:10:28 Notebook kernel: search_node df384400 start_node df384400  
return_node 00000000  
Jun  1 20:10:28 Notebook kernel:     ACPI-1138: *** Error: Method execution  
failed [\_SB_.BAT1._BST] (Node df384300), AE_NOT_FOUND  
Jun  1 20:10:28 Notebook kernel:     ACPI-0352: *** Error: Looking up [Z00B] in  
namespace, AE_NOT_FOUND  
Jun  1 20:10:28 Notebook kernel: search_node df384400 start_node df384400  
return_node 00000000  
Jun  1 20:10:28 Notebook kernel:     ACPI-1138: *** Error: Method execution  
failed [\_SB_.BAT1._BST] (Node df384300), AE_NOT_FOUND  
  
I didn't know the programming language but I was able to add the required  
patches because I take a look at other patched acer dsdt table.  
I have attached the original dsdt and the patch.  
  
lspci -vv:  
http://home.wms-hn.de/~tilo/kernel/lspci  
  
/proc/interrupts:  
           CPU0         
  0:      22594          XT-PIC  timer  
  1:        294          XT-PIC  i8042  
  2:          0          XT-PIC  cascade  
  8:          2          XT-PIC  rtc  
  9:        402          XT-PIC  acpi  
 11:          0          XT-PIC  ohci1394  
 14:       6358          XT-PIC  ide0  
NMI:          0   
LOC:      13094   
ERR:          0  
MIS:          0  
  
dmesg: 
http://home.wms-hn.de/~tilo/kernel/dmesg 
  
.config:  
http://home.wms-hn.de/~tilo/kernel/config 
  
original_dsdt:  
http://home.wms-hn.de/~tilo/kernel/DSDT-Acer-Travelmate-3000-orig 
  
Patch for dsdt:  
211a212  
>     External (CFGD)  
573c574  
<             If (LEqual (And (PDC0, 0x0A), 0x0A))  
---  
>             If (LEqual (And (\_SB.PCI0.RP01.PDC0, 0x0A), 0x0A))  
578c579  
<             If (LEqual (And (PDC1, 0x0A), 0x0A))  
---  
>             If (LEqual (And (\_SB.PCI0.RP01.PDC1, 0x0A), 0x0A))  
1879c1880  
<                         ,   2,  
---  
>                     PDC0,   2,  
2149a2151  
>                                 Return (0x00)  
3676c3678  
<                             \_SB.BAT1.Z007 ()  
---  
>                             \_SB.BAT1.Z00B ()  
5910c5912  
<             Method (Z007, 0, NotSerialized)  
---  
>             Method (Z00B, 0, NotSerialized)  
  
output of acpidmt:  
http://home.wms-hn.de/~tilo/kernel/acpidmt
Comment 1 Tilo Lutz 2005-06-05 07:31:27 UTC
After googling a while I found someone with related problems: 
http://www.ussg.iu.edu/hypermail/linux/kernel/0502.1/1450.html 
 
I applied the patch to both, suse kernel sources and 2.6.12-rc5. 
Now apic seems to work and my problems with acpi are gone. 
 
I got everyhting work but irda. I don't think irda is related in any 
way with that problem. 
 
irq table is looking strange but I think this is only a cosmetic thing: 
 
Notebook:/home/tilo # cat /proc/interrupts 
           CPU0 
  0:     365550    IO-APIC-edge  timer 
  1:        517    IO-APIC-edge  i8042 
  7:          2    IO-APIC-edge  parport0 
  8:          2    IO-APIC-edge  rtc 
  9:       1579   IO-APIC-level  acpi 
 12:      25149    IO-APIC-edge  i8042 
 14:      15446    IO-APIC-edge  ide0 
169:       1225   IO-APIC-level  ipw2200 
193:       3458   IO-APIC-level  uhci_hcd, HDA Intel, eth0 
209:        763   IO-APIC-level  ohci1394, uhci_hcd, yenta 
225:          0   IO-APIC-level  uhci_hcd, ehci_hcd 
233:          0   IO-APIC-level  uhci_hcd 
NMI:          0 
LOC:     108519 
ERR:          0 
MIS:          0 
 
 
Comment 2 Shaohua 2005-06-05 19:00:13 UTC
So in the apic mode, the system now works with the IOAPIC patches. and in pic 
mode, the system can't work, right?
I checked the BIOS. It seems for below devices, the IRQ routing table is 
correct in apic mode, and in pic mode, the table is very strange. And I think 
it's buggy. Could you please check if you have the latest BIOS?
0000:06:06.0
0000:06:07.0
0000:06:07.2
0000:06:07.3
0000:06:08.0
Comment 3 Tilo Lutz 2005-06-05 23:46:35 UTC
In pic mode not every device is working, e.g. sound. 
The notebook is a very new modell and there are no 
bios updates for it. 
But as soon there is an update available I will 
check it and post the result. 
Comment 4 Shaohua 2005-07-03 18:37:54 UTC
Could you please try latest kernel version, which includes a workaround that 
interrupt routing table is wrong but bios setup correctly device's interrupt 
line. This workaround possibly fixes your issue.
Comment 5 Tilo Lutz 2005-07-04 00:23:07 UTC
I tried 2.6.12.2 and the problem is still there. 
When booting the kernel it crashes right after startup 
when apic is enabled. 
The screen is blank and I don't see anything. 
I will take a look at 2.6.13-rc1-git5. 
 
Comment 6 Shaohua 2005-07-07 19:39:10 UTC
I guess with the patch you mentioned in comment 1, your system works in apic 
mode. Eight?
Did you even try the latest kernel with apic disabled? If it can't work, 
eigher you wait for a BIOS update or you keep using apic mode, it definitely 
is a BIOS bug.
Comment 7 Tilo Lutz 2005-07-13 23:21:39 UTC
Without apic the system boot sup but some devices, e.g. sound, 
won't work. 
The alsa driver is complainging about a not asigned IRQ. 
 
I think I have to wait until device drivers for my system 
will become more stable or my vendor will release a new bios :( 
 
Comment 8 Alexander G 2005-07-15 09:13:30 UTC
I've a similar Problem here:
Distribution: Gentoo Linux
Hardware: Acer Travelmate 4101WLMI (915GM Chipset)
Affected Kernels: I think all -> from 2.6.11.9 to 2.6.13-rc2 all testet incl.
the appropriate mm and acpi Patches.

Problem:
I've to boot the Kernel into the PIC-Mode (acpi=noirq or pci=noacpi) to get the
kernel up and running. But I can't use the Devices "behind" the PCI-Express to
PCI-Bridge in PIC-Mode. For example, if I try to load the Module ipw2200 (Intel
Wireless Lan) i get a IRQ-Error: (dmesg)

ieee80211_crypt: registered algorithm 'NULL'
ipw2200: Intel(R) PRO/Wireless 2200/2915 Network Driver, 1.0.4
ipw2200: Copyright(c) 2003-2004 Intel Corporation
ipw2200: Detected Intel PRO/Wireless 2200BG Network Connection
ipw2200: Radio Frequency Kill Switch is On:
Kill switch must be turned off for wireless networking to work.
ieee80211_crypt: registered algorithm 'WEP'
ieee80211_crypt: registered algorithm 'CCMP'
ieee80211_crypt: registered algorithm 'TKIP'
irq 10: nobody cared!
 [<c013d75a>] __report_bad_irq+0x2a/0xa0
 [<c013d100>] handle_IRQ_event+0x30/0x70
 [<c013d860>] note_interrupt+0x70/0xb0
 [<c013d250>] __do_IRQ+0x110/0x120
 [<c01057e9>] do_IRQ+0x19/0x30
 [<c0103b52>] common_interrupt+0x1a/0x20
 [<c01228be>] __do_softirq+0x2e/0x90
 [<c0122946>] do_softirq+0x26/0x30
 [<c0122a15>] irq_exit+0x35/0x40
 [<c01057ee>] do_IRQ+0x1e/0x30
 [<c0103b52>] common_interrupt+0x1a/0x20
 [<c01e007b>] simple_strtoul+0xcb/0xf0
 [<c013d502>] setup_irq+0xb2/0x130
 [<e0bcea80>] usb_hcd_irq+0x0/0x70 [usbcore]
 [<c013d715>] request_irq+0x85/0xa0
 [<e0bcee42>] usb_add_hcd+0x1d2/0x2a0 [usbcore]
 [<e0bcea80>] usb_hcd_irq+0x0/0x70 [usbcore]
 [<c01e5276>] pci_set_master+0x46/0x80
 [<e0bd361a>] usb_hcd_pci_probe+0x22a/0x370 [usbcore]
 [<c01e6d62>] pci_device_probe_static+0x52/0x70
 [<c01e6dbc>] __pci_device_probe+0x3c/0x50
 [<c01e6dfc>] pci_device_probe+0x2c/0x50
 [<c023c0ff>] driver_probe_device+0x2f/0x80
 [<c023c24c>] driver_attach+0x5c/0xa0
 [<c023c7bd>] bus_add_driver+0x9d/0xd0
 [<c01e6f40>] pci_device_shutdown+0x0/0x30
 [<c01e70ae>] pci_register_driver+0x6e/0x90
 [<e0ab909b>] uhci_hcd_init+0x9b/0xe8 [uhci_hcd]
 [<c0139598>] sys_init_module+0x148/0x1f0
 [<c0103195>] syscall_call+0x7/0xb
handlers:
[<e0bcea80>] (usb_hcd_irq+0x0/0x70 [usbcore])
Disabling IRQ #10

If I try to start the kernel without "acpi=noirq" the system hang after the
"Booting the Kernel ..." Line

any ideas?
Comment 9 Pedro Ramalhais 2005-08-23 10:23:17 UTC
I have an Acer Aspire 1690 (1691WLMi) with the same problems. The workaround i
use is to boot with parameter "noapic" which makes it boot, and "pci=routeirq"
which seems to help with interrupt problems. The boot hang seems to happen when
the IOAPIC is being probed to be enabled. I'd like to help with solving this
problem.
Comment 10 Pedro Ramalhais 2005-08-23 14:20:17 UTC
The patch in Comment #1 From Tilo Lutz fixes the boot hang and i now see IRQs up
to 23. No need to use "noapic" and "pci=routeirq" boot parameters.

cat /proc/interrupts
           CPU0
  0:    2033117    IO-APIC-edge  timer
  1:       4329    IO-APIC-edge  i8042
  8:          5    IO-APIC-edge  rtc
  9:     347763   IO-APIC-level  acpi
 12:     109983    IO-APIC-edge  i8042
 14:      16521    IO-APIC-edge  ide0
 16:      20265   IO-APIC-level  eth0, uhci_hcd:usb5
 17:      23902   IO-APIC-level  Intel ICH6, ipw2200
 18:          3   IO-APIC-level  yenta, ohci1394, uhci_hcd:usb4
 19:          0   IO-APIC-level  uhci_hcd:usb3
 23:          0   IO-APIC-level  ehci_hcd:usb1, uhci_hcd:usb2
NMI:          0
LOC:     271889
ERR:          0
MIS:          0
Comment 11 Len Brown 2005-09-15 00:13:06 UTC
The APIC (MADT) table in the acpidump output suggests this system
has two IOAPICS:

ACPI: APIC (v001 INTEL  ALVISO   0x06040000 LOHR 0x0000005f) @ 0x(nil)
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: IOAPIC (id[0x01] address[0xfec00000] global_irq_base[0x0])
ACPI: IOAPIC (id[0x02] address[0xfec20000] global_irq_base[0x18])
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
Length 102 OK
Checksum OK

How the heck did this system escape from QA?
Comment 12 Tilo Lutz 2005-09-22 08:37:20 UTC
I'm using 2.6.13.2 at many things like dri are working out of the box. 
Hardware support is much better. 
I still have to add those patches so linux don't use the 2nd apic 
 
I have think about the 2nd apic reported but not there. 
I can connect a dockingstation to my notebook. 
The dockingstation includes some PCIe ports. 
During bootup I get some error messages. 
Because I can't use those devices I don't care about them. 
But maybe the 2nd apic has to something with this. 
 
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) 
ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) 
IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23 
ACPI: IOAPIC (id[0x02] address[0xfec20000] gsi_base[24]) 
IOAPIC[1]: Unable change apic_id! 
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) 
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) 
ACPI: IRQ0 used by override. 
ACPI: IRQ2 used by override. 
ACPI: IRQ9 used by override. 
[..] 
PCI: Cannot allocate resource region 7 of bridge 0000:00:1c.0 
PCI: Cannot allocate resource region 8 of bridge 0000:00:1c.0 
PCI: Cannot allocate resource region 9 of bridge 0000:00:1c.0 
PCI: Cannot allocate resource region 7 of bridge 0000:00:1c.1 
PCI: Cannot allocate resource region 8 of bridge 0000:00:1c.1 
PCI: Cannot allocate resource region 9 of bridge 0000:00:1c.1 
PCI: Cannot allocate resource region 7 of bridge 0000:00:1c.2 
PCI: Cannot allocate resource region 8 of bridge 0000:00:1c.2 
PCI: Cannot allocate resource region 9 of bridge 0000:00:1c.2 
TC classifier action (bugs to netdev@vger.kernel.org cc hadi@cyberus.ca) 
PCI: Ignore bogus resource 6 [0:0] of 0000:00:02.0 
PCI: Bridge: 0000:00:1c.0 
  IO window: disabled. 
  MEM window: disabled. 
  PREFETCH window: disabled. 
PCI: Bridge: 0000:00:1c.1 
  IO window: disabled. 
  MEM window: disabled. 
  PREFETCH window: disabled. 
PCI: Bridge: 0000:00:1c.2 
  IO window: disabled. 
  MEM window: disabled. 
  PREFETCH window: disabled. 
PCI: Bus 7, cardbus bridge: 0000:06:07.0 
  IO window: 00002000-00002fff 
  IO window: 00003000-00003fff 
  PREFETCH window: 20000000-21ffffff 
  MEM window: 24000000-25ffffff 
PCI: Bridge: 0000:00:1e.0 
  IO window: 2000-3fff 
  MEM window: b0100000-b01fffff 
  PREFETCH window: 20000000-21ffffff 
PCI: Device 0000:00:1c.0 not available because of resource collisions 
PCI: Setting latency timer of device 0000:00:1c.0 to 64 
PCI: Device 0000:00:1c.1 not available because of resource collisions 
PCI: Setting latency timer of device 0000:00:1c.1 to 64 
PCI: Device 0000:00:1c.2 not available because of resource collisions 
PCI: Setting latency timer of device 0000:00:1c.2 to 64 
PCI: Setting latency timer of device 0000:00:1e.0 to 64 
 
lspci: 
[...] 
0000:00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
PCI Express Port 1 (rev 04) 
0000:00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
PCI Express Port 2 (rev 04) 
0000:00:1c.2 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
PCI Express Port 3 (rev 04) 
 
Cheers, Tilo 
Comment 13 Jools Wills 2005-12-17 08:41:32 UTC
I have the same problem. Using the mentioned patch or  
with kernel option noapic allows the kernel to boot. (But would be nice  
to be able to boot without these additional options or patching the kernel).  
  
I'm running an Acer travelmate 8104.. Do all the acer laptops have completely  
broken bios's or what ?!  
Comment 14 Stefan Ciobaca 2006-01-14 00:21:52 UTC
I too have a travelmate 8100 (more exactly 8104WMLi) running the latest Gentoo
with kernel 2.6.14 which locks immediatly after the message "Ok, booting the
kernel." unless I pass on acpi=off or noapic. I enabled kernel/power management
debugging and still no message is printed. I suspect it may be related to this
bug. Can I help fix this somehow?
Comment 15 Jools Wills 2006-01-17 08:41:08 UTC
This bug is set to 2.6.12-rc5-mm but the bug affects even the most recent  
stable kernel 2.6.15.1. Should the kernel version be changed to show it 
affects more recent kernels ? 
Comment 16 Shaohua 2006-01-17 18:25:14 UTC
Reassign this to Len. Hopefully he can merge the patch mentioned in comment 1 
with some tweaks.
Comment 17 Tilo Lutz 2006-02-19 02:38:56 UTC
I tested kernel 2.6.16 from opensuse 10.1 beta 3  
and the problem still exists.  
  
I tried the current stable kernel 2.6.15.4 and took  
a look at the code.  
I don't see any changes.  
  
This fix is still neccessary:  
http://www.ussg.iu.edu/hypermail/linux/kernel/0502.1/1450.html  
  
  
Comment 18 Jools Wills 2006-02-19 12:38:31 UTC
When did the bug get switched to resolved anyway? I've been watching both the 
kernel development changelogs and saw nothing to suggest it had been fixed and 
got no notification from bugzilla that the status had been modified to 
resolved. 
Comment 19 Andreas Deresch 2006-02-21 08:09:00 UTC
Created attachment 7429 [details]
Makes kernel not panic on trying to activate non-existent IOAPICs

Just for the record, this is the current version of above mentioned patch.
Except for some offset it applies to any kernel version since 2.6.13-rc4.
So there should be no tweaking required.
Both versions can be found at http://www.fs.tum.de/~aderesch/, and I will make
new patches should the need arise.
Comment 20 Andi Kleen 2006-02-25 07:21:14 UTC
Since there is no activity and it's clearly a useful and needed patch 
I put it into the x86-64 patchkit and will submit it with that.
Comment 21 Adrian Bunk 2006-02-26 21:41:13 UTC
I'm closing this bug since the patch was included in 2.6.16-rc5.
Comment 22 Nan Wang 2006-03-07 17:04:23 UTC
You can download the lastest BIOS here:

http://csd.acer.com.tw/SI/Download2.nsf/1815c7c6f8aff65d48256bdd0035cffd/3e1c0adb04e82428482570910038d7ee?OpenDocument


For all Acer users, you can download the lastest BIOS here:
http://csd.acer.com.tw/SI/Download2.nsf/NotebookWeb

username: guest
password: guest
Comment 23 Nan Wang 2006-03-07 17:21:08 UTC
Does this patch also fix the battery problem, sound problem (no recording) and
touchpad problem (the cursor sometime moves to wrong position when I use touchpad).

I have an ACER TM3001 laptop, and I just turned on APIC but turned off IO_APIC
option, then it boots. ;)
Comment 24 Tilo Lutz 2006-03-08 00:32:22 UTC
Your problems are not related to this fix
battery problem: You have to use a fixed DSDT, http://acpi.sourceforge.net/
sound problem: Recording works with current alsa drivers
touchpad problem: I never had any problems with it.
Comment 25 Nan Wang 2006-03-25 16:13:12 UTC
My travelmate 3002 hangs up at Coldplug/hotplug stage with using 2.6.16, but can
reboot by using CTRL+DEL+ALT. Now I have to rollback to 2.6.14.

Does anybody have encountered this issue?

I'm using Gentoo Linux.

Note You need to log in before you can comment on or make changes to this bug.