Bug 4700
Summary: | Boot hang due to BIOS reporting phantom 2nd IOAPIC - Acer Travelmate 3002, 915gm Chipset | ||
---|---|---|---|
Product: | ACPI | Reporter: | Tilo Lutz (TiloLutz) |
Component: | BIOS | Assignee: | Len Brown (lenb) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | aa_web, acpi-bugzilla, aderesch, bunk, compnerd, jools, rodd |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.12-rc5 - 2.6.16 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | Makes kernel not panic on trying to activate non-existent IOAPICs |
Description
Tilo Lutz
2005-06-03 04:28:37 UTC
After googling a while I found someone with related problems: http://www.ussg.iu.edu/hypermail/linux/kernel/0502.1/1450.html I applied the patch to both, suse kernel sources and 2.6.12-rc5. Now apic seems to work and my problems with acpi are gone. I got everyhting work but irda. I don't think irda is related in any way with that problem. irq table is looking strange but I think this is only a cosmetic thing: Notebook:/home/tilo # cat /proc/interrupts CPU0 0: 365550 IO-APIC-edge timer 1: 517 IO-APIC-edge i8042 7: 2 IO-APIC-edge parport0 8: 2 IO-APIC-edge rtc 9: 1579 IO-APIC-level acpi 12: 25149 IO-APIC-edge i8042 14: 15446 IO-APIC-edge ide0 169: 1225 IO-APIC-level ipw2200 193: 3458 IO-APIC-level uhci_hcd, HDA Intel, eth0 209: 763 IO-APIC-level ohci1394, uhci_hcd, yenta 225: 0 IO-APIC-level uhci_hcd, ehci_hcd 233: 0 IO-APIC-level uhci_hcd NMI: 0 LOC: 108519 ERR: 0 MIS: 0 So in the apic mode, the system now works with the IOAPIC patches. and in pic mode, the system can't work, right? I checked the BIOS. It seems for below devices, the IRQ routing table is correct in apic mode, and in pic mode, the table is very strange. And I think it's buggy. Could you please check if you have the latest BIOS? 0000:06:06.0 0000:06:07.0 0000:06:07.2 0000:06:07.3 0000:06:08.0 In pic mode not every device is working, e.g. sound. The notebook is a very new modell and there are no bios updates for it. But as soon there is an update available I will check it and post the result. Could you please try latest kernel version, which includes a workaround that interrupt routing table is wrong but bios setup correctly device's interrupt line. This workaround possibly fixes your issue. I tried 2.6.12.2 and the problem is still there. When booting the kernel it crashes right after startup when apic is enabled. The screen is blank and I don't see anything. I will take a look at 2.6.13-rc1-git5. I guess with the patch you mentioned in comment 1, your system works in apic mode. Eight? Did you even try the latest kernel with apic disabled? If it can't work, eigher you wait for a BIOS update or you keep using apic mode, it definitely is a BIOS bug. Without apic the system boot sup but some devices, e.g. sound, won't work. The alsa driver is complainging about a not asigned IRQ. I think I have to wait until device drivers for my system will become more stable or my vendor will release a new bios :( I've a similar Problem here: Distribution: Gentoo Linux Hardware: Acer Travelmate 4101WLMI (915GM Chipset) Affected Kernels: I think all -> from 2.6.11.9 to 2.6.13-rc2 all testet incl. the appropriate mm and acpi Patches. Problem: I've to boot the Kernel into the PIC-Mode (acpi=noirq or pci=noacpi) to get the kernel up and running. But I can't use the Devices "behind" the PCI-Express to PCI-Bridge in PIC-Mode. For example, if I try to load the Module ipw2200 (Intel Wireless Lan) i get a IRQ-Error: (dmesg) ieee80211_crypt: registered algorithm 'NULL' ipw2200: Intel(R) PRO/Wireless 2200/2915 Network Driver, 1.0.4 ipw2200: Copyright(c) 2003-2004 Intel Corporation ipw2200: Detected Intel PRO/Wireless 2200BG Network Connection ipw2200: Radio Frequency Kill Switch is On: Kill switch must be turned off for wireless networking to work. ieee80211_crypt: registered algorithm 'WEP' ieee80211_crypt: registered algorithm 'CCMP' ieee80211_crypt: registered algorithm 'TKIP' irq 10: nobody cared! [<c013d75a>] __report_bad_irq+0x2a/0xa0 [<c013d100>] handle_IRQ_event+0x30/0x70 [<c013d860>] note_interrupt+0x70/0xb0 [<c013d250>] __do_IRQ+0x110/0x120 [<c01057e9>] do_IRQ+0x19/0x30 [<c0103b52>] common_interrupt+0x1a/0x20 [<c01228be>] __do_softirq+0x2e/0x90 [<c0122946>] do_softirq+0x26/0x30 [<c0122a15>] irq_exit+0x35/0x40 [<c01057ee>] do_IRQ+0x1e/0x30 [<c0103b52>] common_interrupt+0x1a/0x20 [<c01e007b>] simple_strtoul+0xcb/0xf0 [<c013d502>] setup_irq+0xb2/0x130 [<e0bcea80>] usb_hcd_irq+0x0/0x70 [usbcore] [<c013d715>] request_irq+0x85/0xa0 [<e0bcee42>] usb_add_hcd+0x1d2/0x2a0 [usbcore] [<e0bcea80>] usb_hcd_irq+0x0/0x70 [usbcore] [<c01e5276>] pci_set_master+0x46/0x80 [<e0bd361a>] usb_hcd_pci_probe+0x22a/0x370 [usbcore] [<c01e6d62>] pci_device_probe_static+0x52/0x70 [<c01e6dbc>] __pci_device_probe+0x3c/0x50 [<c01e6dfc>] pci_device_probe+0x2c/0x50 [<c023c0ff>] driver_probe_device+0x2f/0x80 [<c023c24c>] driver_attach+0x5c/0xa0 [<c023c7bd>] bus_add_driver+0x9d/0xd0 [<c01e6f40>] pci_device_shutdown+0x0/0x30 [<c01e70ae>] pci_register_driver+0x6e/0x90 [<e0ab909b>] uhci_hcd_init+0x9b/0xe8 [uhci_hcd] [<c0139598>] sys_init_module+0x148/0x1f0 [<c0103195>] syscall_call+0x7/0xb handlers: [<e0bcea80>] (usb_hcd_irq+0x0/0x70 [usbcore]) Disabling IRQ #10 If I try to start the kernel without "acpi=noirq" the system hang after the "Booting the Kernel ..." Line any ideas? I have an Acer Aspire 1690 (1691WLMi) with the same problems. The workaround i use is to boot with parameter "noapic" which makes it boot, and "pci=routeirq" which seems to help with interrupt problems. The boot hang seems to happen when the IOAPIC is being probed to be enabled. I'd like to help with solving this problem. The patch in Comment #1 From Tilo Lutz fixes the boot hang and i now see IRQs up to 23. No need to use "noapic" and "pci=routeirq" boot parameters. cat /proc/interrupts CPU0 0: 2033117 IO-APIC-edge timer 1: 4329 IO-APIC-edge i8042 8: 5 IO-APIC-edge rtc 9: 347763 IO-APIC-level acpi 12: 109983 IO-APIC-edge i8042 14: 16521 IO-APIC-edge ide0 16: 20265 IO-APIC-level eth0, uhci_hcd:usb5 17: 23902 IO-APIC-level Intel ICH6, ipw2200 18: 3 IO-APIC-level yenta, ohci1394, uhci_hcd:usb4 19: 0 IO-APIC-level uhci_hcd:usb3 23: 0 IO-APIC-level ehci_hcd:usb1, uhci_hcd:usb2 NMI: 0 LOC: 271889 ERR: 0 MIS: 0 The APIC (MADT) table in the acpidump output suggests this system has two IOAPICS: ACPI: APIC (v001 INTEL ALVISO 0x06040000 LOHR 0x0000005f) @ 0x(nil) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) ACPI: IOAPIC (id[0x01] address[0xfec00000] global_irq_base[0x0]) ACPI: IOAPIC (id[0x02] address[0xfec20000] global_irq_base[0x18]) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) Length 102 OK Checksum OK How the heck did this system escape from QA? I'm using 2.6.13.2 at many things like dri are working out of the box. Hardware support is much better. I still have to add those patches so linux don't use the 2nd apic I have think about the 2nd apic reported but not there. I can connect a dockingstation to my notebook. The dockingstation includes some PCIe ports. During bootup I get some error messages. Because I can't use those devices I don't care about them. But maybe the 2nd apic has to something with this. ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23 ACPI: IOAPIC (id[0x02] address[0xfec20000] gsi_base[24]) IOAPIC[1]: Unable change apic_id! ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. [..] PCI: Cannot allocate resource region 7 of bridge 0000:00:1c.0 PCI: Cannot allocate resource region 8 of bridge 0000:00:1c.0 PCI: Cannot allocate resource region 9 of bridge 0000:00:1c.0 PCI: Cannot allocate resource region 7 of bridge 0000:00:1c.1 PCI: Cannot allocate resource region 8 of bridge 0000:00:1c.1 PCI: Cannot allocate resource region 9 of bridge 0000:00:1c.1 PCI: Cannot allocate resource region 7 of bridge 0000:00:1c.2 PCI: Cannot allocate resource region 8 of bridge 0000:00:1c.2 PCI: Cannot allocate resource region 9 of bridge 0000:00:1c.2 TC classifier action (bugs to netdev@vger.kernel.org cc hadi@cyberus.ca) PCI: Ignore bogus resource 6 [0:0] of 0000:00:02.0 PCI: Bridge: 0000:00:1c.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:1c.1 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:1c.2 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bus 7, cardbus bridge: 0000:06:07.0 IO window: 00002000-00002fff IO window: 00003000-00003fff PREFETCH window: 20000000-21ffffff MEM window: 24000000-25ffffff PCI: Bridge: 0000:00:1e.0 IO window: 2000-3fff MEM window: b0100000-b01fffff PREFETCH window: 20000000-21ffffff PCI: Device 0000:00:1c.0 not available because of resource collisions PCI: Setting latency timer of device 0000:00:1c.0 to 64 PCI: Device 0000:00:1c.1 not available because of resource collisions PCI: Setting latency timer of device 0000:00:1c.1 to 64 PCI: Device 0000:00:1c.2 not available because of resource collisions PCI: Setting latency timer of device 0000:00:1c.2 to 64 PCI: Setting latency timer of device 0000:00:1e.0 to 64 lspci: [...] 0000:00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 1 (rev 04) 0000:00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 2 (rev 04) 0000:00:1c.2 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 3 (rev 04) Cheers, Tilo I have the same problem. Using the mentioned patch or with kernel option noapic allows the kernel to boot. (But would be nice to be able to boot without these additional options or patching the kernel). I'm running an Acer travelmate 8104.. Do all the acer laptops have completely broken bios's or what ?! I too have a travelmate 8100 (more exactly 8104WMLi) running the latest Gentoo with kernel 2.6.14 which locks immediatly after the message "Ok, booting the kernel." unless I pass on acpi=off or noapic. I enabled kernel/power management debugging and still no message is printed. I suspect it may be related to this bug. Can I help fix this somehow? This bug is set to 2.6.12-rc5-mm but the bug affects even the most recent stable kernel 2.6.15.1. Should the kernel version be changed to show it affects more recent kernels ? Reassign this to Len. Hopefully he can merge the patch mentioned in comment 1 with some tweaks. I tested kernel 2.6.16 from opensuse 10.1 beta 3 and the problem still exists. I tried the current stable kernel 2.6.15.4 and took a look at the code. I don't see any changes. This fix is still neccessary: http://www.ussg.iu.edu/hypermail/linux/kernel/0502.1/1450.html When did the bug get switched to resolved anyway? I've been watching both the kernel development changelogs and saw nothing to suggest it had been fixed and got no notification from bugzilla that the status had been modified to resolved. Created attachment 7429 [details] Makes kernel not panic on trying to activate non-existent IOAPICs Just for the record, this is the current version of above mentioned patch. Except for some offset it applies to any kernel version since 2.6.13-rc4. So there should be no tweaking required. Both versions can be found at http://www.fs.tum.de/~aderesch/, and I will make new patches should the need arise. Since there is no activity and it's clearly a useful and needed patch I put it into the x86-64 patchkit and will submit it with that. I'm closing this bug since the patch was included in 2.6.16-rc5. You can download the lastest BIOS here: http://csd.acer.com.tw/SI/Download2.nsf/1815c7c6f8aff65d48256bdd0035cffd/3e1c0adb04e82428482570910038d7ee?OpenDocument For all Acer users, you can download the lastest BIOS here: http://csd.acer.com.tw/SI/Download2.nsf/NotebookWeb username: guest password: guest Does this patch also fix the battery problem, sound problem (no recording) and touchpad problem (the cursor sometime moves to wrong position when I use touchpad). I have an ACER TM3001 laptop, and I just turned on APIC but turned off IO_APIC option, then it boots. ;) Your problems are not related to this fix battery problem: You have to use a fixed DSDT, http://acpi.sourceforge.net/ sound problem: Recording works with current alsa drivers touchpad problem: I never had any problems with it. My travelmate 3002 hangs up at Coldplug/hotplug stage with using 2.6.16, but can reboot by using CTRL+DEL+ALT. Now I have to rollback to 2.6.14. Does anybody have encountered this issue? I'm using Gentoo Linux. |