Bug 1581
Summary: | disabled PCI Interrupt Link devices. | ||
---|---|---|---|
Product: | ACPI | Reporter: | Luming Yu (luming.yu) |
Component: | Config-Interrupts | Assignee: | Len Brown (lenb) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | acpi-bugzilla, andi-bz, ccheney, jkohen, lenb, pcnet32, tony |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.0-test9 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
a patch for fixing this issue
same patch -- ported to 2.4.23 and newer 2.6.0 x86_64 VIA chipset IOAPIC fix debug patch against 2.6.5 eMachines M6807 - 2.6.6-rc2-bk3 + patch - dmesg output eMachines M6807 - 2.6.6-rc2-bk3 + patch - lspci output eMachines M6807 - 2.6.6-rc2-bk3 + patch - /proc/interrupts output updated 2.6.5 debug patch updated 2.6.5 debug patch proposed 2.6.5 patch m6805 hangs with this patch |
Description
Luming Yu
2003-11-24 01:15:10 UTC
Created attachment 1514 [details]
a patch for fixing this issue
Created attachment 1591 [details]
same patch -- ported to 2.4.23 and newer 2.6.0
One would think this was an obvious fix, but it isn't so simple -- we may actually want to do the opposite of the suggestion in this patch, to ignore that links are disabled rather than pay closer attention to them being disabled. Experimenting with 2.4.23 on my Intel 440GX with acpi=force and noapic... Where device 00:0c.0 is the on-board SCSI, which is covered by link "PRQ3". pci_link-0405 [20] acpi_pci_link_set : Link disabled ACPI: Unable to set IRQ for PCI Interrupt Link [PRQ3] (likely buggy ACPI BIOS). Aborting ACPI-based IRQ routing. Try pci=noacpi or acpi=off pci_irq-0266 [17] acpi_pci_irq_lookup : Invalid IRQ link routing entry pci_irq-0305 [17] acpi_pci_irq_derive : Unable to derive IRQ for device 00:0c.0 PCI: No IRQ known for interrupt pin A of device 00:0c.0 Then the system actually runs properly with the device on IRQ11. Apply a patch to ignore disabled links: pci_link-0605 [12] acpi_pci_link_get_irq : Invalid link context pci_irq-0266 [11] acpi_pci_irq_lookup : Invalid IRQ link routing entry pci_irq-0305 [11] acpi_pci_irq_derive : Unable to derive IRQ for device 00:0c.0 PCI: No IRQ known for interrupt pin A of device 00:0c.0 So the we followed a _PRT entry to a link device that does not exist. Do the reverse and apply a patch to acpi_pci_link_set to proceed in the face of disabled links: and use these options to try to program the PIRQ off the default of 11: Kernel command line: root=/dev/sda2 console=tty0 console=ttyS0,115200n8 acpi=force noapic acpi_irq_balance acpi_irq_isa=11 pci_link-0405 [20] acpi_pci_link_set : Link disabled pci_link-0407 [20] acpi_pci_link_set : but continuing anyway pci_link-0292 [21] acpi_pci_link_try_get_: No active IRQ resource found _CRS returns NULL! Using IRQ 10 fordevice (PCI Interrupt Link [PRQ3]). ACPI: PCI Interrupt Link [PRQ3] enabled at IRQ 10 and the system successfully routes the SCSI to IRQ10 and runs properly. The fact that when ACPI couldn't get an IRQ for SCSI it worked at IRQ11 anyway suggests that our derive function is broken. The fact that ACPI successfully programs this IRQ when the link is disabled suggests that not only should we not ignore disabled link devices, it might be useful to allow programming them when they're referenced by an active _PRT entry. Created attachment 2401 [details] x86_64 VIA chipset IOAPIC fix The attached patch is needed on x86_64 machines based on VIA chipset. It fixes ACPI bug 2090, but may need modifications to be safe on other systems. On x86_64, apic still needs to be specified in the kernel cmdline: root=/dev/hda3 ro psmouse.proto=imps apic console=tty0 And then cat /proc/interrupts shows: 0: 70843 IO-APIC-edge timer 1: 9 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 0 IO-APIC-edge rtc 10: 0 IO-APIC-level acpi 12: 44 IO-APIC-edge i8042 14: 2734 IO-APIC-edge ide0 15: 19 IO-APIC-edge ide1 17: 0 IO-APIC-level yenta 18: 0 IO-APIC-level eth0 21: 565 IO-APIC-level ehci_hcd, uhci_hcd, uhci_hcd, uhci_hcd 22: 0 IO-APIC-level VIA8233 23: 6 IO-APIC-level eth1 NMI: 12 LOC: 70752 ERR: 0 MIS: 0 And things are just working :) Len, what's the status of Tony's link patch? Is it going into mainline any time soon? I would like to have some solution for the eMachines laptop Another example of the disabled PCI link device issue from Don Fry: ACPI: PCI Interrupt Link [LMVI] (IRQs 18) ACPI: Unable to set IRQ for PCI Interrupt Link [LMVI] to 18 (likely buggy ACPI BIOS). Try pci=noacpi or acpi=off ACPI: No IRQ known for interrupt pin A of device 0000:00:06.0 - using IRQ 255 00:06.0 VGA compatible controller: S3 Inc. Savage 4 (rev 04) (prog-if 00 [VGA]) Subsystem: IBM: Unknown device 01c5 Flags: bus master, medium devsel, latency 248, IRQ 255 Memory at feb00000 (32-bit, non-prefetchable) [size=512K] Memory at f0000000 (32-bit, prefetchable) [size=128M] Expansion ROM at <unassigned> [disabled] [size=64K] Capabilities: [dc] Power Management version 1 Certainly confusing to users to have messages about disabled links on the console... Created attachment 2677 [details]
debug patch against 2.6.5
please apply this debug patch to 2.6.5 and attach the
resulting dmesg and /proc/interrupts.
I've now found in practice all 4 combinations of enabled vs. functional _CRS,
so it is clear that we can't rely on the _SRS enabled bit to mean anything.
Len, This bug seems to fix most of the problems I reported wrt #2090, I tried it with 2.6.6-rc2-bk3. The one remaining issue I see is that it still doesn't use the correct irq for via-rhine. In WinXP the irq is 23, but with the acpi patch the irq gets set to 11, which doesn't work. I seem to recall a bug report about via-rhine and irq before but I couldn't locate it in bugzilla. Chris Created attachment 2690 [details]
eMachines M6807 - 2.6.6-rc2-bk3 + patch - dmesg output
Created attachment 2691 [details]
eMachines M6807 - 2.6.6-rc2-bk3 + patch - lspci output
Created attachment 2692 [details]
eMachines M6807 - 2.6.6-rc2-bk3 + patch - /proc/interrupts output
Oh yea, I also noticed that the via ide pci device got assigned the irq 23. I'm not sure if that was supposed to actually happen or not since normally it would be 14/15 right? Re: eMachines M6807 VIA Rhine eth0 dead on IRQ11 instead of IRQ23 lspci: 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74) Interrupt: pin A routed to IRQ 11 DSDT: Name (APIC, Package (0x15) { ... Package (0x04) { 0x0012FFFF, 0x00, \_SB.PCI0.PIB.ALKB, 0x00 }, Device (ALKB) {... Method (_SRS, 1, NotSerialized) { } No editing error there, the Set Resource Setting method for ALKB used by eth0 is a NOP. dmesg: ACPI: PCI Interrupt Link [ALKB] (IRQs 23) <*11>, disabled. BIOS bug #1 is that it didn't mark ALKB as enabled in its _STA method. BIOS bug #2 is that it returned _CRS (current setting) 11, while at the same time it returned _PRS (possible settings) 23. One of 'em must be incorrect, which one? Linux uses the _CRS value (11), we check that it worked and _CRS still returns 11, so we believe that we succeeded: We got burnt by BIOS bug #2 and the result is a dead ethernet. I believe that ignoring that _CRS is not in _PRS is a workaround for another broken system. Can't have it both ways, but it seems that VIA has a history of an unreliable _CRS... pci_link-0423 [31] acpi_pci_link_set : Set IRQ 11 ACPI: PCI Interrupt Link [ALKB] enabled at IRQ 11 IOAPIC[0]: Set PCI routing entry (1-11 -> 0x81 -> IRQ 11 Mode:1 Active:1) 00:00:11[B] -> 1-11 -> IRQ 11 Re: IDE 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235 Interrupt: pin A routed to IRQ 23 Package (0x04) { 0x0011FFFF, 0x00, \_SB.PCI0.PIB.ALKA, 0x00 }, ACPI: PCI Interrupt Link [ALKA] (IRQs 16 17 18 19 20 21 22 23) <*9>, disabled. LENB: extended resource for 23 LENB: setting disabled link pci_link-0416 [29] acpi_pci_link_set : Attempt to enable at IRQ 23 resulted in IRQ 9, using 23 pci_link-0423 [29] acpi_pci_link_set : Set IRQ 23 ACPI: PCI Interrupt Link [ALKA] enabled at IRQ 23 IOAPIC[0]: Set PCI routing entry (1-23 -> 0xb1 -> IRQ 23 Mode:1 Active:1) 00:00:11[A] -> 1-23 -> IRQ 23 BIOS bug #3, IDE uses ALKA, which is not only disabled and returns _CRS=9 that is not in _PRS, but doesn't even support IRQ14 where IDE really lives. IDE is saved by a combination of bugs: ALKA _SRS is a big NO-OP, so it doesn't matter we tried to set it to 23. IDE driver is hard-coded to IRQ14 and ignores what we tell it. So what becomes of the devices actually on ALKA?: LENB: extended resource for 23 LENB: setting disabled link pci_link-0416 [29] acpi_pci_link_set : Attempt to enable at IRQ 23 resulted in IRQ 9, using 23pci_link-0423 [29] acpi_pci_link_set : Set IRQ 23 ACPI: PCI Interrupt Link [ALKA] enabled at IRQ 23 IOAPIC[0]: Set PCI routing entry (1-23 -> 0xb1 -> IRQ 23 Mode:1 Active:1) 00:00:11[A] -> 1-23 -> IRQ 23 We tell them that they're on 23 because we did a _SRS on 23 -- even though _CRS still returns 9. Which method tells the truth? Apparently we escape unscathed because IDE is the only device on device 11 pinA. Please verify that this system is running the latest BIOS, because this sure doesn't look production quality. Also, please verify that the ACPI interrupt is working by seeing if it responds to your power button. (disable acpid first to avoid a system shutdown). Created attachment 2707 [details]
updated 2.6.5 debug patch
please test this updated 2.6.5 debug patch.
When the IRQs are being re-programmed anyway (IOAPIC mode),
it verifies that _CRS is a member of _PRS before programming.
This should fix the Rhine on the M6807, hopefully it doesn't break anybody
else.
Created attachment 2710 [details]
updated 2.6.5 debug patch
Improved debug patch. This version validates _CRS against _PRS
for both PIC and IOAPIC mode.
Created attachment 2767 [details]
proposed 2.6.5 patch
checked this version acpi-test tree
The proposed patch works for me on the eMachines M6807 with the via-rhine as well. Thanks! :) Created attachment 2790 [details]
m6805 hangs with this patch
Still some problems on my m6805 laptop. Sorry I could not try this patch
earlier.
System hangs when loading processor.ko if via-rhine is loaded. System also
hangs during boot if ACPI processor module is compiled in.
ACPI button seems to work now though, at least the machine does not power off
immediately after pressing it.
Poweroff command does not work, just says acpi_power_off.
Tony, Did you remember to apply the latest patch from bug #2090 if you are running in x86-64 mode? It is still needed afaik. I was running in x86 mode when doing my testing. Chris Thanks for the tip, but the patch from bug #2090 does not help either. System still hangs when loading processor module. Yes, I'm running in x86_64 mode. Tony I tried this patch on kernel 2.6.6-rc3 with and without additionally using the patch supplied for bug 2090, but when I boot with both ioapic and acpi enabled the CD-ROM drive bundled with the eMachines M6805 is not correctly configured. The laptop hangs when the ide-cd module is loaded, telling that interrupts are being lost. Previously, when the initial information regarding ide interfaces is displayed, I see it can detect the harddisk and the cd-rom properly, but even then I get lost interrupts for hdc and hdd (hdc seems to be the interface where the CD unit is plugged). The error message reads: "hdc: IRQ probe failed (0xbafa)" (idem with hdd). Passing the "pci=noacpi noapic" parameters leaves me with a so far working system, but I don't seem to need either of the aformentioned patches in that case. I'm not using the processor module (I removed the file, just in case), and I'm running a 32-bit kernel. Also, I don't know if it's related to the ACPI bug, but when I close the notebook lid it produces a constant flow of the following message and becomes unusable (the CPU usage scales to 100%): evgpe-0403: *** Error: acpi_ev_gpe_dispatch: Unable to queue handler for GPE[ B], event is disabled I'm still using "pci=noacpi", and the noapic parameter doesn't seem to have an effect on the outcome. Sorry, comment #21 is invalid. I thought I had IO-APIC enabled on the kernel, but it turns out that I didn't. Now ACPI/APIC work. Comment #22 is still valid, with and without an acpid daemon. From http://www.muru.com/linux/amd64/: "Pavel Machek's patch for fix PCMCIA with ACPI here (http://www.muru.com/linux/amd64/patches/patch-m680x-pcmcia-acpi-fix). This fixes the problem where plugging/unplugging the power cord with yenta_socket hangs the machine." This additionally fixed the lid issue I wrote about on comment #22. For me yenta socket locks up even when power state isn't changed. The issue is that the ACPI registers are located in the 0x4000-0x407F range which Linux tries to use for cardbus. The pcmcia patch is useful as a temporary solution but I have filed a bug #2641 that is trying to get the i/o port 0x4000 issue resolved properly. It appears that WinXP somehow knows which ports to reserve for ACPI where under Linux it gets trampled on. My current thought is that WinXP uses some combination of the FADT and DSDT tables to reserve the ports. closing -- the disabled link patch shipped in 2.6.6, and is on top of 2.4.27-pre2. if these boxes still have other problems, open other bugs if you haven't already. thanks, -Len |