Distribution: Fedora Core Hardware Environment: Dell Optiplex (P3 - 600) Problem Description: Starting from 2.6.10-rc2 I am no longer able to use my 3c59x NIC in one of the PCI slots on my machine. It seems to be a IRQ routing problem since I can use the workaround suggested in that kernel (pci=routeirq). It also works if I use pci=noacpi. Tested setups: * Just the NIC in the bad slot (breaks). * NIC in good slot (works). * NICs in both slots (bad slot breaks, good slot works). * NIC in bad slot, sound card in good slot (NIC breaks). Kernels up to 2.6.13-rc5-mm have been tested.
Created attachment 5592 [details] dmesg 2.6.10-rc1 dmesg on a working kernel
Created attachment 5593 [details] dmesg 2.6.10-rc2 dmesg on a broken kernel
Created attachment 5594 [details] lspci -vv System with one NIC and one sound card in the slots. The extension slots are 0e.0 and 11.0. Broken slot is 11.0
Sorry, mixed up the ports. 0d.0 and 0e.0 are the expansion ports and 0d.0 is the one causing problems. (11.0 is onboard as seen by subsystem id).
1c1 < Linux version 2.6.10-rc1 (root@natasha.craffe.se) (gcc version 3.3.3 20040412 (Red Hat Linux 3.3.3-7)) #3 Sun Jul 17 03:03:28 CEST 2005 --- > Linux version 2.6.10-rc2 (root@natasha.craffe.se) (gcc version 3.3.3 20040412 (Red Hat Linux 3.3.3-7)) #8 Wed Jul 20 02:57:15 CEST 2005 56c56 < ACPI: Subsystem revision 20040816 --- > ACPI: Subsystem revision 20041105 66a67,68 > pnp: PnP ACPI init > pnp: PnP ACPI: found 12 devices 70,78c72,79 < ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 < ACPI: PCI interrupt 0000:00:07.2[D] -> GSI 11 (level, low) -> IRQ 11 < ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 < ACPI: PCI interrupt 0000:00:0d.0[A] -> GSI 10 (level, low) -> IRQ 10 < ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 < ACPI: PCI interrupt 0000:00:0e.0[A] -> GSI 11 (level, low) -> IRQ 11 < ACPI: PCI interrupt 0000:00:11.0[A] -> GSI 11 (level, low) -> IRQ 11 < ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 < ACPI: PCI interrupt 0000:01:00.0[A] -> GSI 10 (level, low) -> IRQ 10 --- > ** PCI interrupts are no longer routed automatically. If this > ** causes a device to stop working, it is probably because the > ** driver failed to call pci_enable_device(). As a temporary > ** workaround, the "pci=routeirq" argument restores the old > ** behavior. If this argument makes the device work again, > ** please email the output of "lspci" to bjorn.helgaas@hp.com > ** so I can fix the driver. > pnp: 00:0b: ioport range 0x800-0x85f could not be reserved 162a163 > ibm_acpi: ec object not found 163a165 > ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 181a184 > ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 194c197 --- > NETDEV WATCHDOG: eth0: transmit timed out Looks like we no longer set up LNKC and LNKA (unless pci=routeirq). Please attach the output of acpidump so we can verify which devices use those links. acpidump is available in the latest pmtools: http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/ Also, please boot with pnpacpi=off -- just so we have an apples/apples comparison. And please include the /proc/interrupts for failure and success cases. BTW. does this machine have an IOAPIC? If it does then it appears to be excluded from the kernel config. (yes, we do need to fix PIC mode even if the machine has an IOAPIC that works).
Created attachment 5600 [details] dsdt
The machine does not have an APIC as far as I know. Kernel doesn't report anything and the BIOS is not configurable enough to tell. I'll get /proc/interrupts for you tonight. The machine I was using got allocated for other stuff so I'm setting up tests on a new one (identical though, minus the sound card).
Sorry for the delay, the last couple of days have been hectic. Len, are you sure I should do the test with pnpacpi=off? That seems to imply pci=routeirq since the warning disappears and the slot works fine. Do you still want /proc/interrupts from that run or should I get it without pnpacpi=off?
*poke*
As Len pointed out, slot 00:0d broke when we stopped enabling LNKA and LNKC. We recently debugged a problem where the BIOS supplied the wrong LNK device for one of the slot IRQ lines (http://bugzilla.kernel.org/show_bug.cgi?id=4773), and this could be a similar problem. So first, check to see if there are any BIOS updates from Dell. Your DSDT has four _PRTs (RISA, RISM, RIST, and RISL), and one is selected based on \CHAS. I have no idea what \CHAS is, but maybe it means something to you or Len. Your dmesg logs show that 00:0d[A] is connected to LNKB, so you must be using either RISA or RIST. iasl was unable to compile your DSDT due to some syntax errors. But I binary-patched your DSDT to connect 00:0d[A] to LNKA. Can you build a current kernel with CONFIG_ACPI_CUSTOM_DSDT=y and point CONFIG_ACPI_CUSTOM_DSDT_FILE at DSDT-LNKA.hex? Boot the resulting kernel without "pci=routeirq". If the NIC doesn't work with that one, try the DSDT-LNKC.hex one. In either case, please post the complete dmesg log.
Created attachment 5919 [details] DSDT patched to 00:0d[A] connects to LNKA
Created attachment 5920 [details] DSDT patched so 00:0d[A] connects to LNKC
Dell had an updated BIOS but it didn't fix bug I'm afraid. I've started a kernel compiliation now with the modified DSDT. Should have some results later tonight.
The first DSDT (LNKA) tested and solves the problem. So does this mean that I'm stuck using either a custom DSDT or pci=routeirq?
> The first DSDT (LNKA) tested and solves the problem. Great. Thanks for testing this! > So does this mean that I'm stuck using either a custom DSDT or pci=routeirq? At the moment, yes (and I'd prefer using "pci=routeirq", because it's possible that a BIOS switch or configuration change would cause a different _PRT to be used, and then the custom DSDT may break). The other possibilities are to bug Dell to fix the BIOS (probably not likely, since it looks like a fairly old box), or to work around this somehow in Linux. But there's no "quirks"-type mechanism for ACPI, as there is for PCI, so there's no simple way to work around it.
> Len, are you sure I should do the test with pnpacpi=off? > That seems to imply pci=routeirq since the warning > disappears and the slot works fine. Do you still > want /proc/interrupts from that run or should I get it without pnpacpi=off? Assuming the failure still exists with a recent kernel... Yes, please test with pnpacpi=off. If that fixes the problem please attach the dmesg and /proc/interrupts for both the pnpacpi=off and on cases. No, I don't expect pnpacpi=off to help, but I if it did I'd sure want to know about it. > Your DSDT has four _PRTs (RISA, RISM, RIST, and RISL), and one > is selected based on \CHAS RISA == RIST RISM changes 0:d[a] to LNKD from LNKB, and adds support for device 0:f. RISA and RISL differ in that they exchange LNKA and LNKB for devices 0:d[A] and 0:d[B] *** RISA 2007-08-18 23:59:06.000000000 -0400 --- RISL 2007-08-18 23:58:19.000000000 -0400 *************** *** 1,4 **** ! Name (RISA, Package (0x0E) { Package (0x04) { --- 1,4 ---- ! Name (RISL, Package (0x0E) { Package (0x04) { *************** *** 84,90 **** { 0x000DFFFF, 0x00, ! LNKB, 0x00 }, --- 84,90 ---- { 0x000DFFFF, 0x00, ! LNKA, 0x00 }, *************** *** 92,98 **** { 0x000DFFFF, 0x01, ! LNKA, 0x00 }, --- 92,98 ---- { 0x000DFFFF, 0x01, ! LNKB, 0x00 }, < ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 < ACPI: PCI interrupt 0000:00:0d.0[A] -> GSI 10 (level, low) -> IRQ 10 Based on the experiment above, yes, it looks like we are using RISA (or RIST) when perhaps we wish we were using RISL. Method (_PRT, 0, NotSerialized) { Store (\CHAS, Local1) If (LEqual (Local1, 0x00)) { Store (RISL, Local0) } If (LEqual (Local1, 0x01)) { Store (RIST, Local0) } If (LEqual (Local1, 0x02)) { Store (RISM, Local0) } If (LEqual (Local1, 0x03)) { Store (RISA, Local0) } Return (Local0) } So we want CHAS to be 0, but it is 1 or 3. OperationRegion (PXY0, SystemIO, 0x0831, 0x01) Field (PXY0, ByteAcc, NoLock, Preserve) { , 6, CHAS, 2 } So at IO(0x831) there is a 1 byte register, and the top two bits are either 1 or 3, but we want 0. The next operation region is used by SST to blink an LED: OperationRegion (GPOZ, SystemIO, 0x0834, 0x02) Field (GPOZ, ByteAcc, NoLock, Preserve) { BLNK, 1, GPO, 15 } So I'm going to go out on a limb here and guess that CHAS reflects a jumper or switch on the motherboard. Only Dell knows. Perhaps you have a manual for the motherboard? It might be fun from the cmdline to use inb(1) and out(1) to see if you can scribble on address 0x831 and have it stick. Note that AML never writes here, but does specify that if it did, that it would preserve the bottom 6 bits.
It's quite clear this is a BIOS bug. Ping bug reporter for test result of pnpacpi=off, or we can close this bug...
I agree this is a BIOS bug. But I don't think we should just close this bug report. This IRQ used to work under Linux, and it's fair to assume that it works under Windows. We just haven't been smart enough to figure out a nice way to do a Linux workaround, and I think we should regard that as a Linux defect. If we ever do figure out a mechanism for this, it might be enabled by a dmi_check_system() table, so I think it would be useful if Pierre could attach the output of dmidecode.
Sure, let's wait for Pierre's response.
Sorry I've been a bit unresponsive. The machines in question are in production right now, but I'm working on getting one replaced so that I can do the testing. Bare with me one more week and I should have some answers.
hi, Pierre, would you be able to give us some dmidecode output?
I'm very sorry for being so unresponsive. Time is an all too scarce resource. :/ I've done some tests now on 2.6.22, and pnpacpi=off no longer works. Only pci=routeirq gets the machine fully functional.
Created attachment 13312 [details] dmesg with pnpacpi=on
Created attachment 13313 [details] /proc/interrupts with pnpacpi=on
Created attachment 13314 [details] dmesg with pnpacpi=off
Created attachment 13315 [details] /proc/interrupts with pnpacpi=off
Created attachment 13316 [details] dmesg with pci=routeirq
Created attachment 13317 [details] /proc/interrupts with pci=routeirq
This is a duplicate of bug4773. It seems that a dmi check is the only way to fix the bug for your laptop. Please attach the dmidecode or else we'll close this bug.
Pierre, any chance you still have this machine and could collect the dmidecode output? I'm collecting _PRT problems and cooking up a patch to work around them.
Created attachment 14094 [details] dmidecode Output from dmidecode. Sorry about the delay. As always, a severe lack of time...
bjorn, we count on you for the patch then. thanks! :)
Created attachment 14271 [details] quirk to move 00:0d[A] from LNKB to LNKA Pierre, please try this patch and post the results and dmesg log. Thanks!
Just so you know, I have noticed your patch. I just haven't had time to compile my own kernel and test it yet.
I've now managed to test the patch and it works perfectly. Nice work. :)
Patch in comment #33 fixes the problem. Close this bug and mark it as PATCH_ALREADY_AVAILABLE.
391df5dce30a5aab477b9e55ea65a3e83bae96b1 ACPI: add _PRT quirks to work around broken firmware shipped in linux-2.6.25-rc6 closed.