Bug 5044

Summary: PCI card in slot 0:d[A]: no interrupt unless pci=routeirq - 440BX/ZX/DX: Dell Optiplex
Product: ACPI Reporter: Pierre Ossman (pierre-bugzilla)
Component: BIOSAssignee: Zhang Rui (rui.zhang)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: acpi-bugzilla, bjorn.helgaas, bunk, Matt_Domsch, rui.zhang
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.10-rc2 and up Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg 2.6.10-rc1
dmesg 2.6.10-rc2
lspci -vv
dsdt
DSDT patched to 00:0d[A] connects to LNKA
DSDT patched so 00:0d[A] connects to LNKC
dmesg with pnpacpi=on
/proc/interrupts with pnpacpi=on
dmesg with pnpacpi=off
/proc/interrupts with pnpacpi=off
dmesg with pci=routeirq
/proc/interrupts with pci=routeirq
dmidecode
quirk to move 00:0d[A] from LNKB to LNKA

Description Pierre Ossman 2005-08-10 13:21:41 UTC
Distribution: Fedora Core
Hardware Environment: Dell Optiplex (P3 - 600)
Problem Description:

Starting from 2.6.10-rc2 I am no longer able to use my 3c59x NIC in one of the
PCI slots on my machine. It seems to be a IRQ routing problem since I can use
the workaround suggested in that kernel (pci=routeirq). It also works if I use
pci=noacpi.

Tested setups:
 * Just the NIC in the bad slot (breaks).
 * NIC in good slot (works).
 * NICs in both slots (bad slot breaks, good slot works).
 * NIC in bad slot, sound card in good slot (NIC breaks).

Kernels up to 2.6.13-rc5-mm have been tested.
Comment 1 Pierre Ossman 2005-08-10 13:22:53 UTC
Created attachment 5592 [details]
dmesg 2.6.10-rc1

dmesg on a working kernel
Comment 2 Pierre Ossman 2005-08-10 13:23:27 UTC
Created attachment 5593 [details]
dmesg 2.6.10-rc2

dmesg on a broken kernel
Comment 3 Pierre Ossman 2005-08-10 13:25:02 UTC
Created attachment 5594 [details]
lspci -vv

System with one NIC and one sound card in the slots. The extension slots are
0e.0 and 11.0. Broken slot is 11.0
Comment 4 Pierre Ossman 2005-08-10 13:31:42 UTC
Sorry, mixed up the ports. 0d.0 and 0e.0 are the expansion ports and 0d.0 is the
one causing problems. (11.0 is onboard as seen by subsystem id).
Comment 5 Len Brown 2005-08-10 23:02:01 UTC
1c1
< Linux version 2.6.10-rc1 (root@natasha.craffe.se) (gcc version 3.3.3 20040412
(Red Hat Linux 3.3.3-7)) #3 Sun Jul 17 03:03:28 CEST 2005
---
> Linux version 2.6.10-rc2 (root@natasha.craffe.se) (gcc version 3.3.3 20040412
(Red Hat Linux 3.3.3-7)) #8 Wed Jul 20 02:57:15 CEST 2005
56c56
< ACPI: Subsystem revision 20040816
---
> ACPI: Subsystem revision 20041105
66a67,68
> pnp: PnP ACPI init
> pnp: PnP ACPI: found 12 devices
70,78c72,79
< ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
< ACPI: PCI interrupt 0000:00:07.2[D] -> GSI 11 (level, low) -> IRQ 11
< ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
< ACPI: PCI interrupt 0000:00:0d.0[A] -> GSI 10 (level, low) -> IRQ 10
< ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
< ACPI: PCI interrupt 0000:00:0e.0[A] -> GSI 11 (level, low) -> IRQ 11
< ACPI: PCI interrupt 0000:00:11.0[A] -> GSI 11 (level, low) -> IRQ 11
< ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
< ACPI: PCI interrupt 0000:01:00.0[A] -> GSI 10 (level, low) -> IRQ 10
---
> ** PCI interrupts are no longer routed automatically.  If this
> ** causes a device to stop working, it is probably because the
> ** driver failed to call pci_enable_device().  As a temporary
> ** workaround, the "pci=routeirq" argument restores the old
> ** behavior.  If this argument makes the device work again,
> ** please email the output of "lspci" to bjorn.helgaas@hp.com
> ** so I can fix the driver.
> pnp: 00:0b: ioport range 0x800-0x85f could not be reserved
162a163
> ibm_acpi: ec object not found
163a165
> ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
181a184
> ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
194c197
---
> NETDEV WATCHDOG: eth0: transmit timed out

Looks like we no longer set up LNKC and LNKA
(unless pci=routeirq).  Please attach the output
of acpidump so we can verify which devices use
those links.  acpidump is available in the latest pmtools:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/

Also, please boot with pnpacpi=off -- just so we have
an apples/apples comparison.

And please include the /proc/interrupts for failure
and success cases.

BTW. does this machine have an IOAPIC?
If it does then it appears to be excluded from
the kernel config.  (yes, we do need to fix PIC mode
even if the machine has an IOAPIC that works).
Comment 6 Pierre Ossman 2005-08-11 04:22:54 UTC
Created attachment 5600 [details]
dsdt
Comment 7 Pierre Ossman 2005-08-11 04:30:35 UTC
The machine does not have an APIC as far as I know. Kernel doesn't report
anything and the BIOS is not configurable enough to tell.

I'll get /proc/interrupts for you tonight. The machine I was using got allocated
for other stuff so I'm setting up tests on a new one (identical though, minus
the sound card).
Comment 8 Pierre Ossman 2005-08-13 07:19:59 UTC
Sorry for the delay, the last couple of days have been hectic.

Len, are you sure I should do the test with pnpacpi=off? That seems to imply
pci=routeirq since the warning disappears and the slot works fine. Do you still
want /proc/interrupts from that run or should I get it without pnpacpi=off?
Comment 9 Pierre Ossman 2005-09-06 09:01:24 UTC
*poke*
Comment 10 Bjorn Helgaas 2005-09-06 13:38:28 UTC
As Len pointed out, slot 00:0d broke when we stopped enabling   
LNKA and LNKC.  We recently debugged a problem where the BIOS   
supplied the wrong LNK device for one of the slot IRQ lines   
(http://bugzilla.kernel.org/show_bug.cgi?id=4773), and this   
could be a similar problem.   
   
So first, check to see if there are any BIOS updates from Dell.   
   
Your DSDT has four _PRTs (RISA, RISM, RIST, and RISL), and one   
is selected based on \CHAS.  I have no idea what \CHAS is, but   
maybe it means something to you or Len.  Your dmesg logs show  
that 00:0d[A] is connected to LNKB, so you must be using either   
RISA or RIST.  
  
iasl was unable to compile your DSDT due to some syntax errors. 
 
But I binary-patched your DSDT to connect 00:0d[A] to LNKA.  Can 
you build a current kernel with CONFIG_ACPI_CUSTOM_DSDT=y and point 
CONFIG_ACPI_CUSTOM_DSDT_FILE at DSDT-LNKA.hex?  Boot the resulting 
kernel without "pci=routeirq". 
 
If the NIC doesn't work with that one, try the DSDT-LNKC.hex one. 
 
In either case, please post the complete dmesg log. 
Comment 11 Bjorn Helgaas 2005-09-06 13:40:13 UTC
Created attachment 5919 [details]
DSDT patched to 00:0d[A] connects to LNKA
Comment 12 Bjorn Helgaas 2005-09-06 13:41:09 UTC
Created attachment 5920 [details]
DSDT patched so 00:0d[A] connects to LNKC
Comment 13 Pierre Ossman 2005-09-11 09:06:23 UTC
Dell had an updated BIOS but it didn't fix bug I'm afraid.

I've started a kernel compiliation now with the modified DSDT. Should have some
results later tonight.
Comment 14 Pierre Ossman 2005-09-11 11:45:50 UTC
The first DSDT (LNKA) tested and solves the problem.

So does this mean that I'm stuck using either a custom DSDT or pci=routeirq?
Comment 15 Bjorn Helgaas 2005-09-12 08:01:48 UTC
> The first DSDT (LNKA) tested and solves the problem.    
    
Great.  Thanks for testing this!    
    
> So does this mean that I'm stuck using either a custom DSDT or pci=routeirq?    
    
At the moment, yes (and I'd prefer using "pci=routeirq", because it's   
possible that a BIOS switch or configuration change would cause a different   
_PRT to be used, and then the custom DSDT may break).  
  
The other possibilities are to bug Dell to fix the BIOS (probably not  
likely, since it looks like a fairly old box), or to work around this 
somehow in Linux.  But there's no "quirks"-type mechanism for ACPI, as 
there is for PCI, so there's no simple way to work around it. 
Comment 16 Len Brown 2007-08-18 22:05:29 UTC
> Len, are you sure I should do the test with pnpacpi=off?
> That seems to imply pci=routeirq since the warning
> disappears and the slot works fine. Do you still
> want /proc/interrupts from that run or should I get it without pnpacpi=off?

Assuming the failure still exists with a recent kernel...

Yes, please test with pnpacpi=off.
If that fixes the problem please attach the dmesg and
/proc/interrupts for both the pnpacpi=off and on cases.
No, I don't expect pnpacpi=off to help, but I if it did
I'd sure want to know about it.

> Your DSDT has four _PRTs (RISA, RISM, RIST, and RISL), and one   
> is selected based on \CHAS

RISA == RIST

RISM changes 0:d[a] to LNKD from LNKB, and adds support
for device 0:f.

RISA and RISL differ in that they exchange LNKA and LNKB for
devices 0:d[A] and 0:d[B]
*** RISA        2007-08-18 23:59:06.000000000 -0400
--- RISL        2007-08-18 23:58:19.000000000 -0400
***************
*** 1,4 ****
!             Name (RISA, Package (0x0E)
              {
                  Package (0x04)
                  {
--- 1,4 ----
!             Name (RISL, Package (0x0E)
              {
                  Package (0x04)
                  {
***************
*** 84,90 ****
                  {
                      0x000DFFFF,
                      0x00,
!                     LNKB,
                      0x00
                  },

--- 84,90 ----
                  {
                      0x000DFFFF,
                      0x00,
!                     LNKA,
                      0x00
                  },

***************
*** 92,98 ****
                  {
                      0x000DFFFF,
                      0x01,
!                     LNKA,
                      0x00
                  },

--- 92,98 ----
                  {
                      0x000DFFFF,
                      0x01,
!                     LNKB,
                      0x00
                  },
< ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
< ACPI: PCI interrupt 0000:00:0d.0[A] -> GSI 10 (level, low) -> IRQ 10

Based on the experiment above, yes, it looks like we are using
RISA (or RIST) when perhaps we wish we were using RISL.

            Method (_PRT, 0, NotSerialized)
            {
                Store (\CHAS, Local1)
                If (LEqual (Local1, 0x00))
                {
                    Store (RISL, Local0)
                }

                If (LEqual (Local1, 0x01))
                {
                    Store (RIST, Local0)
                }

                If (LEqual (Local1, 0x02))
                {
                    Store (RISM, Local0)
                }

                If (LEqual (Local1, 0x03))
                {
                    Store (RISA, Local0)
                }

                Return (Local0)
            }

So we want CHAS to be 0, but it is 1 or 3.

    OperationRegion (PXY0, SystemIO, 0x0831, 0x01)
    Field (PXY0, ByteAcc, NoLock, Preserve)
    {
            ,   6,
        CHAS,   2
    }

So at IO(0x831) there is a 1 byte register,
and the top two bits are either 1 or 3, but we want 0.

The next operation region is used by SST to blink an LED:
    OperationRegion (GPOZ, SystemIO, 0x0834, 0x02)
    Field (GPOZ, ByteAcc, NoLock, Preserve)
    {
        BLNK,   1,
        GPO,    15
    }

So I'm going to go out on a limb here and guess that
CHAS reflects a jumper or switch on the motherboard.  Only Dell knows.
Perhaps you have a manual for the motherboard?

It might be fun from the cmdline to use inb(1) and out(1)
to see if you can scribble on address 0x831 and have it stick.
Note that AML never writes here, but does specify that if it did,
that it would preserve the bottom 6 bits.
Comment 17 Fu Michael 2007-09-26 01:02:28 UTC
It's quite clear this is a BIOS bug. Ping bug reporter for test result of pnpacpi=off, or we can close this bug...
Comment 18 Bjorn Helgaas 2007-09-26 14:46:12 UTC
I agree this is a BIOS bug.  But I don't think we should just close this bug report.  This IRQ used to work under Linux, and it's fair to assume that it works under Windows.  We just haven't been smart enough to figure out a nice way to do a Linux workaround, and I think we should regard that as a Linux defect.

If we ever do figure out a mechanism for this, it might be enabled by a dmi_check_system() table, so I think it would be useful if Pierre could attach the output of dmidecode.
Comment 19 Fu Michael 2007-09-26 20:16:02 UTC
Sure, let's wait for Pierre's response.
Comment 20 Pierre Ossman 2007-09-27 03:04:20 UTC
Sorry I've been a bit unresponsive. The machines in question are in production right now, but I'm working on getting one replaced so that I can do the testing. Bare with me one more week and I should have some answers.
Comment 21 Fu Michael 2007-10-22 17:31:09 UTC
hi, Pierre, would  you be able to give us some dmidecode output?
Comment 22 Pierre Ossman 2007-10-29 11:06:42 UTC
I'm very sorry for being so unresponsive. Time is an all too scarce resource. :/

I've done some tests now on 2.6.22, and pnpacpi=off no longer works. Only pci=routeirq gets the machine fully functional.
Comment 23 Pierre Ossman 2007-10-29 11:08:08 UTC
Created attachment 13312 [details]
dmesg with pnpacpi=on
Comment 24 Pierre Ossman 2007-10-29 11:08:27 UTC
Created attachment 13313 [details]
/proc/interrupts with pnpacpi=on
Comment 25 Pierre Ossman 2007-10-29 11:08:45 UTC
Created attachment 13314 [details]
dmesg with pnpacpi=off
Comment 26 Pierre Ossman 2007-10-29 11:09:06 UTC
Created attachment 13315 [details]
/proc/interrupts with pnpacpi=off
Comment 27 Pierre Ossman 2007-10-29 11:09:22 UTC
Created attachment 13316 [details]
dmesg with pci=routeirq
Comment 28 Pierre Ossman 2007-10-29 11:09:43 UTC
Created attachment 13317 [details]
/proc/interrupts with pci=routeirq
Comment 29 ykzhao 2007-12-13 22:30:25 UTC
This is a duplicate of bug4773.
It seems that a dmi check is the only way to fix the bug for your laptop.
Please attach the dmidecode or else we'll close this bug.
Comment 30 Bjorn Helgaas 2007-12-17 13:23:09 UTC
Pierre, any chance you still have this machine and could collect the dmidecode output?  I'm collecting _PRT problems and cooking up a patch to work around them.
Comment 31 Pierre Ossman 2007-12-17 13:37:31 UTC
Created attachment 14094 [details]
dmidecode

Output from dmidecode.

Sorry about the delay. As always, a severe lack of time...
Comment 32 Fu Michael 2007-12-23 22:31:15 UTC
bjorn, we count on you for the patch then.  thanks! :)
Comment 33 Bjorn Helgaas 2008-01-03 11:02:42 UTC
Created attachment 14271 [details]
quirk to move 00:0d[A] from LNKB to LNKA

Pierre, please try this patch and post the results and dmesg log.  Thanks!
Comment 34 Pierre Ossman 2008-01-11 00:20:22 UTC
Just so you know, I have noticed your patch. I just haven't had time to compile my own kernel and test it yet.
Comment 35 Pierre Ossman 2008-02-10 11:09:54 UTC
I've now managed to test the patch and it works perfectly. Nice work. :)
Comment 36 Zhang Rui 2008-03-17 01:50:19 UTC
Patch in comment #33 fixes the problem.
Close this bug and mark it as PATCH_ALREADY_AVAILABLE.
Comment 37 Len Brown 2008-03-25 17:25:23 UTC
391df5dce30a5aab477b9e55ea65a3e83bae96b1
ACPI: add _PRT quirks to work around broken firmware
shipped in linux-2.6.25-rc6

closed.