Bug 42659

Summary: PCI extension cards not working (Disabling IRQ) with Asus Z68 board based on Sandy Bridge chipset and ASM1083 PCIe to PCI bridge
Product: Platform Specific/Hardware Reporter: kaneda
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: RESOLVED DUPLICATE    
Severity: high CC: aklhfex, alan, edward.donovan, jeroen.vandenkeybus
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32, 3.1.8, 3.2.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: /proc/interrupts after IRQ disabled
lspci_vvv after IRQ got disabled
lspci -nvv after IRQ got disabled
Stack trace when disabling IRQ.
dmesg of a fresh boot

Description kaneda 2012-01-26 09:09:56 UTC
Overview:
Multiple different types of PCI extension cards stop working or work with degraded speed on an Asus P8Z68-V/GEN3 motherboard after varying uptime ranging from 100seconds to several hours.

Steps to reproduce:
Plug a PCI extension card into one of the PCI extension slots. It will be detected and assigned IRQ# 18 or IRQ# 19. Soon after using the card, e.g. ping with network cards, the kernel will emit
irq 19: nobody cared (try booting with the "irqpoll" option)
... module specific stack trace ...
Disabling IRQ #19

Actual results:
After the IRQ is disabled, the extension card either does not work or works very slow. E.g. a 100MBit ethernet card has >100ms pings on next hop and the transfer speed drops to 100kbps.

Expected results:
The IRQ should not be disabled. The extension card should continue working.

Kernel environment:
Debian/Squeeze stock kernel: 2.6.32+29 
Squeeze backports 3.1 kernel: 3.1.8-2~bpo60+1
Squeeze backports 3.2 kernel: Linux version 3.2.0-0.bpo.1-amd64 (Debian 3.2.1-2~bpo60+1) (ben@decadent.org.uk) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed Jan 25 00:15:47 UTC 2012

Additional information:
* Investigation on the net shows that this is a wide-spread issue affecting anybody using PCI cards in recent Asus motherboards.
* Booting with PCI / ACPI command line options does not fix the problem, but can delay it.
* Many people running into this issue have an ASM1083 PCIe to PCI bridge (1b21:1080) http://www.asmedia.com.tw/eng/e_show_products.php?item=114&cate_index=112 controller between the Sandy Bridge chipset and the PCI cards.
* This is not a network card only issue, but it is easiest to show with PCI network cards.

Probably related bug tracks:
https://bugzilla.kernel.org/show_bug.cgi?id=38632
https://bugzilla.kernel.org/show_bug.cgi?id=41322
https://bugzilla.kernel.org/show_bug.cgi?id=35332

Related threads:
http://phoronix.com/forums/showthread.php?63509-Sandy-Bridge-PCI-Card-Drivers-Fail-with-quot-Disabling-IRQ-quot
Comment 1 kaneda 2012-01-26 09:12:03 UTC
Created attachment 72197 [details]
/proc/interrupts after IRQ disabled
Comment 2 kaneda 2012-01-26 09:15:30 UTC
Created attachment 72198 [details]
lspci_vvv after IRQ got disabled

PCI device is 08:01.0 Network controller: Techsan Electronics Co Ltd B2C2 FlexCopII DVB chip / Technisat SkyStar2 DVB card (rev 02)
PCI device is connected via
07:00.0 PCI bridge: Device 1b21:1080 (rev 01) (prog-if 01 [Subtractive decode])
Comment 3 kaneda 2012-01-26 09:21:07 UTC
Created attachment 72199 [details]
lspci -nvv after IRQ got disabled

PCI card is 08:01.0 0280: 13d0:2103 (rev 02)
PCI card is behing PCIe to PCI bridge 07:00.0 0604: 1b21:1080 (rev 01) (prog-if 01 [Subtractive decode])
Comment 4 kaneda 2012-01-26 09:32:16 UTC
Created attachment 72200 [details]
Stack trace when disabling IRQ.
Comment 5 kaneda 2012-01-26 09:44:09 UTC
Created attachment 72201 [details]
dmesg of a fresh boot
Comment 6 Jeroen Van den Keybus 2012-01-30 20:20:08 UTC
Issue also occurs on AMD Fusion platform, suggesting that Sandy Bridge is not involved in this problem.
Comment 7 Edward Donovan 2012-02-14 04:23:14 UTC
This bug looks like the same problem as numbers 38632 and 39122.  

  https://bugzilla.kernel.org/show_bug.cgi?id=38632
  https://bugzilla.kernel.org/show_bug.cgi?id=39122

If bugzilla would let me, I'd mark the two later ones as dupes of the first.  Or do something to pull them together.

It looks like the ASM1083 chip is bad.  It's been discussed on LKML, as seen here:

  https://lkml.org/lkml/2012/2/2/370

where Linus and others say we may be able to do limited workarounds.  No code has come from that, yet.

I'm posting a version of this note on all three bugs.
Comment 8 Alan 2012-08-30 14:08:31 UTC

*** This bug has been marked as a duplicate of bug 38632 ***