Bug 39122

Summary: IRQ issues with ASUS E35M1-M
Product: Platform Specific/Hardware Reporter: bjorn.ottervik
Component: x86-64Assignee: other_other
Status: RESOLVED DUPLICATE    
Severity: high CC: aklhfex, alan, bjorn.ottervik, edward.donovan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0-rc6 Subsystem:
Regression: No Bisected commit-id:
Attachments: 3.0-rc6 .config

Description bjorn.ottervik 2011-07-10 18:38:24 UTC
After a period of time this or very similar errors appears and both networking and disk I/O on the system gets painfully slow. Seems to happen more frequently if both network and disks are under heavier load.
BIOS version 1002.
Also happens with 2.6.39 and 2.6.38. Booting with irqpoll doesnt help.

3.0-rc6 example:
[  156.926259] ------------[ cut here ]------------
[  156.926286] WARNING: at drivers/tty/tty_ldisc.c:766 tty_ldisc_reinit+0x8c/0xa0()
[  156.926294] Hardware name: System Product Name
[  156.926299] Modules linked in: xt_mark
[  156.926312] Pid: 2619, comm: login Not tainted 3.0.0-rc6 #1
[  156.926318] Call Trace:
[  156.926334]  [<ffffffff8104850a>] warn_slowpath_common+0x7a/0xb0
[  156.926345]  [<ffffffff81069190>] ? wake_up_bit+0x40/0x40
[  156.926355]  [<ffffffff81048555>] warn_slowpath_null+0x15/0x20
[  156.926366]  [<ffffffff813d882c>] tty_ldisc_reinit+0x8c/0xa0
[  156.926377]  [<ffffffff813d8d28>] tty_ldisc_hangup+0xf8/0x220
[  156.926387]  [<ffffffff813d0b04>] __tty_hangup+0xf4/0x3a0
[  156.926396]  [<ffffffff81044f1e>] ? sched_move_task+0x8e/0x190
[  156.926406]  [<ffffffff813d26c4>] disassociate_ctty+0x84/0x240
[  156.926416]  [<ffffffff8104c50c>] do_exit+0x81c/0x910
[  156.926427]  [<ffffffff81114713>] ? vfs_write+0x123/0x180
[  156.926437]  [<ffffffff8104c88f>] do_group_exit+0x4f/0xc0
[  156.926447]  [<ffffffff8104c912>] sys_exit_group+0x12/0x20
[  156.926458]  [<ffffffff817e14fb>] system_call_fastpath+0x16/0x1b
[  156.926466] ---[ end trace e2dade1c1439d6bc ]---
[  919.616370] flush-9:127 used greatest stack depth: 2816 bytes left
[ 3917.088830] irq 19: nobody cared (try booting with the "irqpoll" option)
[ 3917.088847] Pid: 0, comm: kworker/0:0 Tainted: G        W   3.0.0-rc6 #1
[ 3917.088853] Call Trace:
[ 3917.088859]  <IRQ>  [<ffffffff810a1ff5>] __report_bad_irq+0x35/0xd0
[ 3917.088887]  [<ffffffff810a23c6>] note_interrupt+0x126/0x1e0
[ 3917.088898]  [<ffffffff810a05d0>] handle_irq_event_percpu+0xb0/0x200
[ 3917.088909]  [<ffffffff810a0755>] handle_irq_event+0x35/0x60
[ 3917.088919]  [<ffffffff810a2c91>] handle_fasteoi_irq+0x51/0xd0
[ 3917.088929]  [<ffffffff810040d4>] handle_irq+0x44/0xa0
[ 3917.088938]  [<ffffffff81003d58>] do_IRQ+0x58/0xe0
[ 3917.088949]  [<ffffffff817da1d3>] common_interrupt+0x13/0x13
[ 3917.088955]  <EOI>  [<ffffffff8100105e>] ? __exit_idle+0xe/0x40
[ 3917.088970]  [<ffffffff81001225>] cpu_idle+0x55/0x90
[ 3917.088981]  [<ffffffff817d2cc4>] start_secondary+0x18b/0x190
[ 3917.088988] handlers:
[ 3917.088996] [<ffffffff815217a0>] ahci_interrupt
[ 3917.089006] [<ffffffff81529d50>] rtl8139_interrupt
[ 3917.089012] Disabling IRQ #19

2.6.39 example:
[ 8882.602995] irq 19: nobody cared (try booting with the "irqpoll" option)
[ 8882.603006] Pid: 2664, comm: xbmc.bin Not tainted 2.6.39.2 #4
[ 8882.603006] Call Trace:
[ 8882.603006]  <IRQ>  [<ffffffff810a9b35>] __report_bad_irq+0x35/0xc0
[ 8882.603006]  [<ffffffff810a9f64>] note_interrupt+0x194/0x1d0
[ 8882.603006]  [<ffffffff810a814c>] handle_irq_event_percpu+0xac/0x200
[ 8882.603006]  [<ffffffff81051d1e>] ? __do_softirq+0x10e/0x1f0
[ 8882.603006]  [<ffffffff810a82d5>] handle_irq_event+0x35/0x60
[ 8882.603006]  [<ffffffff810aa6e1>] handle_fasteoi_irq+0x51/0xd0
[ 8882.603006]  [<ffffffff810042e4>] handle_irq+0x44/0x90
[ 8882.603006]  [<ffffffff81003a48>] do_IRQ+0x58/0xe0
[ 8882.603006]  [<ffffffff817e8b13>] common_interrupt+0x13/0x13
[ 8882.603006]  <EOI>  [<ffffffff817efe7b>] ? system_call_fastpath+0x16/0x1b
[ 8882.603006] handlers:
[ 8882.603006] [<ffffffff81526b50>] (ahci_interrupt+0x0/0x690)
[ 8882.603006] [<ffffffff8152f7f0>] (rtl8139_interrupt+0x0/0x520)
[ 8882.603006] Disabling IRQ #19
[13674.336681] irq 18: nobody cared (try booting with the "irqpoll" option)
[13674.336692] Pid: 0, comm: kworker/0:0 Not tainted 2.6.39.2 #4
[13674.336695] Call Trace:
[13674.336699]  <IRQ>  [<ffffffff810a9b35>] __report_bad_irq+0x35/0xc0
[13674.336716]  [<ffffffff810a9f64>] note_interrupt+0x194/0x1d0
[13674.336722]  [<ffffffff810a814c>] handle_irq_event_percpu+0xac/0x200
[13674.336727]  [<ffffffff810a82d5>] handle_irq_event+0x35/0x60
[13674.336732]  [<ffffffff810aa6e1>] handle_fasteoi_irq+0x51/0xd0
[13674.336738]  [<ffffffff810042e4>] handle_irq+0x44/0x90
[13674.336742]  [<ffffffff81003a48>] do_IRQ+0x58/0xe0
[13674.336748]  [<ffffffff817e8b13>] common_interrupt+0x13/0x13
[13674.336751]  <EOI>  [<ffffffff813bfce3>] ? acpi_idle_enter_simple+0xbc/0xee
[13674.336762]  [<ffffffff813bfcde>] ? acpi_idle_enter_simple+0xb7/0xee
[13674.336769]  [<ffffffff815ddbcd>] cpuidle_idle_call+0xbd/0x210
[13674.336773]  [<ffffffff81001ff9>] cpu_idle+0x59/0xb0
[13674.336779]  [<ffffffff817e2119>] start_secondary+0x181/0x185
[13674.336782] handlers:
[13674.336785] [<ffffffff81550490>] (usb_hcd_irq+0x0/0x60)
[13674.336792] [<ffffffff81550490>] (usb_hcd_irq+0x0/0x60)
[13674.336797] [<ffffffff81550490>] (usb_hcd_irq+0x0/0x60)
[13674.336802] [<ffffffff81550490>] (usb_hcd_irq+0x0/0x60)
[13674.336807] [<ffffffff81536030>] (rtl8169_interrupt+0x0/0x3a0)
[13674.336814] Disabling IRQ #18

/etc/proc/interrupts (2.6.39; 3.0-rc6 is the same):
           CPU0       CPU1       
  0:        132         86   IO-APIC-edge      timer
  1:          0          2   IO-APIC-edge      i8042
  8:          1          0   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          0          4   IO-APIC-edge      i8042
 16:          5        660   IO-APIC-fasteoi   sata_sil24, hda_intel
 17:       7644     334717   IO-APIC-fasteoi   ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3
 18:      15864    3584182   IO-APIC-fasteoi   ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, eth1
 19:      21789     879852   IO-APIC-fasteoi   ahci, eth0
 40:      19651    1545084   PCI-MSI-edge      radeon
 41:          0         21   PCI-MSI-edge      hda_intel
NMI:          0          0   Non-maskable interrupts
LOC:    6409420    4278752   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RES:    1913584    1620188   Rescheduling interrupts
CAL:       3607       2570   Function call interrupts
TLB:       1279        897   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:         88         88   Machine check polls
ERR:          0
MIS:          0
Comment 1 bjorn.ottervik 2011-07-10 19:35:36 UTC
Created attachment 65182 [details]
3.0-rc6 .config
Comment 2 bjorn.ottervik 2011-07-12 21:15:17 UTC
Related with https://bugzilla.kernel.org/show_bug.cgi?id=32492 perhaps? I'm running mdadm RAID5. Still with 3.0-rc7: ↓
Giving up and getting another mainboard model. Will keep this one for testing in case anyone comes up with any ideas.

[25160.202294] irq 18: nobody cared (try booting with the "irqpoll" option)
[25160.202305] Pid: 0, comm: kworker/0:0 Not tainted 3.0.0-rc7 #1
[25160.202308] Call Trace:
[25160.202312]  <IRQ>  [<ffffffff810a10b5>] __report_bad_irq+0x35/0xd0
[25160.202328]  [<ffffffff810a1295>] note_interrupt+0x145/0x200
[25160.202336]  [<ffffffff8109f230>] handle_irq_event_percpu+0xb0/0x200
[25160.202342]  [<ffffffff8109f3b5>] handle_irq_event+0x35/0x60
[25160.202346]  [<ffffffff810a1a8d>] handle_fasteoi_irq+0x4d/0xc0
[25160.202352]  [<ffffffff810041d4>] handle_irq+0x44/0x90
[25160.202356]  [<ffffffff810039b8>] do_IRQ+0x58/0xe0
[25160.202362]  [<ffffffff817eabd3>] common_interrupt+0x13/0x13
[25160.202365]  <EOI>  [<ffffffff813b9daf>] ? acpi_idle_do_entry+0x36/0x57
[25160.202374]  [<ffffffff813b9e20>] acpi_idle_enter_c1+0x50/0x99
[25160.202381]  [<ffffffff8161f577>] cpuidle_idle_call+0xb7/0x240
[25160.202385]  [<ffffffff81001ff9>] cpu_idle+0x59/0xb0
[25160.202391]  [<ffffffff817e3256>] start_secondary+0x181/0x185
[25160.202394] handlers:
[25160.202398] [<ffffffff815475f0>] usb_hcd_irq
[25160.202402] [<ffffffff815475f0>] usb_hcd_irq
[25160.202406] [<ffffffff815475f0>] usb_hcd_irq
[25160.202409] [<ffffffff815475f0>] usb_hcd_irq
[25160.202416] [<ffffffff815284f0>] rtl8169_interrupt
[25160.202419] Disabling IRQ #18
Comment 3 bjorn.ottervik 2011-08-01 02:25:12 UTC
Moving to hardware specific and bumping to high because <10MB disk transfer speeds makes the system useless untill rebooted.

Found out that the two PCI slots on the system are not "native" slots, but controlled by an ASMedia ASM1083 [1] PCIe→PCI bridge chipset. Possible cause?

A thread on the ASUS forums suggested disabling the builtin NIC might help. [2] A Silicon Image SiI 3132 storage controller was still in the PCIe 1x slot, and a RTL-8169 and an added RTL-8139 Realtek NIC occupied the two PCI slots. Same problem.

[1] http://www.asmedia.com.tw/eng/e_show_products.php?item=114&cate_index=112
[2] http://vip.asus.com/forum/view.aspx?board_id=1&model=E35M1-M+PRO&id=20110507053520320&page=1&SLanguage=en-us
Comment 4 Edward Donovan 2012-02-14 04:21:55 UTC
This bug looks like the same problem as numbers 38632 and 42659.  

  https://bugzilla.kernel.org/show_bug.cgi?id=38632
  https://bugzilla.kernel.org/show_bug.cgi?id=42659

If bugzilla would let me, I'd mark the two later ones as dupes of the first.  Or do something to pull them together.

It looks like the chip is bad.  It's been discussed on LKML, as seen here:

  https://lkml.org/lkml/2012/2/2/370

where Linus and others say we may be able to do limited workarounds.  No code has come from that, yet.

I'm posting a version of this note on all three bugs.
Comment 5 Alan 2012-08-24 15:21:11 UTC

*** This bug has been marked as a duplicate of bug 38632 ***