Bug 12356

Summary: [i915 drm] irq 16: nobody cared with 2.6.28 kernel
Product: Drivers Reporter: Peter Volkov (peter.volkov)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: gordon.jin, kernel-bugzilla
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28 Subsystem:
Regression: Yes Bisected commit-id:

Description Peter Volkov 2009-01-04 00:00:38 UTC
Latest working kernel version: unknown since this is new PC
Distribution: Gentoo
Hardware Environment: ThinkPad X61 Tablet 7762-CTO
Software Environment: Gentoo Linux
Problem Description: 

This problem already was reported in number of places:
https://bugs.freedesktop.org/show_bug.cgi?id=18609
http://bugzilla.kernel.org/show_bug.cgi?id=12161
http://lkml.org/lkml/2008/12/2/115
http://marc.info/?l=linux-kernel&m=122822444615724&w=4

Although first two links state that this bug is fixes, I've checked and commit that should report this problem is applied in 2.6.28 but bug is still there.

The problem is that sometimes (yea, probably high system loads increase probability) I see the following in the kernel:

Jan  3 18:45:56 tablet irq 16: nobody cared (try booting with the "irqpoll" option)
Jan  3 18:45:56 tablet Pid: 4587, comm: cc1plus Not tainted 2.6.28-gentoo-noswap #1
Jan  3 18:45:56 tablet Call Trace:
Jan  3 18:45:56 tablet <IRQ>  [<ffffffff802559b8>] __report_bad_irq+0x30/0x7d
Jan  3 18:45:56 tablet [<ffffffff80255b0a>] note_interrupt+0x105/0x16b
Jan  3 18:45:56 tablet [<ffffffff8025618e>] handle_fasteoi_irq+0xa6/0xcf
Jan  3 18:45:56 tablet [<ffffffff8020dcb0>] do_IRQ+0x75/0xe5
Jan  3 18:45:56 tablet [<ffffffff8020b866>] ret_from_intr+0x0/0xa
Jan  3 18:45:56 tablet <EOI> <3>handlers:
Jan  3 18:45:56 tablet [<ffffffff80440d68>] (ahci_interrupt+0x0/0x45c)
Jan  3 18:45:56 tablet [<ffffffff803e3672>] (i915_driver_irq_handler+0x0/0x1e2)
Jan  3 18:45:56 tablet Disabling IRQ #16

After that my system became very slow and unusable.
Comment 1 Peter Volkov 2009-01-05 02:55:16 UTC
Answering question Niku asked me:

> Could you try booting with pci=noacpi ?

I'm unable to boot this notebook PC with pci=noacpi as I get kernel panic:
VFS: Cannot open root device "sda2" or unknown-block(0,0)

During boot process I see the following:

===============================================================
ahci 0000:00:1f.2 AHCI 0001.0100 32 slots 3 ports 1.5 Gbps 0x1 impl SATA mode
ahci 0000:00:1f.2 flags: 64bit ncq sntf pm led clo pio slum part
scsi0: ahci
scsi1: ahci
scsi2: ahci
ata1: SATA max UDMA/133 abar m2048@0xfe226000 port 0xfe226100 irq10
ata2: DUMMY
ata3: DUMMY
ata1: SATA link up 1.5Gbps (SStatus 113 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O ERROR, err_mask=0x4)
ata1: SATA link up 1.5Gbps (SStatus 113 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O ERROR, err_mask=0x4)
===============================================================

Well, I thought this could be bios related so I've updated bios from 1.11 to 1.21 but I'm still unable to boot it with noacpi. But since this latest BIOS update (1.21) fixes something Video related I'll try to reproduce initial problem another time... I'll be back ASAP

BIOS ChangeLog from Lenovo site:

Version 7SET35WW (1.21)
    * (New) Intel Video BIOS update.
Version 7SET34WW (1.20)
    * (Fix) Buzzing sound around the fan with battery.
          o Note: To use this feature, you need to install following drivers.
                + Power Manager for Windows 2000, XP (1.40 or later)
                + Power Manager for Windows Vista (2.30 or later)
                + Power Management driver(1.44 or later)    * (Fix) The computer may hang if Rescue and Recovery is installed on Windows Vista 64-bit SP1.
Version 7SET33WW (1.19)
    * (New) Support for Intel ICH-8 step B2.
Version 7SET31WW (1.17)
    * (New) Security function was enhanced.
    * (New) Firmware update of Intel AMT ME.
    * (Fix) Data transmission speed of 1394 device gets slow.
Version 7SET30WW (1.16)
    * (Fix) System hangs while formatting CardBus ATA HDD drive under Windows Vista 64-bit SP1.
Version 7SET28WW (1.14)
    * (Fix) WOL (Wake on LAN) may fail. (BIOS)
    * (Fix) Boot error occurs at Intel ICH8M SATA initialization. (BIOS)
Version 7SET25WW (1.11)
Comment 2 Peter Volkov 2009-01-05 13:06:15 UTC
Looks like bios update did not fixed this problem. Again I've got irq16 diabled error, but this time error message is different:

[drm:i915_gem_idle] *ERROR* hardware wedged
irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.28-gentoo-noswap #1
Call Trace:
 <IRQ>  [<ffffffff802559b8>] __report_bad_irq+0x30/0x7d
 [<ffffffff80255b0a>] note_interrupt+0x105/0x16b
 [<ffffffff8025618e>] handle_fasteoi_irq+0xa6/0xcf
 [<ffffffff8020dcb0>] do_IRQ+0x75/0xe5
 [<ffffffff8020b866>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff8048064c>] menu_select+0x3f/0x8a
 [<ffffffff803c9175>] acpi_idle_enter_simple+0x1c7/0x237
 [<ffffffff803c916b>] acpi_idle_enter_simple+0x1bd/0x237
 [<ffffffff8048064c>] menu_select+0x3f/0x8a
 [<ffffffff8047fa97>] cpuidle_idle_call+0x8b/0xc8
 [<ffffffff8020a5cc>] cpu_idle+0x4a/0xac
handlers:
[<ffffffff80440d68>] (ahci_interrupt+0x0/0x45c)
[<ffffffffa003969d>] (yenta_interrupt+0x0/0xc1 [yenta_socket])
Disabling IRQ #16


This happened after X got locked with lot's of
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop
messages in Xorg.0.log and I had to kill X server with `kill -9`. After that on start I've got this disabling irq 16 error in kernel log.

  (heh, don't know why bug gdb failed to attach to X server with the "linux-
  nat.c:988: internal-error: linux_nat_attach: Assertion `pid == GET_PID 
  (inferior_ptid) && WIFSTOPPED (status) && WSTOPSIG (status) == SIGSTOP' 
  failed" error but this is different problem...)

That's said I'll downgrade to 2.6.27 and will try to check if problem is reproducible there. Other posters never had problem there, so probably I'll workaround my problem this way.
Comment 3 Eric Anholt 2009-01-05 13:45:19 UTC
Something is broken in your configuration that is preventing MSI.  You need MSI for stable graphics on this chipset.
Comment 4 Peter Volkov 2009-01-06 02:33:28 UTC
Eric, do you mean Message Signaled Interrupts? Sorry MSI has too many meanings... If yes, then you are right I had it disabled and if it is required, probably kernel configuration could be improved to automatically select MSI in case Intel drm is enabled... That's said, I'll enable MSI and do further checks.

Also I've noticed Keith told in
http://bugs.freedesktop.org/show_bug.cgi?id=18896#c9
that I'm unable to use any framebuffer with dri, so I'll disable this framebuffer support for now. And will try another time with 2.6.28.
Comment 5 Peter Volkov 2009-02-03 01:31:33 UTC
Ok, MSI helps me. I think it's worth to force MSI in code whenever user enables i915 drm...
Comment 6 Gordon Jin 2009-09-17 08:09:57 UTC
closing.