Bug 9782

Summary: IRQ problems in usb_hdc_irq (bluetooth usb device) & in usb
Product: Drivers Reporter: Milan Kocian (milan.kocian)
Component: BluetoothAssignee: Greg Kroah-Hartman (greg)
Status: REJECTED INVALID    
Severity: normal CC: dbrownell, marcel, protasnb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc8-git3 Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on:    
Bug Blocks: 5089    
Attachments: boot log

Description Milan Kocian 2008-01-20 13:17:51 UTC
Latest working kernel version: 
Earliest failing kernel version:
Distribution: debian
Hardware Environment: Thinkpad x61s
Software Environment: latest unstable with vanilla kernel
Problem Description:
After enabling internal USB bluetooth modul I see after a few minutes this message in log:

irq 23: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-git3 #6
 [<c0168c24>] __report_bad_irq+0x24/0x80
 [<c0168efb>] note_interrupt+0x27b/0x2c0
 [<c0168125>] handle_IRQ_event+0x25/0x50
 [<c01695f7>] handle_fasteoi_irq+0x87/0xe0
 [<c01070eb>] do_IRQ+0x3b/0x70
 [<c028bd4e>] acpi_hw_register_read+0xa9/0x12a
 [<c0104df3>] common_interrupt+0x23/0x28
 [<c011007b>] mtrr_del_page+0x10b/0x1b0
 [<c029fc5e>] acpi_idle_enter_simple+0x13e/0x1a8
 [<c029f91c>] acpi_idle_enter_bm+0xac/0x2b0
 [<c030cd14>] menu_select+0x84/0xa0
 [<c030bd70>] cpuidle_idle_call+0x70/0xa0
 [<c0102653>] cpu_idle+0x73/0xd0
 [<c046ba0a>] start_kernel+0x2da/0x320
 [<c046b130>] unknown_bootoption+0x0/0x1f0
 =======================
handlers:
[<f88b6630>] (usb_hcd_irq+0x0/0x60 [usbcore])
Disabling IRQ #23

lsusb (BT is broadcom):
Bus 003 Device 001: ID 0000:0000  
Bus 005 Device 001: ID 0000:0000  
Bus 002 Device 001: ID 0000:0000  
Bus 006 Device 001: ID 0000:0000  
Bus 001 Device 007: ID 0a5c:2110 Broadcom Corp. 
Bus 001 Device 004: ID 0483:2016 SGS Thomson Microelectronics Fingerprint Reader
Bus 001 Device 001: ID 0000:0000  
Bus 004 Device 001: ID 0000:0000

dmesg (IRQ 23):
ACPI: PCI Interrupt 0000:00:1d.7[D] -> GSI 19 (level, low) -> IRQ 23
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 6
ehci_hcd 0000:00:1d.7: debug port 1
PCI: cache line size of 32 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: irq 23, io mem 0xfe227000
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb6: configuration #1 chosen from 1 choice

I tested 2.6.23.12 and I the same problem. When I connect other USB devices,  don't see this problem (external camera). Internal fingerprint reader is working without problem too.

Steps to reproduce:
Enable bluetooth on thinkpad X61s (Product Name: 76693KG).
Comment 1 Milan Kocian 2008-01-25 08:41:22 UTC
I tried the version 2.6.24-rc8-git7 and I don't see this segfault anymore. Interesting. (I only enabled debug statistics for powertop). I will test new 2.6.24 and do report.
Comment 2 Milan Kocian 2008-02-05 13:22:29 UTC
I tested 2.6.24 and I see this segfault too. I returned to 2.6.24-rc8-git7 and I see segfault again. So the state is that sometimes I see segfault during few minutes after boot and sometimes is all ok till next reboot. I enabled debug function in hci_usb but I see nothing sensible. So what to test next ?
Comment 3 Milan Kocian 2008-04-28 05:32:17 UTC
I tested 2.6.25:

irq 19: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.25 #1
 [<c0168497>] __report_bad_irq+0x27/0x90
 [<c0168756>] note_interrupt+0x256/0x290
 [<c0167a71>] ? handle_IRQ_event+0x31/0x60
 [<c0168e16>] handle_fasteoi_irq+0x86/0xe0
 [<c0106f60>] do_IRQ+0x40/0x80
 [<c0104b13>] common_interrupt+0x23/0x28
 [<c0299155>] ? acpi_idle_enter_simple+0x165/0x1d0
 [<c0298db6>] acpi_idle_enter_bm+0xba/0x2f4
 [<c0306ad0>] cpuidle_idle_call+0x70/0xb0
 [<c0306a60>] ? cpuidle_idle_call+0x0/0xb0
 [<c0102b1a>] cpu_idle+0x5a/0xe0
 [<c038f14f>] rest_init+0x5f/0x70
 =======================
handlers:
[<f88b7d40>] (usb_hcd_irq+0x0/0x60 [usbcore])
Disabling IRQ #19
Comment 4 Milan Kocian 2008-05-05 01:04:05 UTC
I play a little with it. And now I can reproduce it immediately:
1) load all modules without ehci-hcd
2) load ehci-hcd
3) segfault appears during 1-2 seconds

Unload/load ehci-hcd and I see this segfault again. Sometimes unloading/loading begin to work without segfaults then I do suspend-to-ram/back and I see it again.

Second procedure:
1) load all modules without hci_usb (include ehci-hcd)
2) load hci-usb
3) start bluetooth
4) segfault

I hope it helps.
Comment 5 Natalie Protasevich 2008-06-02 17:57:53 UTC
Milan,
Did any earlier kernel version work for you by chance?
Comment 6 David Brownell 2008-06-03 01:07:06 UTC
Note that this is *NOT* a segfault; please correct the summary.

This looks to be some kind of IRQ reporting issue.  /proc/interrupts output would be relevant.  I suspect ACPI is an issue here; notice how sometimes it uses irq 23, sometimes irq 19.  Full "dmesg" output during boot, including the BT adapter (and until ehci is loaded) should be useful too.

One theory is that the internal BT device has some IRQ wired up, and it happens to share the line used by EHCI.  But that BT IRQ isn't being handled.
Comment 7 Milan Kocian 2008-06-04 00:06:07 UTC
(In reply to comment #5)
> Milan,
> Did any earlier kernel version work for you by chance?
> 
Sorry, I didn't tested older kernel due to missing some drivers.
Comment 8 Milan Kocian 2008-06-04 00:36:28 UTC
Created attachment 16387 [details]
boot log

irq 19: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.25.3 #8
 [<c016dc87>] __report_bad_irq+0x27/0x90
 [<c016df3f>] note_interrupt+0x24f/0x290
 [<c016d441>] ? handle_IRQ_event+0x31/0x60
 [<c016e64b>] handle_fasteoi_irq+0xab/0xd0
 [<c0107090>] do_IRQ+0x40/0x80
 [<c0104bf3>] common_interrupt+0x23/0x28
 [<c03bba37>] ? _spin_unlock_irqrestore+0x7/0x30
 [<c01466ec>] tick_broadcast_oneshot_control+0x6c/0x120
 [<c0145f8d>] tick_notify+0x29d/0x3e0
 [<c012d5bd>] ? irq_exit+0x5d/0x90
 [<c0107095>] ? do_IRQ+0x45/0x80
 [<c0140f5d>] notifier_call_chain+0x2d/0x70
 [<c014107a>] raw_notifier_call_chain+0x1a/0x20
 [<c0145871>] clockevents_notify+0x21/0x80
 [<c02aaaae>] acpi_state_timer_broadcast+0x31/0x34
 [<c02ab28a>] acpi_idle_enter_simple+0x1aa/0x1ff
 [<c028b657>] ? acpi_os_release_lock+0x8/0xa
 [<c02aae52>] acpi_idle_enter_bm+0xba/0x348
 [<c0141806>] ? pm_qos_requirement+0x26/0x30
 [<c031dccb>] cpuidle_idle_call+0x7b/0xc0
 [<c031dc50>] ? cpuidle_idle_call+0x0/0xc0
 [<c01030a1>] cpu_idle+0x61/0xe0
 [<c03b6e82>] start_secondary+0x195/0x1a3
 =======================
handlers:
[<f88c41e0>] (usb_hcd_irq+0x0/0x60 [usbcore])
Disabling IRQ #19
Comment 9 Milan Kocian 2008-06-04 00:37:59 UTC
(In reply to comment #6)
> Note that this is *NOT* a segfault; please correct the summary.

ok, summary corrected.

> 
> This looks to be some kind of IRQ reporting issue.  /proc/interrupts output
> would be relevant.  I suspect ACPI is an issue here; notice how sometimes it
> uses irq 23, sometimes irq 19.  Full "dmesg" output during boot, including
> the
> BT adapter (and until ehci is loaded) should be useful too.
> 
> One theory is that the internal BT device has some IRQ wired up, and it
> happens
> to share the line used by EHCI.  But that BT IRQ isn't being handled.
> 
milon@ntm:~$ cat /proc/interrupts 
           CPU0       CPU1       
  0:      47696      51098   IO-APIC-edge      timer
  1:        573        689   IO-APIC-edge      i8042
  8:          4          1   IO-APIC-edge      rtc
  9:      17281      16820   IO-APIC-fasteoi   acpi
 12:       1395        332   IO-APIC-edge      i8042
 16:          0          0   IO-APIC-fasteoi   yenta, uhci_hcd:usb5, i915@pci:0000:00:02.0
 17:        159        157   IO-APIC-fasteoi   uhci_hcd:usb6, HDA Intel, firewire_ohci
 18:          0          0   IO-APIC-fasteoi   sdhc0:slot0
 19:      75119      88942   IO-APIC-fasteoi   ehci_hcd:usb2
 20:      45860      36973   IO-APIC-fasteoi   uhci_hcd:usb3
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 22:          1          2   IO-APIC-fasteoi   ehci_hcd:usb1
220:      17348        223   PCI-MSI-edge      eth0
221:       4538       1600   PCI-MSI-edge      ahci
NMI:          0          0   Non-maskable interrupts
LOC:      52961      36619   Local timer interrupts
RES:       9716      11084   Rescheduling interrupts
CAL:         99      12764   function call interrupts
TLB:        180        259   TLB shootdowns
TRM:          0          0   Thermal event interrupts
SPU:          0          0   Spurious interrupts
ERR:          0
MIS:          0

Log is in attachment.
Comment 10 David Brownell 2008-06-05 14:33:35 UTC
For the record, the limited public info on the http://www.broadcom.com/products/Bluetooth/Bluetooth-RF-Silicon-and-Software-Solutions/BCM2045 webpage does indeed suggest that some Bluetooth IRQs could be wired up parallel to the USB signal wires ... alternative data paths include UART and SPI connections, which are routinely done with parallel IRQ paths.

I suggest you see whether various BIOS tables -- DMI, PNPACPI, DSDT, even the info in /proc/acpi/wakeup -- list the Bluetooth device.  If so, look for IRQ or GPE hookups.  If the Bluetooth device lists such a thing, one issue will be how to tell the its device driver that it's got a "real" IRQ.  Another will be how to make the hci_usb driver listen to it ... and ideally use the IRQ for something sensible.

Can you confirm that there are no high speed USB devices on this system, by the way?  Sure doesn't look like there are any.  If so, the hardware seems to be sending an absurdly high number of spurious IRQs to the EHCI controller.  As in, almost all of those 164K+ irqs are spurious ... one possible explanation is a BIOS that's interfering.  Make sure it's told to ignore USB.  The /sys/kernel/debug/ehci/*/registers data might help too.

If the BT chip doesn't have a "real" IRQ then I'll start wondering about the ACPI IRQ routing.  Something switched EHCI from irq 23 to irq 19, which isn't usual.  
Comment 11 Milan Kocian 2008-06-06 01:42:47 UTC
(In reply to comment #10)
> I suggest you see whether various BIOS tables -- DMI, PNPACPI, DSDT, even the
> info in /proc/acpi/wakeup -- list the Bluetooth device.  If so, look for IRQ
> or
> GPE hookups.  If the Bluetooth device lists such a thing, one issue will be
> how
> to tell the its device driver that it's got a "real" IRQ.  Another will be
> how
> to make the hci_usb driver listen to it ... and ideally use the IRQ for
> something sensible.

I don't understand what you exactly want. List of /proc/acpi/wakeup ? Here is it:

milon@ntm:~$ cat /proc/acpi/wakeup 
Device	S-state	  Status   Sysfs node
LID	  S3	*enabled   
SLPB	  S3	*enabled   
IGBE	  S4	 disabled  pci:0000:00:19.0
EXP0	  S4	 disabled  pci:0000:00:1c.0
EXP1	  S4	 disabled  pci:0000:00:1c.1
EXP2	  S4	 disabled  
EXP3	  S4	 disabled  
EXP4	  S4	 disabled  
PCI1	  S4	 disabled  pci:0000:00:1e.0
USB0	  S3	 disabled  pci:0000:00:1d.0
USB1	  S3	 disabled  pci:0000:00:1d.1
USB2	  S3	 disabled  
USB3	  S3	 disabled  pci:0000:00:1a.0
USB4	  S3	 disabled  pci:0000:00:1a.1
EHC0	  S3	 disabled  pci:0000:00:1d.7
EHC1	  S3	 disabled  pci:0000:00:1a.7
HDEF	  S4	 disabled  pci:0000:00:1b.0


> 
> Can you confirm that there are no high speed USB devices on this system, by
> the
> way?  Sure doesn't look like there are any.  If so, the hardware seems to be
> sending an absurdly high number of spurious IRQs to the EHCI controller.  As
> in, almost all of those 164K+ irqs are spurious ... one possible explanation
> is
> a BIOS that's interfering.  Make sure it's told to ignore USB.  The
> /sys/kernel/debug/ehci/*/registers data might help too.

I have no high speed  USB device (I think that Fingerprint and BT controller are not high speed devices. :). 
I don't have /sys/kernel/debug directory in my system. I will try to enable it.


milon@ntm:~$ lsusb
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 021: ID 0483:2016 SGS Thomson Microelectronics Fingerprint Reader
Bus 003 Device 020: ID 0a5c:2110 Broadcom Corp. Bluetooth Controller
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

> 
> If the BT chip doesn't have a "real" IRQ then I'll start wondering about the
> ACPI IRQ routing.  Something switched EHCI from irq 23 to irq 19, which isn't
> usual.  
> 
Due to kernel's version change ?
Comment 12 Milan Kocian 2008-12-15 01:29:12 UTC
Hi,

I upgraded BIOS in my Thinkpad X61s. Here is piece of changelog:

(Fix) Unexpected interrupts from the USB controller may occur. (BIOS)

From this time is all ok. Many thanks to Pavel Kysilka (goldenfish@linuxsoft.cz) for pointing to this bug.