Bug 4966

Summary: ehci_hcd on x86_64 causes more than 100000 bogus and missed interrupts
Product: Drivers Reporter: Harald Welte (laforge)
Component: USBAssignee: Greg Kroah-Hartman (greg)
Status: RESOLVED CODE_FIX    
Severity: normal CC: akpm, dbrownell
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13-rc2-4 Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on:    
Bug Blocks: 5089    
Attachments: dmesg of kernel 2.6.13-rc3 booting up
dmesg of 2.6.12.2 booting up (usb-ehci working)

Description Harald Welte 2005-07-29 10:02:15 UTC
Distribution: debian unstable
Hardware Environment: TARGA Traveller 826 MT32 (turion64 notebook)
Software Environment: 2.6.13-rc2, 2.6.13-rc3, 2.6.13-rc4, current linus git tree
Problem Description: ehci_hcd causes severe breakage

The notebook I have (it's a turion64 with ATI chipset, lspci below). ehci_hcd
runs perfectly fine in the 2.6.12.2 and 2.6.12.3 kernels.  However, with
2.6.13-rc2 and later (including linus git tree today) it causes sever breakage,
making not only the usb subsystem unusable.

The notebook has OHCI, EHCI and Cardbus on one IRQ line (#11).  At boot time,
first yenta, then ohci-hcd, then ehci-hcd are loaded.  With 2.6.12.3 everything
works fine, and after bootup, /proc/net/interupts shows:
 11:    0    XT-PIC  yenta, ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb3

However, when booting and loading the modules on any later kernel, after loading
ehci-hcd the kernel says:

Jul 29 15:28:04 localhost kernel: ehci_hcd 0000:00:13.2: PCI device 1002:4373
(ATI Technologies Inc)
Jul 29 15:28:04 localhost kernel: ehci_hcd 0000:00:13.2: new USB bus registered,
assigned bus number 3
Jul 29 15:28:04 localhost kernel: ehci_hcd 0000:00:13.2: irq 11, io mem 0xfbdff000
Jul 29 15:28:04 localhost kernel: ehci_hcd 0000:00:13.2: USB 2.0 initialized,
EHCI 1.00, driver 10 Dec 2004
Jul 29 15:28:04 localhost kernel: hub 3-0:1.0: USB hub found
Jul 29 15:28:04 localhost kernel: hub 3-0:1.0: 8 ports detected
Jul 29 15:28:04 localhost kernel: irq 11: nobody cared (try booting with the
"irqpoll" option)
Jul 29 15:28:04 localhost kernel:
Jul 29 15:28:04 localhost kernel: Call Trace: <IRQ>
<ffffffff80156c45>{__report_bad_irq+53} <ffffffff80156e57>{note_interrupt+439}
Jul 29 15:28:04 localhost kernel:        <ffffffff801567bf>{__do_IRQ+207}
<ffffffff80111518>{do_IRQ+72}
Jul 29 15:28:04 localhost kernel:        <ffffffff8010ef62>{ret_from_intr+0}  <EOI>
Jul 29 15:28:04 localhost kernel: handlers:
Jul 29 15:28:04 localhost kernel: [<ffffffff880ff580>] (yenta_interrupt+0x0/0xc0
[yenta_socket])
Jul 29 15:28:04 localhost kernel: [<ffffffff802965b0>] (usb_hcd_irq+0x0/0x70)
Jul 29 15:28:04 localhost last message repeated 2 times
Jul 29 15:28:04 localhost kernel: Disabling IRQ #11

So IRQ11 gets issued more than 100.000 times, and the kernel finally disables
it.  /proc/net/interrupt at this time:

 11 100000    XT-PIC  yenta, ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb3

The same happens when no other drivers are boundto IRQ #11, i.e. when I only
load ehci_hcd from "init=/bin/sh" mode.

This is definitely a regression over previous kernels.  I've tried to backport
the 2.6.12.3 usb code into 2.6.13-rc4, but there are too many changes with the
device model in order to make this feasible :(

lspci:

0000:00:00.0 Host bridge: ATI Technologies Inc: Unknown device 5951
0000:00:02.0 PCI bridge: ATI Technologies Inc: Unknown device 5a34
0000:00:13.0 USB Controller: ATI Technologies Inc: Unknown device 4374
0000:00:13.1 USB Controller: ATI Technologies Inc: Unknown device 4375
0000:00:13.2 USB Controller: ATI Technologies Inc: Unknown device 4373
Comment 1 Harald Welte 2005-07-29 10:03:43 UTC
If I do not load ehci-hcd, and only use yenta and ohci-hcd on IRQ11, both
cardbus and USB1.0 - devices work fine, with no bogus interrupts whatsoever.
Comment 2 Harald Welte 2005-07-29 10:05:29 UTC
this might be related to bug #4866
Comment 3 Andrew Morton 2005-07-29 12:45:55 UTC
Can you please generate the dmesg output for good and bad kernels
and diff them?

My money's on acpi :(
Comment 4 David Brownell 2005-07-29 13:07:17 UTC
Agreed, this is likely an ACPI or BIOS problem.  That's where 
they usually come up.  Another experiment:  try with a 32bit 
kernel. 
 
Comment 5 Harald Welte 2005-07-30 05:12:09 UTC
This is both with "acpi=off" as kernel bootup argument.

ACPI causes so many problems on this device (like invalid IRQ routing, 50%
softirq load in ACPI code when CPU is idle, ...) that I don't bother enabling it.

I'll post dmesg shortly.
Comment 6 Harald Welte 2005-07-30 05:13:51 UTC
I can't try 32bit kernels since I don't have the space for installing a 32bit
userspace onto a separate partition [and this is a notebook] :(
Comment 7 Harald Welte 2005-07-30 11:42:04 UTC
Created attachment 5425 [details]
dmesg of kernel 2.6.13-rc3 booting up
Comment 8 Harald Welte 2005-07-30 11:42:54 UTC
Created attachment 5426 [details]
dmesg of 2.6.12.2 booting up (usb-ehci working)
Comment 9 Greg Kroah-Hartman 2005-08-04 13:06:50 UTC
Ugh, fun.  This looks like it _might_ be a pci resource issue.

Since you have git, care to use 'git bisect' to try to see if you can find
this bug?  It would be most appreciated :)
Comment 10 Andrew Morton 2005-08-04 13:28:55 UTC
We need an easy git-bisection HOWTO.  I have a few emails from Linus saved
away, but they're gobbledigook.

Comment 11 Greg Kroah-Hartman 2005-08-04 13:45:27 UTC
http://www.livejournal.com/users/kernelslacker/22371.html is a good start of 
such a HOWTO
Comment 12 Greg Kroah-Hartman 2006-02-14 17:38:07 UTC
I had this very same problem.  It should be fixed in the latest 2.6.16-rc3 kernel.

it was due to a bug in the EHCI handoff code.

Please reopen this bug, if after testing it is still present.