Bug 4966 - ehci_hcd on x86_64 causes more than 100000 bogus and missed interrupts
Summary: ehci_hcd on x86_64 causes more than 100000 bogus and missed interrupts
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Greg Kroah-Hartman
URL:
Keywords:
Depends on:
Blocks: USB
  Show dependency tree
 
Reported: 2005-07-29 10:02 UTC by Harald Welte
Modified: 2006-02-14 17:38 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.13-rc2-4
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg of kernel 2.6.13-rc3 booting up (18.66 KB, text/plain)
2005-07-30 11:42 UTC, Harald Welte
Details
dmesg of 2.6.12.2 booting up (usb-ehci working) (17.22 KB, text/plain)
2005-07-30 11:42 UTC, Harald Welte
Details

Description Harald Welte 2005-07-29 10:02:15 UTC
Distribution: debian unstable
Hardware Environment: TARGA Traveller 826 MT32 (turion64 notebook)
Software Environment: 2.6.13-rc2, 2.6.13-rc3, 2.6.13-rc4, current linus git tree
Problem Description: ehci_hcd causes severe breakage

The notebook I have (it's a turion64 with ATI chipset, lspci below). ehci_hcd
runs perfectly fine in the 2.6.12.2 and 2.6.12.3 kernels.  However, with
2.6.13-rc2 and later (including linus git tree today) it causes sever breakage,
making not only the usb subsystem unusable.

The notebook has OHCI, EHCI and Cardbus on one IRQ line (#11).  At boot time,
first yenta, then ohci-hcd, then ehci-hcd are loaded.  With 2.6.12.3 everything
works fine, and after bootup, /proc/net/interupts shows:
 11:    0    XT-PIC  yenta, ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb3

However, when booting and loading the modules on any later kernel, after loading
ehci-hcd the kernel says:

Jul 29 15:28:04 localhost kernel: ehci_hcd 0000:00:13.2: PCI device 1002:4373
(ATI Technologies Inc)
Jul 29 15:28:04 localhost kernel: ehci_hcd 0000:00:13.2: new USB bus registered,
assigned bus number 3
Jul 29 15:28:04 localhost kernel: ehci_hcd 0000:00:13.2: irq 11, io mem 0xfbdff000
Jul 29 15:28:04 localhost kernel: ehci_hcd 0000:00:13.2: USB 2.0 initialized,
EHCI 1.00, driver 10 Dec 2004
Jul 29 15:28:04 localhost kernel: hub 3-0:1.0: USB hub found
Jul 29 15:28:04 localhost kernel: hub 3-0:1.0: 8 ports detected
Jul 29 15:28:04 localhost kernel: irq 11: nobody cared (try booting with the
"irqpoll" option)
Jul 29 15:28:04 localhost kernel:
Jul 29 15:28:04 localhost kernel: Call Trace: <IRQ>
<ffffffff80156c45>{__report_bad_irq+53} <ffffffff80156e57>{note_interrupt+439}
Jul 29 15:28:04 localhost kernel:        <ffffffff801567bf>{__do_IRQ+207}
<ffffffff80111518>{do_IRQ+72}
Jul 29 15:28:04 localhost kernel:        <ffffffff8010ef62>{ret_from_intr+0}  <EOI>
Jul 29 15:28:04 localhost kernel: handlers:
Jul 29 15:28:04 localhost kernel: [<ffffffff880ff580>] (yenta_interrupt+0x0/0xc0
[yenta_socket])
Jul 29 15:28:04 localhost kernel: [<ffffffff802965b0>] (usb_hcd_irq+0x0/0x70)
Jul 29 15:28:04 localhost last message repeated 2 times
Jul 29 15:28:04 localhost kernel: Disabling IRQ #11

So IRQ11 gets issued more than 100.000 times, and the kernel finally disables
it.  /proc/net/interrupt at this time:

 11 100000    XT-PIC  yenta, ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb3

The same happens when no other drivers are boundto IRQ #11, i.e. when I only
load ehci_hcd from "init=/bin/sh" mode.

This is definitely a regression over previous kernels.  I've tried to backport
the 2.6.12.3 usb code into 2.6.13-rc4, but there are too many changes with the
device model in order to make this feasible :(

lspci:

0000:00:00.0 Host bridge: ATI Technologies Inc: Unknown device 5951
0000:00:02.0 PCI bridge: ATI Technologies Inc: Unknown device 5a34
0000:00:13.0 USB Controller: ATI Technologies Inc: Unknown device 4374
0000:00:13.1 USB Controller: ATI Technologies Inc: Unknown device 4375
0000:00:13.2 USB Controller: ATI Technologies Inc: Unknown device 4373
Comment 1 Harald Welte 2005-07-29 10:03:43 UTC
If I do not load ehci-hcd, and only use yenta and ohci-hcd on IRQ11, both
cardbus and USB1.0 - devices work fine, with no bogus interrupts whatsoever.
Comment 2 Harald Welte 2005-07-29 10:05:29 UTC
this might be related to bug #4866
Comment 3 Andrew Morton 2005-07-29 12:45:55 UTC
Can you please generate the dmesg output for good and bad kernels
and diff them?

My money's on acpi :(
Comment 4 David Brownell 2005-07-29 13:07:17 UTC
Agreed, this is likely an ACPI or BIOS problem.  That's where 
they usually come up.  Another experiment:  try with a 32bit 
kernel. 
 
Comment 5 Harald Welte 2005-07-30 05:12:09 UTC
This is both with "acpi=off" as kernel bootup argument.

ACPI causes so many problems on this device (like invalid IRQ routing, 50%
softirq load in ACPI code when CPU is idle, ...) that I don't bother enabling it.

I'll post dmesg shortly.
Comment 6 Harald Welte 2005-07-30 05:13:51 UTC
I can't try 32bit kernels since I don't have the space for installing a 32bit
userspace onto a separate partition [and this is a notebook] :(
Comment 7 Harald Welte 2005-07-30 11:42:04 UTC
Created attachment 5425 [details]
dmesg of kernel 2.6.13-rc3 booting up
Comment 8 Harald Welte 2005-07-30 11:42:54 UTC
Created attachment 5426 [details]
dmesg of 2.6.12.2 booting up (usb-ehci working)
Comment 9 Greg Kroah-Hartman 2005-08-04 13:06:50 UTC
Ugh, fun.  This looks like it _might_ be a pci resource issue.

Since you have git, care to use 'git bisect' to try to see if you can find
this bug?  It would be most appreciated :)
Comment 10 Andrew Morton 2005-08-04 13:28:55 UTC
We need an easy git-bisection HOWTO.  I have a few emails from Linus saved
away, but they're gobbledigook.

Comment 11 Greg Kroah-Hartman 2005-08-04 13:45:27 UTC
http://www.livejournal.com/users/kernelslacker/22371.html is a good start of 
such a HOWTO
Comment 12 Greg Kroah-Hartman 2006-02-14 17:38:07 UTC
I had this very same problem.  It should be fixed in the latest 2.6.16-rc3 kernel.

it was due to a bug in the EHCI handoff code.

Please reopen this bug, if after testing it is still present.

Note You need to log in before you can comment on or make changes to this bug.