Bug 6809 - uhci_hcd: host controller halted, very bad!
Summary: uhci_hcd: host controller halted, very bad!
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Alan Stern
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-10 12:59 UTC by Chris Brien
Modified: 2007-08-20 12:29 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.16
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Chris Brien 2006-07-10 12:59:38 UTC
Most recent kernel where this bug did not occur: 2.4.something
Distribution: Debian Unstable
Hardware Environment: Athlon thunderbird, Abit KT7-RAID, Via host controller
Software Environment: Linux 2.6.16, and others. No tainting.
Problem Description:

This bug is a deliberate duplicate of bug 4776. It is also a duplicate of bug 479 (at least it no longer oopses), bug 488 (closed due to lack of interest), 
bug 1373 (fixed with device reset after resume), bug 1578 (dupe of 1373), bug 3735 (closed due to lack of interest), bug 3856 (closed with "something was 
changed, so it must be ok now"), bug 4390 (dupe of 1373), bug 4405 (closed due 
to lack of interest), bug 4414 (closed due to lack of interest), bug 5765 
(closed for no obvious reason), and bug 6752 (still open(!) at the moment).

This could possibly be the most duped and most ignored bug in the linux 
kernel.

The reason I am opening a new bug is that, after reopening bug 4776, which 
seemed the most appropriate thing to do, I was told off. Apparently spamming 
bugzilla with duplicate bugs is ok in this installation.

Anyway. Details are as at bug 4776. My system is *not* broken. This is a 
kernel bug. Judging by the number of reports, I highly doubt this is a 
hardware fault, unless it is a systemic error, common to dozens of 
motherboards. More likely it is either a Linux bug (in which case it 
desperately needs fixed), or it is a specific chipset bug (in which case it 
needs worked around).

This issue has happened for too long and for too many people to continue to be 
brushed under the carpet.
Comment 1 Alan Stern 2006-07-10 15:45:26 UTC
The first thing to do is get some useful information.  To start with, you should
be running at least 2.6.17 and preferably the current 2.6.18-rc1-git.  Compile
your kernel with CONFIG_USB_DEBUG turned on, and load uhci-hcd with "debug=2". 
Then you should get a nice dump of the controller's schedule in the system log
the next time this failure occurs.

I appreciate your list of earlier incarnations of this problem.  :-)  After I've
had a chance to look through them carefully I'll give some intelligent feedback.
 For now, you must realize that a lot of the time it's impossible to follow up
on bug reports because of lack of response.  It's also the case that the
uhci-hcd driver has undergone a lot of changes in the recent past, so a problem
appearing two OS versions ago may be quite different now.

Finally, you should realize that although a lot of those bugs may appear to be
similar or duplicates to you, in fact they may be quite different.  This
difference shows up mainly in the kernel log, in the line before "host
controller halted, very bad!"  The fact that the controller stopped is merely
the symptom; the cause is what matters.

In your case the cause is listed as "host system error, PCI problems?"  That's
very specific and it indicates a _hardware_ problem.  Not a _software_ problem
(the message for software driver problems is "host controller process error,
something bad happened!").  It means that the PCI bus isn't able to carry all
the data needed by the USB controller, or it encountered an error while trying
to do so.

What were you doing with your computer at the time the error occurred?
Comment 2 Alan Stern 2006-07-11 08:46:12 UTC
I have looked through those old bug reports you listed.  They don't support your
contention that this is a persistent bug that has never been fixed properly.

The reports that died out because of lack of response don't really prove
anything one way or another.

Bug 479 was before Linux 2.6, which makes it completely irrelevant.

Bug 1373 was resolved with a code fix, so if your problem was a duplicate then
it would have been solved already by that same code fix.

Bug 3735 involved audio, which was known to have other problems in other bug
reports from around the same time.  In any case, the reporter indicated by
omission that the problem had been fixed in 2.6.13.

Bug 3856 was closed due to lack of response, not "something was changed, so it
must be ok now".

Bug 4390 implicated several different subsystems and turned out not to be a USB
problem at all.  And it is marked as a duplicate of 2439, not 1373 as you stated.

Bug 4405 appeared to involve motherboard hardware issues in addition to dying
for lack of response.

Bug 4414 was not closed because of lack of interest; it was determined to be a
hardware problem and was fixed by the user purchasing replacement hardware.

Bug 5765 was closed because there was nothing else to do.  That same computer
had caused problems in at least one other bug report, and it was clear that it
had funky firmware and hardware.

Bug 4776 was closed because the computer died!  It seemed pretty clear that the
problem was a symptom of progressive hardware failure.  If your bug really is a
duplicate of that one, you should be busily backing up your hard disks and
ordering a replacement computer instead of filing bug reports.

Bug 6752 mentions nothing about PCI problems.  I don't know why you think it has
anything in common with your report.
Comment 3 Chris Brien 2006-07-16 13:31:39 UTC
Still happens with 2.6.17
Comment 4 Natalie Protasevich 2007-06-13 13:09:21 UTC
Chris, could you please try the latest kernel? such as 2.6.22-rc4?
Thanks.

Note You need to log in before you can comment on or make changes to this bug.