Bug 4776 - uhci_hcd: host controller halted, very bad!
uhci_hcd: host controller halted, very bad!
Status: REJECTED WILL_NOT_FIX
Product: Drivers
Classification: Unclassified
Component: USB
i386 Linux
: P2 high
Assigned To: Alan Stern
:
Depends on:
Blocks: USB
  Show dependency treegraph
 
Reported: 2005-06-21 18:44 UTC by Thomas R. (TRauMa)
Modified: 2006-07-09 10:40 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.12 vanilla
Tree: Mainline
Regression: ---


Attachments
dmesg output till uhci error and rmmod uhci_hcd (13.20 KB, text/plain)
2005-06-21 18:46 UTC, Thomas R. (TRauMa)
Details
lspci -vv (8.29 KB, text/plain)
2005-06-21 18:48 UTC, Thomas R. (TRauMa)
Details
Kernel log (10.32 KB, text/plain)
2006-07-08 14:11 UTC, Chris Brien
Details
dmesg log (10.63 KB, text/plain)
2006-07-08 14:12 UTC, Chris Brien
Details

Description Thomas R. (TRauMa) 2005-06-21 18:44:00 UTC
Distribution: Gentoo Linux
Problem Description: From time to time, the kernel writes something like this:

uhci_hcd 0000:00:07.2: host system error, PCI problems?
uhci_hcd 0000:00:07.2: host controller halted, very bad!

and shuts down my usb keyboard + mouse. I still can access the PC via ssh,
everything seems to work, but

rmmod uhci_hcd

produces this output

uhci_hcd 0000:00:07.2: remove, state 1
usb usb1: USB disconnect, address 1
usb 1-1: USB disconnect, address 2
usb 1-1.1: USB disconnect, address 4

and then hangs.

OK, it worked with 2.6.10 vanilla. There are some things I changed, though:

 * usb-hid: Was a module, is now compiled in (since i lost keyb & mouse when
loading uhci_hcd but not usb-hid)
 * I went udev (from devfsd)
 * New version of nvidia evil binary drivers (1.0.6629) (X was running with
them, so I'm afraid my kernel is tainted)

Things I can try:
 * compile usb-hid as module again
 * upgrade nvidia drivers to 1.0.7664
 * downgrade " to 1.0.55whateverwasbefore
 * go back do devfsd

Which of those make sense/should I try first? Is there any more info you need?
Comment 1 Thomas R. (TRauMa) 2005-06-21 18:46:03 UTC
Created attachment 5201 [details]
dmesg output till uhci error and rmmod uhci_hcd

The dmesg output as described.
Comment 2 Thomas R. (TRauMa) 2005-06-21 18:48:17 UTC
Created attachment 5202 [details]
lspci -vv

My hardware (after crash, if that makes any difference)
Comment 3 Greg Kroah-Hartman 2005-06-21 21:03:58 UTC
If you can reproduce this, without the nvidia driver, please reopen this bug.

Until then you are on your own, sorry.
Comment 4 Thomas R. (TRauMa) 2005-06-25 08:23:44 UTC
OK, the one change I didn't remember was the culprit (it seems):
 
preempt big kernel lock is no good idea with my setup.

I tried to reproduce without nvidia-closed btw, but the problem is it only
happens every two days or so and I use this machine for opengl development :-).

Anyway now if someone finds this bug they see the solution, too.
Comment 5 Thomas R. (TRauMa) 2005-07-09 18:53:23 UTC
Premature success report.

Seems like this bug just got triggered more often with PREEMPT-B-K-L. I recently
plugged a SATA controller in my box, and the bug is almost reproducible now. Yay. 

What I do: Sound, Video, Network, Disk, (USB)Mouse. No Nvidia-evil module loaded
in kernel.

What happens: either cold hard lock (no magic sysrq) OR just USB going down and
the usual UHCI error, ten seconds, then kernel going down (as above).

What changed: uhci is no compiled in, along with libsata. Perhaps the error
message is right and I got PCI/ACPI woes?

Just tell me what info you want.
Comment 6 Andrew Morton 2005-07-28 22:03:55 UTC
Is this still happening in 2.6.13-rc4?   Any updates?

Thanks.
Comment 7 Alan Stern 2005-08-04 13:56:17 UTC
Yes, please let us know how 2.6.13 is doing.

You should indeed take seriously the warning about PCI problems.  It looks like
you have a lot of devices installed; they could use up a lot of PCI bandwidth. 
If there isn't enough bandwidth left, UHCI controllers will stop working.  I
don't know how this might be related to PREEMPT-B-K-L, unless somehow that
causes bus utilization to increase.

However, even when the controller stops working the kernel isn't supposed to
crash when you do rmmod uhci-hcd.  If this is still happening, I will ask you to
test some patches.
Comment 8 Andrew Morton 2005-09-14 22:40:05 UTC
Five weeks, no update: I propose we close this.
Comment 9 Thomas R. (TRauMa) 2005-09-15 01:59:34 UTC
Sorry, I wasn't in town. I'm installing 2.6.13 just now, and btw, I narrowed it
down to snd-cmipci - if I load that module after using the system happily for
some time, kernel freezes, sometimes USB going down first. Updates following.
Comment 10 Thomas R. (TRauMa) 2005-09-15 05:41:49 UTC
Still crashes with 2.6.13.1 if using X (no cs nvidia), disk, network and
external soundcard.

Works fine without using ext. sound (and everything else), though.
Comment 11 Greg Kroah-Hartman 2006-02-14 17:30:10 UTC
Is this fixed in 2.6.16-rc3?
Comment 12 Thomas R. (TRauMa) 2006-02-19 09:52:26 UTC
Wow, now that's lucky: after having read the bugzilla mail yesterday I today
switched the PC on to install the new kernel; an hour later the PSU went down,
before I had a chance to test it. It'll take some time to get a new one, but if
the mainboard is still working I'll get back on this bug.
Comment 13 Thomas R. (TRauMa) 2006-03-01 13:18:52 UTC
The machine seems to be dead and I'm looking for a replacement. It wasn't the
PSU and the mainboard doesn't even beep, so this bug can be fixed as WONTFIX
DEAD. :-(
Comment 14 Chris Brien 2006-07-08 14:07:32 UTC
This happens for me. It has happened with every 2.6-series kernel (I don't 
think it ever happened with 2.4, I can't remember). After reading the comments 
on this thread, I have uninstalled the nvidia driver, and *still* it happens, 
with an untainted kernel.

Symptoms: Apple USB keyboard and mouse (mighty mouse) stop responding. I go 
fetch my spare PS/2 keyboard (already plugged in because of this), and find 
the log saying:

Jul  8 21:53:53 orthanc kernel: uhci_hcd 0000:00:07.2: host system error, PCI 
problems?
Jul  8 21:53:53 orthanc kernel: uhci_hcd 0000:00:07.2: host controller halted, 
very bad!
Jul  8 21:53:53 orthanc kernel: drivers/usb/input/hid-core.c: can't resubmit 
intr, 0000:00:07.2-1.1/input0, status -108
Jul  8 21:53:53 orthanc kernel: uhci_hcd 0000:00:07.2: HC died; cleaning up


So I log in as root and do:
#rmmod uhci-hcd; modprobe uhci-hcd
Lo and behold, linux redetects my USB keyboard and mouse.
I will attach my system log.

System: Debian Unstable
Kernel: 2.6.16-2-k7 (happens with others, too) *Not* tainted.
CPU: AMD Athlon thunderbird @ 1.1GHz
Motherboard: Abit KT7-RAID
USB host controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller 
(rev 10) (according to lspci)
Comment 15 Chris Brien 2006-07-08 14:11:26 UTC
Created attachment 8508 [details]
Kernel log
Comment 16 Chris Brien 2006-07-08 14:12:33 UTC
Created attachment 8509 [details]
dmesg log
Comment 17 Alan Stern 2006-07-09 10:40:03 UTC
Please don't change the status of a bug entry if you aren't the submitter or the
owner.  You can always start a new bug report of your own, if you want.

In this case, though, I don't think there's much point.  It seems pretty clear
that either your system has some sort of hardware problem on the motherboard or
its PCI bus is overloaded.  Changing the kernel won't make any difference.

Note You need to log in before you can comment on or make changes to this bug.