Distribution: Gentoo Linux Problem Description: From time to time, the kernel writes something like this: uhci_hcd 0000:00:07.2: host system error, PCI problems? uhci_hcd 0000:00:07.2: host controller halted, very bad! and shuts down my usb keyboard + mouse. I still can access the PC via ssh, everything seems to work, but rmmod uhci_hcd produces this output uhci_hcd 0000:00:07.2: remove, state 1 usb usb1: USB disconnect, address 1 usb 1-1: USB disconnect, address 2 usb 1-1.1: USB disconnect, address 4 and then hangs. OK, it worked with 2.6.10 vanilla. There are some things I changed, though: * usb-hid: Was a module, is now compiled in (since i lost keyb & mouse when loading uhci_hcd but not usb-hid) * I went udev (from devfsd) * New version of nvidia evil binary drivers (1.0.6629) (X was running with them, so I'm afraid my kernel is tainted) Things I can try: * compile usb-hid as module again * upgrade nvidia drivers to 1.0.7664 * downgrade " to 1.0.55whateverwasbefore * go back do devfsd Which of those make sense/should I try first? Is there any more info you need?
Created attachment 5201 [details] dmesg output till uhci error and rmmod uhci_hcd The dmesg output as described.
Created attachment 5202 [details] lspci -vv My hardware (after crash, if that makes any difference)
If you can reproduce this, without the nvidia driver, please reopen this bug. Until then you are on your own, sorry.
OK, the one change I didn't remember was the culprit (it seems): preempt big kernel lock is no good idea with my setup. I tried to reproduce without nvidia-closed btw, but the problem is it only happens every two days or so and I use this machine for opengl development :-). Anyway now if someone finds this bug they see the solution, too.
Premature success report. Seems like this bug just got triggered more often with PREEMPT-B-K-L. I recently plugged a SATA controller in my box, and the bug is almost reproducible now. Yay. What I do: Sound, Video, Network, Disk, (USB)Mouse. No Nvidia-evil module loaded in kernel. What happens: either cold hard lock (no magic sysrq) OR just USB going down and the usual UHCI error, ten seconds, then kernel going down (as above). What changed: uhci is no compiled in, along with libsata. Perhaps the error message is right and I got PCI/ACPI woes? Just tell me what info you want.
Is this still happening in 2.6.13-rc4? Any updates? Thanks.
Yes, please let us know how 2.6.13 is doing. You should indeed take seriously the warning about PCI problems. It looks like you have a lot of devices installed; they could use up a lot of PCI bandwidth. If there isn't enough bandwidth left, UHCI controllers will stop working. I don't know how this might be related to PREEMPT-B-K-L, unless somehow that causes bus utilization to increase. However, even when the controller stops working the kernel isn't supposed to crash when you do rmmod uhci-hcd. If this is still happening, I will ask you to test some patches.
Five weeks, no update: I propose we close this.
Sorry, I wasn't in town. I'm installing 2.6.13 just now, and btw, I narrowed it down to snd-cmipci - if I load that module after using the system happily for some time, kernel freezes, sometimes USB going down first. Updates following.
Still crashes with 2.6.13.1 if using X (no cs nvidia), disk, network and external soundcard. Works fine without using ext. sound (and everything else), though.
Is this fixed in 2.6.16-rc3?
Wow, now that's lucky: after having read the bugzilla mail yesterday I today switched the PC on to install the new kernel; an hour later the PSU went down, before I had a chance to test it. It'll take some time to get a new one, but if the mainboard is still working I'll get back on this bug.
The machine seems to be dead and I'm looking for a replacement. It wasn't the PSU and the mainboard doesn't even beep, so this bug can be fixed as WONTFIX DEAD. :-(
This happens for me. It has happened with every 2.6-series kernel (I don't think it ever happened with 2.4, I can't remember). After reading the comments on this thread, I have uninstalled the nvidia driver, and *still* it happens, with an untainted kernel. Symptoms: Apple USB keyboard and mouse (mighty mouse) stop responding. I go fetch my spare PS/2 keyboard (already plugged in because of this), and find the log saying: Jul 8 21:53:53 orthanc kernel: uhci_hcd 0000:00:07.2: host system error, PCI problems? Jul 8 21:53:53 orthanc kernel: uhci_hcd 0000:00:07.2: host controller halted, very bad! Jul 8 21:53:53 orthanc kernel: drivers/usb/input/hid-core.c: can't resubmit intr, 0000:00:07.2-1.1/input0, status -108 Jul 8 21:53:53 orthanc kernel: uhci_hcd 0000:00:07.2: HC died; cleaning up So I log in as root and do: #rmmod uhci-hcd; modprobe uhci-hcd Lo and behold, linux redetects my USB keyboard and mouse. I will attach my system log. System: Debian Unstable Kernel: 2.6.16-2-k7 (happens with others, too) *Not* tainted. CPU: AMD Athlon thunderbird @ 1.1GHz Motherboard: Abit KT7-RAID USB host controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 10) (according to lspci)
Created attachment 8508 [details] Kernel log
Created attachment 8509 [details] dmesg log
Please don't change the status of a bug entry if you aren't the submitter or the owner. You can always start a new bug report of your own, if you want. In this case, though, I don't think there's much point. It seems pretty clear that either your system has some sort of hardware problem on the motherboard or its PCI bus is overloaded. Changing the kernel won't make any difference.