Kernel Bug Tracker – Bug 13405
USB OHCI freezes after few minutes of heavy use. EHCI is OK. noapic or acpi=noirq almost provides workaround
Last modified: 2012-02-22 21:07:17 UTC
Created attachment 21635 [details]
original DSDT table extracted and disassembled.
USB 1.1 devices freezes after few minutes of use. The more intensive communication the earlier usb ohci controller becomes frozen. USB 2.0 which use EHCI controller works perfect. USB 1.1 works perfect in Windows XP SP3 so this is software problem bios/acpi<->Linux.
Please look at acpi dump attached. Please tell me how to fix usb ohci.
Kernel parameters which almost workarounds the bug in usb 1.1:
kernel parameters which does NOT workarounds the problem:
maxcpus=1 (so this is not SMP problem)
kernels tested: 2.6.27-29.4
bioses tested: P1.10, P1.60-1.90
When usb 1.1 is frozen:
Plugging/unpluging devices to any port makes no difference - they get no power and are not detected by kernel (I checked dmesg). rmmod ohci_hcd hangs rmmod command. lsusb also hangs. Any application which is used to talk to usb devices hangs after trying to use. Only reboot makes usb working. If usb device like harddisk have its own power supply is continues to be frozen after reboot.
Almost means usb 1.1 devices becomes usable in 95% cases. For example my ADSL modem speedtouch 330 (1Mbit link): without noapic/acpi=noirq I can receive mail from 1 mailbox or open 1 webpage before usb freeze. If I open Akregator RSS reader which checks several RSS channels at once the usb freeze happens immediately. If I start Internet connection via modem but do not use it, it can work unfrozen.
If I use ktorrent to download CentOS DVD it hangs immediately because there is huge number of seed/leechers and my Linux connects to many IPs at once.
After booting kernel with noapic or acpi=noirq:
-Akregator RSS always works perfect
-web/mail works perfect too. Opening 20 tabs in Firefox with flash content/video makes no problem.
-git/svn works perfect
-downloading CentOS DVD image from FTP site freezes usb 1.1 after random download time after hundreds of megabytes is already downloaded. I have seen freeze after about 300/900/1200MB downloaded data. (I use -c switch in wget to continue download after reboot).
-using ktorrent to download CentOS still frezez usb after some time of downloading.
Rebooting to WindowsXP SP3 allows me to download Centos DVD iso without any problem/retries from torrent (using uTorrent) or from the same FTP site.
This means this is not hardware issue.
Recently I upgraded my machine:
cpu: Athlon64 3000+ Venice -> Phenom 9550
ram: 4x256MiB DDR400 -> 2x1GiB DDR2 800 CL4
mobo:Asus A8N-VM CSM (Geforce 6150/nForce430)-> ASrock K10N78FullHD-hSLI 3.0
There was no problem with usb on my previous configuration.
USB devices which hangs with this Asrock mainboard (but works on Asus):
-Speedtouch 330 ADSL modem,
-ZTE ZXDSL2 ADSL modem (unicorn II chipset),
-external usb 2.0 sata hard drive (when usb 2.0 is disabled in bios it freezes in usb1.1 mode (sometimes when I plug it in. Sometimes when I mount it and copy files (it freezes on first file).
-USB irda dongle (sigmatel stir 4200 chipset)
-pendrive Kingstone 8GiB (the same problem like with usb hard drive)
-HP Deskjet 5940 printer
Created attachment 21636 [details]
Compilation log for DSDT (1Error, 22 warnings, 16 remarks)
Created attachment 21637 [details]
I fixed all errors/warnings/remarks but is does not resolve USB freezing
Created attachment 21638 [details]
Compilation log for fixed DSDT (0Error, 0 warnings, 0 remarks)
Created attachment 21639 [details]
Full ACPI binary dump of all ACPI tables. You can use Intel acpi tools to extract and disassemble/compile it.
With every bios release dmesg shows that ACPI interpreter says about [OEMB] table checksum error:
ACPI Warning (tbutils-0217): Incorrect checksum in table [OEMB] - 97, should be 8F 
I tried almost every bios release. The bad checksum is always too big by 8.
See for example 97-8F=8
Similar problem with USB. This mainboard is built on similar Geforce 8300 chipset:
Created attachment 21640 [details]
Many logs, dmsg dumps, irq status dumps, acpi dumps, error logs - everything I tested
OK. I sent you everything I tried to fix it by myself. The fixed ACPI table I now load statically during boot. However I fixed all the bugs this does not solve usb 1.1 hanging. I'm out of ideas how can I fix this. The noapic option allows me to use usb for everyday use in limited way: I can not download/transfer big files using torrent, ftp or local file transfer. Keeping Windows XP just only for transferring big files is sad option.
noapic and acpi=noirq clearly shows this is pure software issue and there must be something wrong with acpi irq/bios irq tables or timings.
This is my first mainboard which does not have an option in bios setup known from all my older mainboards: "Assign IRQ for USB" [Enable]
This is also my first multicore SMP mainboard/cpu but I doubt usb problem is connected to SMP.
kernel boot parameters I tried which makes no difference to usb problem:
is it possible you can run windows and see the interrupt assignment for ohci?
Created attachment 22181 [details]
IRQ assignment in Windows XP SP3
I installed Windows XP SP3 Polish for a while to see how IRQs were assigned.
I tested OHCI USB in Windows and works perfect. I even plugged in all USB devices to every USB port together and used them simultaneously. Still works perfect. I uninstalled every Nvidia/AMD cpu driver to make sure it works on reference drivers. Still works perfect. Linux 2.6.30 give ups with only one USB device plugged in to back of mainboard (ASrock says these ports are the best powered).
1. The hardware is OK.
2. The power supply is enough.
3. Very slow OHCI USB devices always work on Linux without noapic or acpi=noirq workarounds: keyboards, mice, trackballs.
4. Single fast OHCI USB device make trouble: IRDA 4Mbit, ADSL modem 1Mbit(2 different tested), pendrive 12Mbit, usb hdd 12Mbit.
5. ADSL modem hangs usb only when begin to download something big (4,3 MB e2fsprogs-1.41.7.tar.gz archive is enough to hang USB in the middle of download on 1MBit ADSL connection).
6. USB hdd or pendrive can crash usb on plug in because they use 12Mbit.
7. The more bandwidth hungry device the usb hang will be earlier.
3. noapic or acpi=noirq workarounds 95% of trouble (except torrent and >1GB ftp download).
4. Today I watched what happens after usb hang with IRQs.
(watch -n1 /proc/interrupts).
After usb hang usb ohci controller to which usb device is connected is still generating interrupts as before hang. I did not see any interrupt storm or something like that. It keeps increasing interrupts number as before in the same manner and rhythm. Just any application can not access usb bus. For me it looks like usb controller is screaming using IRQs to ohci_hcd driver: 'service me!' but driver does not listen.
When I boot kernel with APIC usb ohci interrupt controller looks like that:
When I boot kernel WITHOUT APIC usb ohci interrupt controller looks like that:
So maybe ohci_hcd driver is hanging because IO-APIC-fasteoi is too fast and XT-PIC-XT is slow enough to be serviced by driver on time.
APIC is new technology and XT-PIC-XT is from early 1980' IBM XT computer.
From the info in comment #10 it seems that the interrupt pin of OHCI is routed to 20/22 of I/O APIC on windows.
But from the info in comment #6 it seems that sometimes the interrupt pin of OHCI is routed to 21/22 of I/O APIC. Sometimes it is routed to 20/22 of I/O APIC.
When the boot option of "noapic" or "acpi=noirq" is added, it will use the I8259 instead of I/O APIC.
But it is strange that the OHCI can work well if the I8259 is used. But it can't work well under the I/O APIC mode.
And from the acpidump it seems that still link device is used even when it works in I/O APIC mode. I don't know whether this is related with the BIOS/hardware.
thank you for response :)
The usb bug I encounter does not disappear if I use I8259 by issuing noapic. It is only very, very rare that way (it only appears immediately when using torrent or fast ftp when downloading big file or other very fast and intensive job).
The APIC seems to be faster than PIC thus bug is more visible.
Because slow usb devices work perfect and on Linux, IRQs are never shared, the only thing to check is latency. Fast usb devices use bulk/iso transfers which are different than keyboard/mouse or other slow usb device transfers use.
For example during downloading 4,3 MB e2fsprogs-1.41.7.tar.gz using wget from fast ftp server I get constant 124kb/s. After short time of download usb hangs for a second which wget shows as drop in transfer to 23kb/s but soon transfer goes back to 124kb/s. The download continues and the situation happens again after a while and download still continues at 124kb/s. The third hang is deadly. USB is not responding till reboot. It looks to me ohci controller is so slow it is unable to keep up with system. When overloaded with data it hangs.
So I modified ohci usb driver source code (ohci-q.c, ohci-hub.c) by adding mdelay(5); before most of ohci_writel(...) and ohci_readl(...). After recompilation and reboot I used ohci usb without noapic workaround for much longer time without crashing. However after this modification usb hangs immediately after hotpluging other usb device so my ugly hack did not fix all the issues. :(
I will ask on mail list how to slow down ohci controller I/O RW in smart way when using bulk/iso transfers without hanging hotplug.
thanks for your detailed test.
From the test it seems that the issue can also be reproduced when adding the boot option of "noapic"/"acpi=noirq". But it is not easily reproduced.
Maybe this is related with OHCI driver.
Can we assign this bug to driver/USB category? Maybe we can get help from there.
I have tried with 2.6.30 kernel and I got a freeze for my usb serial modem (Alcatel X100). This problem occurs both on my pc and on a laptop with the same serial modem, but using kernel 2.6.27.
I checked on my logs and it seems to use ehci driver.
Moreover, it seems that the freeze happens on kernel 22.214.171.124, too, but in this case I have a complete freeze and I have to hard reboot my pc. In details:
1. The connection starts
2. For a random time the connection works perfectly
3. The connection freezes
4. After a while, the pppd daemon tries to kill the connection
5. The connection cannot be shut down, because the device (/dev/ttyUSB2) results temporary unavailable
6. At this time, I have to reboot, but linux is not able to kill the pppd process
7. If I don't reboot, after a while I get a complete hang
The freeze happens at (apparently) random times and it seems related to the concurrent use of the device.
My motherboard is a VIA KT400.
Unfortunately, I'm not skilled about the linux kernel, so feel free to require any detail I can give you.
This problem is still present on my system as well, using 2.6.31 on a Geforce 8200 chipset (Pegatron M2N78-LA commonly found in AMD-based HP desktops).
Keyboard and mouse work, for the most part, and the USB 1.1 freeze only crops up in the event that significant data is sent. Peripherals that can cause this issue include:
1. The Novatel U727 mobile broadband modem
2. Logitech USB Microphones
3. Older storage devices that use USB 1.1
The freeze will sometimes include stalling either the keyboard or mouse or both. lsusb becomes non-responsive, and repeated attempts to run lsusb after inserting the device can cause the keyboard and mouse to fail as well.
USB 2.0 devices, including hard drives, are not affected.
Created attachment 25135 [details]
dmsg output of hung USB system
dmesg output after USB hang when DS9490 connected to GeForce 8200 USB (PCI-e USB controller is present)
Created attachment 25136 [details]
dmsg output when USB operating normally
dmesg output when DS9490 connected to PCI-e USB interface. This configuration does not hang USB device.
Referring to Comments 16 & 17:
We are running ~25 Shuttle SN78SH7's in production that exhibit this problem. The SN78SH7's use GeForce 8200 chipsets. Our systems are running FC10 with only a DS9490R 1-wire bus interface connected to the USB. There is only 1 device a DS2405 addressable switch on the 1 wire bus. The program that reads the state of the DS2405 ~ every .5 seconds will hang in an Uninterruptabel Kernel Sleep after a few hours. If a PCI-e USB interface is plugged in and used for the DS9490R then the system runs USB hang free for at least 100 hours. We are VERY interested in anything we can do to get this problem resolved soon.
(In reply to comment #19)
> We are VERY
> interested in anything we can do to get this problem resolved soon.
You can workaround this problem right now by adding acpi=noirq or noapic parameter to kernel boot parameters.
Those parameters does the same: they replace APIC with PIC interrupt controller device. You will have usb problem solved but you will have less interrupts available for devices (16 instead of 24) and a little bit slower system.
I use this solution since February 2009 and my usb hdds, pendrives, adsl modems,irda modules, usb radios and all other usb devices works perfect.
I tested this bug on OpenSolaris. Using OpenSolaris 2009.06 LiveCD I copied RHEL5 iso file (2,9G size) sata->usb drive then usb->sata and finally compred them twice: sata-usb and usb-sata.
OpenSolaris passed all tests.
Here is my IRQ dump from OpenSolaris:
IRQ Vect IPL Bus Trg Type CPU Share APIC/INT# ISR(s)
1 0x40 5 ISA Edg Fixed 3 1 0x0/0x1 i8042_intr
6 0x42 5 ISA Edg Fixed 0 1 0x0/0x6 fdc_intr
9 0x81 9 PCI Lvl Fixed 1 1 0x0/0x9 acpi_wrapper_isr
12 0x41 5 ISA Edg Fixed 0 1 0x0/0xc i8042_intr
16 0x85 9 PCI Lvl Fixed 3 1 0x0/0x10 hci1394_isr
20 0x82 9 PCI Lvl Fixed 2 1 0x0/0x14 nv_intr_aif
21 0x83 9 PCI Lvl Fixed 1 1 0x0/0x15 ohci_intr
22 0x84 9 PCI Lvl Fixed 2 1 0x0/0x16 ohci_intr
23 0x43 5 PCI Lvl Fixed 1 1 0x0/0x17 ahci_intr
160 0xa0 0 Edg IPI all 0 - poke_cpu
192 0xc0 13 Edg IPI all 1 - xc_serv
208 0xd0 14 Edg IPI all 1 - kcpc_hw_overflow_intr
209 0xd1 14 Edg IPI all 1 - cbe_fire
210 0xd3 14 Edg IPI all 1 - cbe_fire
240 0xe0 15 Edg IPI all 1 - xc_serv
241 0xe1 15 Edg IPI all 1 - apic_error_intr
I tried RedHat Enterprise Linux 5 install DVD. When usb drive is connected the installer hangs on usb storage access.
I tried Linux 126.96.36.199 - bug still present.
It looks that with TickLess kernel enabled the bug appears earlier. When disabled - a little bit later but happens.
Very similar problem but in network cards:
"2.6.20->2.6.21 - networking dies after random time"
Maybe this could be solution also for our USB problems?
Unfortunately I can not test kernel 2.6.20 - it is too old for MCP78 mainboards - the SATA disk is not detected both in IDE and AHCI modes. I can not boot. Even if I backported PCI IDs neither the SATA disk nor CD SATA drive were detected.
I tried to revert "[PATCH] genirq: do not mask interrupts by default" in 188.8.131.52 but this kernel is so new that revert is not possible for me (no IRQ_DELAYED_DISABLE is present in current kernels).
If someone skilled in interrupts in Linux would write revert patch for "genirq: do not mask interrupts by default" for kernel 184.108.40.206 I would be happy to test it.
Switching kernel to use level interrupt handler instead of fasteoi makes bug much less present.
The bug is not only present on Geforce 8200 but nForce 730a too:
Created attachment 27007 [details]
Makes usb hang less occuring or not occuring at all. Hang depends on usb load.
I wrote this patch for myself. When applied and activated the hang will happen after hours not minutes. If usb is not overloaded with data the hang will not happen.
All USB bugs should be sent to the firstname.lastname@example.org mailing
list, and not entered into bugzilla. Please bring this issue up there,
if it is still a problem in the latest kernel release.