Bug 13405 - USB OHCI freezes after few minutes of heavy use. EHCI is OK. noapic or acpi=noirq almost provides workaround
Summary: USB OHCI freezes after few minutes of heavy use. EHCI is OK. noapic or acpi=n...
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Greg Kroah-Hartman
URL: http://www.asrock.com/mb/overview.asp...
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-30 20:32 UTC by Zbigniew Luszpinski
Modified: 2012-02-22 21:07 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.29.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
original DSDT table extracted and disassembled. (327.67 KB, application/octet-stream)
2009-05-30 20:32 UTC, Zbigniew Luszpinski
Details
Compilation log for DSDT (1Error, 22 warnings, 16 remarks) (6.04 KB, application/octet-stream)
2009-05-30 20:37 UTC, Zbigniew Luszpinski
Details
I fixed all errors/warnings/remarks but is does not resolve USB freezing (327.97 KB, application/octet-stream)
2009-05-30 20:39 UTC, Zbigniew Luszpinski
Details
Compilation log for fixed DSDT (0Error, 0 warnings, 0 remarks) (400 bytes, application/octet-stream)
2009-05-30 20:40 UTC, Zbigniew Luszpinski
Details
Full ACPI binary dump of all ACPI tables. You can use Intel acpi tools to extract and disassemble/compile it. (170.17 KB, application/octet-stream)
2009-05-30 20:47 UTC, Zbigniew Luszpinski
Details
Many logs, dmsg dumps, irq status dumps, acpi dumps, error logs - everything I tested (150.35 KB, application/octet-stream)
2009-05-30 21:23 UTC, Zbigniew Luszpinski
Details
IRQ assignment in Windows XP SP3 (25.34 KB, image/png)
2009-07-02 22:40 UTC, Zbigniew Luszpinski
Details
dmsg output of hung USB system (58.32 KB, text/plain)
2010-02-20 21:28 UTC, Karl Johnson
Details
dmsg output when USB operating normally (46.60 KB, text/plain)
2010-02-20 21:31 UTC, Karl Johnson
Details
Makes usb hang less occuring or not occuring at all. Hang depends on usb load. (2.51 KB, patch)
2010-07-03 12:44 UTC, Zbigniew Luszpinski
Details | Diff

Description Zbigniew Luszpinski 2009-05-30 20:32:33 UTC
Created attachment 21635 [details]
original DSDT table extracted and disassembled.

USB 1.1 devices freezes after few minutes of use. The more intensive communication the earlier usb ohci controller becomes frozen. USB 2.0 which use EHCI controller works perfect. USB 1.1 works perfect in Windows XP SP3 so this is software problem bios/acpi<->Linux.

Please look at acpi dump attached. Please tell me how to fix usb ohci.

Kernel parameters which almost workarounds the bug in usb 1.1:
noapic
acpi=noirq

kernel parameters which does NOT workarounds the problem:
maxcpus=1 (so this is not SMP problem)
pci=nomsi
usb-handoff  i8042.nomux

kernels tested: 2.6.27-29.4
bioses tested: P1.10, P1.60-1.90

When usb 1.1 is frozen:
Plugging/unpluging devices to any port makes no difference - they get no power and are not detected by kernel (I checked dmesg). rmmod ohci_hcd hangs rmmod command. lsusb also hangs. Any application which is used to talk to usb devices hangs after trying to use. Only reboot makes usb working. If usb device like harddisk have its own power supply is continues to be frozen after reboot.

Almost means usb 1.1 devices becomes usable in 95% cases. For example my ADSL modem speedtouch 330 (1Mbit link): without noapic/acpi=noirq I can receive mail from 1 mailbox or  open 1 webpage before usb freeze. If I open Akregator RSS reader which checks several RSS channels at once the usb freeze happens immediately. If I start Internet connection via modem but do not use it, it can work unfrozen.
If I use ktorrent to download CentOS DVD it hangs immediately because there is huge number of seed/leechers and my Linux connects to many IPs at once.

After booting kernel with noapic or acpi=noirq:
-Akregator RSS always works perfect
-web/mail works perfect too. Opening 20 tabs in Firefox with flash content/video makes no problem.
-git/svn works perfect
but:
-downloading CentOS DVD image from FTP site freezes usb 1.1 after random download time after hundreds of megabytes is already downloaded. I have seen freeze after about 300/900/1200MB downloaded data. (I use -c switch in wget to continue download after reboot).
-using ktorrent to download CentOS still frezez usb after some time of downloading.

Rebooting to WindowsXP SP3 allows me to download Centos DVD iso without any problem/retries from torrent (using uTorrent) or from the same FTP site.
This means this is not hardware issue.

Recently I upgraded my machine:
cpu: Athlon64 3000+ Venice -> Phenom 9550
ram: 4x256MiB DDR400 -> 2x1GiB DDR2 800 CL4
mobo:Asus A8N-VM CSM (Geforce 6150/nForce430)-> ASrock K10N78FullHD-hSLI 3.0
There was no problem with usb on my previous configuration.

USB devices which hangs with this Asrock mainboard (but works on Asus):
-Speedtouch 330 ADSL modem,
-ZTE ZXDSL2 ADSL modem (unicorn II chipset),
-external usb 2.0 sata hard drive (when usb 2.0 is disabled in bios it freezes in usb1.1 mode (sometimes when I plug it in. Sometimes when I mount it and copy files (it freezes on first file).
-USB irda dongle (sigmatel stir 4200 chipset)
-pendrive Kingstone 8GiB (the same problem like with usb hard drive)
-HP Deskjet 5940 printer
Comment 1 Zbigniew Luszpinski 2009-05-30 20:37:14 UTC
Created attachment 21636 [details]
Compilation log for DSDT (1Error, 22 warnings, 16 remarks)
Comment 2 Zbigniew Luszpinski 2009-05-30 20:39:28 UTC
Created attachment 21637 [details]
I fixed all errors/warnings/remarks but is does not resolve USB freezing
Comment 3 Zbigniew Luszpinski 2009-05-30 20:40:36 UTC
Created attachment 21638 [details]
Compilation log for fixed DSDT (0Error, 0 warnings, 0 remarks)
Comment 4 Zbigniew Luszpinski 2009-05-30 20:47:55 UTC
Created attachment 21639 [details]
Full ACPI binary dump of all ACPI tables. You can use Intel acpi tools to extract and disassemble/compile it.

With every bios release dmesg shows that ACPI interpreter says about [OEMB] table checksum error:
ACPI Warning (tbutils-0217): Incorrect checksum in table [OEMB] - 97, should be 8F [20080926]

I tried almost every bios release. The bad checksum is always too big by 8.
See for example 97-8F=8
Comment 5 Zbigniew Luszpinski 2009-05-30 21:04:20 UTC
Similar problem with USB. This mainboard is built on similar Geforce 8300 chipset:
https://bugs.launchpad.net/ubuntu/+bug/350065
Comment 6 Zbigniew Luszpinski 2009-05-30 21:23:20 UTC
Created attachment 21640 [details]
Many logs, dmsg dumps, irq status dumps, acpi dumps, error logs - everything I tested
Comment 7 Zbigniew Luszpinski 2009-05-30 21:39:16 UTC
OK. I sent you everything I tried to fix it by myself. The fixed ACPI table I now load statically during boot. However I fixed all the bugs this does not solve usb 1.1 hanging. I'm out of ideas how can I fix this. The noapic option allows me to use usb for everyday use in limited way: I can not download/transfer big files using torrent, ftp or local file transfer. Keeping Windows XP just only for transferring big files is sad option.

noapic and acpi=noirq clearly shows this is pure software issue and there must be something wrong with acpi irq/bios irq tables or timings.

This is my first mainboard which does not have an option in bios setup known from all my older mainboards: "Assign IRQ for USB" [Enable]
This is also my first multicore SMP mainboard/cpu but I doubt usb problem is connected to SMP.
Comment 8 Zbigniew Luszpinski 2009-05-31 12:52:28 UTC
kernel boot parameters I tried which makes no difference to usb problem:
usb_handoff i8042.nomux
pci=nomsi
maxcpus=1
clocksource=acpi_pm
clocksource=hpet
clocksource=tsc
clocksource=jiffies
acpi=force irqpoll
Comment 9 Shaohua 2009-07-01 03:29:25 UTC
is it possible you can run windows and see the interrupt assignment for ohci?
Comment 10 Zbigniew Luszpinski 2009-07-02 22:40:25 UTC
Created attachment 22181 [details]
IRQ assignment in Windows XP SP3

I installed Windows XP SP3 Polish for a while to see how IRQs were assigned.
Comment 11 Zbigniew Luszpinski 2009-07-02 23:15:11 UTC
I tested OHCI USB in Windows and works perfect. I even plugged in all USB devices to every USB port together and used them simultaneously. Still works perfect. I uninstalled every Nvidia/AMD cpu driver to make sure it works on reference drivers. Still works perfect. Linux 2.6.30 give ups with only one USB device plugged in to back of mainboard (ASrock says these ports are the best powered).

So:
1. The hardware is OK.
2. The power supply is enough.
3. Very slow OHCI USB devices always work on Linux without noapic or acpi=noirq workarounds: keyboards, mice, trackballs.
4. Single fast OHCI USB device make trouble: IRDA 4Mbit, ADSL modem 1Mbit(2 different tested), pendrive 12Mbit, usb hdd 12Mbit.
5. ADSL modem hangs usb only when begin to download something big (4,3 MB e2fsprogs-1.41.7.tar.gz archive is enough to hang USB in the middle of download on 1MBit ADSL connection).
6. USB hdd or pendrive can crash usb on plug in because they use 12Mbit.
7. The more bandwidth hungry device the usb hang will be earlier.
3. noapic or acpi=noirq workarounds 95% of trouble (except torrent and >1GB ftp download).
4. Today I watched what happens after usb hang with IRQs.
(watch -n1 /proc/interrupts).
After usb hang usb ohci controller to which usb device is connected is still generating interrupts as before hang. I did not see any interrupt storm or something like that. It keeps increasing interrupts number as before in the same manner and rhythm. Just any application can not access usb bus. For me it looks like usb controller is screaming using IRQs to ohci_hcd driver: 'service me!' but driver does not listen.

When I boot kernel with APIC usb ohci interrupt controller looks like that:
IO-APIC-fasteoi
When I boot kernel WITHOUT APIC usb ohci interrupt controller looks like that:
XT-PIC-XT

So maybe ohci_hcd driver is hanging because IO-APIC-fasteoi is too fast and XT-PIC-XT is slow enough to be serviced by driver on time.
APIC is new technology and XT-PIC-XT is from early 1980' IBM XT computer.
Comment 12 ykzhao 2009-07-06 03:49:12 UTC
Hi, Zbignew
    From the info in comment #10 it seems that the interrupt pin of OHCI is routed to 20/22 of I/O APIC on windows.
    But from the info in comment #6 it seems that sometimes the interrupt pin of OHCI is routed to 21/22 of I/O APIC. Sometimes it is routed to 20/22 of I/O APIC.

    When the boot option of "noapic" or "acpi=noirq" is added, it will use the I8259 instead of I/O APIC. 
    But it is strange that the OHCI can work well if the I8259 is used. But it can't work well under the I/O APIC mode.
    
    And from the acpidump it seems that still link device is used even when it works in I/O APIC mode. I don't know whether this is related with the BIOS/hardware.
    
thanks.
Comment 13 Zbigniew Luszpinski 2009-07-06 20:00:44 UTC
Hi ykzhao,

thank you for response :)
The usb bug I encounter does not disappear if I use I8259 by issuing noapic. It is only very, very rare that way (it only appears immediately when using torrent or fast ftp when downloading big file or other very fast and intensive job).
The APIC seems to be faster than PIC thus bug is more visible.

Because slow usb devices work perfect and on Linux, IRQs are never shared, the only thing to check is latency. Fast usb devices use bulk/iso transfers which are different than keyboard/mouse or other slow usb device transfers use.

For example during downloading 4,3 MB e2fsprogs-1.41.7.tar.gz using wget from fast ftp server I get constant 124kb/s. After short time of download usb hangs for a second which wget shows as drop in transfer to 23kb/s but soon transfer goes back to 124kb/s. The download continues and the situation happens again after a while and download still continues at 124kb/s. The third hang is deadly. USB is not responding till reboot. It looks to me ohci controller is so slow it is unable to keep up with system. When overloaded with data it hangs.

So I modified ohci usb driver source code (ohci-q.c, ohci-hub.c) by adding mdelay(5); before most of ohci_writel(...) and ohci_readl(...). After recompilation and reboot I used ohci usb without noapic workaround for much longer time without crashing. However after this modification usb hangs immediately after hotpluging other usb device so my ugly hack did not fix all the issues. :(

I will ask on mail list how to slow down ohci controller I/O RW in smart way when using bulk/iso transfers without hanging hotplug.
Comment 14 ykzhao 2009-07-07 00:39:07 UTC
hi, Zbigniew
    thanks for your detailed test.
    From the test it seems that the issue can also be reproduced when adding the boot option of "noapic"/"acpi=noirq". But it is not easily reproduced.
    Maybe this is related with OHCI driver.
    Can we assign this bug to driver/USB category? Maybe we can get help from there.

    Thanks.
Comment 15 Mirko 2009-07-08 09:18:51 UTC
Greetings,

I have tried with 2.6.30 kernel and I got a freeze for my usb serial modem (Alcatel X100). This problem occurs both on my pc and on a laptop with the same serial modem, but using kernel 2.6.27.
I checked on my logs and it seems to use ehci driver.

Moreover, it seems that the freeze happens on kernel 2.6.24.5, too, but in this case I have a complete freeze and I have to hard reboot my pc. In details:
1. The connection starts
2. For a random time the connection works perfectly
3. The connection freezes
4. After a while, the pppd daemon tries to kill the connection
5. The connection cannot be shut down, because the device (/dev/ttyUSB2) results temporary unavailable
6. At this time, I have to reboot, but linux is not able to kill the pppd process
7. If I don't reboot, after a while I get a complete hang
I've checked 
The freeze happens at (apparently) random times and it seems related to the concurrent use of the device.
My motherboard is a VIA KT400.

Unfortunately, I'm not skilled about the linux kernel, so feel free to require any detail I can give you.
Comment 16 Jason Ditz 2010-01-19 19:37:16 UTC
This problem is still present on my system as well, using 2.6.31 on a Geforce 8200 chipset (Pegatron M2N78-LA commonly found in AMD-based HP desktops).

Keyboard and mouse work, for the most part, and the USB 1.1 freeze only crops up in the event that significant data is sent. Peripherals that can cause this issue include: 

1. The Novatel U727 mobile broadband modem

2. Logitech USB Microphones

3. Older storage devices that use USB 1.1

The freeze will sometimes include stalling either the keyboard or mouse or both. lsusb becomes non-responsive, and repeated attempts to run lsusb after inserting the device can cause the keyboard and mouse to fail as well. 

USB 2.0 devices, including hard drives, are not affected.
Comment 17 Karl Johnson 2010-02-20 21:28:40 UTC
Created attachment 25135 [details]
dmsg output of hung USB system

dmesg output after USB hang when DS9490 connected to GeForce 8200 USB (PCI-e USB controller is present)
Comment 18 Karl Johnson 2010-02-20 21:31:41 UTC
Created attachment 25136 [details]
dmsg output when USB operating normally

dmesg output when DS9490 connected to PCI-e USB interface. This configuration does not hang USB device.
Comment 19 Karl Johnson 2010-02-20 21:42:24 UTC
Referring to Comments 16 & 17:
We are running ~25 Shuttle SN78SH7's in production that exhibit this problem. The SN78SH7's use GeForce 8200 chipsets. Our systems are running FC10 with only a DS9490R 1-wire bus interface connected to the USB. There is only 1 device a DS2405 addressable switch on the 1 wire bus. The program that reads the state of the DS2405 ~ every .5 seconds will hang in an Uninterruptabel Kernel Sleep after a few hours. If a PCI-e USB interface is plugged in and used for the DS9490R then the system runs USB hang free for at least 100 hours. We are VERY interested in anything we can do to get this problem resolved soon.
Comment 20 Zbigniew Luszpinski 2010-02-21 11:07:01 UTC
(In reply to comment #19)
> We are VERY
> interested in anything we can do to get this problem resolved soon.

You can workaround this problem right now by adding acpi=noirq or noapic parameter to kernel boot parameters.

Those parameters does the same: they replace APIC with PIC interrupt controller device. You will have usb problem solved but you will have less interrupts available for devices (16 instead of 24) and a little bit slower system.

I use this solution since February 2009 and my usb hdds, pendrives, adsl modems,irda modules, usb radios and all other usb devices works perfect.
Comment 21 Zbigniew Luszpinski 2010-02-21 12:39:46 UTC
I tested this bug on OpenSolaris. Using OpenSolaris 2009.06 LiveCD I copied RHEL5 iso file (2,9G size) sata->usb drive then usb->sata and finally compred them twice: sata-usb and usb-sata.
OpenSolaris passed all tests.

Here is my IRQ dump from OpenSolaris:
IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# ISR(s)
1    0x40 5   ISA    Edg Fixed  3   1     0x0/0x1   i8042_intr
6    0x42 5   ISA    Edg Fixed  0   1     0x0/0x6   fdc_intr
9    0x81 9   PCI    Lvl Fixed  1   1     0x0/0x9   acpi_wrapper_isr
12   0x41 5   ISA    Edg Fixed  0   1     0x0/0xc   i8042_intr
16   0x85 9   PCI    Lvl Fixed  3   1     0x0/0x10  hci1394_isr
20   0x82 9   PCI    Lvl Fixed  2   1     0x0/0x14  nv_intr_aif
21   0x83 9   PCI    Lvl Fixed  1   1     0x0/0x15  ohci_intr
22   0x84 9   PCI    Lvl Fixed  2   1     0x0/0x16  ohci_intr
23   0x43 5   PCI    Lvl Fixed  1   1     0x0/0x17  ahci_intr
160  0xa0 0          Edg IPI    all 0     -         poke_cpu
192  0xc0 13         Edg IPI    all 1     -         xc_serv
208  0xd0 14         Edg IPI    all 1     -         kcpc_hw_overflow_intr
209  0xd1 14         Edg IPI    all 1     -         cbe_fire
210  0xd3 14         Edg IPI    all 1     -         cbe_fire
240  0xe0 15         Edg IPI    all 1     -         xc_serv
241  0xe1 15         Edg IPI    all 1     -         apic_error_intr

I tried RedHat Enterprise Linux 5 install DVD. When usb drive is connected the installer hangs on usb storage access.

I tried Linux 2.6.32.8 - bug still present.

It looks that with TickLess kernel enabled the bug appears earlier. When disabled - a little bit later but happens.
Comment 22 Zbigniew Luszpinski 2010-03-14 19:29:18 UTC
Very similar problem but in network cards:
"2.6.20->2.6.21 - networking dies after random time"
http://www.mail-archive.com/linux-net@vger.kernel.org/msg01535.html

Maybe this could be solution also for our USB problems?

Unfortunately I can not test kernel 2.6.20 - it is too old for MCP78 mainboards - the SATA disk is not detected both in IDE and AHCI modes. I can not boot. Even if I backported PCI IDs neither the SATA disk nor CD SATA drive were detected.

I tried to revert "[PATCH] genirq: do not mask interrupts by default" in 2.6.32.8 but this kernel is so new that revert is not possible for me (no IRQ_DELAYED_DISABLE is present in current kernels).

If someone skilled in interrupts in Linux would write revert patch for "genirq: do not mask interrupts by default" for kernel 2.6.32.8 I would be happy to test it.
Comment 23 Zbigniew Luszpinski 2010-05-23 12:42:59 UTC
Switching kernel to use level interrupt handler instead of fasteoi makes bug much less present.
Comment 24 Zbigniew Luszpinski 2010-05-23 12:44:37 UTC
The bug is not only present on Geforce 8200 but nForce 730a too:
http://www.nvnews.net/vbulletin/showthread.php?p=2254508
Comment 25 Zbigniew Luszpinski 2010-07-03 12:44:47 UTC
Created attachment 27007 [details]
Makes usb hang less occuring or not occuring at all. Hang depends on usb load.

I wrote this patch for myself. When applied and activated the hang will happen after hours not minutes. If usb is not overloaded with data the hang will not happen.
Comment 26 Greg Kroah-Hartman 2012-02-22 21:07:17 UTC
All USB bugs should be sent to the linux-usb@vger.kernel.org mailing 
list, and not entered into bugzilla.  Please bring this issue up there,
if it is still a problem in the latest kernel release.

Note You need to log in before you can comment on or make changes to this bug.