Bug 6994
Description
Felix Schueller
2006-08-12 14:59:15 UTC
One thing here looks very suspicious. What's up in the second log (/var/log/debug from 2.6.17.8) with the long delay between 23:08:09 and 23:31:39? Was the usb-audio driver just spinning its wheels, or was it something more sinister? Unfortunately the log from 2.6.18-rc4 didn't include the analogous time period so we can't compare against it. What happens if you don't run the speaker-test program, but just unplug the speaker? Can you then plug in the USB stick successfully? Something to try: Set CONFIG_DEBUG_FS in the kernel configuration. Then before repeating your test, do mount -t debugfs none /sys/kernel/debug echo 2 >/sys/module/uhci_hcd/parameters/debug Then after unplugging the speaker but before plugging in anything else, make a copy of /sys/kernel/debug/uhci/0000:00:07.2. Make another copy of that file after plugging in the USB stick, and attach both copies here. Just to keep things simple, let's focus only on 2.6.18-rc4 for now. Created attachment 8785 [details]
copy of /sys/kernel/debug/uhci/0000:00:07.2 after unpluging the speaker
Created attachment 8786 [details]
copy of /sys/kernel/debug/uhci/0000:00:07.2 after pluging in the stick
> One thing here looks very suspicious. What's up in the second log > (/var/log/debug from 2.6.17.8) with the long delay between 23:08:09 and > 23:31:39? Was the usb-audio driver just spinning its wheels, or was it > something more sinister? Sorry for that, thats my fault. I started to write the bugreport but forgot that alsamixergui was still runing. I think at 23:31:39 alsamixergui was killed. > What happens if you don't run the speaker-test program, but just unplug the > speaker? Can you then plug in the USB stick successfully? Yes, then I also can plug the speaker back. The files are attatched. Produced by: Reboot, Pluging the speaker in, speaker-test, ^C, unplug the speaker, cp, plugin the stick. Created attachment 8789 [details]
Improved frame-related diagnostics
Please try out the attached diagnostic patch. It tests for certain kinds of
errors that the current driver doesn't look for, errors which you may be
getting.
After applying the patch, run the same test as before. If the first file copy
(before plugging in the stick) contains a bunch of warnings about links not
matching, then there's no need to make the second copy.
Created attachment 8795 [details]
copy of /sys/kernel/debug/uhci/0000:00:07.2 after unpluging the speaker 2
Created attachment 8796 [details]
8786: copy of /sys/kernel/debug/uhci/0000:00:07.2 after pluging in the stick 2
No big changes, afais. I doublechecked that the Patch is applied.
Created attachment 8799 [details]
Diagnostic patch for UHCI
I can tell this is going to be a hard problem to solve. :-( The logs didn't
point to anything in particular, but they did show the controller was in a
weird state.
The attached patch is an improved version of the previous one. Along with the
new diagnostics, it also suppresses all those "usb_check_bandwidth" messages
that clutter up the system log.
In addition to the usual copies, this time try making a copy of the file while
the test tone is playing, and then again after the test tone fizzles out but
before you unplug the device. You don't have any other USB devices attached,
do you? If yes, it would be best to unplug them before running the test.
Also attach the dmesg output for the test. (You can clear the dmesg buffer
before starting the test by doing "dmesg -c >/dev/null".)
Finally, it might help to see the output from usbmon throughout the test.
There are instructions for running usbmon in the kernel source file
Documentation/usb/usbmon.txt. To build usbmon you have to set CONFIG_USB_MON.
The file you will want to cat is /sys/kernel/debug/usbmon/1t.
I had a usb-hub and a cardreader connected in the other usb-port, and during the first test i testet all combinations (speaker at hub, speaker alone, hub and speaker at different ports) and got no change (no sound in all 3 settings) so i thought that doesn't matter. But now I have the following situation: 1. speaker only: speacker-test give a tone, but stops after a while (specially if i work with the machine). After unplug I can plug in any other device, this includes the speaker, which are then functionable, again. 2. Speaker and some other devices: Situation from the first report, but if I disconnect every device, the bus recovers. Shame on me, I totally forgot the hub during writing the first bugreport, and during the second test I don't want to change the setup. I still think it's a bug, but my inital bugreport wasn't the best. I thought that might happen. When there are no devices attached, the driver automatically stops the controller. Then when you plug in a new device, the controller is restarted and it works okay. Apparently while the speaker-test program runs, the controller gets into a state where it thinks it is running but it doesn't actually do anything. That's why I asked you to make copies of that file while the program was running. Stopping and restarting the controller gets it going again. But you're right, this is a bug. Probably a bug in the driver, although just possibly a bug in the hardware. Have you tried using a 2.6.16 kernel? There was a significant change in uhci-hcd just before 2.6.17 came out, affecting exactly this sort of operation. There's a very good chance the change had an error in it. Not having any USB audio cards, I couldn't test it very well. Created attachment 8807 [details]
copy of /sys/kernel/debug/uhci/0000:00:07.2 during test tone
Created attachment 8808 [details]
copy of /sys/kernel/debug/uhci/0000:00:07.2 after test tone stops
Created attachment 8809 [details]
copy of /sys/kernel/debug/uhci/0000:00:07.2 after plugout in the speaker
Created attachment 8810 [details]
output of dmesg
Created attachment 8811 [details]
/sys/kernel/debug/usbmon/1t (bzip2)
Comment on attachment 8811 [details]
/sys/kernel/debug/usbmon/1t (bzip2)
I test the speaker with 2.6.16 later, I have to kompile the kernel for that, an
the machine ist not the fastest.
The error is also reproducible with kernel version 2.6.16 The usbmon log is interesting. Here's where the errors start occurring: d3b406a0 390495195 C Zo:002:06 -63 64 > d3b406a0 390495225 S Zo:002:06 -115 192 = 20c020c0 00c000c0 16c016c0 62c062c0 d3b40720 390496176 C Zo:002:06 -63 64 > d3b40720 390496184 S Zo:002:06 -115 192 = e839e839 613b613b a83ca83c bb3dbb3d d3b401a0 390497218 C Zo:002:06 -63 128 > d3b401a0 390497248 S Zo:002:06 -115 192 = 30d430d4 93d193d1 1dcf1dcf d1ccd1cc d3b40220 390522968 C Zo:002:06 -18 0 Those -63 values mean the USB controller couldn't read the data over the PCI bus in time to send it out the USB bus. The -18 error indicates that the controller just stopped sending data entirely. It's possible that your computer's PCI bus is getting saturated, but I think it's more likely that the USB controller has a hardware problem. The total amount of data being sent isn't all that large (192 bytes every millisecond); I see transfer rates higher than that all the time. This could mean that the controller really isn't working right. It does not look like a problem in the driver, after all. Can you try out the speakers on a different Linux computer, hopefully one with a UHCI controller? Created attachment 8821 [details]
Don't stop when an ISO packet gets an error
On second reading, the usbmon log did show up a small bug in the driver. It
didn't cause your problem, but I would like to make sure it gets fixed anyway.
Please apply this patch (keep the other one too) and try the test again. This
time I don't need to see the /sys/kernel/debug/uhci files; only the usbmon log
is important.
Created attachment 8826 [details]
/sys/kernel/debug/usbmon/1t 2. version (bzip2)
Your Patch changes to little things:
The output of speakertest is a bit different ( " Time per period = x")
x stays now 2.5 <= x <= 3.5, before it Jumps to >10 if the tone stops.
The "playback drain error (DMA or IRQ trouble?)" message in /var/log/messages
is away.
Every thing else is as before.
"nrpacks=1" was set (hidden in /etc/modprobe.conf) from my first tests, but
unsetting it doesn't change anything.
I found a different computer with an uhci, but theres no Linux installed, so at
first i have to download some recent knoppix or something else.
Maybe i can test this on saturday oder sunday.
Created attachment 8827 [details]
Don't stop on ISO error and early rejection of late ISO URBs
All right. The usbmon information points pretty definitely to a hardware
problem rather than a software problem. Only testing on a different computer
will tell us for sure.
Your last test confirmed that the patch works as desired. Like I expected, it
didn't solve the problem. Here's an updated version of that patch. It won't
solve the problem either, but it may change the symptoms a little. Again,
please send just the usbmon log for this test.
Created attachment 8836 [details]
/sys/kernel/debug/usbmon/1t 3. version (bzip2)
As you said, the patch didn't solve the problem, only one little change:
If i put load on the machine, and if the tone stops, the " Time per period = x"
message of speacker-test decreases to x ~ 0.5
Okay, thanks for trying it out. Although the visible symptoms didn't change much, there's a significant difference in the usbmon output. It confirms that the patch works as desired. At this point there's nothing left for me to do but wait until you can try the device on the other computer. You know, it's possible that this isn't exactly a hardware issue in the USB controller. It could be there's something funny about the chipset on the motherboard and it's not getting set right by the kernel. However that would be outside my area of expertise. It really looks like a hardware problem, I tested the speaker on an other Computer with an uhci, and I couldn't trigger the bug. And also a quick test (just copying a bunch of files) with an external harddisk attached to usb on my computer gave something: [see logs below] Is it possible that the usb_storage driver also triggers the problem, but finds a way to recover? And one question: If it's a hardwareproblem, why does it only occurs if I put load an the machine? (at least if the speaker are the only devices at the bus?) /var/log/messages: Aug 20 21:45:41 frodo kernel: usb 1-2: reset full speed USB device using uhci_hcd and address 2 Aug 20 21:46:59 frodo kernel: usb 1-2: reset full speed USB device using uhci_hcd and address 2 Aug 20 21:48:37 frodo last message repeated 3 times Aug 20 21:50:22 frodo last message repeated 2 times Aug 20 21:51:56 frodo last message repeated 2 times Aug 20 21:53:25 frodo kernel: usb 1-2: reset full speed USB device using uhci_hcd and address 2 Aug 20 21:53:50 frodo kernel: usb 1-2: reset full speed USB device using uhci_hcd and address 2 Aug 20 21:58:36 frodo kernel: usb 1-2: USB disconnect, address 2 /var/log/debug: Aug 20 21:45:41 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:45:41 frodo kernel: hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0004 Aug 20 21:45:41 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:45:41 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:46:59 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:46:59 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:46:59 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:47:25 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:47:26 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:47:26 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:47:33 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:47:34 frodo kernel: hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0004 Aug 20 21:47:34 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:47:34 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:48:37 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:48:38 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:48:38 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:49:34 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:49:35 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:49:35 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:50:21 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:50:22 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:50:22 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:51:02 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:51:02 frodo kernel: hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0004 Aug 20 21:51:03 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:51:03 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:51:56 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:51:56 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:51:56 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:53:24 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:53:25 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:53:25 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:53:50 frodo kernel: usb 1-2: uhci_result_common: failed with status 440000 Aug 20 21:53:50 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:53:50 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:58:36 frodo kernel: hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0004 Aug 20 21:58:36 frodo kernel: uhci_hcd 0000:00:07.2: port 2 portsc 008a,00 Aug 20 21:58:36 frodo kernel: hub 1-0:1.0: port 2, status 0100, change 0003, 12 Mb/s Aug 20 21:58:36 frodo kernel: usb 1-2: usb_disable_device nuking all URBs Aug 20 21:58:36 frodo kernel: usb 1-2: unregistering interface 1-2:1.0 Aug 20 21:58:36 frodo kernel: usbdev1.2_ep01: ep_device_release called for usbdev1.2_ep01 Aug 20 21:58:36 frodo kernel: usbdev1.2_ep82: ep_device_release called for usbdev1.2_ep82 Aug 20 21:58:36 frodo kernel: usb 1-2:1.0: uevent Aug 20 21:58:36 frodo kernel: usb 1-2: unregistering device Aug 20 21:58:36 frodo kernel: usbdev1.2_ep00: ep_device_release called for usbdev1.2_ep00 Aug 20 21:58:36 frodo kernel: usb 1-2: uevent Aug 20 21:58:36 frodo kernel: hub 1-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100 Aug 20 21:58:37 frodo kernel: usb usb1: suspend_rh (auto-stop) > Is it possible that the usb_storage driver also triggers the problem, but > finds a way to recover? Yes, it is quite possible. usb-storage has very good error recovery -- although it's aimed more at recovering from errors in the attached device than errors in the USB controller. Notice all the reset messages in your log; each one is an attempt to recover from an error. > And one question: If it's a hardwareproblem, why does it only occurs if I put > load an the machine? (at least if the speaker are the only devices at the > bus?) I don't know the answer. Perhaps some component on your computer (like the UHCI controller) is slightly defective and is operating near its limit, and increasing the load pushes it over the edge. Anyway, it looks like you can close out this bug report. If anything changes, you can re-open it. Yes, I think its time to do so (And for me to buy an Usb hostcontroller :( ) Thanks for your help. Thanks Alan for your help on this. |