Bug 212955 - Possible kernel regression USB2 (not USB3) port EDIROL UA-101 (in USB 1.1 mode, not USB2) error -110
Summary: Possible kernel regression USB2 (not USB3) port EDIROL UA-101 (in USB 1.1 mod...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-05 19:34 UTC by Lucas Endres
Modified: 2021-06-07 19:06 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.12.0-rc8-next-20210422
Subsystem:
Regression: No
Bisected commit-id:


Attachments
usbmon USB 3 port working (3.40 MB, text/plain)
2021-05-06 05:12 UTC, Lucas Endres
Details
usbmon USB 2 port non-working (85.25 KB, text/plain)
2021-05-06 05:14 UTC, Lucas Endres
Details
dmesg USB 3 port working (2.51 KB, text/plain)
2021-05-06 05:17 UTC, Lucas Endres
Details
dmesg USB 2 port non-working (3.53 KB, text/plain)
2021-05-06 05:17 UTC, Lucas Endres
Details
usbmon USB 2 port working on 5.9.0 (2.33 MB, text/plain)
2021-05-07 02:12 UTC, Lucas Endres
Details
dmesg USB 2 port working on 5.9.0 (5.51 KB, text/plain)
2021-05-07 02:21 UTC, Lucas Endres
Details

Description Lucas Endres 2021-05-05 19:34:59 UTC
While working helping to figure out universal EDIROL/Roland/BOSS snd-usb-audio support, I noticed a new issue with the last kernel.  Prior to using kernel 5.12.0-rc8-next-20210422, I'd been using 5.9.0-1-rt-amd64 from Debian's package, and it worked on it.  I have a recollection of it working on kernel 5.11 too, but I'm away from the testing computer and don't have any notes on it at the moment.

The EDIROL UA-101 has a physical switch which sets it to either "Full speed" (USB 1.1) mode, or "High speed" (USB 2) mode.  High Speed mode works on all kernels, but its Full speed mode only works either with prior kernels (both USB2 & USB3 port versions) or only USB3 ports on the current kernel.  These results are unvaried and quite reproducible on my hardware.

Whatever causes it, breaks both capture and playback, but here's just a capture example:

arecord -D plughw:UA101 -f S24_3LE -r 48000 -c 2 ./file.wav
arecord: main:830: audio open error: Connection timed out

(dmesg:)
[  188.956481] usb 4-1.5: cannot initialize interface; error -110: unknown error
[  194.076450] usb 4-1.5: cannot initialize interface; error -110: unknown error
[  199.196407] usb 4-1.5: cannot initialize interface; error -110: unknown error

I have yet to narrow down the cause in the code, but plan to try eventually.

Takashi Iwai on the alsa-devel list, had this suggestion for me:
The error -110 is ETIMEDOUT, and the symptom implies that the problem
is rather in USB core side.  You'd better ask USB devs.  It might be
worth to run git bisect if it's a kernel regression and reliably
reproducible.

It seems I need the git repository to use bisect, and the sheer bandwidth required would take me days or perhaps a week downloading, so if you can suggest a method of downloading just the past 5.11.0 to 5.12.0 history I'd be happy to do that.

I'll be glad to test whatever you wish, thanks!
Comment 1 Alan Stern 2021-05-05 20:20:24 UTC
Something else you can do is to collect and attach a couple of usbmon traces under a 5.12 kernel: a working trace (device attached to a USB-3 port) and a non-working trace (device attached to a USB-2 port).  Comparing them might show where the problem is.

In addition, before plugging in the device you should enable USB dynamic debugging by doing:

    echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control

and then afterwards attach the relevant parts of the dmesg logs.
Comment 2 Lucas Endres 2021-05-06 05:12:41 UTC
Created attachment 296661 [details]
usbmon USB 3 port working

Sorry, I didn't know how to eliminate other bus/device entries (since the non-working log would contain only seemingly nonsensical selections, if ":3:" is grepped).
Comment 3 Lucas Endres 2021-05-06 05:14:23 UTC
Created attachment 296663 [details]
usbmon USB 2 port non-working

Sorry, I didn't know how to eliminate other bus/device entries (since this non-working log would contain only seemingly nonsensical selections, if ":3:" is grepped).
Comment 4 Lucas Endres 2021-05-06 05:17:06 UTC
Created attachment 296665 [details]
dmesg USB 3 port working

I hope this helps you identify the usbmon entries, and eventually leads to a conclusion.
Comment 5 Lucas Endres 2021-05-06 05:17:55 UTC
Created attachment 296667 [details]
dmesg USB 2 port non-working

I hope this helps you identify the usbmon entries, and eventually leads to a conclusion.
Comment 6 Alan Stern 2021-05-06 15:45:40 UTC
Unfortunately the logs and traces didn't show anything definitive.  The working and non-working versions are essentially the same up to the point where in the non-working version, the devices fails to respond to a Set-Interface request.

There's no indication of why the failure occurred.  The same request succeeded earlier in the trace, and it succeeds in the working trace.

Incidentally, you can restrict the usbmon trace to include only devices on a particular bus by copying the appropriate debugfs file.  For example, to include only communications with devices on bus 4, you would copy the 4u file.

Can you collect a similar usbmon trace for the USB-2 port under an earlier kernel where it was working?

Although there have been a few changes to the ehci-hcd driver in the last couple of years, I don't see any changes that would have this effect.  The only commit which might be relevant is 64cc3f12d1c7 ("USB: EHCI: Do not return -EPIPE when hub is disconnected"), and that would affect only what happens after the error has occurred -- and it is already present in 5.9.  You can try reverting the commit anyway, just to see if it really does make any difference.

In the end, the only was to solve the puzzle may be bisection.  You can cut down the size of a git clone operation by using the --shallow-exclude= option.  In your case, you would exclude everything before v5.9 (or whatever kernel version you identify as most recently working).
Comment 7 Lucas Endres 2021-05-07 02:12:17 UTC
Created attachment 296681 [details]
usbmon USB 2 port working on 5.9.0

Here is the usbmon output from the same USB2 port, but working on the older 5.9.0 kernel.
Comment 8 Lucas Endres 2021-05-07 02:21:07 UTC
Created attachment 296683 [details]
dmesg USB 2 port working on 5.9.0

Here is the dmesg output from the same USB2 port, but working on the older 5.9.0 kernel.

Thanks for the tip about "--shallow-exclude=".  I used it so far to limit it to 5.11 (since I think it had been working on it), like so:
git clone --shallow-exclude=v5.11 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

It was much smaller, being only 332.27 MiB in total.  Now I just have to read up about how to use bisect.  Hopefully I'll be able to easily "make uninstall" when all through testing, as I prefer using debian's package management, and don't want leftover pieces taking up space.
Comment 9 Alan Stern 2021-05-07 15:07:38 UTC
Interesting.  I don't see any particular differences as far as the USB core is concerned, but there are major differences in the way the device gets used.  You can even see it in the dmesg output:

[  963.375444] usb 4-1.5: 0:1: add audio endpoint 0x2
[  963.376273] usb 4-1.5: 1:1: add audio endpoint 0x81

Those lines are present in 5.9 but not in 5.12, and they occur before the problem happens.  These messages may not be directly relevant to the problem at hand, but they do point to changes in the audio drivers.  Those changes may expose a bug somewhere else.

Judging from these results, I will guess that your bisection tests will pinpoint a commit that affects the snd-ua101 audio driver.
Comment 10 Lucas Endres 2021-05-08 03:10:20 UTC
Thanks for your thoughts!  I helped to test a change on snd-ua101 to fix its MIDI port not being created, but think it just blacklists the UA-101 from the snd-usb-audio driver, since it was conflicting.  Here's a link to that discussion: https://bugzilla.kernel.org/show_bug.cgi?id=212477

Also, there have been substantial quality Roland/EDIROL/BOSS detection quirk tweaks to snd-usb-audio (also with my championing and testing), but if I understand correctly, they shouldn't be affecting the UA-101, due to its use of only snd-ua101: https://bugzilla.kernel.org/show_bug.cgi?id=212519

Sorry I hadn't mentioned these before, but until now I thought they could have nothing to do with this issue.

In the next few days I hope to have time to attempt bisection to get to the bottom of it all.
Comment 11 Lucas Endres 2021-06-02 04:07:40 UTC
(In reply to Lucas Endres from comment #10)
> In the next few days I hope to have time to attempt bisection to get to the
> bottom of it all.

Sorry it took me so long (please let me know if it's been noticed and fixed already), but I finally have gotten through all the bisects to reveal this commit (I don't think it's what anyone was expecting):

51e6f07cb12e50bfc71181bd08bad5f0a9b5e9e2 is the first bad commit
commit 51e6f07cb12e50bfc71181bd08bad5f0a9b5e9e2
Merge: 5e321ded30 6b3788e5fb
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue May 4 10:48:05 2021 -0700

    Merge tag 'm68knommu-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

    Pull m68knommu updates from Greg Ungerer:

     - a fix for interrupt number range checking for the ColdFire SIMR
       interrupt controller.

     - changes for the binfmt_flat binary loader to allow RISC-V nommu
       support it needs to be able to accept flat binaries that have no gap
       between the text and data sections.

    * tag 'm68knommu-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
      m68k: coldfire: fix irq ranges
      riscv: Disable data start offset in flat binaries
      binfmt_flat: allow not offsetting data start

 arch/m68k/coldfire/intc-simr.c | 12 ++++++------
 arch/riscv/Kconfig             |  1 +
 fs/Kconfig.binfmt              |  3 +++
 fs/binfmt_flat.c               | 18 +++++++++++++-----
 4 files changed, 23 insertions(+), 11 deletions(-)


Thanks for any suggestions I may be needing to test,

  Lucas
Comment 12 Lucas Endres 2021-06-02 04:25:46 UTC
(In reply to Alan Stern from comment #9)
> Interesting.  I don't see any particular differences as far as the USB core
> is concerned, but there are major differences in the way the device gets
> used.  You can even see it in the dmesg output:
> 
> [  963.375444] usb 4-1.5: 0:1: add audio endpoint 0x2
> [  963.376273] usb 4-1.5: 1:1: add audio endpoint 0x81
> 
> Those lines are present in 5.9 but not in 5.12, and they occur before the
> problem happens.  These messages may not be directly relevant to the problem
> at hand, but they do point to changes in the audio drivers.  Those changes
> may expose a bug somewhere else.
> 
> Judging from these results, I will guess that your bisection tests will
> pinpoint a commit that affects the snd-ua101 audio driver.

I'm not on a RISC processor, by the way, just a run-of-the-mill intel i5-2400. So, yes, to me this seems to be a wider commit that may affect much more.
Comment 13 Alan Stern 2021-06-02 16:15:53 UTC
THat commit you found, 51e6f07cb12e, cannot be the real culprit.  Merge commits don't change any code (as a general rule).  Something must have gone wrong with the bisection procedure.

The way to test the result of a bisection is: First, check out the commit that the bisection identified, build it and boot that kernel, and verify that the problem occurs.  Second, check out the commit's parent (or revert the commit), build and boot the resulting kernel, and verify that the problem does not occur.

One thing you could do is get a list of commits affecting the snd-ua101 driver and try checking out and testing various ones, to find the first commit which exhibits the problem.
Comment 14 Lucas Endres 2021-06-02 18:15:26 UTC
(In reply to Alan Stern from comment #13)
> THat commit you found, 51e6f07cb12e, cannot be the real culprit.  Merge
> commits don't change any code (as a general rule).  Something must have gone
> wrong with the bisection procedure.
> 
> The way to test the result of a bisection is: First, check out the commit
> that the bisection identified, build it and boot that kernel, and verify
> that the problem occurs.  Second, check out the commit's parent (or revert
> the commit), build and boot the resulting kernel, and verify that the
> problem does not occur.
> 
> One thing you could do is get a list of commits affecting the snd-ua101
> driver and try checking out and testing various ones, to find the first
> commit which exhibits the problem.

I probably was doing it wrong, then, but thought I was being quite thorough.  Here's the procedure I'd been following (probably from https://git-scm.com/docs/git-bisect):

0. cd /usr/src/linux-next

0.1 On the first run, I ran these: "git bisect start", "git bisect bad", "git bisect good $OriginalKnownGoodVersion" (I think the original known good was v5.12.0 or the first commit to 5.12.0)

1. Copy my .config file to the source directory, in case it changes with bisect

2. make -j4 menuconfig, and then just exit and save, just to save .config with any automatic changes needed for this source

3. Remove space hogs and any prior configuration: rm -r /usr/src/linux-next/debian/ /boot/vmlinuz-5.12* /boot/initrd-5.12* /usr/src/linux-image-5.12*

4. make -j4 deb-pkg

5. Install the kernel just built, and update grub accordingly: dpkg -i ../linux-image-5.12*_*.deb

6. Power off the computer

7. Power on and boot the new kernel

8. Test the bug (if its designation changes to "hw:USB1" I expect it'll be working, but still test it): arecord -D hw:UA101 -f S24_3LE -r 48000 -c 2 ./ua101.wav

9. If it records without the error: cd /usr/src/linux-next ; git bisect good,
Or with the error: cd /usr/sr/src/linux-next; git bisect bad

10. Repeat 1-10 until I got the output I gave you above.

I was under the assumption that running "git bisect {good,bad}" was enough to checkout the next version that needed to be tested, and it seems that it must have been. This is because I had quite a few that didn't work, then one that did and another that didn't, eventually ending with the remaining few all working.

If I, indeed, needed to be running checkout (as, say step 0.5), /usr/src/linux-next/.git/refs/bisect contains a logged list of the first bad and some good commits.  Reverting each commit previously tested (logged) and retesting sounds even more time consuming, but if you don't mind waiting and think it's necessary, I will continue this endeavor, and eventually get back to you.
Comment 15 Lucas Endres 2021-06-02 19:51:41 UTC
(In reply to Lucas Endres from comment #14)
> Reverting each commit previously tested (logged) and
> retesting sounds even more time consuming, but if you don't mind waiting and
> think it's necessary, I will continue this endeavor, and eventually get back
> to you.

Tonight, before I attempt anything that major, I'll build the commit just before 51e6f07cb12e to see if it's working (although it probably already was tested during one of the aforesaid bisect iterations).

Hopefully, that will be enough to prove that the glitch-causing commit was 51e6f07cb12e, or prove to me that you're right and I'll have to spend many more days at this.
Comment 16 Alan Stern 2021-06-02 19:54:09 UTC
You do not have to go through that time-consuming procedure (checking out and reverting) for every commit in the bisection process.  Those instructions I gave you were just for verifying that the final commit found by the bisection was indeed the one causing the problem.
Comment 17 Lucas Endres 2021-06-02 20:00:10 UTC
(In reply to Alan Stern from comment #16)
> You do not have to go through that time-consuming procedure (checking out
> and reverting) for every commit in the bisection process.  Those
> instructions I gave you were just for verifying that the final commit found
> by the bisection was indeed the one causing the problem.

Okay, that will save my sanity, thanks for the clarification. :)
Comment 18 Lucas Endres 2021-06-03 04:42:18 UTC
(In reply to Lucas Endres from comment #15)
> Tonight, before I attempt anything that major, I'll build the commit just
> before 51e6f07cb12e to see if it's working (although it probably already was
> tested during one of the aforesaid bisect iterations).
> 
> Hopefully, that will be enough to prove that the glitch-causing commit was
> 51e6f07cb12e, or prove to me that you're right and I'll have to spend many
> more days at this.

Well, thanks Alan, you've been proven correct, as neither it or its prior commit worked.  So I'm back to the drawing board, but have begun with my closest logged and tested commits.  I've only just begun to compile the first bisect suggestion, although it hasn't started to estimate how many commits I'll need to test yet:

git bisect reset
Updating files: 100% (12637/12637), done.
Previous HEAD position was bf05bf16c7 Linux 5.12-rc8
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
git bisect start
git bisect bad b208108638c4bd3215792415944467c36f5dfd97 # May 3rd
git bisect good 698f99ed5e06946764c3be035ce9d62a2691e08c # April 29
Bisecting: a merge base must be tested
[6daa755f813e6aa0bcc97e352666e072b1baac25] Merge tag 's390-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux # April 27 so prior to known-good for to test a merge base?...  I'm building it anyway.
Comment 19 Lucas Endres 2021-06-05 03:36:08 UTC
(In reply to Lucas Endres from comment #18)
> (In reply to Lucas Endres from comment #15)
> > Tonight, before I attempt anything that major, I'll build the commit just
> > before 51e6f07cb12e to see if it's working (although it probably already
> was
> > tested during one of the aforesaid bisect iterations).
> > 
> > Hopefully, that will be enough to prove that the glitch-causing commit was
> > 51e6f07cb12e, or prove to me that you're right and I'll have to spend many
> > more days at this.
> 
> Well, thanks Alan, you've been proven correct, as neither it or its prior
> commit worked.  So I'm back to the drawing board, but have begun with my
> closest logged and tested commits.  I've only just begun to compile the
> first bisect suggestion, although it hasn't started to estimate how many
> commits I'll need to test yet:
> 
> git bisect reset
> Updating files: 100% (12637/12637), done.
> Previous HEAD position was bf05bf16c7 Linux 5.12-rc8
> Switched to branch 'master'
> Your branch is up to date with 'origin/master'.
> git bisect start
> git bisect bad b208108638c4bd3215792415944467c36f5dfd97 # May 3rd
> git bisect good 698f99ed5e06946764c3be035ce9d62a2691e08c # April 29
> Bisecting: a merge base must be tested
> [6daa755f813e6aa0bcc97e352666e072b1baac25] Merge tag 's390-5.13-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux # April 27 so prior
> to known-good for to test a merge base?...  I'm building it anyway.

Sorry to bother you over something so trivial, but the last "bad" point I chose turned out good in the end after the couple compilations I needed to test, so I have a re-pretested truly bad one that will be a much longer journey, but I hope will lead to the ultimate truth:


git bisect reset
Previous HEAD position was 51e6f07cb1 Merge tag 'm68knommu-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
git bisect start
git bisect good b208108638c4bd3215792415944467c36f5dfd97
git bisect bad 51e6f07cb12e50bfc71181bd08bad5f0a9b5e9e2
Bisecting: a merge base must be tested
[6daa755f813e6aa0bcc97e352666e072b1baac25] Merge tag 's390-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

That already definitely tested good during the previous stint, so:
git bisect good
Bisecting: 5295 revisions left to test after this (roughly 12 steps)

git bisect good
Bisecting: 2647 revisions left to test after this (roughly 11 steps)
[2342eb1afe00586a018536c4bf9e04d7aa4bf63e] llc2: Remove redundant assignment to rc

This is what I'm currently building, but I'll get back to you when I'm finally finished again, this time with even the suspect and prior commit tested.

Thanks!
Comment 20 Lucas Endres 2021-06-07 03:47:36 UTC
I sped through it this time with definite success:

The commit causing it was one I helped diagnose so that the UA-101's MIDI device became available, but since I told Takashi Iwai about the issue and he wanted me to check with you USB gurus, maybe we can help eachother in figuring out why the snd-ua101 module doesn't work with its "Hi-SPEED" switch set to "off" (USB1 mode) only. The snd-usb-audio module apparently did work for USB1 mode, but caused the MIDI port to be unavailable.

And, yes, its parent commit tested good (473d5ae82d73c3b21160b725af0e247fa94d7832), and this really was tested as bad:

d763145312582c08c4e9ed99d61276cde8488256 is the first bad commit
commit d763145312582c08c4e9ed99d61276cde8488256
Author: Takashi Iwai <tiwai@suse.de>
Date:   Thu Apr 8 09:56:56 2021 +0200

    ALSA: usb-audio: Skip probe of UA-101 devices

    UA-101 device and co are supported by another driver, snd-ua101, but
    the USB audio class driver (snd-usb-audio) catches all and this
    resulted in the lack of functionality like missing MIDI devices.

    This patch introduces a sort of deny-listing for those devices to just
    return -ENODEV at probe in snd-usb-audio driver, so that it falls back
    to the probe by snd-ua101.

    BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=212477
    Link: https://lore.kernel.org/r/20210408075656.30184-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>

 sound/usb/card.c         |  2 ++
 sound/usb/quirks-table.h | 21 +++++++++++++++++++++
 sound/usb/usbaudio.h     |  1 +
 3 files changed, 24 insertions(+)


I think it will, but can only hope that this time, it will lead to the solution.

Thanks!
Comment 21 Lucas Endres 2021-06-07 04:39:14 UTC
According to my comment here: https://bugzilla.kernel.org/show_bug.cgi?id=212477#c13

USB 1.1 mode was working for me there, but I was probably testing using the previous patch (not the finalized version), and I had just reverted the other BOSS/EDIROL/Roland quirk patches we'd been working on to rule out any conflicts with them.  Although, they are included in the linux-next branch I've been diagnosing here, they seem to appear after this, and otherwise shouldn't be causing it, due to this patch blacklisting the UA-101 from snd-usb-audio.

Here's the culmination of the other BOSS/EDIROL/Roland quirk table patches: 
https://patchwork.kernel.org/project/alsa-devel/patch/20210422120413.457-2-tiwai@suse.de/
And my bugzilla report leading to it:
https://bugzilla.kernel.org/show_bug.cgi?id=212519

Thanks for any light you can shed on the matter, but it seems to me that I will have to contact Takashi Iwai again,

  Lucas
Comment 22 Alan Stern 2021-06-07 15:16:48 UTC
Using two different drivers would definitely explain the different patterns of communication with the device in your two usbmon traces.

But it doesn't explain why, with one of the drivers, the device seems to stop responding when used with EHCI (and works okay with xHCI!).  Maybe this is caused by a bug in the device's firmware.  It that's true, changing the communication pattern might work around the problem.
Comment 23 Lucas Endres 2021-06-07 19:06:36 UTC
(In reply to Alan Stern from comment #22)
> Using two different drivers would definitely explain the different patterns
> of communication with the device in your two usbmon traces.
> 
> But it doesn't explain why, with one of the drivers, the device seems to
> stop responding when used with EHCI (and works okay with xHCI!).  Maybe this
> is caused by a bug in the device's firmware.  It that's true, changing the
> communication pattern might work around the problem.

Sorry, but here's another pertinent factor that I should have mentioned earlier if I haven't:
It could very well be a bug in the UA-101's firmware, since if I connect an audible input and watch it on the UA-101's builtin levels meter (with it already on), I think it does freeze.  The levels remain when the input is unplugged, after plugging in USB while running a nonworking commit.

Thanks for the insight!

I suppose I should start comparing the code to each module, since I think that's what you're suggesting.

Note You need to log in before you can comment on or make changes to this bug.