Bug 212955
Summary: | Possible kernel regression USB2 (not USB3) port EDIROL UA-101 (in USB 1.1 mode, not USB2) error -110 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Lucas Endres (jaffa225man) |
Component: | USB | Assignee: | Default virtual assignee for Drivers/USB (drivers_usb) |
Status: | NEW --- | ||
Severity: | normal | CC: | jaffa225man |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.12.0-rc8-next-20210422 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
usbmon USB 3 port working
usbmon USB 2 port non-working dmesg USB 3 port working dmesg USB 2 port non-working usbmon USB 2 port working on 5.9.0 dmesg USB 2 port working on 5.9.0 |
Description
Lucas Endres
2021-05-05 19:34:59 UTC
Something else you can do is to collect and attach a couple of usbmon traces under a 5.12 kernel: a working trace (device attached to a USB-3 port) and a non-working trace (device attached to a USB-2 port). Comparing them might show where the problem is. In addition, before plugging in the device you should enable USB dynamic debugging by doing: echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control and then afterwards attach the relevant parts of the dmesg logs. Created attachment 296661 [details]
usbmon USB 3 port working
Sorry, I didn't know how to eliminate other bus/device entries (since the non-working log would contain only seemingly nonsensical selections, if ":3:" is grepped).
Created attachment 296663 [details]
usbmon USB 2 port non-working
Sorry, I didn't know how to eliminate other bus/device entries (since this non-working log would contain only seemingly nonsensical selections, if ":3:" is grepped).
Created attachment 296665 [details]
dmesg USB 3 port working
I hope this helps you identify the usbmon entries, and eventually leads to a conclusion.
Created attachment 296667 [details]
dmesg USB 2 port non-working
I hope this helps you identify the usbmon entries, and eventually leads to a conclusion.
Unfortunately the logs and traces didn't show anything definitive. The working and non-working versions are essentially the same up to the point where in the non-working version, the devices fails to respond to a Set-Interface request. There's no indication of why the failure occurred. The same request succeeded earlier in the trace, and it succeeds in the working trace. Incidentally, you can restrict the usbmon trace to include only devices on a particular bus by copying the appropriate debugfs file. For example, to include only communications with devices on bus 4, you would copy the 4u file. Can you collect a similar usbmon trace for the USB-2 port under an earlier kernel where it was working? Although there have been a few changes to the ehci-hcd driver in the last couple of years, I don't see any changes that would have this effect. The only commit which might be relevant is 64cc3f12d1c7 ("USB: EHCI: Do not return -EPIPE when hub is disconnected"), and that would affect only what happens after the error has occurred -- and it is already present in 5.9. You can try reverting the commit anyway, just to see if it really does make any difference. In the end, the only was to solve the puzzle may be bisection. You can cut down the size of a git clone operation by using the --shallow-exclude= option. In your case, you would exclude everything before v5.9 (or whatever kernel version you identify as most recently working). Created attachment 296681 [details]
usbmon USB 2 port working on 5.9.0
Here is the usbmon output from the same USB2 port, but working on the older 5.9.0 kernel.
Created attachment 296683 [details] dmesg USB 2 port working on 5.9.0 Here is the dmesg output from the same USB2 port, but working on the older 5.9.0 kernel. Thanks for the tip about "--shallow-exclude=". I used it so far to limit it to 5.11 (since I think it had been working on it), like so: git clone --shallow-exclude=v5.11 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git It was much smaller, being only 332.27 MiB in total. Now I just have to read up about how to use bisect. Hopefully I'll be able to easily "make uninstall" when all through testing, as I prefer using debian's package management, and don't want leftover pieces taking up space. Interesting. I don't see any particular differences as far as the USB core is concerned, but there are major differences in the way the device gets used. You can even see it in the dmesg output: [ 963.375444] usb 4-1.5: 0:1: add audio endpoint 0x2 [ 963.376273] usb 4-1.5: 1:1: add audio endpoint 0x81 Those lines are present in 5.9 but not in 5.12, and they occur before the problem happens. These messages may not be directly relevant to the problem at hand, but they do point to changes in the audio drivers. Those changes may expose a bug somewhere else. Judging from these results, I will guess that your bisection tests will pinpoint a commit that affects the snd-ua101 audio driver. Thanks for your thoughts! I helped to test a change on snd-ua101 to fix its MIDI port not being created, but think it just blacklists the UA-101 from the snd-usb-audio driver, since it was conflicting. Here's a link to that discussion: https://bugzilla.kernel.org/show_bug.cgi?id=212477 Also, there have been substantial quality Roland/EDIROL/BOSS detection quirk tweaks to snd-usb-audio (also with my championing and testing), but if I understand correctly, they shouldn't be affecting the UA-101, due to its use of only snd-ua101: https://bugzilla.kernel.org/show_bug.cgi?id=212519 Sorry I hadn't mentioned these before, but until now I thought they could have nothing to do with this issue. In the next few days I hope to have time to attempt bisection to get to the bottom of it all. (In reply to Lucas Endres from comment #10) > In the next few days I hope to have time to attempt bisection to get to the > bottom of it all. Sorry it took me so long (please let me know if it's been noticed and fixed already), but I finally have gotten through all the bisects to reveal this commit (I don't think it's what anyone was expecting): 51e6f07cb12e50bfc71181bd08bad5f0a9b5e9e2 is the first bad commit commit 51e6f07cb12e50bfc71181bd08bad5f0a9b5e9e2 Merge: 5e321ded30 6b3788e5fb Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue May 4 10:48:05 2021 -0700 Merge tag 'm68knommu-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: - a fix for interrupt number range checking for the ColdFire SIMR interrupt controller. - changes for the binfmt_flat binary loader to allow RISC-V nommu support it needs to be able to accept flat binaries that have no gap between the text and data sections. * tag 'm68knommu-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: coldfire: fix irq ranges riscv: Disable data start offset in flat binaries binfmt_flat: allow not offsetting data start arch/m68k/coldfire/intc-simr.c | 12 ++++++------ arch/riscv/Kconfig | 1 + fs/Kconfig.binfmt | 3 +++ fs/binfmt_flat.c | 18 +++++++++++++----- 4 files changed, 23 insertions(+), 11 deletions(-) Thanks for any suggestions I may be needing to test, Lucas (In reply to Alan Stern from comment #9) > Interesting. I don't see any particular differences as far as the USB core > is concerned, but there are major differences in the way the device gets > used. You can even see it in the dmesg output: > > [ 963.375444] usb 4-1.5: 0:1: add audio endpoint 0x2 > [ 963.376273] usb 4-1.5: 1:1: add audio endpoint 0x81 > > Those lines are present in 5.9 but not in 5.12, and they occur before the > problem happens. These messages may not be directly relevant to the problem > at hand, but they do point to changes in the audio drivers. Those changes > may expose a bug somewhere else. > > Judging from these results, I will guess that your bisection tests will > pinpoint a commit that affects the snd-ua101 audio driver. I'm not on a RISC processor, by the way, just a run-of-the-mill intel i5-2400. So, yes, to me this seems to be a wider commit that may affect much more. THat commit you found, 51e6f07cb12e, cannot be the real culprit. Merge commits don't change any code (as a general rule). Something must have gone wrong with the bisection procedure. The way to test the result of a bisection is: First, check out the commit that the bisection identified, build it and boot that kernel, and verify that the problem occurs. Second, check out the commit's parent (or revert the commit), build and boot the resulting kernel, and verify that the problem does not occur. One thing you could do is get a list of commits affecting the snd-ua101 driver and try checking out and testing various ones, to find the first commit which exhibits the problem. (In reply to Alan Stern from comment #13) > THat commit you found, 51e6f07cb12e, cannot be the real culprit. Merge > commits don't change any code (as a general rule). Something must have gone > wrong with the bisection procedure. > > The way to test the result of a bisection is: First, check out the commit > that the bisection identified, build it and boot that kernel, and verify > that the problem occurs. Second, check out the commit's parent (or revert > the commit), build and boot the resulting kernel, and verify that the > problem does not occur. > > One thing you could do is get a list of commits affecting the snd-ua101 > driver and try checking out and testing various ones, to find the first > commit which exhibits the problem. I probably was doing it wrong, then, but thought I was being quite thorough. Here's the procedure I'd been following (probably from https://git-scm.com/docs/git-bisect): 0. cd /usr/src/linux-next 0.1 On the first run, I ran these: "git bisect start", "git bisect bad", "git bisect good $OriginalKnownGoodVersion" (I think the original known good was v5.12.0 or the first commit to 5.12.0) 1. Copy my .config file to the source directory, in case it changes with bisect 2. make -j4 menuconfig, and then just exit and save, just to save .config with any automatic changes needed for this source 3. Remove space hogs and any prior configuration: rm -r /usr/src/linux-next/debian/ /boot/vmlinuz-5.12* /boot/initrd-5.12* /usr/src/linux-image-5.12* 4. make -j4 deb-pkg 5. Install the kernel just built, and update grub accordingly: dpkg -i ../linux-image-5.12*_*.deb 6. Power off the computer 7. Power on and boot the new kernel 8. Test the bug (if its designation changes to "hw:USB1" I expect it'll be working, but still test it): arecord -D hw:UA101 -f S24_3LE -r 48000 -c 2 ./ua101.wav 9. If it records without the error: cd /usr/src/linux-next ; git bisect good, Or with the error: cd /usr/sr/src/linux-next; git bisect bad 10. Repeat 1-10 until I got the output I gave you above. I was under the assumption that running "git bisect {good,bad}" was enough to checkout the next version that needed to be tested, and it seems that it must have been. This is because I had quite a few that didn't work, then one that did and another that didn't, eventually ending with the remaining few all working. If I, indeed, needed to be running checkout (as, say step 0.5), /usr/src/linux-next/.git/refs/bisect contains a logged list of the first bad and some good commits. Reverting each commit previously tested (logged) and retesting sounds even more time consuming, but if you don't mind waiting and think it's necessary, I will continue this endeavor, and eventually get back to you. (In reply to Lucas Endres from comment #14) > Reverting each commit previously tested (logged) and > retesting sounds even more time consuming, but if you don't mind waiting and > think it's necessary, I will continue this endeavor, and eventually get back > to you. Tonight, before I attempt anything that major, I'll build the commit just before 51e6f07cb12e to see if it's working (although it probably already was tested during one of the aforesaid bisect iterations). Hopefully, that will be enough to prove that the glitch-causing commit was 51e6f07cb12e, or prove to me that you're right and I'll have to spend many more days at this. You do not have to go through that time-consuming procedure (checking out and reverting) for every commit in the bisection process. Those instructions I gave you were just for verifying that the final commit found by the bisection was indeed the one causing the problem. (In reply to Alan Stern from comment #16) > You do not have to go through that time-consuming procedure (checking out > and reverting) for every commit in the bisection process. Those > instructions I gave you were just for verifying that the final commit found > by the bisection was indeed the one causing the problem. Okay, that will save my sanity, thanks for the clarification. :) (In reply to Lucas Endres from comment #15) > Tonight, before I attempt anything that major, I'll build the commit just > before 51e6f07cb12e to see if it's working (although it probably already was > tested during one of the aforesaid bisect iterations). > > Hopefully, that will be enough to prove that the glitch-causing commit was > 51e6f07cb12e, or prove to me that you're right and I'll have to spend many > more days at this. Well, thanks Alan, you've been proven correct, as neither it or its prior commit worked. So I'm back to the drawing board, but have begun with my closest logged and tested commits. I've only just begun to compile the first bisect suggestion, although it hasn't started to estimate how many commits I'll need to test yet: git bisect reset Updating files: 100% (12637/12637), done. Previous HEAD position was bf05bf16c7 Linux 5.12-rc8 Switched to branch 'master' Your branch is up to date with 'origin/master'. git bisect start git bisect bad b208108638c4bd3215792415944467c36f5dfd97 # May 3rd git bisect good 698f99ed5e06946764c3be035ce9d62a2691e08c # April 29 Bisecting: a merge base must be tested [6daa755f813e6aa0bcc97e352666e072b1baac25] Merge tag 's390-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux # April 27 so prior to known-good for to test a merge base?... I'm building it anyway. (In reply to Lucas Endres from comment #18) > (In reply to Lucas Endres from comment #15) > > Tonight, before I attempt anything that major, I'll build the commit just > > before 51e6f07cb12e to see if it's working (although it probably already > was > > tested during one of the aforesaid bisect iterations). > > > > Hopefully, that will be enough to prove that the glitch-causing commit was > > 51e6f07cb12e, or prove to me that you're right and I'll have to spend many > > more days at this. > > Well, thanks Alan, you've been proven correct, as neither it or its prior > commit worked. So I'm back to the drawing board, but have begun with my > closest logged and tested commits. I've only just begun to compile the > first bisect suggestion, although it hasn't started to estimate how many > commits I'll need to test yet: > > git bisect reset > Updating files: 100% (12637/12637), done. > Previous HEAD position was bf05bf16c7 Linux 5.12-rc8 > Switched to branch 'master' > Your branch is up to date with 'origin/master'. > git bisect start > git bisect bad b208108638c4bd3215792415944467c36f5dfd97 # May 3rd > git bisect good 698f99ed5e06946764c3be035ce9d62a2691e08c # April 29 > Bisecting: a merge base must be tested > [6daa755f813e6aa0bcc97e352666e072b1baac25] Merge tag 's390-5.13-1' of > git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux # April 27 so prior > to known-good for to test a merge base?... I'm building it anyway. Sorry to bother you over something so trivial, but the last "bad" point I chose turned out good in the end after the couple compilations I needed to test, so I have a re-pretested truly bad one that will be a much longer journey, but I hope will lead to the ultimate truth: git bisect reset Previous HEAD position was 51e6f07cb1 Merge tag 'm68knommu-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Switched to branch 'master' Your branch is up to date with 'origin/master'. git bisect start git bisect good b208108638c4bd3215792415944467c36f5dfd97 git bisect bad 51e6f07cb12e50bfc71181bd08bad5f0a9b5e9e2 Bisecting: a merge base must be tested [6daa755f813e6aa0bcc97e352666e072b1baac25] Merge tag 's390-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux That already definitely tested good during the previous stint, so: git bisect good Bisecting: 5295 revisions left to test after this (roughly 12 steps) git bisect good Bisecting: 2647 revisions left to test after this (roughly 11 steps) [2342eb1afe00586a018536c4bf9e04d7aa4bf63e] llc2: Remove redundant assignment to rc This is what I'm currently building, but I'll get back to you when I'm finally finished again, this time with even the suspect and prior commit tested. Thanks! I sped through it this time with definite success: The commit causing it was one I helped diagnose so that the UA-101's MIDI device became available, but since I told Takashi Iwai about the issue and he wanted me to check with you USB gurus, maybe we can help eachother in figuring out why the snd-ua101 module doesn't work with its "Hi-SPEED" switch set to "off" (USB1 mode) only. The snd-usb-audio module apparently did work for USB1 mode, but caused the MIDI port to be unavailable. And, yes, its parent commit tested good (473d5ae82d73c3b21160b725af0e247fa94d7832), and this really was tested as bad: d763145312582c08c4e9ed99d61276cde8488256 is the first bad commit commit d763145312582c08c4e9ed99d61276cde8488256 Author: Takashi Iwai <tiwai@suse.de> Date: Thu Apr 8 09:56:56 2021 +0200 ALSA: usb-audio: Skip probe of UA-101 devices UA-101 device and co are supported by another driver, snd-ua101, but the USB audio class driver (snd-usb-audio) catches all and this resulted in the lack of functionality like missing MIDI devices. This patch introduces a sort of deny-listing for those devices to just return -ENODEV at probe in snd-usb-audio driver, so that it falls back to the probe by snd-ua101. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=212477 Link: https://lore.kernel.org/r/20210408075656.30184-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> sound/usb/card.c | 2 ++ sound/usb/quirks-table.h | 21 +++++++++++++++++++++ sound/usb/usbaudio.h | 1 + 3 files changed, 24 insertions(+) I think it will, but can only hope that this time, it will lead to the solution. Thanks! According to my comment here: https://bugzilla.kernel.org/show_bug.cgi?id=212477#c13 USB 1.1 mode was working for me there, but I was probably testing using the previous patch (not the finalized version), and I had just reverted the other BOSS/EDIROL/Roland quirk patches we'd been working on to rule out any conflicts with them. Although, they are included in the linux-next branch I've been diagnosing here, they seem to appear after this, and otherwise shouldn't be causing it, due to this patch blacklisting the UA-101 from snd-usb-audio. Here's the culmination of the other BOSS/EDIROL/Roland quirk table patches: https://patchwork.kernel.org/project/alsa-devel/patch/20210422120413.457-2-tiwai@suse.de/ And my bugzilla report leading to it: https://bugzilla.kernel.org/show_bug.cgi?id=212519 Thanks for any light you can shed on the matter, but it seems to me that I will have to contact Takashi Iwai again, Lucas Using two different drivers would definitely explain the different patterns of communication with the device in your two usbmon traces. But it doesn't explain why, with one of the drivers, the device seems to stop responding when used with EHCI (and works okay with xHCI!). Maybe this is caused by a bug in the device's firmware. It that's true, changing the communication pattern might work around the problem. (In reply to Alan Stern from comment #22) > Using two different drivers would definitely explain the different patterns > of communication with the device in your two usbmon traces. > > But it doesn't explain why, with one of the drivers, the device seems to > stop responding when used with EHCI (and works okay with xHCI!). Maybe this > is caused by a bug in the device's firmware. It that's true, changing the > communication pattern might work around the problem. Sorry, but here's another pertinent factor that I should have mentioned earlier if I haven't: It could very well be a bug in the UA-101's firmware, since if I connect an audible input and watch it on the UA-101's builtin levels meter (with it already on), I think it does freeze. The levels remain when the input is unplugged, after plugging in USB while running a nonworking commit. Thanks for the insight! I suppose I should start comparing the code to each module, since I think that's what you're suggesting. |