Bug 199327

Summary: [Bisected][Regression] with duplex sound introduced since 4.15.5 to present (5e35dc0338d85ccebacf3f77eca1e5dea73155e8)
Product: Drivers Reporter: Sylvain (brutus_dansereau)
Component: Sound(ALSA)Assignee: Jaroslav Kysela (perex)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alexander, brutus_dansereau, geraldogabriel, lapeyl, senderberlin, tiwai
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.15.5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: config.txt
Include_Linux_log2.patch
makefile_pie.patch
tools_include_linux_log2.patch
emerge-info.txt
alsa-info.txt
ver_linux.txt
cpuinfo.txt
ioports.txt
iomem.txt
interrupts.txt
lspci.txt
scsi.txt
devices.txt
lsusb -v of Behringer UFX1604
max_packs_per_urb.patch
packs_per_urb_sync.patch
Patch to set clock selector to default clock upon rate change
More complete patch disabling unneeded implicit feedback and setting clock selector to default clock on rate change for UFX1604
lsusb -v of UFS1204
Patch to set the UFX1204 clock selector to default clock upon rate change. Removes hacky implicit feedback entries
Patch to fix clock selector setup

Description Sylvain 2018-04-08 22:46:25 UTC
Created attachment 275171 [details]
config.txt

[1.] One line summary of the problem:

duplex profiles are no longer possible in pulseaudio since commit on UFX1204



[2.] Full description of the problem/report:

I lost the duplex functionality in pulse audio on my UFX1204 (usb connection) since commit : 5e35dc0338d85ccebacf3f77eca1e5dea73155e8

this applies to pulseaudio's (auto-profiles = yes) setting in default.conf as well as custom profiles using a udev rule
as soon both input and output mappings are defined in a profile, it is rejected with a device busy error like
    D: [pulseaudio] alsa-util.c: Set neither period nor buffer size.
    I: [pulseaudio] alsa-util.c: snd_pcm_hw_params failed: Device or resource busy


as a result, pacmd list-cards is missing profiles.

	profiles before commit:
		input:multichannel-input: Multichannel Input (priority 1, available: unknown)
		output:analog-surround-40: Analog Surround 4.0 Output (priority 700, available: unknown)
		output:analog-surround-40+input:multichannel-input: Analog Surround 4.0 Output + Multichannel Input (priority 701, available: unknown)

	profiles after commit:
		input:multichannel-input: Multichannel Input (priority 1, available: unknown)
		output:analog-surround-40: Analog Surround 4.0 Output (priority 700, available: unknown)



4aacd757d564dbe76e0b6caed0412da241cf9ea3 is the first bad commit
commit 4aacd757d564dbe76e0b6caed0412da241cf9ea3
Author: Lassi Ylikojola <lassi.ylikojola@gmail.com>
Date:   Fri Feb 9 16:51:36 2018 +0200

    ALSA: usb-audio: add implicit fb quirk for Behringer UFX1204
    
    commit 5e35dc0338d85ccebacf3f77eca1e5dea73155e8 upstream.
    
    Add quirk to ensure a sync endpoint is properly configured.
    This patch is a fix for same symptoms on Behringer UFX1204 as patch
    from Albertto Aquirre on Dec 8 2016 for Axe-Fx II.
    
    Signed-off-by: Lassi Ylikojola <lassi.ylikojola@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 a3ddb37e4106e77565835e0e6596b88a766e371c e9c8ab955451cf5a6f0b96d42d316b043b262c55 M	sound



[3.] Keywords (i.e., modules, networking, kernel):

sound
usb


[4.] Kernel information


#The bisect tests were done using 
git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
#I slightly patched the bisect sources so it compiles from my gentoo system 
see Include_Linux_log2.patch
see makefile_pie.patch
see tools_include_linux_log2.patch


[4.2.] Kernel .config file:

see config.txt


[5.] Most recent kernel version which did not have the bug:

4.14.4


[6.] Output of Oops.. message (if applicable) with symbolic information
     resolved (see Documentation/oops-tracing.txt)
----


[7.] A small shell script or example program which triggers the
     problem (if possible)
----



[8.] Environment

see emerge-info.txt
see alsa-info.txt


[8.1.] Software (add the output of the ver_linux script here)

see ver_linux.txt


[8.2.] Processor information (from /proc/cpuinfo):

see cpuinfo.txt


[8.3.] Module information (from /proc/modules):

efivarfs 16384 1 - Live 0x          (null)


[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

see ioports.txt
see iomem.txt
see interrupts.txt


[8.5.] PCI information ('lspci -vvv' as root)

see lspci.txt


[8.6.] SCSI information (from /proc/scsi/scsi)

see scsi.txt


[8.7.] Other information that might be relevant to the problem
       (please look in /proc and include all information that you
       think to be relevant):


#cat /proc/asound/devices
see devices.txt
Comment 1 Sylvain 2018-04-08 22:49:07 UTC
Created attachment 275173 [details]
Include_Linux_log2.patch
Comment 2 Sylvain 2018-04-08 22:49:45 UTC
Created attachment 275175 [details]
makefile_pie.patch
Comment 3 Sylvain 2018-04-08 22:50:08 UTC
Created attachment 275177 [details]
tools_include_linux_log2.patch
Comment 4 Sylvain 2018-04-08 22:50:42 UTC
Created attachment 275179 [details]
emerge-info.txt
Comment 5 Sylvain 2018-04-08 22:51:02 UTC
Created attachment 275181 [details]
alsa-info.txt
Comment 6 Sylvain 2018-04-08 22:51:22 UTC
Created attachment 275183 [details]
ver_linux.txt
Comment 7 Sylvain 2018-04-08 22:51:41 UTC
Created attachment 275185 [details]
cpuinfo.txt
Comment 8 Sylvain 2018-04-08 22:52:02 UTC
Created attachment 275187 [details]
ioports.txt
Comment 9 Sylvain 2018-04-08 22:52:19 UTC
Created attachment 275189 [details]
iomem.txt
Comment 10 Sylvain 2018-04-08 22:52:39 UTC
Created attachment 275191 [details]
interrupts.txt
Comment 11 Sylvain 2018-04-08 22:52:59 UTC
Created attachment 275193 [details]
lspci.txt
Comment 12 Sylvain 2018-04-08 22:53:19 UTC
Created attachment 275195 [details]
scsi.txt
Comment 13 Sylvain 2018-04-08 22:53:38 UTC
Created attachment 275197 [details]
devices.txt
Comment 14 senderberlin 2019-08-24 15:21:07 UTC
The problem occurs because the endpoint 0x81 (acting as implicit feedback ep) was already started for the output stream as sync_endpoint (use_count = 1).

If the driver then tries to set the format for the input stream for endpoint 0x81, this will fail because the endpoint is already running (use_count = 1 via output stream), however, the SUBSTREAM_FLAG_DATA_EP_STARTED flag is not set for this stream. The endpoint 0x81 can no longer be stopped and reports -EBUSY.

What is the correct way for the driver to handle such circumstances? Does this possibly affect other devices with implicit feedback quirk?

A quick-fix would be to force the stop of endpoints via stop_endpoints()
in pcm.c in line 268:

```
if (test_and_clear_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags))
    snd_usb_endpoint_stop(subs->data_endpoint);

```

change to:

```
if (test_and_clear_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags) || 
   (subs->data_endpoint && subs->data_endpoint->use_count != 0))
    snd_usb_endpoint_stop(subs->data_endpoint);
```
Comment 15 Sylvain 2019-08-25 23:05:50 UTC
I previously compiled the gentoo kernel surces 4.20.16 and the problem does not seem to be present.   I did not check the vanilla sources 4.20.16 but previously Gentoo sources (post bisect till somewhere before 4.20.16) had the issue same as the vanilla sources.
Comment 16 senderberlin 2019-08-27 05:58:44 UTC
(In reply to Sylvain from comment #15)

I use Debian with kernel 4.19.0. After your comment I tested Kernel 5.3.0 which has the same error for me: 

$ pulseaudio -v 

I: [pulseaudio] (alsa-lib)pcm_hw.c: SNDRV_PCM_IOCTL_PREPARE failed (-16)
I: [pulseaudio] (alsa-lib)pcm_hw.c: SNDRV_PCM_IOCTL_PREPARE failed (-16)
I: [pulseaudio] (alsa-lib)pcm_hw.c: SNDRV_PCM_IOCTL_PREPARE failed (-16)
I: [pulseaudio] (alsa-lib)pcm_hw.c: SNDRV_PCM_IOCTL_PREPARE failed (-16)
I: [pulseaudio] (alsa-lib)pcm_hw.c: SNDRV_PCM_IOCTL_PREPARE failed (-16)
I: [pulseaudio] alsa-util.c: snd_pcm_hw_params failed: Device or resource busy
...

$ sudo dmesg

usb 1-1: Unable to change format on ep #81: already in use
usb 1-1: Unable to change format on ep #81: already in use
usb 1-1: Unable to change format on ep #81: already in use
...

However, I use the Behringer UFX1604, but I think it is almost identical to the UFX1204. 

Could it be that you have temporarily updated alsa-lib or pulseaudio and this solved your problem? As far as I understand the code, nothing has changed significantly in this regard on the snd-usb-audio driver.

alsa-lib 1.1.8

$ pulseaudio --version
pulseaudio 12.2
Comment 17 Takashi Iwai 2019-08-27 10:19:20 UTC
Is your UFX1604 1397:0001?  There is a patch that makes UFX1604 behavior compatible with UFX1204, which will land into upstream 5.3-rc7.
In short, it's:

--- a/sound/usb/pcm.c
+++ b/sound/usb/pcm.c
@@ -339,6 +339,7 @@ static int set_sync_ep_implicit_fb_quirk(struct snd_usb_substream *subs,
                ep = 0x81;
                ifnum = 2;
                goto add_sync_ep_from_ifnum;
+       case USB_ID(0x1397, 0x0001): /* Behringer UFX1604 */
        case USB_ID(0x1397, 0x0002): /* Behringer UFX1204 */
                ep = 0x81;
                ifnum = 1;


About the full duplex behavior.  Basically you can't change the capture stream status (format, rate, etc) during the implicit fb usage.
A quick workaround would be just to skip instead of -EBUSY, something like:

--- a/sound/usb/endpoint.c
+++ b/sound/usb/endpoint.c
@@ -858,10 +858,10 @@ int snd_usb_endpoint_set_params(struct snd_usb_endpoint *ep,
 	int err;
 
 	if (ep->use_count != 0) {
-		usb_audio_warn(ep->chip,
+		usb_audio_dbg(ep->chip,
 			 "Unable to change format on ep #%x: already in use\n",
 			 ep->ep_num);
-		return -EBUSY;
+		return 0; /* FIXME: assuming the unchanged condition */
 	}
 
 	/* release old buffers, if any */

But this may result in inconsistent state, so we'd need a restriction via hw_constraint as a proper fix, I guess.
Comment 18 senderberlin 2019-08-27 13:22:08 UTC
(In reply to Takashi Iwai from comment #17)

Thank you for your message.

> Is your UFX1604 1397:0001?  There is a patch that makes UFX1604 behavior
> compatible with UFX1204, which will land into upstream 5.3-rc7.
> In short, it's:
> ...

Yes, that is the USB device id. I know about the patch because I had created the corresponding bug report and have this patch in my code. Exactly this implicit_fb_quirk leads to the problem of this bug report in the way pulseaudio acts with the devices.

> About the full duplex behavior.  Basically you can't change the capture
> stream status (format, rate, etc) during the implicit fb usage.
> A quick workaround would be just to skip instead of -EBUSY, something like:
> ...
> But this may result in inconsistent state, so we'd need a restriction via
> hw_constraint as a proper fix, I guess.

So the problem seems to be the use of shared endpoints between interfaces and potentially can affect all implicit_fb_quirk devices ?!

UFX1204/1604 uses:

```
ifnum 1 (out):
  data EP: 0x02
  sync EP: 0x81

ifnum 2 (in):
  data EP: 0x81
```

This special case should be recognizable in order to handle set_format differently. I came to two possible ways to deal with it:

1. Stop endpoints if use_count 1 to set the format (potentially interrupting already created stream?).
2. Do not set the format if endpoint use_count 1 (assuming the unchanged condition), so your solution.

In this case, is it anyway hard request that the shared endpoint must have the same format? I do not know enough about the USB spec and what the more correct way is atm.
Comment 19 Takashi Iwai 2019-08-27 14:07:17 UTC
It's possible to add some hw_constraint to restrict the parameters dynamically depending on the counterpart stream state.  I'm going to cook that once after confirming that the culprit is really that.   That is, if the simple fix I suggested in comment 17 really helps, I'll start looking at it.
Comment 20 Sylvain 2019-08-29 14:02:05 UTC
(In reply to senderberlin from comment #16)

My mistake, Please disregard my comment #15, the issue is still present
Comment 21 senderberlin 2019-08-30 23:20:23 UTC
Yes, the fix from comment #17 seems to tackle the problem.

Quick tests Behringer UFX1604:

without fix:

Pulseaudio: has only playback or capture device
Reaper via Pulseaudio: only in or out works, not both at the same time
Reaper via ALSA: in and out at the same time (full duplex) ok
Renoise via ALSA: only in or out are acivatable

with fix:

PulseAudio: device is listed as Full Duplex
Reaper via Pulseaudio: in and out at the same worked glitchy, most time reaper freezes (maybe settings in pa)
Reaper via ALSA: in and out at the same time ok
Renoise via ALSA: in and out at least activatable at the same time with fix (but freezes when trying to stream audio when both active)

Renoise log:

Renoise LOG> ALSA: Opening ALSA Playback Device 'hw:1,0 (USB Audio)'...
Renoise LOG> ALSA: Open ALSA Output Device OK
Renoise LOG> ALSA: Max channels is 4 for Playback...
...
Renoise LOG> ALSA: Opening ALSA Capture Device 'hw:1,0 (USB Audio)'...
Renoise LOG> ALSA: Open ALSA Capture Device OK
Renoise LOG> ALSA: Max channels is 16 for Capture...
...
Renoise LOG> ALSA: Setup Buffers...
Renoise LOG> ALSA: Configuring the Player...
Renoise LOG> ALSA: Creating the poll thread...
Renoise LOG> ALSA: Successfully created an ALSA RT thread
Renoise LOG> ALSA: Up and running...
Renoise LOG> Threads: Timeout while waiting for Thread2 to finish its commands!!

Info: Focusrite Scarlet 6i6 can be used in all cases mentioned above as full duplex on the same machine, same usb port

Somehow I had unstable behavior in the tests with the fix with no reproducibility or clear error logs. I am a bit busy atm because of work but I will try to further investigate the problem.
Comment 22 Takashi Iwai 2019-09-03 13:20:49 UTC
OK, could you give lsusb -v output of the corresponding device?
I'll see what can be done in the driver, but would like to check the actual endpoints beforehand.
Comment 23 senderberlin 2019-09-03 18:53:34 UTC
Created attachment 284805 [details]
lsusb -v of Behringer UFX1604

lsusb -v of Behringer UFX1604
Comment 24 Alexander Tsoy 2020-03-30 17:03:32 UTC
The driver currently doesn't properly support SYNC endpoints. So I believe implicit feedback quirk was enabled for several devices as a workaround. That way playback stream is synchronized to capture stream which in turn is synchronized to SOF by the device. %)
Comment 25 Alexander Tsoy 2020-04-11 02:16:45 UTC
(In reply to Alexander Tsoy from comment #24)
> The driver currently doesn't properly support SYNC endpoints. So I believe
> implicit feedback quirk was enabled for several devices as a workaround.
> That way playback stream is synchronized to capture stream which in turn is
> synchronized to SOF by the device. %)
Sorry, I was wrong. It seems isochronous transfers are always synchronized to SOF.
Comment 26 Alexander Tsoy 2020-04-11 02:36:17 UTC
Created attachment 288337 [details]
max_packs_per_urb.patch

Please try attached patch. Decreasing the number of packs per URB fixes pops and clicks for my MOTU Microbook IIc which also has synchronous enpoints. This is especially noticeable on high sample rates like 96 kHz.
Comment 27 senderberlin 2020-04-14 22:05:37 UTC
(In reply to Alexander Tsoy from comment #26)
> Created attachment 288337 [details]
> max_packs_per_urb.patch
> 
> Please try attached patch. Decreasing the number of packs per URB fixes pops
> and clicks for my MOTU Microbook IIc which also has synchronous enpoints.
> This is especially noticeable on high sample rates like 96 kHz.

With the use of the implicit_fb_quirk I did not notice pops and clicks with the Behringer UFX. However, the use of full duplex creates problems. If I'm not mistaken it's about the same issue as in bug #103751 and bug #207023?!
Comment 28 Alexander Tsoy 2020-04-14 23:20:44 UTC
(In reply to senderberlin from comment #27)
Yes, I also can workaround pops and clicks issues by using implicit fb quirk. But it shouldn't be needed for devices that works is synchronous mode.

What I noticed is that these glitches happens only when driver packs a lot of iso frames in URB, e.g. when pulseaudio with tsched or jack with very large buffer is used. So maybe this is a bug in scheduling code in hcd drivers or something like that.
Comment 29 Alexander Tsoy 2020-04-14 23:42:22 UTC
And of course it is possible that Behringer UFX* devices and Microbook II/IIc actually have different problems. We can say for sure if somebody with Behringer device would check if decreasing MAX_PACKS resolves pops and clicks issue.
Comment 30 Sylvain 2020-04-15 18:41:08 UTC
I reported this bug concerning the full duplex issue on a UFX1204. 
I am currently using the ful duplex work around proposed in comment #17 which brings back full duplex capabilities. 

I can also confirm that I see no issue with pops and clicks using this workaround.  
I am not using a microbook computer but rather an intel DZ77GA-70K motherboard.
Comment 31 senderberlin 2020-04-15 19:31:18 UTC
(In reply to Alexander Tsoy from comment #29)
> And of course it is possible that Behringer UFX* devices and Microbook
> II/IIc actually have different problems. We can say for sure if somebody
> with Behringer device would check if decreasing MAX_PACKS resolves pops and
> clicks issue.

The max_packs_per_urb.patch from comment #26 did not resolves pops and clicks issue for me. I tested values 4, 2 and 1 for MAX_PACKS, all had the issue. So far, only the implicit_feedback_quirk has helped against pops and clicks.
Comment 32 Alexander Tsoy 2020-04-15 19:43:41 UTC
Created attachment 288509 [details]
packs_per_urb_sync.patch

That unfortunate. :(

OK, this is my last attempt. For me it gives even better result than with MAX_PACKS = 4. Could you give it a try?
Comment 33 senderberlin 2020-04-15 20:16:41 UTC
(In reply to Alexander Tsoy from comment #32)
> Created attachment 288509 [details]
> packs_per_urb_sync.patch
> 
> That unfortunate. :(
> 
> OK, this is my last attempt. For me it gives even better result than with
> MAX_PACKS = 4. Could you give it a try?

I really appreciate your effort! Unfortunately, this patch does not improve either: constant pops and clicks every few 100 ms or so (not loud, sounds like some sort of scheduling / buffer size mismatch stuff). If implicit_feedback_quirk is used, the sound is completely clear.
Comment 34 Geraldo Nascimento 2021-04-09 21:04:05 UTC
Created attachment 296313 [details]
Patch to set clock selector to default clock upon rate change

Could someone else test the attached patch and see if it resolves the pops and clicks on full-duplex mode for the UFX1604?

I own one of those and it took me quite some time to track down the problem. This patch resolves the problem for me.

I also turned off implicit feedback for my UFX1604 and I consider it unneeded.

Also, I'm interested in seeing an lsusb -v of the UFX1204.
Comment 35 Geraldo Nascimento 2021-04-10 01:26:50 UTC
Created attachment 296319 [details]
More complete patch disabling unneeded implicit feedback and setting clock selector to default clock on rate change for UFX1604

After re-reading the thread it is clear to me that implicit feedback for the UFX1604/UFX1204 needs to be disabled.

This is a more complete patch that disables that and for the UFX1604 only sets the clock selector to its pin 1 default clock synced to the USB SOF upon rate change. This is needed because apparently the endpoints are hardwired to the clock selector and after we change the rate on the main USB SOF synced clock the clock selector is left in a halfway state in regards to the sampling rate.

That's why the pops and clicks aren't evident at stock 48000Hz, become slightly audible at 44100Hz and detestable at 96000Hz. Seems the clock selector needs a nudge or it will screw up the sync.

Unfortunately I don't have access to the lsusb -v of the UFX1204 so please someone could share it here? This patch needs some more love from the community.
Comment 36 Sylvain 2021-04-10 03:10:28 UTC
As the OP of this bug I would like to point out that it is regarding the inability to use duplex mode  on an UFC1204 as a result of a regression.  For me Pops and clicks were never an issue prior or after the regression.  I still use comment #17 as a temporary fix.  Please open a different bug for pops and clicks issues.
Comment 37 Geraldo Nascimento 2021-04-10 05:36:27 UTC
Sylvain, I don't think you understand, the only reason implicit feedback was engaged for both the UFX1204 and the UFX1604 was because there were numerous complaints and Takashi Iwai thought it could solve the pops and clicks... see https://patchwork.kernel.org/project/alsa-devel/patch/20190820070110.4361-1-tiwai@suse.de/

Not only it doesn't solve pops and clicks it creates additional problems. I believe I tracked the crux of the problem with my patch and then we can solve your problem without the need for quick hacks.

Could you provide the lsusb -v of your UFX1204? It would mean much to me!
Comment 38 Sylvain 2021-04-11 01:10:27 UTC
Created attachment 296321 [details]
lsusb -v of UFS1204

My point exactly, if others are complaining about other stuff let them open bugs relating to their own issues.  This bug might be a consequence of changes made in attempt to fix other bugs but this regression bug is about loosing Duplex capabilities.  I add the lsusb of the UFX1204 as it can help you as well as this bug.
Comment 39 Geraldo Nascimento 2021-04-11 01:34:32 UTC
Thanks for the feedback, Sylvain. Your lsusb -v confirms that the clocks of the UFX1204 are similar to the UFX1604. I submitted my preliminary patch to alsa-devel and Cc'ed Takashi Iwai. He chastised me a bit for hard-coding the clock number into the kernel and provided a more generic approach that should work for the UFX1204 as well as for the UFX1604.

This all means that there's a good chance your regression can be soon fixed in mainline. The root cause of your regression is that the first bad commit you tracked down (thanks for that!) engages implicit feedback for a synchronous endpoint. The bad commit was done to try to fix complaints of noise / pops / clicks when these devices first for the UFX1204 then later for the UFX1604.

The bad commit is unneeded now that a proper fix is available.
Comment 40 Geraldo Nascimento 2021-04-11 01:42:21 UTC
Created attachment 296323 [details]
Patch to set the UFX1204 clock selector to default clock upon rate change. Removes hacky implicit feedback entries

Sylvain, could you please try the attached patch on top of a vanilla kernel and see if your PulseAudio issue is resolved without any nasty side effects?
Comment 41 Takashi Iwai 2021-04-13 08:40:24 UTC
I'm going to submit the clock selector setup patch and merge it for 5.13 kernel.

The removal of implicit feedback quirks is done in a separate patch, once after confirmed that it doesn't do anything harmful.  In principle, the implicit fb mode itself shouldn't do much bad.  There have been a few bugs at the beginning of 5.11 kernel that broke PA on some devices, and this could be the case, too.  Those should have been already fixed in the latest code, so please check it.
Comment 42 Takashi Iwai 2021-04-13 08:43:07 UTC
Created attachment 296369 [details]
Patch to fix clock selector setup
Comment 43 Takashi Iwai 2021-06-18 16:13:13 UTC
All should have been merged in 5.13.  Let's close the bug.
Comment 44 Sylvain 2021-11-07 15:52:21 UTC
It's working again, Thanks!