Bug 199235
Summary: | Playback delay reporting broken for kernels >= 4.14 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Georg Chini (georg) |
Component: | Sound(ALSA) | Assignee: | Jaroslav Kysela (perex) |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | georg, tanuk, tiwai |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.14 and above | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
oscilloscope recording with 4.13 ALSA code
oscilloscope recording with 4.16-rc6 ALSA code Always update tstamp for playback streams oscilloscope recording with 4.16 using snd_pcm_status_get_avail() |
Created attachment 274983 [details]
oscilloscope recording with 4.16-rc6 ALSA code
The green line is the input and trigger signal, the yellow line is the output signal.
Hm, through a quick glance of git output, between 4.13 and 4.14, there aren't so significant changes that may influence on the PCM delay reporting. Lots of PCM core change happened between 4.12 and 4.13, but between 4.13 and 4.14, it's mostly cosmetic. Could you try bisection? Created attachment 275023 [details]
Always update tstamp for playback streams
I did a bisection and the offending patch is between 4.14 and 4.15, not as I thought between 4.13 and 4.14. This is the commit:
20e3f985bb875fea4f86b04eba4b6cc29bfd6b71 ALSA: pcm: update tstamp only if audio_tstamp changed
I attached a patch that fixes the issue for me. I don't know however if it is the right way to fix the bug.
It was the stable backport to 4.14.x, then. Did you set report_delay bit in the audio_tstamp_config? Without this, the driver like USB-audio won't give the precise position, so such a problem can happen. Not sure. I am using standard pulseaudio and I guess it sets the bit. What needs to be done to set it? Then I can check what PA does. At any rate, without the change (or with the patch I attached) I am getting stable results again. The "stable" timestamp update isn't what it's supposed to be reported back. The timestamp is corresponding to the audio position, and in the case of USB-audio, the primary position report via hwptr is updated only in the period level, hence the timestamp is updated only coarsely, too. This is the designed behavior, and it's no regression but rather a fix. That is, the behavior you expected was a side-effect of the new feature addition, and rather a bug that was fixed at a later point. Meanwhile, some drivers track the "delay" at the more accurate position from hwptr, and USB-audio is one of such drivers. If the mentioned audio_tstamp_config.report_delay flag is set, the API calculates this delay and it should update the timestamp as well. So, I suspect PA doesn't set it. If it even sets the flag but the timestamp and the audio_tstamp aren't updated, again, it's the restriction of the hardware. Both timestamp and audio_tstamp are tied, and updated together. Oh, what needs to be done: it's a flag set via snd_pcm_status_set_audio_htimestamp_config(), and it has to be set before the call of snd_pcm_status(). Sorry, but it is definitely not a "fix". I am measuring the resulting end-to-end latency with an oscilloscope, so I know that the reported values have been correct before the "fix". I know that the hw_ptr is only updated coarsely, but there is also the delay value (which currently is updated only on frame boundary). So the reports should at least be exact to 2ms. It looks like with the "fix" the reported latency alternates between two values. Also I am not using the delay directly but use a filtered version of it. This works perfect for all kind of devices and virtual sinks and only broke for kernels after 4.14. BTW, I used the same algorithm before the patch that the "fix" wants to fix and I did not have any such problems before. I have done a lot of debugging and measuring, so I am rather sure of what I am saying. I have even replaced the current logic to calculate the delay by using time stamps, which again improves stability of the latency reports. Where is the function you mention documented? Google does not return any results. If that patch fixes an issue that has been introduced by another commit, maybe that commit has a problem? See https://www.freedesktop.org/software/pulseaudio/misc/rate_estimator.odt for the algorithm I am using to get stable delay reports. The document has never been reviewed, so it may be wrong in places, but the general approach works very well. This document is from 2016 and I guess the bug that you are "fixing" was not there by that time. One more remark and then I shut up: If the reported latencies are wrong without the fix, why am I able to keep the end-to end latency correct and stable to 90 microseconds without the fix? It works even overnight - I just run a test last night. With the "fix" it breaks immediately. I have re-read what you were saying and I can confirm that the delay value reported by the driver is used. So the flag you are referring to must be set. As said, I changed the logic of the delay calculation to deliver more exact values. I'll send the patch I am using to you privately because it is not directly related to this bug. Be aware that my mail might end up in your junk folder because my domain has a .tk ending. A likely missed point is that you need both audio timestamp and timestamp in sync. If only one of two is updated while another remains, it's basically useless because you can't judge at which point it's taken. The timestamp isn't only for the current time, but it's rather to get the time coupled with the audio frame. If you expected only the timestamp, that's the problem. So, the previous fix was to correct this -- now the update of the timestamp is done only when the audio timestamp (i.e. the audio position) gets updated. Without this fix, only the timestamp may be updated while the audio position stalls. This is not the designed behavior. Now, a part of the problem is that some drivers can't provide the delay in high accuracy. And correcting this can't be done easily with the system timer as used in your patch, unfortunately. It's because the system timer is a different timing source and the time flows in a different pace from the audio device. The system time 1 second is not the audio device 1 second. That's the reason why we have two timestamps in sync. PA uses snd_pcm_status_get_delay() to get the delay from ALSA and snd_pcm_status_get_htstamp() to get the time stamp corresponding to the delay. snd_pcm_avail() is called immediately before retrieving the status. Should that not deliver the right delay? Are we doing something wrong? Actually I think the fix is wrong for the following reason: If I understand right, the driver provided delay is updated at the time I retrieve the status. This time differs from the audio time stamp and the delay reported by the driver does not refer to the audio time stamp but to the current time. The complete delay is delay_at_audio_time_stamp + delay_reported_by_driver_at_current_time. So the delay snapshot is NOT taken at the time of the audio time stamp but at the time of the snd_pcm_status() call. Therefore the time stamp can (and must) be different from the audio time stamp. Some more questions/remarks: First you do not explain why it worked with kernels since 2016 when the change in question was not implemented. The results in the document I referenced are from 2016 and I know it works with 4.12 and 4.13. Again, I think the fix is wrong and should be reverted. Second you did not answer my question, how it is possible that I can keep the latency stable with wrong latency reports while the right latency reports make it impossible. Within module loopback I have to predict the next latency which is seen after a second. This prediction is right to an average of around +/-2 microseconds without the "fix" and with the "fix" the error goes up to +/-70 microseconds and more. This prediction error has proven to be a very good indicator for the reliability and correctness of the latency reports in other situations. Third, the mismatch between soundcard time and system time should not matter on the timescale we are talking about. With the current setup of PA, hw_ptr is updated every 6ms. Assume that the deviation between sound card and system time is 1% (which should be more than you will ever see). Then using system time, the computed delay can be off by 60 microseconds at maximum, while the delay computed using USB frames can be off by 2ms. I would say, this is a big improvement. Since PA configures ALSA for 2s delay, I don't think there is a situation where a significantly larger number of USB frames is used. I just tested what happens if I always return the delay at audio time stamp from the driver and it seems that my reasoning above why the fix should be reverted is in fact to be correct. I am getting much better results if the driver reported delay is not used at all. BTW, I did my tests above also without my patch for the USB delay reporting to exclude some bug in my patch. Yeah, the timestamp at the snd_pcm_status() might be after the delay update, but also that's the point you're expecting wrongly: you cannot rely on the accuracy more than the defined granularity. Measuring more than that cannot work, per definition. Before introducing the htimestamp infrastructure, the timestamp behaved like that we're seeing now after the fix. IOW, the regression was introduced by the htimestamp change. That is, the behavior relying on the former htimestamp behavior doesn't work as planned. It may work on the hardware that can report the precise link and timestamp information (like HD-audio), but for others, it's no reliable information, after all. That said, what worked in the past doesn't mean that it's doing correctly. Especially if it relies on the information that can't be used always 100%. I do not understand what you mean by granularity. If you mean the time between hw_ptr updates then I think you can expect more accuracy - that's what the driver delay reporting is for, because it works between the hw_ptr updates. What sense would it make otherwise? You are saying yourself that some drivers - including USB - support higher accuracy via the internal delay reporting in comment #6. I also think you still do not understand my argument: You are saying the time stamp must be equal to the point in time when the hw_ptr was last updated. I think that is wrong. The time stamp must be equal to the point in time when the complete delay was last updated and that happens during a snd_pcm_status() call. So keeping the time stamp at audio time stamp makes the driver internal delay reporting useless because you do not know at which time that delay was last updated. Applying the delay to the time when the audio time stamp was taken is simply wrong because the delay was not retrieved at the same time that the audio time stamp was taken. For drivers which do not support delay reporting the fix is correct, but NOT for drivers that support higher accuracy. Also all my tests show that my argument is right: 1) I can keep the end-to-end latency of a loopback correct and stable with delay reports that should according to you be wrong but I can't do that with the reports you consider correct. How can you explain that away? 2) If I keep the "fix" and ignore the driver reported delay (by always returning last_delay from snd_usb_pcm_delay()) the results are much better than with delay reporting. I do not see any weird behavior like with fix + delay reporting. Then the granularity is higher but that's not a problem for the algorithm I am using. The issue that suddenly the delay apparently jumps up and then back down is gone, just the precision is a bit less. This also shows that the usage of the delay is not correct. Let me try to explain somewhat better what I think happens: - When hw_ptr changes, hw_ptr, runtime->delay, audio time stamp and time stamp are all updated together. At this point in time, we have the exact delay at the time of time stamp. - Now snd_pcm_status() is called, which changes runtime->delay to reflect the internal delay at the current time. Together with the hw_ptr this gives the complete delay at the CURRENT time, not at the time when the audio time stamp was taken. So the corresponding time stamp needs to be updated to the current time and should no longer reflect the time of the hw_ptr change. - As said above, the internal delay is useless if you do not know at what time it was updated. - If you don't have delay reporting, then the time stamp should NOT be updated when snd_pcm_status() is called, because in this case we do not have any information about the current delay. (In reply to Georg Chini from comment #18) > Let me try to explain somewhat better what I think happens: > > - When hw_ptr changes, hw_ptr, runtime->delay, audio time stamp and time > stamp are all updated together. At this point in time, we have the exact > delay at the time of time stamp. > > - Now snd_pcm_status() is called, which changes runtime->delay to reflect > the internal delay at the current time. Together with the hw_ptr this gives > the complete delay at the CURRENT time, not at the time when the audio time > stamp was taken. So the corresponding time stamp needs to be updated to the > current time and should no longer reflect the time of the hw_ptr change. > > - As said above, the internal delay is useless if you do not know at what > time it was updated. > > - If you don't have delay reporting, then the time stamp should NOT be > updated when snd_pcm_status() is called, because in this case we do not have > any information about the current delay. If the hwptr update is coarse and happening only at period level, the audio timestamp is kept, thus the time stamp isn't updated, either. That's the fix that broke your use case. Now, when the hwptr update happens during periods, it means that the device can provide more-or-less accurate position. Then we rely on it and update the timestamp accordingly. The same is true for the delay. It's a sort of "best-effort" timestamp. So, indeed, the accuracy of the timestamp you can get in the latter two cases depends on the granularity of hwptr update and delay update. This has been same from the old days, and the htimestamp stuff doesn't improve the situation for the default mode, either. IOW, for some devices, you can't get the exact audio timestamp. That's true, and it's so per design. This is no "bug", per se. If you need more improvement, it means that we need to extend the API. And, this must be discussed on ML. So, please go to alsa-devel ML, submit your proposals, etc. Thanks. Sorry for being insistent. I will stop after that mail, but now I have a definite proof that snd_pcm_status_get_delay() returns wrong values, so please think again about what I am saying. I changed the PA code with your "fix" in place to use snd_pcm_status_get_avail() and the time that has passed since the hstamp to calculate the delay. With this I am again able to stabilize the end-to-end latency of a loopback. The result is not as good as when I remove the "fix" and use snd_pcm_status_get_delay(), but it is still reasonable. This shows, that the problem is definitely not my algorithm or the error of a single delay report, but it is a systematical error in snd_pcm_status_get_delay() when the "fix" is in place. Let me try to explain (again): snd_pcm_status() calculates the delay as follows: status->delay = runtime->buffer_size - status->avail; status->delay += runtime->delay; and - with your fix - always returns the audio time stamp. At the time of audio time stamp, runtime->delay is maximal because the driver just consumed a chunk of data. Consider the following two situations: 1) snd_pcm_status() is called shortly before the hw_ptr update. In that case, runtime->delay will be near zero. Now you add this "near zero" delay to the buffer delay and report this as the delay at the audio stamp. So you are saying "At the time of the audio time stamp, the delay was buffer delay + something near zero". 2) snd_pcm_status() is called shortly after the hw_ptr update. In that case, runtime->delay is near maximal. Now you add this "near maximum" delay to the buffer delay. So you are saying "At the time of the audio time stamp, the delay was buffer delay + something near maximal" What sense does that make? At the time of the audio time stamp, the delay was in both cases buffer delay + maximum runtime->delay, so the current value of runtime->delay should not be used at all when calculating the delay at audio time stamp. This also explains what I am seeing - sudden apparent jumps in the end-to-end latency when the timing slightly changes within PA. These jumps have nothing to do with how correct a single delay report from ALSA is. My algorithm does not rely on the precision of a single delay report. On the contrary, it is designed to smooth out just those errors. But a systematical error like described above cannot easily be smoothed out. Therefore I do not need more improvement - I just need the same precision that has been available since 2016. Created attachment 275035 [details]
oscilloscope recording with 4.16 using snd_pcm_status_get_avail()
To visualize the results I add a screen shot of what I am getting using snd_pcm_status_get_avail(). Refer to the first two images for the other results.
- without fix and using snd_pcm_status_get_delay(): stable to 90 microseconds
- with fix and using snd_pcm_status_get_delay(): stable to 800 microseconds
- with fix and using snd_pcm_status_get_avail(): stable to 120 microseconds
The fix commit corrects only about the timestamp, and runtime->delay value itself isn't touched at all. If the delay value itself changes from the former case, it must be something else. Do you mean that, or just talking about the behavior change of timestamp? And, one more note: as you already noticed, the timestamp in snd_pcm_status can't be reliable at all unless you operate it with the specific timestamp mode for the specific hardware. In the default generic case, the least accuracy is the granularity of hwptr or delay update. The behavior before the previous fix (what you're referring as "the correct one") means nothing but "just always calling the system time query". This doesn't improve the accuracy at all, per se. That is, you could get the same result by just inquiring the system time from user-space as well. The change around 2016 was to add the support of high precision sample / time reporting for HD-audio. For that specific setup, you can get the precise result. But for others, as already said, it's nothing but a "best effort", and you can't rely on it for calculating something finer than the expected granularity. We may extend the API of timestamp update mode to do only at the hwptr update, which is the most accurate; but it'd be an API change, so it has to be discussed on ML, not on bugzilla, in anyway. OK, one final mail. You are saying: The fix commit corrects only about the timestamp, and runtime->delay value itself isn't touched at all. That is exactly my point. The time stamp is wrong, not the delay. I have the final confirmation that what I am saying is in fact true: When I use the delay returned by snd_pcm_status_get_delay() together with the CURRENT time, it works perfect and I am back to my 90 microsecond stability. If I am using the hstamp it breaks. The reported delay is valid at the CURRENT time, not at the audio tstamp time. That is what I have been saying all along. The last sentence is only true if runtime->delay is updated by the driver. Then the total computed delay value changes with runtime->delay between updates of hw_ptr. If the driver does not support delay reporting, then runtime->delay never changes and therefore the reported delay always reflects the delay at audio time stamp. (In reply to Georg Chini from comment #23) > That is exactly my point. The time stamp is wrong, not the delay. Yes and no. It's not "wrong". You CANNOT get any timestamp accurately for the arbitrary audio position. So this is somewhat estimated timestamp, and it's been so from the very beginning for the default case. > I have the final confirmation that what I am saying is in fact true: When I > use the delay returned by snd_pcm_status_get_delay() together with the > CURRENT time, it works perfect and I am back to my 90 microsecond stability. > > If I am using the hstamp it breaks. > > The reported delay is valid at the CURRENT time, not at the audio tstamp > time. That is what I have been saying all along. The audio timestamp isn't updated, and this means it has the limited accuracy. And even the current time you get from the system doesn't overcome that point. In anyway, what you were reading was nothing but the current time a la gettimeofday(). And this isn't what API was intended for. Again, if you need another type of timestamping, feel free to propose on ML. We may still extend the API if anything is missing. You still don't understand my point. What I am saying is that you take a delay snapshot at time X (the time when runtime->delay was updated) and then pretend it is from time Y (the time when hw_ptr was updated). This is plain wrong and has nothing to do with accuracy. How do you explain, that it works perfect, when I simply replace the time from htstamp by the current time I am reading from the system? It is exactly the same code otherwise, I am only replacing one single line. With the htstamp I am getting 800 microseconds stability with weird jumps in the latency while with system time I get 90 microseconds stability with smooth changes. I even double checked that this is indeed true and recompiled everything twice. For me personally the problem is solved - using system time delivers what I expect, but somehow I am reluctant to leave it at that. Regarding accuracy of delay reports: You are saying that you can't expect better accuracy than the granularity. This is true for a single read. But take a look at the document I referenced in comment #10 or take a look at the zita-ajbridge documentation if you do not believe me. It is very well possible to estimate the delay to a higher precision than that of a single read by using filtering. As already said, the algorithm I am using does not rely on the precision of a single read. (In reply to Georg Chini from comment #26) > You still don't understand my point. What I am saying is that you take a > delay snapshot at time X (the time when runtime->delay was updated) and then > pretend it is from time Y (the time when hw_ptr was updated). This is plain > wrong and has nothing to do with accuracy. Hm? The timestamp gets updated when runtime->delay is updated, too, as long as you set the timestamp config properly. This was already mentioned in comment 14. Did you really confirm that it's set properly? And, one more thing: the audio "position" corresponding to the reported timestamp is the audio timestamp. It's reported via snd_pcm_status_get_audio_htstamp(). When htimestamp is referred, the audio timestamp must be used as the base, instead of snd_pcm_status_get_delay() value. I still can't find the function you mentioned and you did not answer my question where it is documented. Maybe I misunderstood. It may indeed be possible that PA is doing things wrong because what I am seeing is that the time stamp is not updated to the time when runtime->delay changed. Found it. The name you gave was slightly wrong. I'll test. Yes, you were right. With the flag it works as expected. I misunderstood you (see comment #12). Sorry for the noise and thanks a lot for your patience. But then there is still a small bug in the code which lead to the weird behavior I have been seeing: When report_delay is not set, runtime->delay should also not be included in the delay estimation. Otherwise it is true what I stated in the first paragraph of comment #26. The current behavior is for compatibility reason. Even before introducing htimestamp extension, we've already reported the delay count via snd_pcm_status(). And the timestamp update was done only at the hwptr update. This is still the default behavior. With the htimestamp extension, a slight improvement was done regarding timestamping. But for keeping the compatibility with the old behavior, yet a new flag was introduced to adjust *only* the timestamp behavior. This is the report_delay flag in snd_pcm_audio_tstamp_config. Without the flag on, the behavior is kept as same as in many years ago. And, as mentioned, for the timestamp, the base audio position to be referred is not the reported delay, but the reported audio_htimestamp. The latter may or may not include the delay depending on the snd_pcm_audio_tstamp_config, and it always points to the audio position corresponding to the timestamp. Now I figured out, why I never hit this problem before. In all kernels since 4.1 the htstamp was updated unconditionally when snd_pcm_update_hw_ptr0() was called. That behavior changed in 4.15 and in the 4.14 backport. |
Created attachment 274981 [details] oscilloscope recording with 4.13 ALSA code Hello, I have been working on improving latency reports within Pulseaudio. This worked very well for kernels below 4.14. From kernel 4.14 onwards the playback delay reported by ALSA shows much larger variance in some situations than it did in previous kernel versions. This is especially obvious when using the PA loopback module to play back a HDA source to an USB sink. To reproduce the issue, I am running the loopback module at 200ms end-to-end latency and record input and output signal on an oscilloscope for about 10 minutes. Kernel versions I have been testing with are 4.13.0 which works and 4.14.28/4.16-rc6 which don't work. To verify that this is really an issue in the ALSA code, I did the following: 1) I first thought it was something in the USB code, so I replaced the sound/usb subtree in 4.14.28 with the code from 4.13.0. This did not change anything. 2) I replaced the whole sound and include/sound subtree in 4.14.28 with the code from 4.13.0. This worked again, so the problem is clearly located in the ALSA changes between 4.13 and 4.14. As a demonstration of the issue I attached two pictures with the oscilloscope recordings from a 4.14.28 kernel using the 4.13.0 ALSA code and an unmodified 4.16-rc6. As you can see, with the first kernel I am able to keep the end-to-end latency stable to around 90 microseconds while with newer kernels this is impossible and I can only achieve around 800 microseconds stability. Time scale of the first recording is 100 usec/unit while it is 200 usec/unit for the second. I suspect the reason is a missing lock somewhere in the core ALSA code, but with my limited kernel knowledge I am unable to track it down further. Regards Georg