Bug 16300

Summary: [2.6.34 regression] mplayer gets out of sync due to problems with ALSA
Product: Drivers Reporter: Artem S. Tashkinov (aros)
Component: Sound(ALSA)Assignee: Ingo Molnar (mingo)
Status: CLOSED CODE_FIX    
Severity: blocking CC: clemens, perex, rjw, tiwai
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 15310    
Attachments: 2.6.34 .config
mplayer output while playing a video
dmesg, lspci and various information from /proc
mplayer's strace for an audio file
mplayer -v -v
dmesg with xrun_debug=3
dmesg with xrun_debug=0,29,0,29
Patch to correct invalid position at snd_pcm_period_elapsed()
Fix hwptr on emu10k1
2.6.35 rc4, emu10k1-position-fix2.diff
Patch to delay pcm irqs for specifies amount of samples (0-31)
dmesg with all three patches applied
dmesg with all three patches applied and delay_pcm_irq=8
dmesg with xrun_debug=11 and delay_pcm_irq=16
dmesg with xrun_debug=11, delay_pcm_irq=64 and 0xff change
dmesg with xrun_debug=11 and delay_pcm_irq=16
dmesg with xrun_debug=11 and delay_pcm_irq=64 (!)
Patch to delay pcm irqs for specifies amount of samples (0-31)
dmesg with xrun_debug=11, delay_pcm_irq=0 and emu10k1-delay-irq2.patch
xrun_debug=11 log, delay_pcm_irq=8 and emu10k1-delay-irq2.patch
xrun_debug=27

Description Artem S. Tashkinov 2010-06-26 20:39:38 UTC
Created attachment 26954 [details]
2.6.34 .config

I have the following problem:

I'm unable to watch movies in mplayer because mplayer randomly skips long chunks of video/audio thus video is randomly accelerated to a point where you cannot even understand what's being played.

Caveats:

1) This bug occurs in kernel 2.6.34 with Intel drivers
2) This bug does NOT occur in kernel 2.6.34 with NVIDIA drivers (on the same PC)
3) This bug does NOT occur in kernel 2.6.33 with any drivers (on the same PC)

I have a very fast CPU (Intel Core i5), playing videos in question consumes less than 5% of CPU, films are stored locally, so I do NOT have issues with IO or system load.

When I play videos my PC is usually 100% idle.

I have captured mplayer's output, you can download it from here: http://www.mediafire.com/?y2ymtyz3yzt
Comment 1 Artem S. Tashkinov 2010-06-26 21:05:34 UTC
Created attachment 26955 [details]
mplayer output while playing a video

As you can see mplayer plays this film with random speeds, sometimes reaching 3-4 realtime speed.
Comment 2 Artem S. Tashkinov 2010-06-26 22:15:36 UTC
I've just compiled 2.6.35-rc3 and the bug is still here :(
Comment 3 Artem S. Tashkinov 2010-06-26 22:50:59 UTC
Created attachment 26956 [details]
dmesg, lspci and various information from /proc
Comment 4 Artem S. Tashkinov 2010-06-26 23:19:57 UTC
OK, I've talked to mplayer's developers, it's a problem with ALSA which is used as a timer source:

"it is most likely bug in the sound driver.
by mesuring the remaining buffer.
maybe alsa drops the whole buffer, mplayer thinks it have played it and fills another, to be dropped again."

As soon as I use `mplayer -ao oss video` (use OSS instead of ALSA) the bug is gone.
Comment 5 Artem S. Tashkinov 2010-06-26 23:45:18 UTC
Created attachment 26957 [details]
mplayer's strace for an audio file

This bug occurs even when playing plain audio files.
Comment 6 Artem S. Tashkinov 2010-06-27 00:20:58 UTC
I have several audio cards, but this problem *only* occurs with:

04:00.0 Generic system peripheral [0841]: Creative Labs SB Live! EMU10k1 (rev
07)

It does *not* occur with e.g.:

HDA Intel at 0xfbdf8000 irq 44 (Codec: VIA VT1718S)

So, probably some changes to emu10k1 code during 2.6.34 merge window time have broken mplayer.
Comment 7 Artem S. Tashkinov 2010-06-29 09:03:16 UTC
I've managed to compile sound/ directory of kernel 2.6.33 on top of kernel 2.6.34 and the bug has gone. emu10k1 code has no changes between these two kernels, so something inside a core ALSA subsystem was broken.

I would be glad to bisect this problem but I hate the prospects of rebooting my PC 25 times to find a bad commit. If anyone has ideas how to bisect _only_ /sound, I'd be very grateful.
Comment 8 Clemens Ladisch 2010-06-29 09:16:52 UTC
This is likely to be some change in the core PCM code.

Please show the output when you run your mplayer command with the -v option.
Comment 9 Artem S. Tashkinov 2010-06-29 09:41:22 UTC
Created attachment 26970 [details]
mplayer -v -v
Comment 10 Artem S. Tashkinov 2010-06-29 09:44:20 UTC
This is exactly what I'm experiencing:

A:   4.7 (04.7) of 212.0 (03:32.0)  0.4% .[J
A:   5.1 (05.1) of 212.0 (03:32.0)  0.4% .[J

A:   6.1 (06.1) of 212.0 (03:32.0)  0.4% .[J
A:   6.5 (06.5) of 212.0 (03:32.0)  0.4% .[J

A:   6.8 (06.8) of 212.0 (03:32.0)  0.4% .[J
A:   7.2 (07.1) of 212.0 (03:32.0)  0.4% .[J

A:   7.4 (07.4) of 212.0 (03:32.0)  0.4% .[J
A:   7.8 (07.8) of 212.0 (03:32.0)  0.4% .[J

etc.

mplayer skips data completely instead of playing it.
Comment 11 Clemens Ladisch 2010-06-30 07:44:38 UTC
I haven't been able to reproduce this with my Emu10k1 card on kernel 2.6.34, but I'm using radeon.

There have been some recent fixes (which should appear in 2.6.34.x sometime); please try 2.6.35.rc3orsomething.

Bisecting in certain subdirectories is possible:
http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html#_cutting_down_bisection_by_giving_more_parameters_to_bisect_start
Comment 12 Artem S. Tashkinov 2010-07-01 17:23:58 UTC
16 reboots and here it is:

# git bisect good
7b3a177b0d4f92b3431b8dca777313a07533a710 is the first bad commit
commit 7b3a177b0d4f92b3431b8dca777313a07533a710
Author: Jaroslav Kysela <perex@perex.cz>
Date:   Fri Jan 8 08:43:01 2010 +0100

    ALSA: pcm_lib: fix "something must be really wrong" condition
    
    When runtime->periods == 1 or when pointer crosses end of ring buffer,
    the delta might be greater than buffer_size.
    
    Signed-off-by: Jaroslav Kysela <perex@perex.cz>

:040000 040000 a0835badfe6fe1b8b8f947432b2dea5250e31b65 d92abb531cd473d9a59ee275d35f6e3857afa83e M      sound
Comment 13 Artem S. Tashkinov 2010-07-01 17:47:50 UTC
Double confirmation: reverting this patch on top of 2.6.34 fixes the issue.
Comment 14 Artem S. Tashkinov 2010-07-01 17:59:08 UTC
And now I wonder why the $^%# ALSA developers don't pay a single bit of attention to this bug, because it was known since 13 Apr 23:54:

http://comments.gmane.org/gmane.linux.kernel/973516

> Hello,
> 
> Since 2.6.34-rc*, I have a regression on alsa which prevents the sound
> to be played correctly. When playing, the music goes too fast, skipping
> some parts. Typically, it's very easy to reproduce by doing:
> time mplayer -endpos 30 sound-file-which-lasts-more-than-thirty-sec.mp3

and the issue already has a filed bug report, which is bug 15796.

I'm very upset.
Comment 15 Jaroslav Kysela 2010-07-01 21:14:57 UTC
Use value 29 for xrun_debug and attach a log with some comments where the problem occurs:

http://www.alsa-project.org/main/index.php/XRUN_Debug

It would be also good to know, if the latest ALSA driver (http://www.alsa-project.org/snapshot/) has same problem.
Comment 16 Artem S. Tashkinov 2010-07-02 19:26:46 UTC
Created attachment 27000 [details]
dmesg with xrun_debug=3

alsa-driver-1.0.23.35.g8bc80.297.g5e011e
Comment 17 Artem S. Tashkinov 2010-07-02 19:29:53 UTC
Created attachment 27001 [details]
dmesg with xrun_debug=0,29,0,29

Now, it gets really interesting.

As soon as I set xrun_debug to 29, the bug disappears, thus I'm not sure if this dmesg output is of any use.
Comment 18 Takashi Iwai 2010-07-08 06:24:41 UTC
It works now because you enabled the automatic correction using jiffies check via proc file.

The log shows that the returned hwptr values are almost all wrong; every time it's positioned one sample below the target.

Do you run your app with 44.1kHZ samples?  Does the problem occur with 48kHz, too?
Comment 19 Rafael J. Wysocki 2010-07-09 22:52:19 UTC
Handled-By : Takashi Iwai <tiwai@suse.de>
Comment 20 Artem S. Tashkinov 2010-07-10 08:51:11 UTC
(In reply to comment #18)
> It works now because you enabled the automatic correction using jiffies check
> via proc file.
> 
> The log shows that the returned hwptr values are almost all wrong; every time
> it's positioned one sample below the target.
> 
> Do you run your app with 44.1kHZ samples?  Does the problem occur with 48kHz,
> too?

Yes, the problem occurs with 48KHz source audio too.

However there's one thing I've noticed: when playing FLAC files the problem occurs almost instantly, but I can play MP3 files for 5-10 minutes straight without getting this loathsome problem.
Comment 21 Artem S. Tashkinov 2010-07-10 08:54:59 UTC
Anticipating your question about kernel release that I'm running - it's now 2.6.35-rc4.

> However there's one thing I've noticed: when playing FLAC files the problem
> occurs almost instantly, but I can play MP3 files for 5-10 minutes straight
> without getting this loathsome problem.

Answering my own assumption - FLAC samples are probably shorter than MP3 samples, thus the problem is more pronounced.
Comment 22 Artem S. Tashkinov 2010-07-10 10:27:03 UTC
BTW, here's a thread on a Russian forum where people experience the same problem:

http://translate.google.com/translate?js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&u=http%3A%2F%2Fwww.linux.org.ru%2Fforum%2Ftalks%2F5097377&sl=auto&tl=en
Comment 23 Takashi Iwai 2010-07-10 18:21:00 UTC
Could you try the patch below?
Comment 24 Takashi Iwai 2010-07-10 18:22:11 UTC
Created attachment 27065 [details]
Patch to correct invalid position at snd_pcm_period_elapsed()
Comment 25 Jaroslav Kysela 2010-07-10 19:20:37 UTC
Looking to the patch - is not more easy to add a constant to the extra voice CCCA register setup in the snd_emu10k1_pcm_init_voice() routine? (line 369 - emupcm.c).
Comment 26 Takashi Iwai 2010-07-10 20:42:17 UTC
Well, the CCCA read value might not be constantly wrong.  If the value is sometimes correct, increasing the value unconditionally may give click noises especially when used with dmix or PA, which rely on the exact hwptr position.

OTOH, correcting only at snd_pcm_period_elapsed() like the patch in comment 24 would be safe at this moment.  The patch is a proof of concept, a quick fix.  After that, we can measure the real time and register value whether it has the constant offset, and make things less complex.  But, we'd need to measure on different platforms for it.  As Clemens reported, there are still good-working emu10k1 systems...

BTW, the patch above isn't perfect, but should be enough for mplayer.  I'll put a revised version once when the current patch works.
Comment 27 Artem S. Tashkinov 2010-07-10 21:31:47 UTC
(In reply to comment #24)
> Created an attachment (id=27065) [details]
> Patch to correct invalid position at snd_pcm_period_elapsed()

This patch didn't help, the issue persists.

And according to the thread I've given to you people experience this problem on almost all HDA codecs and even on Audigy2 sound cards.

( URLs in Russian: http://www.linux.org.ru/forum/talks/5097889 and http://www.linux.org.ru/forum/talks/5097377 ).
Comment 28 Jaroslav Kysela 2010-07-11 07:32:10 UTC
(In reply to comment #27)
> (In reply to comment #24)
> > Created an attachment (id=27065) [details] [details]
> > Patch to correct invalid position at snd_pcm_period_elapsed()
> 
> This patch didn't help, the issue persists.

Attach xrun_debug = 29 log.

> And according to the thread I've given to you people experience this problem
> on
> almost all HDA codecs and even on Audigy2 sound cards.
> 
> ( URLs in Russian: http://www.linux.org.ru/forum/talks/5097889 and
> http://www.linux.org.ru/forum/talks/5097377 ).

Is this problem also in v2.6.35-rc4 ? Anyway, if these problems occurs, we need log for xrun_debug = 29 to see what's wrong.

These patches fixes the snd-hda-intel issues:

commit edb39935c8b19fcd9a8f619d0bc1e9d04594cd2b
Author: Jaroslav Kysela <perex@perex.cz>
Date:   Wed Jun 2 13:29:17 2010 +0200

    ALSA: hda-intel - fix wallclk variable update and condition
    
    This patch fixes thinko introduced in "last minutes" before commiting
    of the last wallclk patch.
    
    It also fixes the condition checking if the first period after last
    wallclk update is processed. There is a little rounding error in
    period_wallclk.
    
    Signed-off-by: Jaroslav Kysela <perex@perex.cz>

commit 9ef04066b3e7c51ed7edc6010ac039f18f9f3617
Author: Clemens Ladisch <clemens@ladisch.de>
Date:   Tue May 25 09:03:40 2010 +0200

    ALSA: hda_intel: fix handling of non-completion stream interrupts
    
    Check that the interrupt raised for a stream is actually a buffer
    completion interrupt before handling it as one.  Otherwise, memory
    errors or FIFO xruns would be interpreted as a pointer update and could
    break the stream timing.
    
    Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
    Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit f48f606d9f85a8b25c716082c73c36e9ff6f0cde
Author: Jaroslav Kysela <perex@perex.cz>
Date:   Tue May 11 12:10:47 2010 +0200

    [ALSA] snd-hda-intel: Improve azx_position_ok()
    
    Add back the zero return value (activate workqueue) when
    bdl_pos_adj is nonzero for position check.
    
    Do the position related check only for first next period
    using wallclk counter.
    
    Return -1 value (ignore interrupt) when period_bytes
    variable is zero.
    
    Signed-off-by: Jaroslav Kysela <perex@perex.cz>

commit e54637205b00837bf00de916b0ae361c6aa0b139
Author: Jaroslav Kysela <perex@perex.cz>
Date:   Tue May 11 10:21:46 2010 +0200

    [ALSA] snd-hda-intel: use WALLCLK register to check for early irqs
    
    Use 24Mhz WALLCLK register to ignore too early interrupts and
    wrong interrupt status. The bad timing confuses the higher ALSA
    layer and causes audio skipping. More information about behaviour
    and debugging can be found in kernel bz#15912.
    
    https://bugzilla.kernel.org/show_bug.cgi?id=15912
    
    Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Comment 29 Takashi Iwai 2010-07-11 08:59:21 UTC
First try the revised patch below.  It'll spew many debug messages, too.
If this still doesn't work, attach the kernel log.  Then run with xrun_debug = 29 (together with the patch), and attach the log, too.

Regarding HD-audio: this is a different hardware.  And the bug you see is about emu10k1.  So, it's a different problem even though the same git commit triggers.  For HD-audio, create another bug report, and don't mix them up here.  And as Jaroslav pointed, better to check whether 2.6.35-rc works.

This bug is revealed recently because the recent code checks the positions the hardware reports more strictly (mainly because of PulseAudio).  That is, this kind of problem is primarily a hardware issue.

Many HD-audio controllers are so crappy, and often give wrong positions at wrong time.  On your board, emu10k1 gives the wrong position, too.
Comment 30 Takashi Iwai 2010-07-11 09:01:29 UTC
Created attachment 27066 [details]
Fix hwptr on emu10k1
Comment 31 Artem S. Tashkinov 2010-07-11 09:45:09 UTC
Created attachment 27067 [details]
2.6.35 rc4, emu10k1-position-fix2.diff

The bug persists.

What kind of xrun_debug should I add to a bug report? Or dmesg is enough?
Comment 32 Takashi Iwai 2010-07-11 14:52:49 UTC
What apps did you test?  Apparently, there are a few different runs in the log.
The first one or two use 4096 x 2 frames.  The driver seemed to run well for these.

Then, it ran with 1024 x 16 frames, which is supposed to by mplayer.  And at some point, suddenly the wrong pointers are reported.  Something got wrong with the hardware.
Comment 33 Takashi Iwai 2010-07-11 16:55:03 UTC
(In reply to comment #31)
> Created an attachment (id=27067) [details]
> 2.6.35 rc4, emu10k1-position-fix2.diff
> 
> The bug persists.
> 
> What kind of xrun_debug should I add to a bug report? Or dmesg is enough?

xrun_debug=29 means to set 29 to /proc/asound/card*/pcm*/xrun_debug (needs to set only to corresponding /proc/asound/card* entries like card0).  /proc/asound/card*/pcm0p/xrun_debug should suffice for checking only playback.  Then you'll have much more kernel debug messages (so yes, dmesg should suffice).

But, before that, could you set just 4 to xrun_debug proc file and check whether this resolves the problem (with / without patch)?  This bit will activate the jiffies check in PCM core.

Note that filtering in PCM core is barely a right solution but a temporary workaround.  The real problem is either the wrong interrupt timing or the bogus hwptr value.  I suspected the latter, but judging from the fact that my patch doesn't work, it might be the former...
Comment 34 Artem S. Tashkinov 2010-07-11 17:15:22 UTC
(In reply to comment #32)
> What apps did you test?  Apparently, there are a few different runs in the
> log.
> The first one or two use 4096 x 2 frames.  The driver seemed to run well for
> these.
> 
> Then, it ran with 1024 x 16 frames, which is supposed to by mplayer.  And at
> some point, suddenly the wrong pointers are reported.  Something got wrong
> with
> the hardware.

The first application was a KDE logon sound (so probably artsd played it), the second was mplayer (which I usually pause after 1-1,5 seconds of playing a FLAC file - that makes the bug appear instantly).

I'm away from my computer right now, so I will post the required information next week.
Comment 35 Jaroslav Kysela 2010-07-11 17:25:00 UTC
Also try to change line 'epcm->position_fix = epcm->next_position - 2;' to 'epcm->position_fix = epcm->next_position - 8;' in patch emu10k1-position-fix2.diff.
Comment 36 Jaroslav Kysela 2010-07-12 09:19:53 UTC
Created attachment 27070 [details]
Patch to delay pcm irqs for specifies amount of samples (0-31)

This patch will delay the PCM interrupts by specified amount of samples. I tested this code with SB Live! Value (emu10k1). This is the clean solution to fix reported issue (reporter - please try value 4 or 8 for your system for the introduced delay_pcm_irq module option).

Note that in my environment (IbexPeak platform) the default value (0) (old behaviour) works perfectly (the periods are acknowledged with hwptr positions 4 or 5 samples behing the period boundary - I saw few cases when hwptr was 1 sample behind).

I think that the combination emu10k1 chip version + PCI bridge setup/hardware causes that PCM interrupts are issued too early on some hw.

The question left is: How we can detect the wrong behaviour automatically.
Comment 37 Artem S. Tashkinov 2010-07-12 16:39:49 UTC
I'm confused, what patches I should apply? Or all of them in this order?

1. Fix hwptr on emu10k1 v2
then 
2. epcm->position_fix = epcm->next_position - 8;
then
3. Patch to delay pcm irqs for specifies amount of samples
Comment 38 Artem S. Tashkinov 2010-07-12 17:41:09 UTC
Created attachment 27076 [details]
dmesg with all three patches applied

The bug persists, I will now try delay_pcm_irq.
Comment 39 Artem S. Tashkinov 2010-07-12 17:47:46 UTC
Created attachment 27077 [details]
dmesg with all three patches applied and delay_pcm_irq=8

With both delay_pcm_irq = 4 and = 8 I can reproduce the problem, however with delay_pcm_irq = 8 it took some time for bug to appear.
Comment 40 Jaroslav Kysela 2010-07-13 06:26:33 UTC
Apply only patch from comment#36. Try values 4, 8, 16, 31 for the delay_pcm_irq module parameter. If you encouter the problem with all values, attach log for delay_pcm_irq=16 - set xrun_debug to 3 (echo 3 > /proc/asound/card*/pcm*/xrun_debug - replace * with card number and device number).
Comment 41 Jaroslav Kysela 2010-07-13 06:28:15 UTC
(In reply to comment #40)
> Apply only patch from comment#36. Try values 4, 8, 16, 31 for the
> delay_pcm_irq
> module parameter. If you encouter the problem with all values, attach log for
> delay_pcm_irq=16 - set xrun_debug to 3 (echo 3 >
> /proc/asound/card*/pcm*/xrun_debug - replace * with card number and device
> number).

Sorry, the xrun_debug value should be 11 (to dump the hwptr position on the period update call).
Comment 42 Artem S. Tashkinov 2010-07-13 10:33:04 UTC
Created attachment 27084 [details]
dmesg with xrun_debug=11 and delay_pcm_irq=16

The bug occurs with all suggested delay_pcm_irq values.
Comment 43 Artem S. Tashkinov 2010-07-13 10:34:14 UTC
P.S. I'm now running 2.6.35-rc5 with patch from comment 36.
Comment 44 Jaroslav Kysela 2010-07-13 16:43:34 UTC
Please, change line in patch 'emu->delay_pcm_irq = delay_pcm_irq[dev] & 0x1f;' to 'emu->delay_pcm_irq = delay_pcm_irq[dev] & 0xff;' and use delay_pcm_irq=64 (attach xrun_debug=11 log).
Comment 45 Artem S. Tashkinov 2010-07-13 18:40:14 UTC
Created attachment 27091 [details]
dmesg with xrun_debug=11, delay_pcm_irq=64 and 0xff change

The bug is still here.
Comment 46 Jaroslav Kysela 2010-07-13 20:05:02 UTC
I don't think that you pass the delay_pcm_irq value to the module. What is contents of 'cat /sys/module/snd_emu10k1/parameters/delay_pcm_irq' when the snd_emu10k1 module is loaded?
Comment 47 Artem S. Tashkinov 2010-07-13 20:12:36 UTC
cat /sys/module/snd_emu10k1/parameters/delay_pcm_irq
0,0,0,0,0,0,0,0
Comment 48 Artem S. Tashkinov 2010-07-13 20:14:07 UTC
I insert the emu10k1 module this way:

# modprobe snd_emu10k1 delay_pcm_irq=64

am I wrong?
Comment 49 Jaroslav Kysela 2010-07-13 22:16:08 UTC
I'm not sure what's wrong. Just check, if snd_emu10k1 module is not loaded (modprobe does not return an error, if module is already loaded).

Other way is to change the initial value in patch:

static uint delay_pcm_irq[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS - 1)] = 64};
Comment 50 Artem S. Tashkinov 2010-07-13 22:52:07 UTC
I rmmod modules using this custom script: rmmod `lsmod | grep snd | awk '{print $1}'` and it effectively removes all ALSA modules from the running kernel.

OK, now with this patch
(In reply to comment #49)
> static uint delay_pcm_irq[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS - 1)] = 64};

I finally get the desired values in /sys/module/snd_emu10k1/parameters/delay_pcm_irq:
64,64,64,64,64,64,64,64

So, probably there are not enough pieces in a patch which should accept 'modprobe' parameters or something else is missing/not working correctly.

And ... after applying this patch the bug has finally disappeared (at least I cannot reproduce it any longer)! Thanks, I hope kernel 2.6.35 will be released with patches which allow using ALSA backend in mplayer.
Comment 51 Jaroslav Kysela 2010-07-14 09:10:21 UTC
Please, determine the lowest value which works for you (I think it would be 4 or something like that).

I tested my patch on my machine and the module parameter was accepted correctly, so I doubt that it's a problem in the patch.
Comment 52 Takashi Iwai 2010-07-14 10:05:01 UTC
BTW, one possible automation way is to check the validity of hwptr like my patch, set / increase the correction offset, and gives XRUN to restart the stream.  This won't conflict with delay_pcm_irq modparm patch, but will be an addition.
Comment 53 Jaroslav Kysela 2010-07-14 10:14:08 UTC
The other solution might be to start with value delay_pcm_irq=4 or so (after verification) by default. My measurements show that on my system the irq ack delay is usually 4 or 5 samples so it does not harm (8 or 9 samples with delay_pcm_irq=4). Experienced users can decrease this value.
Comment 54 Artem S. Tashkinov 2010-07-14 15:50:41 UTC
I still get this bug with delay_pcm_irq equal to 0, 4 and 8 and it only vanishes with 16, so 16 is a bare minimum for this problem to go away.

"emu->delay_pcm_irq = delay_pcm_irq[dev] & 0xff;" patch is not necessesary.

P.S. Guys!

$ modinfo snd_emu10k1
...
parm:  delay_pcm_irq:Delay PCM interrupt by specified number of samples (default 0). (array of uint)
...

That is why my attempts to pass an integer to this module failed,  I should have passed an _array_ of unsigned integers! I don't know this array dimensions (and I have no idea how to find that out), so I strongly believe you need to show how many numbers are required.
Comment 55 Artem S. Tashkinov 2010-07-14 15:58:35 UTC
Created attachment 27103 [details]
dmesg with xrun_debug=11 and delay_pcm_irq=16

Too early!

I've just started watching a movie, been watching it for 5 minutes, paused it, then after a 3 minutes pause the bug is again here with delay_pcm_irq=16.

I will recheck higher values to make sure at least 31 or 64 are usable and bug free.

(BTW, I've never got this error with NVIDIA driver running 2.6.34, or maybe I just didn't test it enough)
Comment 56 Artem S. Tashkinov 2010-07-14 18:08:23 UTC
Created attachment 27104 [details]
dmesg with xrun_debug=11 and delay_pcm_irq=64 (!)

OK, I can reproduce this bug with any delay_pcm_irq (now running 2.6.35-rc5 with just emu10k1-delay-irq.patch)

Sigh, I don't know what to do.

cat /sys/module/snd_emu10k1/parameters/delay_pcm_irq
64,64,64,64,64,64,64,64

Steps to reproduce: start watching a movie in mplayer, press and don't release "space" button so that mplayer keeps pausing and resuming tens of times - then this bug reoccurs or might not reoccur - it happens quite randomly - at times you need to watch a movie for some time, try this trick, nothing will happen, randomly fast forward, rewind, then use "space" trick again and so on and so forth. However with lower delay_pcm_irq values like 0,4 and 8 this bug can be triggered almost simultaneously just after pausing an audio/video stream just once.

Anyway PAUSING playing is a sure way to trigger the bug, I mean during normal playing of a any file this bug almost never happens, you must pause then continue playing to get a "desired" effect. I don't know if this information is of any use, I just wanted you to know.
Comment 57 Jaroslav Kysela 2010-07-14 19:30:53 UTC
Could you try to move line 'snd_emu10k1_playback_stop_voice(emu, epcm->extra);' before 'snd_emu10k1_playback_stop_voice(emu, epcm->voices[0]);' in sound/pci/emu10k1/emupcm.c file? Only in the snd_emu10k1_playback_trigger() function.
Comment 58 Artem S. Tashkinov 2010-07-14 21:44:24 UTC
No, this change didn't help (tried delay_pcm_irq=0 and 64).

[25292.843779] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=13311/1024/16384, hwptr=36/1405915/1405951/1392640
[25292.866949] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=14335/1024/16384, hwptr=36/1406939/1406975/1392640
[25292.890092] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=15359/1024/16384, hwptr=36/1407963/1407999/1392640
[25292.913257] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=16383/1024/16384, hwptr=35/1408988/1409023/1392640
[25292.936398] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=1023/1024/16384, hwptr=18/1410029/1410047/1409024
[25292.959569] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=2047/1024/16384, hwptr=17/1411054/1411071/1409024
[25292.982733] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=3071/1024/16384, hwptr=16/1412079/1412095/1409024
[25293.005867] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=4095/1024/16384, hwptr=17/1413102/1413119/1409024
[25293.029032] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=5119/1024/16384, hwptr=0/1414143/1414143/1409024
[25293.052176] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=6143/1024/16384, hwptr=575/1414592/1415167/1409024
[25293.075341] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=7167/1024/16384, hwptr=1019/1415172/1416191/1409024
[25293.098480] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=8191/1024/16384, hwptr=1/1417214/1417215/1409024
[25293.121668] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=9216/1024/16384, hwptr=34/1418206/1418240/1409024
[25293.144787] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=10239/1024/16384, hwptr=16416/1419231/1435647/1409024
[25293.144799] Pid: 0, comm: swapper Tainted: G  R         2.6.35-rc5-ic #3
[25293.144803] Call Trace:
[25293.144816]  [<fa1f797e>] snd_pcm_update_hw_ptr0+0x59e/0x6f0 [snd_pcm]
[25293.144826]  [<c10269d8>] ? __wake_up_common+0x48/0x70
[25293.144834]  [<fa1f7b7b>] snd_pcm_period_elapsed+0xab/0x100 [snd_pcm]
[25293.144844]  [<fa4984ad>] ? snd_emu10k1_ptr_read+0x5d/0xd0 [snd_emu10k1]
[25293.144851]  [<fa496386>] snd_emu10k1_pcm_interrupt+0x16/0x20 [snd_emu10k1]
[25293.144858]  [<fa49334a>] snd_emu10k1_interrupt+0x26a/0x590 [snd_emu10k1]
[25293.144877]  [<f8665719>] ? e100_disable_irq+0x49/0x60 [e100]
[25293.144884]  [<c10607fe>] handle_IRQ_event+0x2e/0xc0
[25293.144889]  [<c10635d4>] ? move_native_irq+0x14/0x50
[25293.144894]  [<c106253f>] handle_fasteoi_irq+0x5f/0xc0
[25293.144899]  [<c10624e0>] ? handle_fasteoi_irq+0x0/0xc0
[25293.144902]  <IRQ>  [<c1004687>] ? do_IRQ+0x47/0xc0
[25293.144911]  [<c1003169>] ? common_interrupt+0x29/0x30
[25293.144921]  [<f875d1a7>] ? acpi_idle_enter_simple+0x100/0x134 [processor]
[25293.144929]  [<f875cf10>] ? acpi_idle_enter_bm+0xbe/0x255 [processor]
[25293.144935]  [<c1215ab6>] ? cpuidle_idle_call+0x76/0xd0
[25293.144940]  [<c1001c17>] ? cpu_idle+0x47/0x90
[25293.144945]  [<c128e9d8>] ? start_secondary+0x1c8/0x1ce
[25293.167950] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=11263/1024/16384, hwptr=23/1436648/1436671/1425408
[25293.191089] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=12287/1024/16384, hwptr=21/1437674/1437695/1425408
Comment 59 Jaroslav Kysela 2010-07-15 12:46:15 UTC
Created attachment 27115 [details]
Patch to delay pcm irqs for specifies amount of samples (0-31)

Improved patch implementing delay_pcm_irq module option. This version includes position correction for the interrupt voice for the pause/resume trigger.

Please, test.
Comment 60 Artem S. Tashkinov 2010-07-17 06:50:24 UTC
I'm unable to reproduce the bug with the latest patch applied, probably it's finally resolved (even setting 'delay_pcm_irq' wasn't necessary).

Give me a few days for testing and I will tell for sure if this problem is gone.

Thank you.
Comment 61 Artem S. Tashkinov 2010-07-18 08:02:36 UTC
Created attachment 27134 [details]
dmesg with xrun_debug=11, delay_pcm_irq=0 and emu10k1-delay-irq2.patch

With emu10k1-delay-irq2.patch patch I can still reproduce the problem, however it has become less severe. mplayer now accelerates playing not as much as it did it before.

What delay_pcm_irq values should I try this time? As the problem is now less pronounced, maybe you can come up even with a better patch which doesn't require fiddling with delay_pcm_irq parameter?
Comment 62 Jaroslav Kysela 2010-07-18 16:23:02 UTC
Try to dermine the lowest value. I would test 2,4,8.
Comment 63 Artem S. Tashkinov 2010-07-30 11:41:58 UTC
(In reply to comment #62)
> Try to dermine the lowest value. I would test 2,4,8.

With the last patch the bug is very difficult to trigger, so I think it would be best to apply it for now, unless I come up with a better way of reproducing it.
Comment 64 Artem S. Tashkinov 2010-08-04 10:51:15 UTC
I've found out that Audacius media player has terrible problems playing FLAC files under 2.6.35 even with delay_pcm_irq equal to 8.

Audacious intermittently stutters and skips up to a second of an audio stream but that doesn't cause audio stream acceleration. Meanwhile such stumblings happen up to 10 times during a 3 minutes single song playing so I cannot say this bug is fixed.

I will now reboot into 2.6.33 to check out if this kernel is bug free, then reboot into 2.6.35 and enable XRUN debugging to see what's happening.
Comment 65 Artem S. Tashkinov 2010-08-05 09:21:00 UTC
Audacious media player, xrun_debug=3, delay_pcm_irq=8:

[ 4383.615116] ALSA sound/core/pcm_lib.c:433: PCM: Lost interrupts? [Q] (stream=0, delta=53, new_hw_ptr=666415, old_hw_ptr=666362)
[ 4383.615134] Pid: 0, comm: swapper Not tainted 2.6.35-ic #2
[ 4383.615141] Call Trace:
[ 4383.615168]  [<f86b699e>] snd_pcm_update_hw_ptr0+0x59e/0x6f0 [snd_pcm]
[ 4383.615185]  [<c120466d>] ? input_event+0x5d/0x70
[ 4383.615194]  [<c1208062>] ? atkbd_interrupt+0x622/0x6d0
[ 4383.615208]  [<f86b6b9b>] snd_pcm_period_elapsed+0xab/0x100 [snd_pcm]
[ 4383.615223]  [<f8c0950d>] ? snd_emu10k1_ptr_read+0x5d/0xd0 [snd_emu10k1]
[ 4383.615237]  [<f8c07386>] snd_emu10k1_pcm_interrupt+0x16/0x20 [snd_emu10k1]
[ 4383.615250]  [<f8c0434a>] snd_emu10k1_interrupt+0x26a/0x590 [snd_emu10k1]
[ 4383.615267]  [<f87e3719>] ? e100_disable_irq+0x49/0x60 [e100]
[ 4383.615277]  [<c106086e>] handle_IRQ_event+0x2e/0xc0
[ 4383.615287]  [<c1063644>] ? move_native_irq+0x14/0x50
[ 4383.615296]  [<c10625af>] handle_fasteoi_irq+0x5f/0xc0
[ 4383.615305]  [<c1062550>] ? handle_fasteoi_irq+0x0/0xc0
[ 4383.615311]  <IRQ>  [<c1004687>] ? do_IRQ+0x47/0xc0
[ 4383.615325]  [<c1003169>] ? common_interrupt+0x29/0x30
[ 4383.615340]  [<f846108a>] ? acpi_idle_enter_bm+0x238/0x26c [processor]
[ 4383.615352]  [<c1215e36>] ? cpuidle_idle_call+0x76/0xd0
[ 4383.615360]  [<c1001c17>] ? cpu_idle+0x47/0x90
[ 4383.615370]  [<c128ee78>] ? start_secondary+0x1c8/0x1ce
[ 4470.381208] ALSA sound/core/pcm_lib.c:433: PCM: Lost interrupts? [Q] (stream=0, delta=64, new_hw_ptr=4503673, old_hw_ptr=4503609)
[ 4470.381219] Pid: 0, comm: swapper Not tainted 2.6.35-ic #2
[ 4470.381222] Call Trace:
[ 4470.381247]  [<f86b699e>] snd_pcm_update_hw_ptr0+0x59e/0x6f0 [snd_pcm]
[ 4470.381260]  [<f86b6b9b>] snd_pcm_period_elapsed+0xab/0x100 [snd_pcm]
[ 4470.381270]  [<f8c0950d>] ? snd_emu10k1_ptr_read+0x5d/0xd0 [snd_emu10k1]
[ 4470.381279]  [<f8c07386>] snd_emu10k1_pcm_interrupt+0x16/0x20 [snd_emu10k1]
[ 4470.381287]  [<f8c0434a>] snd_emu10k1_interrupt+0x26a/0x590 [snd_emu10k1]
[ 4470.381299]  [<f87a1e0e>] ? ehci_irq+0xfe/0x1e0 [ehci_hcd]
[ 4470.381306]  [<c106086e>] handle_IRQ_event+0x2e/0xc0
[ 4470.381312]  [<c1063644>] ? move_native_irq+0x14/0x50
[ 4470.381317]  [<c10625af>] handle_fasteoi_irq+0x5f/0xc0
[ 4470.381322]  [<c1062550>] ? handle_fasteoi_irq+0x0/0xc0
[ 4470.381325]  <IRQ>  [<c1004687>] ? do_IRQ+0x47/0xc0
[ 4470.381334]  [<c1003169>] ? common_interrupt+0x29/0x30
[ 4470.381345]  [<f846108a>] ? acpi_idle_enter_bm+0x238/0x26c [processor]
[ 4470.381352]  [<c1215e36>] ? cpuidle_idle_call+0x76/0xd0
[ 4470.381356]  [<c1001c17>] ? cpu_idle+0x47/0x90
[ 4470.381362]  [<c128ee78>] ? start_secondary+0x1c8/0x1ce
[ 4472.373884] ALSA sound/core/pcm_lib.c:433: PCM: Lost interrupts? [Q] (stream=0, delta=33, new_hw_ptr=4591799, old_hw_ptr=4591766)
[ 4472.373894] Pid: 0, comm: swapper Not tainted 2.6.35-ic #2
[ 4472.373897] Call Trace:
[ 4472.373923]  [<f86b699e>] snd_pcm_update_hw_ptr0+0x59e/0x6f0 [snd_pcm]
[ 4472.373936]  [<f86b6b9b>] snd_pcm_period_elapsed+0xab/0x100 [snd_pcm]
[ 4472.373946]  [<f8c0950d>] ? snd_emu10k1_ptr_read+0x5d/0xd0 [snd_emu10k1]
[ 4472.373955]  [<f8c07386>] snd_emu10k1_pcm_interrupt+0x16/0x20 [snd_emu10k1]
[ 4472.373963]  [<f8c0434a>] snd_emu10k1_interrupt+0x26a/0x590 [snd_emu10k1]
[ 4472.373975]  [<f87a1e0e>] ? ehci_irq+0xfe/0x1e0 [ehci_hcd]
[ 4472.373983]  [<f87e3719>] ? e100_disable_irq+0x49/0x60 [e100]
[ 4472.373990]  [<c106086e>] handle_IRQ_event+0x2e/0xc0
[ 4472.373996]  [<c1063644>] ? move_native_irq+0x14/0x50
[ 4472.374001]  [<c10625af>] handle_fasteoi_irq+0x5f/0xc0
[ 4472.374006]  [<c1062550>] ? handle_fasteoi_irq+0x0/0xc0
[ 4472.374009]  <IRQ>  [<c1004687>] ? do_IRQ+0x47/0xc0
[ 4472.374018]  [<c1003169>] ? common_interrupt+0x29/0x30
[ 4472.374024]  [<c103007b>] ? mmput+0x7b/0xb0
[ 4472.374032]  [<c12936b8>] ? _raw_spin_unlock_irq+0x8/0x30
[ 4472.374041]  [<f87c7224>] ? rtl8169_phy_timer+0x94/0xf0 [r8169]
[ 4472.374046]  [<c12936fe>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[ 4472.374057]  [<c103d215>] ? run_timer_softirq+0xe5/0x1e0
[ 4472.374065]  [<f87c7190>] ? rtl8169_phy_timer+0x0/0xf0 [r8169]
[ 4472.374070]  [<c1037b59>] ? __do_softirq+0x89/0x110
[ 4472.374075]  [<c1037ad0>] ? __do_softirq+0x0/0x110
[ 4472.374077]  <IRQ>  [<c103788d>] ? irq_exit+0x6d/0x70
[ 4472.374086]  [<c1019e86>] ? smp_apic_timer_interrupt+0x56/0x90
[ 4472.374092]  [<c1293ede>] ? apic_timer_interrupt+0x2a/0x30
[ 4472.374101]  [<f846108a>] ? acpi_idle_enter_bm+0x238/0x26c [processor]
[ 4472.374108]  [<c1215e36>] ? cpuidle_idle_call+0x76/0xd0
[ 4472.374112]  [<c1001c17>] ? cpu_idle+0x47/0x90
[ 4472.374117]  [<c128ee78>] ? start_secondary+0x1c8/0x1ce
[ 4472.375829] ALSA sound/core/pcm_lib.c:433: PCM: Lost interrupts? [Q] (stream=0, delta=33, new_hw_ptr=4591885, old_hw_ptr=4591852)
[ 4472.375838] Pid: 0, comm: swapper Not tainted 2.6.35-ic #2
[ 4472.375842] Call Trace:
[ 4472.375856]  [<f86b699e>] snd_pcm_update_hw_ptr0+0x59e/0x6f0 [snd_pcm]
[ 4472.375869]  [<f86b6b9b>] snd_pcm_period_elapsed+0xab/0x100 [snd_pcm]
[ 4472.375879]  [<f8c0950d>] ? snd_emu10k1_ptr_read+0x5d/0xd0 [snd_emu10k1]
[ 4472.375888]  [<f8c07386>] snd_emu10k1_pcm_interrupt+0x16/0x20 [snd_emu10k1]
[ 4472.375896]  [<f8c0434a>] snd_emu10k1_interrupt+0x26a/0x590 [snd_emu10k1]
[ 4472.375905]  [<f87a1e0e>] ? ehci_irq+0xfe/0x1e0 [ehci_hcd]
[ 4472.375912]  [<f87e3719>] ? e100_disable_irq+0x49/0x60 [e100]
[ 4472.375918]  [<c106086e>] handle_IRQ_event+0x2e/0xc0
[ 4472.375924]  [<c1063644>] ? move_native_irq+0x14/0x50
[ 4472.375929]  [<c10625af>] handle_fasteoi_irq+0x5f/0xc0
[ 4472.375934]  [<c1062550>] ? handle_fasteoi_irq+0x0/0xc0
[ 4472.375936]  <IRQ>  [<c1004687>] ? do_IRQ+0x47/0xc0
[ 4472.375945]  [<c1003169>] ? common_interrupt+0x29/0x30
[ 4472.375954]  [<f846108a>] ? acpi_idle_enter_bm+0x238/0x26c [processor]
[ 4472.375960]  [<c1215e36>] ? cpuidle_idle_call+0x76/0xd0
[ 4472.375964]  [<c1001c17>] ? cpu_idle+0x47/0x90
[ 4472.375969]  [<c128ee78>] ? start_secondary+0x1c8/0x1ce

However enabling xrun_debug make the bug go away :)
Comment 66 Jaroslav Kysela 2010-08-05 09:41:13 UTC
You mean that with xrun_debug=11 this problem is not visible? Attach the output using same conditions (program) with xrun_debug=11 to check the stream parameters.
Comment 67 Artem S. Tashkinov 2010-08-05 10:08:04 UTC
Created attachment 27354 [details]
xrun_debug=11 log, delay_pcm_irq=8 and emu10k1-delay-irq2.patch

(In reply to comment #66)
> You mean that with xrun_debug=11 this problem is not visible? 

Yes.

> Attach the output using same conditions (program) with xrun_debug=11
> to check the stream parameters.

Here it is.
Comment 68 Jaroslav Kysela 2010-08-05 10:24:56 UTC
The report in comment#65 contains only warnings. The period size is just 21 samples (Audacious media player is crazy), so the interrupt skipping is very probable.

Anyway, audio should not skip when xrun_debug=0 (please, try). I need probably more detailed log (xrun_debug=27) to check it.
Comment 69 Artem S. Tashkinov 2010-08-05 10:33:07 UTC
Created attachment 27355 [details]
xrun_debug=27

With xrun_debug=0 audio skips, like I've already told you.

Here comes an xrun_debug=27 log.
Comment 70 Jaroslav Kysela 2010-08-05 11:22:03 UTC
I don't see any problem in log and from your comments I deduce that even with xrun_debug=27 the problem is not visible.

Could you change line:

.period_bytes_min =     64,

to:

.period_bytes_min =     512,

in sound/pci/emu10k1/emupcm.c (around line 964 - snd_emu10k1_playback structure) and do your tests again?
Comment 71 Artem S. Tashkinov 2010-08-05 11:52:54 UTC
1. With xrun_debug=27 there must have been something visible/detectable in a log file I attached, because during a short 12 seconds playing session I had a sound skip near the end.

2. With .period_bytes_min = 512 I can no longer reproduce the problem in Audacious. Hopefully this change doesn't affect mplayer.

3. Jaroslav, how can I pass delay_pcm_irq=8 to a module? I ended up manually editing the sources and recompiling because this command

# modprobe snd_emu10k1 delay_pcm_irq=8,8,8,8,8,8,8,8

doesn't work as intended, /sys/module/snd_emu10k1/parameters/delay_pcm_irq still has a set of zeros.

4. Or maybe let's make delay_pcm_irq=8 default? Because I do believe that Linux drivers are supposed to work without fiddling.

5. I hope emu10k1-delay-irq2.patch and the last fix you proposed will land in mainline and in stable too (2.6.34.x and 2.6.35.x).

Thank you.
Comment 72 Jaroslav Kysela 2010-08-05 12:30:09 UTC
(In reply to comment #71)
> 1. With xrun_debug=27 there must have been something visible/detectable in a
> log file I attached, because during a short 12 seconds playing session I had
> a
> sound skip near the end.

Yes, there is. Thanks. Please, write any information you can, I gave up after reading of 40000 lines. It appears like another issue (double irq acknowledge for small periods from the emu10k1 chip).

> 2. With .period_bytes_min = 512 I can no longer reproduce the problem in
> Audacious. Hopefully this change doesn't affect mplayer.

Could you retest with value 128?

> 3. Jaroslav, how can I pass delay_pcm_irq=8 to a module? I ended up manually
> editing the sources and recompiling because this command
> 
> # modprobe snd_emu10k1 delay_pcm_irq=8,8,8,8,8,8,8,8
> 
> doesn't work as intended, /sys/module/snd_emu10k1/parameters/delay_pcm_irq
> still has a set of zeros.

It works for me, it must be a bug in your distro or system configuration. Try to add these parameters to /etc/modprobe.conf or so (/etc/modprobe.d tree).

> 4. Or maybe let's make delay_pcm_irq=8 default? Because I do believe that
> Linux
> drivers are supposed to work without fiddling.

It's big value. Could you do all tests with delay_pcm_irq=2?

> 5. I hope emu10k1-delay-irq2.patch and the last fix you proposed will land in
> mainline and in stable too (2.6.34.x and 2.6.35.x).

We need to find lowest stable values which are working for you at first.
Comment 73 Artem S. Tashkinov 2010-08-05 12:40:42 UTC
With value 128 the skipping issue immediately took effect.

delay_pcm_irq needs a lot of testing in mplayer (sometimes the bug triggers only after 30 minutes during the course of playback), so I'll report later.
Comment 74 Jaroslav Kysela 2010-08-05 12:54:13 UTC
period_size value = 256?
Comment 75 Artem S. Tashkinov 2010-08-05 13:03:18 UTC
With 256 Audacious plays audio just fine.
Comment 76 Artem S. Tashkinov 2010-08-11 09:05:14 UTC
(In reply to comment #74)
> period_size value = 256?

OK, I suppose I've tested enough and period_size=256 and delay_pcm_irq=2 are enough for all bugs to go away.

Please, submit these changes upstream, I can send my Tested-by: if necessary, just CC me when sending out the patches.

P.S.

Jaroslav, I still don't understand why everything worked just fine in <=2.6.33 without any fuss and why these two patches are required 2.6.34 and onwards.

Another question: you mentioned that Audacious is somehow broken the way it (ab)uses ALSA. What if there are some other applications out there which (ab)use ALSA even more and this bug resurfaces even with these two patches applied?
Comment 77 Takashi Iwai 2010-08-11 10:11:17 UTC
Well, I'm against to set period_size=256 as unconditional default.  Such a high value is needed only for crappy hardware that don't give IRQs property.  On most hardware, it's no problem as is now.

The old driver "worked" in earlier versions because it didn't check the condition strictly.  Most apps don't care about it, but apps like PulseAudio pretty rely on the accuracy of PCM runtime status, so the new code was introduced after 2.6.33 to rule out the wrong conditions.

About app-brokenness: if an app sets up the audio in *that* crazy way like Audacious, it's just an application bug to be fixed there.
Comment 78 Artem S. Tashkinov 2010-08-11 11:07:59 UTC
(In reply to comment #77)
> 
> About app-brokenness: if an app sets up the audio in *that* crazy way like
> Audacious, it's just an application bug to be fixed there.

I know you are a busy person, so can you at least tell me what's wrong with audacious ALSA output plug-in, so that I could report this bug to its developers. ALSA output plug-in code can be seen online:

http://hg.atheme.org/audacious-plugins/audacious-plugins/file/cd6114b7955d/src/alsa

or downloaded as a part of of http://distfiles.atheme.org/audacious-plugins-2.4-rc1.tgz archive.

Thank you.
Comment 79 Jaroslav Kysela 2010-08-11 11:41:44 UTC
(In reply to comment #77)
> Well, I'm against to set period_size=256 as unconditional default.  Such a
> high
> value is needed only for crappy hardware that don't give IRQs property.  On
> most hardware, it's no problem as is now.

Note that 256 bytes for standard usage (44kHz, 16-bit samples, stereo) means interrupt latency for 64 frames -> 0.0015 seconds which is enough for all applications.

I think that we can put 256 as default and add a module parameter to allow user modifications.

> The old driver "worked" in earlier versions because it didn't check the
> condition strictly.  Most apps don't care about it, but apps like PulseAudio
> pretty rely on the accuracy of PCM runtime status, so the new code was
> introduced after 2.6.33 to rule out the wrong conditions.

Agreed.

> About app-brokenness: if an app sets up the audio in *that* crazy way like
> Audacious, it's just an application bug to be fixed there.

Yes, Audacious is really crazy.
Comment 80 Takashi Iwai 2010-08-11 12:37:11 UTC
(In reply to comment #79)
> (In reply to comment #77)
> > Well, I'm against to set period_size=256 as unconditional default.  Such a
> high
> > value is needed only for crappy hardware that don't give IRQs property.  On
> > most hardware, it's no problem as is now.
> 
> Note that 256 bytes for standard usage (44kHz, 16-bit samples, stereo) means
> interrupt latency for 64 frames -> 0.0015 seconds which is enough for all
> applications.

OTOH, if another user has used emu10k1 with JACK and period_size=64, this can be seen as a regression, too.  Yes, it's adjustable.  But, this can be a counter-argument, too :)

My concern is that you picked up a value 256 just because a broken app like Audacious worked coincidentally on one machine magically.  If this value were chosen by some proper calculation, I'd be happily for it, though...
Comment 81 Takashi Iwai 2010-08-11 12:39:21 UTC
(In reply to comment #78)
> (In reply to comment #77)
> > 
> > About app-brokenness: if an app sets up the audio in *that* crazy way like
> > Audacious, it's just an application bug to be fixed there.
> 
> I know you are a busy person, so can you at least tell me what's wrong with
> audacious ALSA output plug-in, so that I could report this bug to its
> developers. ALSA output plug-in code can be seen online:
> 
>
> http://hg.atheme.org/audacious-plugins/audacious-plugins/file/cd6114b7955d/src/alsa
> 
> or downloaded as a part of of
> http://distfiles.atheme.org/audacious-plugins-2.4-rc1.tgz archive.
> 
> Thank you.

The problem is that Audacious doesn't set the period_size or period_time before snd_pcm_hw_params() call.  It can set, at least, the lowest value to avoid an insane value.
Comment 82 Jaroslav Kysela 2010-08-11 13:19:38 UTC
> My concern is that you picked up a value 256 just because a broken app like
> Audacious worked coincidentally on one machine magically.  If this value were
> chosen by some proper calculation, I'd be happily for it, though...

The real problem is that the emu10k1 sends double acks for fast interrupts on reporter machine - thus we get two snd_period_elapsed() calls with only little time delay which confuses the update pointer function in the pcm_lib code.

This problem will be seen in jack, too.

So I would propose just to use safe values and experienced users might get lower latencies.

Another possibility is to use the emu10k1 wallclock to check if interrupts are not comming so fast to avoid double acks, but it's just another slow PCI I/O read operation - limiting the period interrupt rate is more elegant in this case.

And last note: All hw configuration offered by the driver should work including the smallest periods. If the parameter setup is not ideal is another question.
Comment 83 Takashi Iwai 2010-08-11 14:08:41 UTC
(In reply to comment #82)
> > My concern is that you picked up a value 256 just because a broken app like
> > Audacious worked coincidentally on one machine magically.  If this value
> were
> > chosen by some proper calculation, I'd be happily for it, though...
> 
> The real problem is that the emu10k1 sends double acks for fast interrupts on
> reporter machine - thus we get two snd_period_elapsed() calls with only
> little
> time delay which confuses the update pointer function in the pcm_lib code.

Actually, in such a situation, the earlier driver version worked better.  It was designed to be robust for lost IRQs rather than the strict ptr update.
 
> This problem will be seen in jack, too.
> 
> So I would propose just to use safe values and experienced users might get
> lower latencies.
> 
> Another possibility is to use the emu10k1 wallclock to check if interrupts
> are
> not comming so fast to avoid double acks, but it's just another slow PCI I/O
> read operation - limiting the period interrupt rate is more elegant in this
> case.

Hm, OK, using period_size_min=256 sounds more reasonable as a quick fix than other smarter but more intrusive solutions.

> And last note: All hw configuration offered by the driver should work
> including
> the smallest periods.

How can you guarantee that period_size=256 works on all machines? :)
Comment 84 Jaroslav Kysela 2010-08-11 18:30:10 UTC
The period limit code is now in alsa-lib:

http://git.alsa-project.org/?p=alsa-lib.git;a=commitdiff;h=09879a4bb58199f64abcb8df506f917c8efc2383

I'm going to apply the last patch with delay_pcm_irq=2 as default without the period limiting for Audacious in the driver, because we have the user space solution for it.
Comment 85 Takashi Iwai 2010-08-18 11:55:49 UTC
Jaroslav, is your final patch ready for merge?
Comment 86 Jaroslav Kysela 2010-08-18 12:28:47 UTC
I put this emu10k1 (with pcm_irq_delay=2 as default) and pcm midlevel patches (for double interrupt acks) to my devel tree:

http://git.alsa-project.org/?p=alsa-kernel.git;a=commitdiff;h=566e41e91c1de4cae625e42ef3f32e5a964be62f

http://git.alsa-project.org/?p=alsa-kernel.git;a=commitdiff;h=b990ae57452628e9f422b8c5bf6543e77cb9633e
Comment 87 Artem S. Tashkinov 2010-08-18 14:29:55 UTC
(In reply to comment #86)
> I put this emu10k1 (with pcm_irq_delay=2 as default) and pcm midlevel patches
> (for double interrupt acks) to my devel tree:

As far as I remember, we decided to increase period_size to 256 but I don't see this change, why is it so?