Bug 16300
Description
Artem S. Tashkinov
2010-06-26 20:39:38 UTC
Created attachment 26955 [details]
mplayer output while playing a video
As you can see mplayer plays this film with random speeds, sometimes reaching 3-4 realtime speed.
I've just compiled 2.6.35-rc3 and the bug is still here :( Created attachment 26956 [details]
dmesg, lspci and various information from /proc
OK, I've talked to mplayer's developers, it's a problem with ALSA which is used as a timer source: "it is most likely bug in the sound driver. by mesuring the remaining buffer. maybe alsa drops the whole buffer, mplayer thinks it have played it and fills another, to be dropped again." As soon as I use `mplayer -ao oss video` (use OSS instead of ALSA) the bug is gone. Created attachment 26957 [details]
mplayer's strace for an audio file
This bug occurs even when playing plain audio files.
I have several audio cards, but this problem *only* occurs with: 04:00.0 Generic system peripheral [0841]: Creative Labs SB Live! EMU10k1 (rev 07) It does *not* occur with e.g.: HDA Intel at 0xfbdf8000 irq 44 (Codec: VIA VT1718S) So, probably some changes to emu10k1 code during 2.6.34 merge window time have broken mplayer. I've managed to compile sound/ directory of kernel 2.6.33 on top of kernel 2.6.34 and the bug has gone. emu10k1 code has no changes between these two kernels, so something inside a core ALSA subsystem was broken. I would be glad to bisect this problem but I hate the prospects of rebooting my PC 25 times to find a bad commit. If anyone has ideas how to bisect _only_ /sound, I'd be very grateful. This is likely to be some change in the core PCM code. Please show the output when you run your mplayer command with the -v option. Created attachment 26970 [details]
mplayer -v -v
This is exactly what I'm experiencing: A: 4.7 (04.7) of 212.0 (03:32.0) 0.4% .[J A: 5.1 (05.1) of 212.0 (03:32.0) 0.4% .[J A: 6.1 (06.1) of 212.0 (03:32.0) 0.4% .[J A: 6.5 (06.5) of 212.0 (03:32.0) 0.4% .[J A: 6.8 (06.8) of 212.0 (03:32.0) 0.4% .[J A: 7.2 (07.1) of 212.0 (03:32.0) 0.4% .[J A: 7.4 (07.4) of 212.0 (03:32.0) 0.4% .[J A: 7.8 (07.8) of 212.0 (03:32.0) 0.4% .[J etc. mplayer skips data completely instead of playing it. I haven't been able to reproduce this with my Emu10k1 card on kernel 2.6.34, but I'm using radeon. There have been some recent fixes (which should appear in 2.6.34.x sometime); please try 2.6.35.rc3orsomething. Bisecting in certain subdirectories is possible: http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html#_cutting_down_bisection_by_giving_more_parameters_to_bisect_start 16 reboots and here it is: # git bisect good 7b3a177b0d4f92b3431b8dca777313a07533a710 is the first bad commit commit 7b3a177b0d4f92b3431b8dca777313a07533a710 Author: Jaroslav Kysela <perex@perex.cz> Date: Fri Jan 8 08:43:01 2010 +0100 ALSA: pcm_lib: fix "something must be really wrong" condition When runtime->periods == 1 or when pointer crosses end of ring buffer, the delta might be greater than buffer_size. Signed-off-by: Jaroslav Kysela <perex@perex.cz> :040000 040000 a0835badfe6fe1b8b8f947432b2dea5250e31b65 d92abb531cd473d9a59ee275d35f6e3857afa83e M sound Double confirmation: reverting this patch on top of 2.6.34 fixes the issue. And now I wonder why the $^%# ALSA developers don't pay a single bit of attention to this bug, because it was known since 13 Apr 23:54: http://comments.gmane.org/gmane.linux.kernel/973516 > Hello, > > Since 2.6.34-rc*, I have a regression on alsa which prevents the sound > to be played correctly. When playing, the music goes too fast, skipping > some parts. Typically, it's very easy to reproduce by doing: > time mplayer -endpos 30 sound-file-which-lasts-more-than-thirty-sec.mp3 and the issue already has a filed bug report, which is bug 15796. I'm very upset. Use value 29 for xrun_debug and attach a log with some comments where the problem occurs: http://www.alsa-project.org/main/index.php/XRUN_Debug It would be also good to know, if the latest ALSA driver (http://www.alsa-project.org/snapshot/) has same problem. Created attachment 27000 [details]
dmesg with xrun_debug=3
alsa-driver-1.0.23.35.g8bc80.297.g5e011e
Created attachment 27001 [details]
dmesg with xrun_debug=0,29,0,29
Now, it gets really interesting.
As soon as I set xrun_debug to 29, the bug disappears, thus I'm not sure if this dmesg output is of any use.
It works now because you enabled the automatic correction using jiffies check via proc file. The log shows that the returned hwptr values are almost all wrong; every time it's positioned one sample below the target. Do you run your app with 44.1kHZ samples? Does the problem occur with 48kHz, too? Handled-By : Takashi Iwai <tiwai@suse.de> (In reply to comment #18) > It works now because you enabled the automatic correction using jiffies check > via proc file. > > The log shows that the returned hwptr values are almost all wrong; every time > it's positioned one sample below the target. > > Do you run your app with 44.1kHZ samples? Does the problem occur with 48kHz, > too? Yes, the problem occurs with 48KHz source audio too. However there's one thing I've noticed: when playing FLAC files the problem occurs almost instantly, but I can play MP3 files for 5-10 minutes straight without getting this loathsome problem. Anticipating your question about kernel release that I'm running - it's now 2.6.35-rc4.
> However there's one thing I've noticed: when playing FLAC files the problem
> occurs almost instantly, but I can play MP3 files for 5-10 minutes straight
> without getting this loathsome problem.
Answering my own assumption - FLAC samples are probably shorter than MP3 samples, thus the problem is more pronounced.
BTW, here's a thread on a Russian forum where people experience the same problem: http://translate.google.com/translate?js=y&prev=_t&hl=en&ie=UTF-8&layout=1&eotf=1&u=http%3A%2F%2Fwww.linux.org.ru%2Fforum%2Ftalks%2F5097377&sl=auto&tl=en Could you try the patch below? Created attachment 27065 [details]
Patch to correct invalid position at snd_pcm_period_elapsed()
Looking to the patch - is not more easy to add a constant to the extra voice CCCA register setup in the snd_emu10k1_pcm_init_voice() routine? (line 369 - emupcm.c). Well, the CCCA read value might not be constantly wrong. If the value is sometimes correct, increasing the value unconditionally may give click noises especially when used with dmix or PA, which rely on the exact hwptr position. OTOH, correcting only at snd_pcm_period_elapsed() like the patch in comment 24 would be safe at this moment. The patch is a proof of concept, a quick fix. After that, we can measure the real time and register value whether it has the constant offset, and make things less complex. But, we'd need to measure on different platforms for it. As Clemens reported, there are still good-working emu10k1 systems... BTW, the patch above isn't perfect, but should be enough for mplayer. I'll put a revised version once when the current patch works. (In reply to comment #24) > Created an attachment (id=27065) [details] > Patch to correct invalid position at snd_pcm_period_elapsed() This patch didn't help, the issue persists. And according to the thread I've given to you people experience this problem on almost all HDA codecs and even on Audigy2 sound cards. ( URLs in Russian: http://www.linux.org.ru/forum/talks/5097889 and http://www.linux.org.ru/forum/talks/5097377 ). (In reply to comment #27) > (In reply to comment #24) > > Created an attachment (id=27065) [details] [details] > > Patch to correct invalid position at snd_pcm_period_elapsed() > > This patch didn't help, the issue persists. Attach xrun_debug = 29 log. > And according to the thread I've given to you people experience this problem > on > almost all HDA codecs and even on Audigy2 sound cards. > > ( URLs in Russian: http://www.linux.org.ru/forum/talks/5097889 and > http://www.linux.org.ru/forum/talks/5097377 ). Is this problem also in v2.6.35-rc4 ? Anyway, if these problems occurs, we need log for xrun_debug = 29 to see what's wrong. These patches fixes the snd-hda-intel issues: commit edb39935c8b19fcd9a8f619d0bc1e9d04594cd2b Author: Jaroslav Kysela <perex@perex.cz> Date: Wed Jun 2 13:29:17 2010 +0200 ALSA: hda-intel - fix wallclk variable update and condition This patch fixes thinko introduced in "last minutes" before commiting of the last wallclk patch. It also fixes the condition checking if the first period after last wallclk update is processed. There is a little rounding error in period_wallclk. Signed-off-by: Jaroslav Kysela <perex@perex.cz> commit 9ef04066b3e7c51ed7edc6010ac039f18f9f3617 Author: Clemens Ladisch <clemens@ladisch.de> Date: Tue May 25 09:03:40 2010 +0200 ALSA: hda_intel: fix handling of non-completion stream interrupts Check that the interrupt raised for a stream is actually a buffer completion interrupt before handling it as one. Otherwise, memory errors or FIFO xruns would be interpreted as a pointer update and could break the stream timing. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Takashi Iwai <tiwai@suse.de> commit f48f606d9f85a8b25c716082c73c36e9ff6f0cde Author: Jaroslav Kysela <perex@perex.cz> Date: Tue May 11 12:10:47 2010 +0200 [ALSA] snd-hda-intel: Improve azx_position_ok() Add back the zero return value (activate workqueue) when bdl_pos_adj is nonzero for position check. Do the position related check only for first next period using wallclk counter. Return -1 value (ignore interrupt) when period_bytes variable is zero. Signed-off-by: Jaroslav Kysela <perex@perex.cz> commit e54637205b00837bf00de916b0ae361c6aa0b139 Author: Jaroslav Kysela <perex@perex.cz> Date: Tue May 11 10:21:46 2010 +0200 [ALSA] snd-hda-intel: use WALLCLK register to check for early irqs Use 24Mhz WALLCLK register to ignore too early interrupts and wrong interrupt status. The bad timing confuses the higher ALSA layer and causes audio skipping. More information about behaviour and debugging can be found in kernel bz#15912. https://bugzilla.kernel.org/show_bug.cgi?id=15912 Signed-off-by: Jaroslav Kysela <perex@perex.cz> First try the revised patch below. It'll spew many debug messages, too. If this still doesn't work, attach the kernel log. Then run with xrun_debug = 29 (together with the patch), and attach the log, too. Regarding HD-audio: this is a different hardware. And the bug you see is about emu10k1. So, it's a different problem even though the same git commit triggers. For HD-audio, create another bug report, and don't mix them up here. And as Jaroslav pointed, better to check whether 2.6.35-rc works. This bug is revealed recently because the recent code checks the positions the hardware reports more strictly (mainly because of PulseAudio). That is, this kind of problem is primarily a hardware issue. Many HD-audio controllers are so crappy, and often give wrong positions at wrong time. On your board, emu10k1 gives the wrong position, too. Created attachment 27066 [details]
Fix hwptr on emu10k1
Created attachment 27067 [details]
2.6.35 rc4, emu10k1-position-fix2.diff
The bug persists.
What kind of xrun_debug should I add to a bug report? Or dmesg is enough?
What apps did you test? Apparently, there are a few different runs in the log. The first one or two use 4096 x 2 frames. The driver seemed to run well for these. Then, it ran with 1024 x 16 frames, which is supposed to by mplayer. And at some point, suddenly the wrong pointers are reported. Something got wrong with the hardware. (In reply to comment #31) > Created an attachment (id=27067) [details] > 2.6.35 rc4, emu10k1-position-fix2.diff > > The bug persists. > > What kind of xrun_debug should I add to a bug report? Or dmesg is enough? xrun_debug=29 means to set 29 to /proc/asound/card*/pcm*/xrun_debug (needs to set only to corresponding /proc/asound/card* entries like card0). /proc/asound/card*/pcm0p/xrun_debug should suffice for checking only playback. Then you'll have much more kernel debug messages (so yes, dmesg should suffice). But, before that, could you set just 4 to xrun_debug proc file and check whether this resolves the problem (with / without patch)? This bit will activate the jiffies check in PCM core. Note that filtering in PCM core is barely a right solution but a temporary workaround. The real problem is either the wrong interrupt timing or the bogus hwptr value. I suspected the latter, but judging from the fact that my patch doesn't work, it might be the former... (In reply to comment #32) > What apps did you test? Apparently, there are a few different runs in the > log. > The first one or two use 4096 x 2 frames. The driver seemed to run well for > these. > > Then, it ran with 1024 x 16 frames, which is supposed to by mplayer. And at > some point, suddenly the wrong pointers are reported. Something got wrong > with > the hardware. The first application was a KDE logon sound (so probably artsd played it), the second was mplayer (which I usually pause after 1-1,5 seconds of playing a FLAC file - that makes the bug appear instantly). I'm away from my computer right now, so I will post the required information next week. Also try to change line 'epcm->position_fix = epcm->next_position - 2;' to 'epcm->position_fix = epcm->next_position - 8;' in patch emu10k1-position-fix2.diff. Created attachment 27070 [details]
Patch to delay pcm irqs for specifies amount of samples (0-31)
This patch will delay the PCM interrupts by specified amount of samples. I tested this code with SB Live! Value (emu10k1). This is the clean solution to fix reported issue (reporter - please try value 4 or 8 for your system for the introduced delay_pcm_irq module option).
Note that in my environment (IbexPeak platform) the default value (0) (old behaviour) works perfectly (the periods are acknowledged with hwptr positions 4 or 5 samples behing the period boundary - I saw few cases when hwptr was 1 sample behind).
I think that the combination emu10k1 chip version + PCI bridge setup/hardware causes that PCM interrupts are issued too early on some hw.
The question left is: How we can detect the wrong behaviour automatically.
I'm confused, what patches I should apply? Or all of them in this order? 1. Fix hwptr on emu10k1 v2 then 2. epcm->position_fix = epcm->next_position - 8; then 3. Patch to delay pcm irqs for specifies amount of samples Created attachment 27076 [details]
dmesg with all three patches applied
The bug persists, I will now try delay_pcm_irq.
Created attachment 27077 [details]
dmesg with all three patches applied and delay_pcm_irq=8
With both delay_pcm_irq = 4 and = 8 I can reproduce the problem, however with delay_pcm_irq = 8 it took some time for bug to appear.
Apply only patch from comment#36. Try values 4, 8, 16, 31 for the delay_pcm_irq module parameter. If you encouter the problem with all values, attach log for delay_pcm_irq=16 - set xrun_debug to 3 (echo 3 > /proc/asound/card*/pcm*/xrun_debug - replace * with card number and device number). (In reply to comment #40) > Apply only patch from comment#36. Try values 4, 8, 16, 31 for the > delay_pcm_irq > module parameter. If you encouter the problem with all values, attach log for > delay_pcm_irq=16 - set xrun_debug to 3 (echo 3 > > /proc/asound/card*/pcm*/xrun_debug - replace * with card number and device > number). Sorry, the xrun_debug value should be 11 (to dump the hwptr position on the period update call). Created attachment 27084 [details]
dmesg with xrun_debug=11 and delay_pcm_irq=16
The bug occurs with all suggested delay_pcm_irq values.
P.S. I'm now running 2.6.35-rc5 with patch from comment 36. Please, change line in patch 'emu->delay_pcm_irq = delay_pcm_irq[dev] & 0x1f;' to 'emu->delay_pcm_irq = delay_pcm_irq[dev] & 0xff;' and use delay_pcm_irq=64 (attach xrun_debug=11 log). Created attachment 27091 [details]
dmesg with xrun_debug=11, delay_pcm_irq=64 and 0xff change
The bug is still here.
I don't think that you pass the delay_pcm_irq value to the module. What is contents of 'cat /sys/module/snd_emu10k1/parameters/delay_pcm_irq' when the snd_emu10k1 module is loaded? cat /sys/module/snd_emu10k1/parameters/delay_pcm_irq 0,0,0,0,0,0,0,0 I insert the emu10k1 module this way: # modprobe snd_emu10k1 delay_pcm_irq=64 am I wrong? I'm not sure what's wrong. Just check, if snd_emu10k1 module is not loaded (modprobe does not return an error, if module is already loaded). Other way is to change the initial value in patch: static uint delay_pcm_irq[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS - 1)] = 64}; I rmmod modules using this custom script: rmmod `lsmod | grep snd | awk '{print $1}'` and it effectively removes all ALSA modules from the running kernel. OK, now with this patch (In reply to comment #49) > static uint delay_pcm_irq[SNDRV_CARDS] = {[0 ... (SNDRV_CARDS - 1)] = 64}; I finally get the desired values in /sys/module/snd_emu10k1/parameters/delay_pcm_irq: 64,64,64,64,64,64,64,64 So, probably there are not enough pieces in a patch which should accept 'modprobe' parameters or something else is missing/not working correctly. And ... after applying this patch the bug has finally disappeared (at least I cannot reproduce it any longer)! Thanks, I hope kernel 2.6.35 will be released with patches which allow using ALSA backend in mplayer. Please, determine the lowest value which works for you (I think it would be 4 or something like that). I tested my patch on my machine and the module parameter was accepted correctly, so I doubt that it's a problem in the patch. BTW, one possible automation way is to check the validity of hwptr like my patch, set / increase the correction offset, and gives XRUN to restart the stream. This won't conflict with delay_pcm_irq modparm patch, but will be an addition. The other solution might be to start with value delay_pcm_irq=4 or so (after verification) by default. My measurements show that on my system the irq ack delay is usually 4 or 5 samples so it does not harm (8 or 9 samples with delay_pcm_irq=4). Experienced users can decrease this value. I still get this bug with delay_pcm_irq equal to 0, 4 and 8 and it only vanishes with 16, so 16 is a bare minimum for this problem to go away. "emu->delay_pcm_irq = delay_pcm_irq[dev] & 0xff;" patch is not necessesary. P.S. Guys! $ modinfo snd_emu10k1 ... parm: delay_pcm_irq:Delay PCM interrupt by specified number of samples (default 0). (array of uint) ... That is why my attempts to pass an integer to this module failed, I should have passed an _array_ of unsigned integers! I don't know this array dimensions (and I have no idea how to find that out), so I strongly believe you need to show how many numbers are required. Created attachment 27103 [details]
dmesg with xrun_debug=11 and delay_pcm_irq=16
Too early!
I've just started watching a movie, been watching it for 5 minutes, paused it, then after a 3 minutes pause the bug is again here with delay_pcm_irq=16.
I will recheck higher values to make sure at least 31 or 64 are usable and bug free.
(BTW, I've never got this error with NVIDIA driver running 2.6.34, or maybe I just didn't test it enough)
Created attachment 27104 [details]
dmesg with xrun_debug=11 and delay_pcm_irq=64 (!)
OK, I can reproduce this bug with any delay_pcm_irq (now running 2.6.35-rc5 with just emu10k1-delay-irq.patch)
Sigh, I don't know what to do.
cat /sys/module/snd_emu10k1/parameters/delay_pcm_irq
64,64,64,64,64,64,64,64
Steps to reproduce: start watching a movie in mplayer, press and don't release "space" button so that mplayer keeps pausing and resuming tens of times - then this bug reoccurs or might not reoccur - it happens quite randomly - at times you need to watch a movie for some time, try this trick, nothing will happen, randomly fast forward, rewind, then use "space" trick again and so on and so forth. However with lower delay_pcm_irq values like 0,4 and 8 this bug can be triggered almost simultaneously just after pausing an audio/video stream just once.
Anyway PAUSING playing is a sure way to trigger the bug, I mean during normal playing of a any file this bug almost never happens, you must pause then continue playing to get a "desired" effect. I don't know if this information is of any use, I just wanted you to know.
Could you try to move line 'snd_emu10k1_playback_stop_voice(emu, epcm->extra);' before 'snd_emu10k1_playback_stop_voice(emu, epcm->voices[0]);' in sound/pci/emu10k1/emupcm.c file? Only in the snd_emu10k1_playback_trigger() function. No, this change didn't help (tried delay_pcm_irq=0 and 64). [25292.843779] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=13311/1024/16384, hwptr=36/1405915/1405951/1392640 [25292.866949] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=14335/1024/16384, hwptr=36/1406939/1406975/1392640 [25292.890092] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=15359/1024/16384, hwptr=36/1407963/1407999/1392640 [25292.913257] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=16383/1024/16384, hwptr=35/1408988/1409023/1392640 [25292.936398] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=1023/1024/16384, hwptr=18/1410029/1410047/1409024 [25292.959569] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=2047/1024/16384, hwptr=17/1411054/1411071/1409024 [25292.982733] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=3071/1024/16384, hwptr=16/1412079/1412095/1409024 [25293.005867] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=4095/1024/16384, hwptr=17/1413102/1413119/1409024 [25293.029032] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=5119/1024/16384, hwptr=0/1414143/1414143/1409024 [25293.052176] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=6143/1024/16384, hwptr=575/1414592/1415167/1409024 [25293.075341] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=7167/1024/16384, hwptr=1019/1415172/1416191/1409024 [25293.098480] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=8191/1024/16384, hwptr=1/1417214/1417215/1409024 [25293.121668] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=9216/1024/16384, hwptr=34/1418206/1418240/1409024 [25293.144787] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=10239/1024/16384, hwptr=16416/1419231/1435647/1409024 [25293.144799] Pid: 0, comm: swapper Tainted: G R 2.6.35-rc5-ic #3 [25293.144803] Call Trace: [25293.144816] [<fa1f797e>] snd_pcm_update_hw_ptr0+0x59e/0x6f0 [snd_pcm] [25293.144826] [<c10269d8>] ? __wake_up_common+0x48/0x70 [25293.144834] [<fa1f7b7b>] snd_pcm_period_elapsed+0xab/0x100 [snd_pcm] [25293.144844] [<fa4984ad>] ? snd_emu10k1_ptr_read+0x5d/0xd0 [snd_emu10k1] [25293.144851] [<fa496386>] snd_emu10k1_pcm_interrupt+0x16/0x20 [snd_emu10k1] [25293.144858] [<fa49334a>] snd_emu10k1_interrupt+0x26a/0x590 [snd_emu10k1] [25293.144877] [<f8665719>] ? e100_disable_irq+0x49/0x60 [e100] [25293.144884] [<c10607fe>] handle_IRQ_event+0x2e/0xc0 [25293.144889] [<c10635d4>] ? move_native_irq+0x14/0x50 [25293.144894] [<c106253f>] handle_fasteoi_irq+0x5f/0xc0 [25293.144899] [<c10624e0>] ? handle_fasteoi_irq+0x0/0xc0 [25293.144902] <IRQ> [<c1004687>] ? do_IRQ+0x47/0xc0 [25293.144911] [<c1003169>] ? common_interrupt+0x29/0x30 [25293.144921] [<f875d1a7>] ? acpi_idle_enter_simple+0x100/0x134 [processor] [25293.144929] [<f875cf10>] ? acpi_idle_enter_bm+0xbe/0x255 [processor] [25293.144935] [<c1215ab6>] ? cpuidle_idle_call+0x76/0xd0 [25293.144940] [<c1001c17>] ? cpu_idle+0x47/0x90 [25293.144945] [<c128e9d8>] ? start_secondary+0x1c8/0x1ce [25293.167950] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=11263/1024/16384, hwptr=23/1436648/1436671/1425408 [25293.191089] ALSA sound/core/pcm_lib.c:365: period_update: pcmC0D0p:0: pos=12287/1024/16384, hwptr=21/1437674/1437695/1425408 Created attachment 27115 [details]
Patch to delay pcm irqs for specifies amount of samples (0-31)
Improved patch implementing delay_pcm_irq module option. This version includes position correction for the interrupt voice for the pause/resume trigger.
Please, test.
I'm unable to reproduce the bug with the latest patch applied, probably it's finally resolved (even setting 'delay_pcm_irq' wasn't necessary). Give me a few days for testing and I will tell for sure if this problem is gone. Thank you. Created attachment 27134 [details]
dmesg with xrun_debug=11, delay_pcm_irq=0 and emu10k1-delay-irq2.patch
With emu10k1-delay-irq2.patch patch I can still reproduce the problem, however it has become less severe. mplayer now accelerates playing not as much as it did it before.
What delay_pcm_irq values should I try this time? As the problem is now less pronounced, maybe you can come up even with a better patch which doesn't require fiddling with delay_pcm_irq parameter?
Try to dermine the lowest value. I would test 2,4,8. (In reply to comment #62) > Try to dermine the lowest value. I would test 2,4,8. With the last patch the bug is very difficult to trigger, so I think it would be best to apply it for now, unless I come up with a better way of reproducing it. I've found out that Audacius media player has terrible problems playing FLAC files under 2.6.35 even with delay_pcm_irq equal to 8. Audacious intermittently stutters and skips up to a second of an audio stream but that doesn't cause audio stream acceleration. Meanwhile such stumblings happen up to 10 times during a 3 minutes single song playing so I cannot say this bug is fixed. I will now reboot into 2.6.33 to check out if this kernel is bug free, then reboot into 2.6.35 and enable XRUN debugging to see what's happening. Audacious media player, xrun_debug=3, delay_pcm_irq=8: [ 4383.615116] ALSA sound/core/pcm_lib.c:433: PCM: Lost interrupts? [Q] (stream=0, delta=53, new_hw_ptr=666415, old_hw_ptr=666362) [ 4383.615134] Pid: 0, comm: swapper Not tainted 2.6.35-ic #2 [ 4383.615141] Call Trace: [ 4383.615168] [<f86b699e>] snd_pcm_update_hw_ptr0+0x59e/0x6f0 [snd_pcm] [ 4383.615185] [<c120466d>] ? input_event+0x5d/0x70 [ 4383.615194] [<c1208062>] ? atkbd_interrupt+0x622/0x6d0 [ 4383.615208] [<f86b6b9b>] snd_pcm_period_elapsed+0xab/0x100 [snd_pcm] [ 4383.615223] [<f8c0950d>] ? snd_emu10k1_ptr_read+0x5d/0xd0 [snd_emu10k1] [ 4383.615237] [<f8c07386>] snd_emu10k1_pcm_interrupt+0x16/0x20 [snd_emu10k1] [ 4383.615250] [<f8c0434a>] snd_emu10k1_interrupt+0x26a/0x590 [snd_emu10k1] [ 4383.615267] [<f87e3719>] ? e100_disable_irq+0x49/0x60 [e100] [ 4383.615277] [<c106086e>] handle_IRQ_event+0x2e/0xc0 [ 4383.615287] [<c1063644>] ? move_native_irq+0x14/0x50 [ 4383.615296] [<c10625af>] handle_fasteoi_irq+0x5f/0xc0 [ 4383.615305] [<c1062550>] ? handle_fasteoi_irq+0x0/0xc0 [ 4383.615311] <IRQ> [<c1004687>] ? do_IRQ+0x47/0xc0 [ 4383.615325] [<c1003169>] ? common_interrupt+0x29/0x30 [ 4383.615340] [<f846108a>] ? acpi_idle_enter_bm+0x238/0x26c [processor] [ 4383.615352] [<c1215e36>] ? cpuidle_idle_call+0x76/0xd0 [ 4383.615360] [<c1001c17>] ? cpu_idle+0x47/0x90 [ 4383.615370] [<c128ee78>] ? start_secondary+0x1c8/0x1ce [ 4470.381208] ALSA sound/core/pcm_lib.c:433: PCM: Lost interrupts? [Q] (stream=0, delta=64, new_hw_ptr=4503673, old_hw_ptr=4503609) [ 4470.381219] Pid: 0, comm: swapper Not tainted 2.6.35-ic #2 [ 4470.381222] Call Trace: [ 4470.381247] [<f86b699e>] snd_pcm_update_hw_ptr0+0x59e/0x6f0 [snd_pcm] [ 4470.381260] [<f86b6b9b>] snd_pcm_period_elapsed+0xab/0x100 [snd_pcm] [ 4470.381270] [<f8c0950d>] ? snd_emu10k1_ptr_read+0x5d/0xd0 [snd_emu10k1] [ 4470.381279] [<f8c07386>] snd_emu10k1_pcm_interrupt+0x16/0x20 [snd_emu10k1] [ 4470.381287] [<f8c0434a>] snd_emu10k1_interrupt+0x26a/0x590 [snd_emu10k1] [ 4470.381299] [<f87a1e0e>] ? ehci_irq+0xfe/0x1e0 [ehci_hcd] [ 4470.381306] [<c106086e>] handle_IRQ_event+0x2e/0xc0 [ 4470.381312] [<c1063644>] ? move_native_irq+0x14/0x50 [ 4470.381317] [<c10625af>] handle_fasteoi_irq+0x5f/0xc0 [ 4470.381322] [<c1062550>] ? handle_fasteoi_irq+0x0/0xc0 [ 4470.381325] <IRQ> [<c1004687>] ? do_IRQ+0x47/0xc0 [ 4470.381334] [<c1003169>] ? common_interrupt+0x29/0x30 [ 4470.381345] [<f846108a>] ? acpi_idle_enter_bm+0x238/0x26c [processor] [ 4470.381352] [<c1215e36>] ? cpuidle_idle_call+0x76/0xd0 [ 4470.381356] [<c1001c17>] ? cpu_idle+0x47/0x90 [ 4470.381362] [<c128ee78>] ? start_secondary+0x1c8/0x1ce [ 4472.373884] ALSA sound/core/pcm_lib.c:433: PCM: Lost interrupts? [Q] (stream=0, delta=33, new_hw_ptr=4591799, old_hw_ptr=4591766) [ 4472.373894] Pid: 0, comm: swapper Not tainted 2.6.35-ic #2 [ 4472.373897] Call Trace: [ 4472.373923] [<f86b699e>] snd_pcm_update_hw_ptr0+0x59e/0x6f0 [snd_pcm] [ 4472.373936] [<f86b6b9b>] snd_pcm_period_elapsed+0xab/0x100 [snd_pcm] [ 4472.373946] [<f8c0950d>] ? snd_emu10k1_ptr_read+0x5d/0xd0 [snd_emu10k1] [ 4472.373955] [<f8c07386>] snd_emu10k1_pcm_interrupt+0x16/0x20 [snd_emu10k1] [ 4472.373963] [<f8c0434a>] snd_emu10k1_interrupt+0x26a/0x590 [snd_emu10k1] [ 4472.373975] [<f87a1e0e>] ? ehci_irq+0xfe/0x1e0 [ehci_hcd] [ 4472.373983] [<f87e3719>] ? e100_disable_irq+0x49/0x60 [e100] [ 4472.373990] [<c106086e>] handle_IRQ_event+0x2e/0xc0 [ 4472.373996] [<c1063644>] ? move_native_irq+0x14/0x50 [ 4472.374001] [<c10625af>] handle_fasteoi_irq+0x5f/0xc0 [ 4472.374006] [<c1062550>] ? handle_fasteoi_irq+0x0/0xc0 [ 4472.374009] <IRQ> [<c1004687>] ? do_IRQ+0x47/0xc0 [ 4472.374018] [<c1003169>] ? common_interrupt+0x29/0x30 [ 4472.374024] [<c103007b>] ? mmput+0x7b/0xb0 [ 4472.374032] [<c12936b8>] ? _raw_spin_unlock_irq+0x8/0x30 [ 4472.374041] [<f87c7224>] ? rtl8169_phy_timer+0x94/0xf0 [r8169] [ 4472.374046] [<c12936fe>] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [ 4472.374057] [<c103d215>] ? run_timer_softirq+0xe5/0x1e0 [ 4472.374065] [<f87c7190>] ? rtl8169_phy_timer+0x0/0xf0 [r8169] [ 4472.374070] [<c1037b59>] ? __do_softirq+0x89/0x110 [ 4472.374075] [<c1037ad0>] ? __do_softirq+0x0/0x110 [ 4472.374077] <IRQ> [<c103788d>] ? irq_exit+0x6d/0x70 [ 4472.374086] [<c1019e86>] ? smp_apic_timer_interrupt+0x56/0x90 [ 4472.374092] [<c1293ede>] ? apic_timer_interrupt+0x2a/0x30 [ 4472.374101] [<f846108a>] ? acpi_idle_enter_bm+0x238/0x26c [processor] [ 4472.374108] [<c1215e36>] ? cpuidle_idle_call+0x76/0xd0 [ 4472.374112] [<c1001c17>] ? cpu_idle+0x47/0x90 [ 4472.374117] [<c128ee78>] ? start_secondary+0x1c8/0x1ce [ 4472.375829] ALSA sound/core/pcm_lib.c:433: PCM: Lost interrupts? [Q] (stream=0, delta=33, new_hw_ptr=4591885, old_hw_ptr=4591852) [ 4472.375838] Pid: 0, comm: swapper Not tainted 2.6.35-ic #2 [ 4472.375842] Call Trace: [ 4472.375856] [<f86b699e>] snd_pcm_update_hw_ptr0+0x59e/0x6f0 [snd_pcm] [ 4472.375869] [<f86b6b9b>] snd_pcm_period_elapsed+0xab/0x100 [snd_pcm] [ 4472.375879] [<f8c0950d>] ? snd_emu10k1_ptr_read+0x5d/0xd0 [snd_emu10k1] [ 4472.375888] [<f8c07386>] snd_emu10k1_pcm_interrupt+0x16/0x20 [snd_emu10k1] [ 4472.375896] [<f8c0434a>] snd_emu10k1_interrupt+0x26a/0x590 [snd_emu10k1] [ 4472.375905] [<f87a1e0e>] ? ehci_irq+0xfe/0x1e0 [ehci_hcd] [ 4472.375912] [<f87e3719>] ? e100_disable_irq+0x49/0x60 [e100] [ 4472.375918] [<c106086e>] handle_IRQ_event+0x2e/0xc0 [ 4472.375924] [<c1063644>] ? move_native_irq+0x14/0x50 [ 4472.375929] [<c10625af>] handle_fasteoi_irq+0x5f/0xc0 [ 4472.375934] [<c1062550>] ? handle_fasteoi_irq+0x0/0xc0 [ 4472.375936] <IRQ> [<c1004687>] ? do_IRQ+0x47/0xc0 [ 4472.375945] [<c1003169>] ? common_interrupt+0x29/0x30 [ 4472.375954] [<f846108a>] ? acpi_idle_enter_bm+0x238/0x26c [processor] [ 4472.375960] [<c1215e36>] ? cpuidle_idle_call+0x76/0xd0 [ 4472.375964] [<c1001c17>] ? cpu_idle+0x47/0x90 [ 4472.375969] [<c128ee78>] ? start_secondary+0x1c8/0x1ce However enabling xrun_debug make the bug go away :) You mean that with xrun_debug=11 this problem is not visible? Attach the output using same conditions (program) with xrun_debug=11 to check the stream parameters. Created attachment 27354 [details] xrun_debug=11 log, delay_pcm_irq=8 and emu10k1-delay-irq2.patch (In reply to comment #66) > You mean that with xrun_debug=11 this problem is not visible? Yes. > Attach the output using same conditions (program) with xrun_debug=11 > to check the stream parameters. Here it is. The report in comment#65 contains only warnings. The period size is just 21 samples (Audacious media player is crazy), so the interrupt skipping is very probable. Anyway, audio should not skip when xrun_debug=0 (please, try). I need probably more detailed log (xrun_debug=27) to check it. Created attachment 27355 [details]
xrun_debug=27
With xrun_debug=0 audio skips, like I've already told you.
Here comes an xrun_debug=27 log.
I don't see any problem in log and from your comments I deduce that even with xrun_debug=27 the problem is not visible. Could you change line: .period_bytes_min = 64, to: .period_bytes_min = 512, in sound/pci/emu10k1/emupcm.c (around line 964 - snd_emu10k1_playback structure) and do your tests again? 1. With xrun_debug=27 there must have been something visible/detectable in a log file I attached, because during a short 12 seconds playing session I had a sound skip near the end. 2. With .period_bytes_min = 512 I can no longer reproduce the problem in Audacious. Hopefully this change doesn't affect mplayer. 3. Jaroslav, how can I pass delay_pcm_irq=8 to a module? I ended up manually editing the sources and recompiling because this command # modprobe snd_emu10k1 delay_pcm_irq=8,8,8,8,8,8,8,8 doesn't work as intended, /sys/module/snd_emu10k1/parameters/delay_pcm_irq still has a set of zeros. 4. Or maybe let's make delay_pcm_irq=8 default? Because I do believe that Linux drivers are supposed to work without fiddling. 5. I hope emu10k1-delay-irq2.patch and the last fix you proposed will land in mainline and in stable too (2.6.34.x and 2.6.35.x). Thank you. (In reply to comment #71) > 1. With xrun_debug=27 there must have been something visible/detectable in a > log file I attached, because during a short 12 seconds playing session I had > a > sound skip near the end. Yes, there is. Thanks. Please, write any information you can, I gave up after reading of 40000 lines. It appears like another issue (double irq acknowledge for small periods from the emu10k1 chip). > 2. With .period_bytes_min = 512 I can no longer reproduce the problem in > Audacious. Hopefully this change doesn't affect mplayer. Could you retest with value 128? > 3. Jaroslav, how can I pass delay_pcm_irq=8 to a module? I ended up manually > editing the sources and recompiling because this command > > # modprobe snd_emu10k1 delay_pcm_irq=8,8,8,8,8,8,8,8 > > doesn't work as intended, /sys/module/snd_emu10k1/parameters/delay_pcm_irq > still has a set of zeros. It works for me, it must be a bug in your distro or system configuration. Try to add these parameters to /etc/modprobe.conf or so (/etc/modprobe.d tree). > 4. Or maybe let's make delay_pcm_irq=8 default? Because I do believe that > Linux > drivers are supposed to work without fiddling. It's big value. Could you do all tests with delay_pcm_irq=2? > 5. I hope emu10k1-delay-irq2.patch and the last fix you proposed will land in > mainline and in stable too (2.6.34.x and 2.6.35.x). We need to find lowest stable values which are working for you at first. With value 128 the skipping issue immediately took effect. delay_pcm_irq needs a lot of testing in mplayer (sometimes the bug triggers only after 30 minutes during the course of playback), so I'll report later. period_size value = 256? With 256 Audacious plays audio just fine. (In reply to comment #74) > period_size value = 256? OK, I suppose I've tested enough and period_size=256 and delay_pcm_irq=2 are enough for all bugs to go away. Please, submit these changes upstream, I can send my Tested-by: if necessary, just CC me when sending out the patches. P.S. Jaroslav, I still don't understand why everything worked just fine in <=2.6.33 without any fuss and why these two patches are required 2.6.34 and onwards. Another question: you mentioned that Audacious is somehow broken the way it (ab)uses ALSA. What if there are some other applications out there which (ab)use ALSA even more and this bug resurfaces even with these two patches applied? Well, I'm against to set period_size=256 as unconditional default. Such a high value is needed only for crappy hardware that don't give IRQs property. On most hardware, it's no problem as is now. The old driver "worked" in earlier versions because it didn't check the condition strictly. Most apps don't care about it, but apps like PulseAudio pretty rely on the accuracy of PCM runtime status, so the new code was introduced after 2.6.33 to rule out the wrong conditions. About app-brokenness: if an app sets up the audio in *that* crazy way like Audacious, it's just an application bug to be fixed there. (In reply to comment #77) > > About app-brokenness: if an app sets up the audio in *that* crazy way like > Audacious, it's just an application bug to be fixed there. I know you are a busy person, so can you at least tell me what's wrong with audacious ALSA output plug-in, so that I could report this bug to its developers. ALSA output plug-in code can be seen online: http://hg.atheme.org/audacious-plugins/audacious-plugins/file/cd6114b7955d/src/alsa or downloaded as a part of of http://distfiles.atheme.org/audacious-plugins-2.4-rc1.tgz archive. Thank you. (In reply to comment #77) > Well, I'm against to set period_size=256 as unconditional default. Such a > high > value is needed only for crappy hardware that don't give IRQs property. On > most hardware, it's no problem as is now. Note that 256 bytes for standard usage (44kHz, 16-bit samples, stereo) means interrupt latency for 64 frames -> 0.0015 seconds which is enough for all applications. I think that we can put 256 as default and add a module parameter to allow user modifications. > The old driver "worked" in earlier versions because it didn't check the > condition strictly. Most apps don't care about it, but apps like PulseAudio > pretty rely on the accuracy of PCM runtime status, so the new code was > introduced after 2.6.33 to rule out the wrong conditions. Agreed. > About app-brokenness: if an app sets up the audio in *that* crazy way like > Audacious, it's just an application bug to be fixed there. Yes, Audacious is really crazy. (In reply to comment #79) > (In reply to comment #77) > > Well, I'm against to set period_size=256 as unconditional default. Such a > high > > value is needed only for crappy hardware that don't give IRQs property. On > > most hardware, it's no problem as is now. > > Note that 256 bytes for standard usage (44kHz, 16-bit samples, stereo) means > interrupt latency for 64 frames -> 0.0015 seconds which is enough for all > applications. OTOH, if another user has used emu10k1 with JACK and period_size=64, this can be seen as a regression, too. Yes, it's adjustable. But, this can be a counter-argument, too :) My concern is that you picked up a value 256 just because a broken app like Audacious worked coincidentally on one machine magically. If this value were chosen by some proper calculation, I'd be happily for it, though... (In reply to comment #78) > (In reply to comment #77) > > > > About app-brokenness: if an app sets up the audio in *that* crazy way like > > Audacious, it's just an application bug to be fixed there. > > I know you are a busy person, so can you at least tell me what's wrong with > audacious ALSA output plug-in, so that I could report this bug to its > developers. ALSA output plug-in code can be seen online: > > > http://hg.atheme.org/audacious-plugins/audacious-plugins/file/cd6114b7955d/src/alsa > > or downloaded as a part of of > http://distfiles.atheme.org/audacious-plugins-2.4-rc1.tgz archive. > > Thank you. The problem is that Audacious doesn't set the period_size or period_time before snd_pcm_hw_params() call. It can set, at least, the lowest value to avoid an insane value. > My concern is that you picked up a value 256 just because a broken app like
> Audacious worked coincidentally on one machine magically. If this value were
> chosen by some proper calculation, I'd be happily for it, though...
The real problem is that the emu10k1 sends double acks for fast interrupts on reporter machine - thus we get two snd_period_elapsed() calls with only little time delay which confuses the update pointer function in the pcm_lib code.
This problem will be seen in jack, too.
So I would propose just to use safe values and experienced users might get lower latencies.
Another possibility is to use the emu10k1 wallclock to check if interrupts are not comming so fast to avoid double acks, but it's just another slow PCI I/O read operation - limiting the period interrupt rate is more elegant in this case.
And last note: All hw configuration offered by the driver should work including the smallest periods. If the parameter setup is not ideal is another question.
(In reply to comment #82) > > My concern is that you picked up a value 256 just because a broken app like > > Audacious worked coincidentally on one machine magically. If this value > were > > chosen by some proper calculation, I'd be happily for it, though... > > The real problem is that the emu10k1 sends double acks for fast interrupts on > reporter machine - thus we get two snd_period_elapsed() calls with only > little > time delay which confuses the update pointer function in the pcm_lib code. Actually, in such a situation, the earlier driver version worked better. It was designed to be robust for lost IRQs rather than the strict ptr update. > This problem will be seen in jack, too. > > So I would propose just to use safe values and experienced users might get > lower latencies. > > Another possibility is to use the emu10k1 wallclock to check if interrupts > are > not comming so fast to avoid double acks, but it's just another slow PCI I/O > read operation - limiting the period interrupt rate is more elegant in this > case. Hm, OK, using period_size_min=256 sounds more reasonable as a quick fix than other smarter but more intrusive solutions. > And last note: All hw configuration offered by the driver should work > including > the smallest periods. How can you guarantee that period_size=256 works on all machines? :) The period limit code is now in alsa-lib: http://git.alsa-project.org/?p=alsa-lib.git;a=commitdiff;h=09879a4bb58199f64abcb8df506f917c8efc2383 I'm going to apply the last patch with delay_pcm_irq=2 as default without the period limiting for Audacious in the driver, because we have the user space solution for it. Jaroslav, is your final patch ready for merge? I put this emu10k1 (with pcm_irq_delay=2 as default) and pcm midlevel patches (for double interrupt acks) to my devel tree: http://git.alsa-project.org/?p=alsa-kernel.git;a=commitdiff;h=566e41e91c1de4cae625e42ef3f32e5a964be62f http://git.alsa-project.org/?p=alsa-kernel.git;a=commitdiff;h=b990ae57452628e9f422b8c5bf6543e77cb9633e (In reply to comment #86) > I put this emu10k1 (with pcm_irq_delay=2 as default) and pcm midlevel patches > (for double interrupt acks) to my devel tree: As far as I remember, we decided to increase period_size to 256 but I don't see this change, why is it so? |