Most recent kernel where this bug did not occur: Distribution: Gentoo Hardware Environment: Software Environment: Linux e275d 2.6.15-rc7 #1 Fri Dec 30 03:58:06 EET 2005 x86_64 AMD Athlon(tm) 64 Processor 3000+ AuthenticAMD GNU/Linux Gnu C 3.4.4 Gnu make 3.80 binutils 2.16.1 util-linux 2.12r mount 2.12r module-init-tools 3.0 e2fsprogs 1.38 jfsutils 1.1.8 reiserfsprogs line reiser4progs line xfsprogs 2.6.25 nfs-utils 1.0.6 Linux C Library 2.3.5 Dynamic linker (ldd) 2.3.5 Procps 3.2.5 Net-tools 1.60 Kbd 1.12 Sh-utils 5.2.1 udev 070 Modules Loaded The kernel does not have pre-empt and it has a 250 Hz timer. Problem Description: Currently firefox + X can starve two of my processes so that they do not get any timeslice during some wall clock seconds. I am getting huge buffer underruns when playing sound with uade123 (uade 2.01 at http://zakalwe.virtuaalipalvelin.net/uade/ ). uadecore process is attached to the uade123 by two pipes. uadecore synthesizes sound data and passes that data to uade123. uade123 pushes the sound data to libao which pushes it for ALSA. Here's a small strace dump of what happens when I open 3 tabs to firefox and push down ctrl-page up so that firefox starts to change tabs rapidly (consuming lots of CPU). Normally uadecore+uade123 consume only 4.0% of CPU but when starved they only get a fraction. 'make soundcheck' should produce the problem well. 22796 21:40:54 write(1, "Playing time position 2.9s in su"..., 60) = 60 22796 21:40:54 select(1, [0], NULL, NULL, {0, 0}) = 0 (Timeout) 22796 21:40:54 write(4, "\0\0\0\5\0\0\0\4\0\0\17\370", 12) = 12 22796 21:40:54 write(4, "\0\0\0\20\0\0\0\0", 8) = 8 22796 21:40:54 ioctl(6, 0x40184150, 0x7fffffc285b0) = 0 22796 21:40:54 read(5, <unfinished ...> * A full second without system calls 22797 21:40:56 <... read resumed> "\0\0\0\5\0\0\0\4", 8) = 8 22797 21:40:56 read(3, "\0\0\17\370", 4) = 4 22797 21:40:56 read(3, "\0\0\0\20\0\0\0\0", 8) = 8 22797 21:40:56 write(6, "\0\0\0\31\0\0\17\370\31\313\23\256\31\313\0261\31\313\$ 22796 21:40:56 <... read resumed> "\0\0\0\31\0\0\17\370", 8) = 8 22797 21:40:56 <... write resumed> ) = 4096 22796 21:40:56 read(5, "\31\313\23\256\31\313\0261\31\313\27&\31\313\27\24\31\3$ 22796 21:40:56 read(5, <unfinished ...> 22797 21:40:56 write(6, "\0\0\0\20\0\0\0\0", 8 <unfinished ...> 22796 21:40:56 <... read resumed> "\0\0\0\20\0\0\0\0", 8) = 8 22797 21:40:56 <... write resumed> ) = 8 22796 21:40:56 write(1, "Playing time position 3.0s in su"..., 60) = 60 22796 21:40:56 select(1, [0], NULL, NULL, {0, 0}) = 0 (Timeout) 22796 21:40:56 write(4, "\0\0\0\5\0\0\0\4\0\0\17\370", 12) = 12 22796 21:40:56 write(4, "\0\0\0\20\0\0\0\0", 8) = 8 22796 21:40:56 ioctl(6, 0x40184150, 0x7fffffc285b0) = -1 EPIPE (Broken pipe) 22796 21:40:56 write(2, "ALSA: underrun, at least 0ms.\n", 30) = 30 22796 21:40:56 ioctl(6, 0x4140, 0x1e) = 0 22796 21:40:56 read(5, <unfinished ...> 22797 21:40:56 read(3, "\0\0\0\5\0\0\0\4", 8) = 8 22797 21:40:56 read(3, "\0\0\17\370", 4) = 4 22797 21:40:56 read(3, "\0\0\0\20\0\0\0\0", 8) = 8 22797 21:40:56 write(6, "\0\0\0\31\0\0\17\370\324Y\353C\324Y\352u\324Y\351(\324$ 22796 21:40:56 <... read resumed> "\0\0\0\31\0\0\17\370", 8) = 8 22797 21:40:56 <... write resumed> ) = 4096 22796 21:40:56 read(5, "\324Y\353C\324Y\352u\324Y\351(\324Y\347\250\324Y\3462\3$ 22796 21:40:56 read(5, <unfinished ...> 22797 21:40:56 write(6, "\0\0\0\20\0\0\0\0", 8 <unfinished ...> 22796 21:40:56 <... read resumed> "\0\0\0\20\0\0\0\0", 8) = 8 22797 21:40:56 <... write resumed> ) = 8 I think I need to try this on BSDs and 2.4.x kernel too, but I do not have such systems at hand. Steps to reproduce: 'make soundcheck' for uade123 (uade 2.01), open 3 tabs to firefox and press down ctrl-page up so that firefox switches tabs rapidly. This will cause huge underruns.
This looks like it is related to the TASK_NONINTERACTIVE flag for pipes. Can you check to see if the problem existed prior to this change? 2.6.12 had the new fatter deeper pipes but did not have the TASK_NONINTERACTIVE flag if I recall correctly.
It seems I don't get any underruns on 2.6.12, which is great. I hope you can fix this problem soon.
Created attachment 7007 [details] sched improve task noninteractive patch Alter the activated mechanism to count all sleep time in a linear fashion and move the TASK_NONINTERACTIVE flagged tasks to gain sleep average from this instead of no sleep average.
Please try the patch I attached here to see if it helps 2.6.15. Also I am interested in any detrimental interactivity effects of this patch.
> Please try the patch I attached here to see if it helps 2.6.15. Also I > am interested in any detrimental interactivity effects of this patch. Do you have any tips what to test? Are there scheduling test suites?
I wrote an interactivity benchmark which covers some of the basics (interbench.kolivas.org). You can use this for some hard measurements. A lot is still up to you to test in your normal environment to see how smooth windows move about and audio and video plays back etc under your _normal_ workloads. I don't really care how it feels with 'make -j16' in the background because optimising for something like that is pointless and tends to favour unfair scheduling.
I compiled 2.6.15 with your patch and I'm not getting any underruns anymore :)
Created attachment 7008 [details] 2.6.15 interactivity rollup Great. That patch was part of a series I'm working on to correct a few current quirks. Can you test this rolled up patch which contains all those in the series to ensure it still fixes your problem? You will need to back out the previous patch first.
I'll try it in the evening. -> work
I tested your newer patch too. It also worked well; no underruns. I will post interbench results later for 2.6.15-rc7 and 2.6.15-interactive-patch.
Created attachment 7019 [details] 2.6.15-rc7 interbench results without any patch
Created attachment 7020 [details] 2.6.15 interbench results with the second patch applied Here are both of the interbench results. The first one (2.6.15-rc7) is without any patch, and the second one is with the second interactivity patch.
Thanks for the results. The changes are consistent with what we would expect, the heavy cpu interactive tasks (like X) suffer more under I/O load since these patches also increase the bonuses of I/O bound tasks (see the lkml thread). Ok these patches have been queued up for the next -mm so I'm marking this bug as fixed.
The bug occurs with 2.6.16 too. Is this going to be fixed in the future? Is merging the 2.6.15 interactivity patch safe for 2.6.16?
The patches were merged into -mm and thus are following the normal cycle for mainstream inclusion. They are in the -mm kernel as of then and I was planning on pushing them for 2.6.17. There are some changes in 2.6.16 that prevent the patch from applying cleanly. I closed this bug because a code fix was pushed to -mm and will eventually be merged upstream. Only reopen the bug if the problem is present in the current -mm kernel please.
> I was planning on pushing them for 2.6.17 Thanks. Will it also be pushed into the new 2.6.16.* series?
No, because it was too big a change to go into 2.6.16, therefore it is definitely too big a change to go into 2.6.16.x For convenience I've posted a patch for 2.6.16 here: http://ck.kolivas.org/patches/interactivity/2.6.16-O22.1int.patch
> No, because it was too big a change to go into 2.6.16, therefore it is > definitely too big a change to go into 2.6.16.x According to this: http://lkml.org/lkml/2005/12/3/55 2.6.16.x might be maintained for as long as 2 to 3 years. Having a deficient scheduler for that long would tremendously decrease usability of that kernel series. No doubt this will cause more bug reports and I dislike the idea of work-arounding this problem in the application.
Being maintained doesn't make it the "current" stable kernel. I'm not going to argue the development model here. Feel free to debate this issue in the appropriate place.