Subject : hackbench regression with 2.6.26-rc2 on tulsa machine Submitter : "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Date : 2008-05-20 8:09 References : http://marc.info/?l=linux-kernel&m=121127121813708&w=2 Handled-By : Mike Galbraith <efault@gmx.de> This entry is being used for tracking a regression from 2.6.25. Please don't close it until the problem is fixed in the mainline.
Probably caused by: commit 46151122e0a2e80e5a6b2889f595e371fe2b600d Author: Mike Galbraith <efault@gmx.de> Date: Thu May 8 17:00:42 2008 +0200 sched: fix weight calculations Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Galbraith says that the tested configuration is known broken. Closing.
We were not that far from giving GROUP_SCHED a dependency on BROKEN during 2.6.25-rc. Can we consider this now instead of adding yet another problem to a known problematic configuration?
"known problematic" for us - a user who once enabled it in his kernel cannot know that it could cause such problems.
(In reply to comment #3) > We were not that far from giving GROUP_SCHED a dependency on BROKEN during > 2.6.25-rc. I sort of agree with this. What was the reason, actually?
(In reply to comment #5) > (In reply to comment #3) > > We were not that far from giving GROUP_SCHED a dependency on BROKEN during > > 2.6.25-rc. > > I sort of agree with this. What was the reason, actually? It was discussed in the thread around http://lkml.org/lkml/2008/3/28/273 In 2.6.25-rc we had 6 CPU scheduler regressions. 3 or 4 of them were caused by group scheduling. Including one that is still unfixed. In 2.6.26-rc we already have 6 CPU scheduler regressions, 5 of them still unfixed. 3 of them seem to be group scheduler regressions. The CPU scheduler is currently regressing horribly often, and half of the regressions are in group scheduling.
On Wed, 2008-05-21 at 05:54 -0700, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10761 > > > > > > ------- Comment #6 from bunk@kernel.org 2008-05-21 05:54 ------- > (In reply to comment #5) > > (In reply to comment #3) > > > We were not that far from giving GROUP_SCHED a dependency on BROKEN > during > > > 2.6.25-rc. > > > > I sort of agree with this. What was the reason, actually? > > It was discussed in the thread around http://lkml.org/lkml/2008/3/28/273 > > In 2.6.25-rc we had 6 CPU scheduler regressions. > 3 or 4 of them were caused by group scheduling. > Including one that is still unfixed. > > In 2.6.26-rc we already have 6 CPU scheduler regressions, 5 of them still > unfixed. > 3 of them seem to be group scheduler regressions. > > The CPU scheduler is currently regressing horribly often, and half of the > regressions are in group scheduling. That is because group scheduling is horribly complex and was never feature complete - trying to solve that is high on my list of priorities.
(In reply to comment #7) > On Wed, 2008-05-21 at 05:54 -0700, bugme-daemon@bugzilla.kernel.org > wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=10761 > > ------- Comment #6 from bunk@kernel.org 2008-05-21 05:54 ------- > > (In reply to comment #5) > > > (In reply to comment #3) > > > > We were not that far from giving GROUP_SCHED a dependency on BROKEN > during > > > > 2.6.25-rc. > > > > > > I sort of agree with this. What was the reason, actually? > > > > It was discussed in the thread around http://lkml.org/lkml/2008/3/28/273 > > > > In 2.6.25-rc we had 6 CPU scheduler regressions. > > 3 or 4 of them were caused by group scheduling. > > Including one that is still unfixed. > > > > In 2.6.26-rc we already have 6 CPU scheduler regressions, 5 of them still > > unfixed. > > 3 of them seem to be group scheduler regressions. > > > > The CPU scheduler is currently regressing horribly often, and half of the > > regressions are in group scheduling. > > That is because group scheduling is horribly complex and was never > feature complete - trying to solve that is high on my list of > priorities. The current question is what to do for 2.6.26. And getting it feature complete is nothing that would suit for 2.6.26. Can we agree to add to GROUP_SCHED a dependency on BROKEN and keep this dependency in Linus' tree until the code is feature complete and considered ready for production use? Currently it seems to be more of a pitfall (for users who enable it) than a useful feature.
On Wed, 2008-05-21 at 06:18 -0700, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10761 > > > > > > ------- Comment #8 from bunk@kernel.org 2008-05-21 06:18 ------- > (In reply to comment #7) > > On Wed, 2008-05-21 at 05:54 -0700, bugme-daemon@bugzilla.kernel.org > > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=10761 > > > ------- Comment #6 from bunk@kernel.org 2008-05-21 05:54 ------- > > > (In reply to comment #5) > > > > (In reply to comment #3) > > > > > We were not that far from giving GROUP_SCHED a dependency on BROKEN > during > > > > > 2.6.25-rc. > > > > > > > > I sort of agree with this. What was the reason, actually? > > > > > > It was discussed in the thread around http://lkml.org/lkml/2008/3/28/273 > > > > > > In 2.6.25-rc we had 6 CPU scheduler regressions. > > > 3 or 4 of them were caused by group scheduling. > > > Including one that is still unfixed. > > > > > > In 2.6.26-rc we already have 6 CPU scheduler regressions, 5 of them still > > > unfixed. > > > 3 of them seem to be group scheduler regressions. > > > > > > The CPU scheduler is currently regressing horribly often, and half of the > > > regressions are in group scheduling. > > > > That is because group scheduling is horribly complex and was never > > feature complete - trying to solve that is high on my list of > > priorities. > > The current question is what to do for 2.6.26. > > And getting it feature complete is nothing that would suit for 2.6.26. > > Can we agree to add to GROUP_SCHED a dependency on BROKEN and keep this > dependency in Linus' tree until the code is feature complete and considered > ready for production use? > > Currently it seems to be more of a pitfall (for users who enable it) than a > useful feature. I think we changed the default to 'N' - isn't that enough?
(In reply to comment #9) > I think we changed the default to 'N' - isn't that enough? It does already default to N, and we know how many people run into problems with it. How many hours have people wasted on bisecting regressions that turned out to be group scheduling problems? If a feature isn't ready for being used on production systems it shouldn't be in stable kernels.
(IMHO, bugzilla shouldn't be used for tracking EXPERIMENTAL code, so I shouldn't be replying to bugme-daemon, but...) On Wed, 2008-05-21 at 06:18 -0700, bugme-daemon@bugzilla.kernel.org wrote: > The current question is what to do for 2.6.26. My $.02 is that since it defaults to 'N' _and_ depends on EXPERIMENTAL, all is just fine. > And getting it feature complete is nothing that would suit for 2.6.26. Heartily disagree given the above. > Can we agree to add to GROUP_SCHED a dependency on BROKEN and keep this > dependency in Linus' tree until the code is feature complete and considered > ready for production use? Why mark it BROKEN? It's only 'broken' in so far as it has known performance issues, which is quite normal for complex code under active development. BROKEN means "this gizmo don't work, and ain't being fixed". That does not apply to group scheduling. > Currently it seems to be more of a pitfall (for users who enable it) than a > useful feature. If you explicitly enable features marked EXPERIMENTAL, you might indeed encounter a developmental pitfall or two. Nothing unusual here. -Mike
(In reply to comment #11) > (IMHO, bugzilla shouldn't be used for tracking EXPERIMENTAL code, so I > shouldn't be replying to bugme-daemon, but...) > > On Wed, 2008-05-21 at 06:18 -0700, bugme-daemon@bugzilla.kernel.org > wrote: > > > The current question is what to do for 2.6.26. > > My $.02 is that since it defaults to 'N' _and_ depends on EXPERIMENTAL, > all is just fine. My €.02 (which is more than $.03) is that it's very common that it's impossible to use a kernel with CONFIG_EXPERIMENTAL=n (e.g. for hardware drivers), and users having CONFIG_EXPERIMENTAL=n set are therefore _very_ rare. > > And getting it feature complete is nothing that would suit for 2.6.26. > > Heartily disagree given the above. > > > Can we agree to add to GROUP_SCHED a dependency on BROKEN and keep this > > dependency in Linus' tree until the code is feature complete and considered > > ready for production use? > > Why mark it BROKEN? It's only 'broken' in so far as it has known > performance issues, which is quite normal for complex code under active > development. BROKEN means "this gizmo don't work, and ain't being > fixed". That does not apply to group scheduling. Is it ready for being used in production today or not? > > Currently it seems to be more of a pitfall (for users who enable it) than a > > useful feature. > > If you explicitly enable features marked EXPERIMENTAL, you might indeed > encounter a developmental pitfall or two. Nothing unusual here. Please name one distribution that builds it's kernels with CONFIG_EXPERIMENTAL=n. Your expectations of CONFIG_EXPERIMENTAL do not match reality.
On Wed, 2008-05-21 at 09:24 -0700, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10761 > > > > > > ------- Comment #12 from bunk@kernel.org 2008-05-21 09:24 ------- > (In reply to comment #11) > > (IMHO, bugzilla shouldn't be used for tracking EXPERIMENTAL code, so I > > shouldn't be replying to bugme-daemon, but...) > > > > On Wed, 2008-05-21 at 06:18 -0700, bugme-daemon@bugzilla.kernel.org > > wrote: > > > > > The current question is what to do for 2.6.26. > > > > My $.02 is that since it defaults to 'N' _and_ depends on EXPERIMENTAL, > > all is just fine. > > My €.02 (which is more than $.03) is that it's very common that it's > impossible to use a kernel with CONFIG_EXPERIMENTAL=n (e.g. for hardware > drivers), and users having CONFIG_EXPERIMENTAL=n set are therefore _very_ > rare. Yes, some hardware needs experimental drivers. That doesn't change the definition of EXPERIMENTAL. It still means "Aunt Tilly beware!", as it always has. > > > And getting it feature complete is nothing that would suit for 2.6.26. > > > > Heartily disagree given the above. > > > > > Can we agree to add to GROUP_SCHED a dependency on BROKEN and keep this > > > dependency in Linus' tree until the code is feature complete and > considered > > > ready for production use? > > > > Why mark it BROKEN? It's only 'broken' in so far as it has known > > performance issues, which is quite normal for complex code under active > > development. BROKEN means "this gizmo don't work, and ain't being > > fixed". That does not apply to group scheduling. > > Is it ready for being used in production today or not? Depends on the production load I suppose. There are loads where EXT3 doesn't perform well. Rhetorical: Shall we mark EXT3 BROKEN? > > > Currently it seems to be more of a pitfall (for users who enable it) than > a > > > useful feature. > > > > If you explicitly enable features marked EXPERIMENTAL, you might indeed > > encounter a developmental pitfall or two. Nothing unusual here. > > Please name one distribution that builds it's kernels with > CONFIG_EXPERIMENTAL=n. Rhetorical: Your point is? If distros were hiring Aunt Tilly to configure and test their kernels, they could run into trouble enabling EXPERIMENTAL. I don't think that's the case. > Your expectations of CONFIG_EXPERIMENTAL do not match reality. No. Your redefinition thereof doesn't match past or current reality. I've stated my position, and rebutted yours for the record. Bugzilla wasn't intended to be a debate podium, so I'm outta here ;-) EOT, -Mike
Confirmed to have been improved recently: References : http://lkml.org/lkml/2008/6/2/10
The problem appears to be fixed in the mainline. References : http://lkml.org/lkml/2008/6/7/227 Closing.