Bug 215045
Summary: | Serious performance downgrade caused by "sched/fair: Ensure _sum and _avg values stay consistent" | ||
---|---|---|---|
Product: | Process Management | Reporter: | Rick Yiu (rickyiu) |
Component: | Scheduler | Assignee: | Ingo Molnar (mingo) |
Status: | NEW --- | ||
Severity: | high | CC: | dietmar.eggemann, joni_lee, rickyiu, vincent.guittot, wvw |
Priority: | P1 | ||
Hardware: | ARM | ||
OS: | Linux | ||
Kernel Version: | 5.10.51 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
test patch 1
test patch 2 use highest estimate of util_sum sched/fair: Sync _avg/_sum in update_cfs_rq_load_avg() only when called from update_blocked_averages() Don't sync hardly util_sum with uti_avg Don't sync hardly util_sum with uti_avg |
Description
Rick Yiu
2021-11-16 13:42:24 UTC
I added some log, it shows that all the values of *_sum are much lower with the patch "sched/fair: Ensure _sum and _avg values stay consistent", take cpu 4 as an example below. I also confirmed that it is not related to our vendor hooks by disabling it and checking 1080p 60fps video recording performance (check Non-error FPS part). Before revert: Runner2:2-8520 ( 770) [006] d..3 683.168993: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=13958406 sa->util_sum=2255904 sa->runnable_sum=4464810 CallbackHndlr-8490 ( 8114) [004] d..2 683.169180: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=12743233 sa->util_sum=1974966 sa->runnable_sum=3244587 SensorService-1747 ( 1437) [000] d..6 683.171991: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=14783428 sa->util_sum=2152018 sa->runnable_sum=3602291 MediaCodec_loop-8485 ( 8114) [004] d..2 683.175912: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=14255360 sa->util_sum=2083840 sa->runnable_sum=2936320 SensorService-1747 ( 1437) [004] d..2 683.176402: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=13847224 sa->util_sum=2039146 sa->runnable_sum=2513366 <idle>-0 (-------) [004] dN.2 683.178186: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=36803382 sa->util_sum=2557764 sa->runnable_sum=4689234 CallbackHndlr-8490 ( 8114) [004] d..6 683.183457: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=21229615 sa->util_sum=1955987 sa->runnable_sum=2671592 After revert: Runner2:0-8163 ( 773) [006] d..3 78.146132: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=78239610 sa->util_sum=7208535 sa->runnable_sum=14883914 app-613 ( 513) [004] d..2 78.147979: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=83904222 sa->util_sum=7433533 sa->runnable_sum=15076531 PreviewChoreogr-8167 ( 944) [005] d..6 78.148714: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=82727397 sa->util_sum=7371652 sa->runnable_sum=14877039 Runner2:1-7893 ( 773) [004] d..2 78.148824: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=74312631 sa->util_sum=7011316 sa->runnable_sum=14156367 provider@2.7-se-1145 ( 773) [005] d..3 78.150369: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=74334964 sa->util_sum=7239902 sa->runnable_sum=14924267 LwisEvent:Ois-8166 ( 773) [003] d..3 78.151618: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=73960791 sa->util_sum=7225835 sa->runnable_sum=14827028 Runner2:3-7835 ( 773) [004] d..3 78.152320: update_cfs_rq_load_avg: rickyiu cpu=4 sa->load_sum=83386625 sa->util_sum=7632947 sa->runnable_sum=14296668 Disable vendor hook, before revert: ver the past 2.024902019s there were 109 responses sent and 95 of them had no error. FPS: 53.8298 Non-error FPS: 46.9159 11-19 15:54:47.700 773 8005 I Lyric : retiming_node.cc:590: Camera: Rear, Over the past 2.020045777s there were 112 responses sent and 95 of them had no error. FPS: 55.4443 Non-error FPS: 47.0286 11-19 15:54:49.731 773 7991 I Lyric : retiming_node.cc:590: Camera: Rear, Over the past 2.031139568s there were 111 responses sent and 95 of them had no error. FPS: 54.6491 Non-error FPS: 46.7718 11-19 15:54:51.741 773 7987 I Lyric : retiming_node.cc:590: Camera: Rear, Over the past 2.005086793s there were 107 responses sent and 94 of them had no error. FPS: 53.3643 Non-error FPS: 46.8808 11-19 15:54:53.753 773 7930 I Lyric : retiming_node.cc:590: Camera: Rear, Over the past 2.012023072s there were 107 responses sent and 87 of them had no error. FPS: 53.1803 Non-error FPS: 43.2401 11-19 15:54:55.776 773 7991 I Lyric : retiming_node.cc:590: Camera: Rear, Over the past 2.022858643s there were 109 responses sent and 87 of them had no error. FPS: 53.8841 Non-error FPS: 43.0084 Disable vendor hook, after revert: 11-19 16:07:43.448 763 7689 I Lyric : retiming_node.cc:590: Camera: Rear, Over the past 2.014838462s there were 114 responses sent and 108 of them had no error. FPS: 56.5802 Non-error FPS: 53.6023 11-19 16:07:45.464 763 7775 I Lyric : retiming_node.cc:590: Camera: Rear, Over the past 2.015823284s there were 112 responses sent and 109 of them had no error. FPS: 55.5604 Non-error FPS: 54.0722 11-19 16:07:47.498 763 7764 I Lyric : retiming_node.cc:590: Camera: Rear, Over the past 2.033886313s there were 113 responses sent and 108 of them had no error. FPS: 55.5587 Non-error FPS: 53.1003 11-19 16:07:49.507 763 7689 I Lyric : retiming_node.cc:590: Camera: Rear, Over the past 2.008588095s there were 112 responses sent and 101 of them had no error. FPS: 55.7606 Non-error FPS: 50.2841 11-19 16:07:51.529 763 7689 I Lyric : retiming_node.cc:590: Camera: Rear, Over the past 2.022211183s there were 109 responses sent and 97 of them had no error. FPS: 53.9014 Non-error FPS: 47.9673 So you reverted: 1c35b07e6d39 - sched/fair: Ensure _sum and _avg values stay consistent Where did you place your log exactly in update_cfs_rq_load_avg()? Within or outside the `if (cfs_rq->removed.nr)` condition? CPU4 on your system. What's the original CPU capacity? # cat /sys/devices/system/cpu/cpu4/cpu_capacity So the story is that the util_sum related change is causing that lower CPU frequencies via schedutil CPU governor are chosen for your camera app? You're using the mainline Schedutil CPUfreq governor, right? Not sure that people understand the concept of Android vendor hooks here. Which vendor hook is it you disabled? There might be more vendor hooks in the area (CFS, PELT -> Schedutil) Can you make sure that this is indeed because of the util_sum change by only reverting: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 884f29d07963..156fe0108ae8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3682,7 +3682,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) r = removed_util; sub_positive(&sa->util_avg, r); - sa->util_sum = sa->util_avg * divider; + sub_positive(&sa->util_sum, r * divider); r = removed_runnable; sub_positive(&sa->runnable_avg, r); and run your test again? (In reply to dietmar.eggemann from comment #2) > So you reverted: > > 1c35b07e6d39 - sched/fair: Ensure _sum and _avg values stay consistent > > Where did you place your log exactly in update_cfs_rq_load_avg()? Within or > outside the `if (cfs_rq->removed.nr)` condition? It's with the `if (cfs_rq->removed.nr)` condition, after all calculations done. > > CPU4 on your system. What's the original CPU capacity? > # cat /sys/devices/system/cpu/cpu4/cpu_capacity It is 427. (In reply to dietmar.eggemann from comment #3) > So the story is that the util_sum related change is causing that lower CPU > frequencies via schedutil CPU governor are chosen for your camera app? > You're using the mainline Schedutil CPUfreq governor, right? We are using a vendor Schedutil CPUfreq governor, with some modification of it. Let me check if it is related. > > Not sure that people understand the concept of Android vendor hooks here. > Which vendor hook is it you disabled? There might be more vendor hooks in > the area (CFS, PELT -> Schedutil) I am disabling all sched vendor hooks, so basically it is running upstream EAS. > > Can you make sure that this is indeed because of the util_sum change by only > reverting: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 884f29d07963..156fe0108ae8 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3682,7 +3682,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) > > r = removed_util; > sub_positive(&sa->util_avg, r); > - sa->util_sum = sa->util_avg * divider; > + sub_positive(&sa->util_sum, r * divider); > > r = removed_runnable; > sub_positive(&sa->runnable_avg, r); > > and run your test again? Ok, let me test it and get back to you. Thanks. (In reply to Rick Yiu from comment #5) > (In reply to dietmar.eggemann from comment #3) > > So the story is that the util_sum related change is causing that lower CPU > > frequencies via schedutil CPU governor are chosen for your camera app? > > You're using the mainline Schedutil CPUfreq governor, right? > We are using a vendor Schedutil CPUfreq governor, with some modification of > it. Let me check if it is related. > > > > Not sure that people understand the concept of Android vendor hooks here. > > Which vendor hook is it you disabled? There might be more vendor hooks in > > the area (CFS, PELT -> Schedutil) > I am disabling all sched vendor hooks, so basically it is running upstream > EAS. > > > > Can you make sure that this is indeed because of the util_sum change by > only > > reverting: > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 884f29d07963..156fe0108ae8 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -3682,7 +3682,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq > *cfs_rq) > > > > r = removed_util; > > sub_positive(&sa->util_avg, r); > > - sa->util_sum = sa->util_avg * divider; > > + sub_positive(&sa->util_sum, r * divider); > > > > r = removed_runnable; > > sub_positive(&sa->runnable_avg, r); > > > > and run your test again? > Ok, let me test it and get back to you. Thanks. A rough comparison of the cpu frequency in general is that: Android sched vendor hook + vendor schedutil > Android sched vendor hook + upstream schedutil > upstream EAS For both Android sched vendor hook and upstream EAS: revert whole patch = partial revert patch > not revert patch (In reply to Rick Yiu from comment #6) > (In reply to Rick Yiu from comment #5) > > (In reply to dietmar.eggemann from comment #3) > > > So the story is that the util_sum related change is causing that lower > CPU > > > frequencies via schedutil CPU governor are chosen for your camera app? > > > You're using the mainline Schedutil CPUfreq governor, right? > > We are using a vendor Schedutil CPUfreq governor, with some modification of > > it. Let me check if it is related. > > > > > > Not sure that people understand the concept of Android vendor hooks here. > > > Which vendor hook is it you disabled? There might be more vendor hooks in > > > the area (CFS, PELT -> Schedutil) > > I am disabling all sched vendor hooks, so basically it is running upstream > > EAS. > > > > > > Can you make sure that this is indeed because of the util_sum change by > > only > > > reverting: > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index 884f29d07963..156fe0108ae8 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -3682,7 +3682,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq > > *cfs_rq) > > > > > > r = removed_util; > > > sub_positive(&sa->util_avg, r); > > > - sa->util_sum = sa->util_avg * divider; > > > + sub_positive(&sa->util_sum, r * divider); > > > > > > r = removed_runnable; > > > sub_positive(&sa->runnable_avg, r); > > > > > > and run your test again? > > Ok, let me test it and get back to you. Thanks. > > A rough comparison of the cpu frequency in general is that: > > Android sched vendor hook + vendor schedutil > > Android sched vendor hook + upstream schedutil > > upstream EAS > > For both Android sched vendor hook and upstream EAS: > > revert whole patch = partial revert patch > not revert patch Thanks for the test results! So we know it's related to rq->cfs.avg.util_avg and the related change in update_cfs_rq_load_avg(). divider = LOAD_MAX_AVG - 1024 + period_contrib r = \Sum se->avg.util_avg over all removed se's util_avg = util_avg - r w/ patch: util_sum = util_avg * divider w/o patch: util_sum = util_sum - r * divider So w/ the patch we also remove the part of util_sum (1) which was accrued out of period_contrib when the cfs_rq was running (accumulate_sum()). I think this is what the patch wanted to achieve, syncing util_sum and util_avg. But update_cfs_rq_load_avg() with cfs_rq->removed.nr might be called too often and we loose too much util_avg so this explains the difference you're experiencing. (1) is not reflected in util_avg yet. Max error here is 1023 << SCHED_CAPACITY_SHIFT = 1047552 which corresponds to util_avg of ~22. I guess you run most of your tasks in 1. level taskgroups such as /foreground, /background, /system etc? Let me instrument the code to be able to see the difference when rq->cfs.avg.util_avg is actually used by schedutil. Thanks for your detailed explanation! And yes, we are running our tasks in 1. level taskgroups. I've put the code to maintain a pre- and post-patch cfs->avg.util_[sum\|avg] and trace them plus the diff (in case > 64) plus the max diff in sugov_get_util()->cpu_util_cfs(). I see max diffs up to ~200 when running `hackbench` (in root taskgroup) on my Juno-r0 (bL Arm64 [l B B l l l] on all CPUs. git.gitlab.arm.com:linux-arm/linux-de.git upstream/bugzilla_kernel_org_215045 Could you run the same on your device with your workload please to finally make sure that we're talking about the same issue here? Thanks! I got access issues, can you please share it as a patch or an open git repository? I tested both upstream EAS and Android vendor hooks, and get the last logs of the trace as below. I'm not sure why util_diff_max values are so large, but at least util_diff values are not small. upstream EAS: cpu_util_cfs: CPU0 util_avg=58 util_avg2=459 util_diff=401 util_diff_max=15729 cpu_util_cfs: CPU1 util_avg=79 util_avg2=728 util_diff=649 util_diff_max=7221 cpu_util_cfs: CPU2 util_avg=182 util_avg2=504 util_diff=322 util_diff_max=25273 cpu_util_cfs: CPU3 util_avg=43 util_avg2=991 util_diff=948 util_diff_max=11456 cpu_util_cfs: CPU4 util_avg=69 util_avg2=1154 util_diff=1085 util_diff_max=18188 cpu_util_cfs: CPU5 util_avg=33 util_avg2=1062 util_diff=1029 util_diff_max=15620 cpu_util_cfs: CPU6 util_avg=264 util_avg2=870 util_diff=606 util_diff_max=36654 cpu_util_cfs: CPU7 util_avg=102 util_avg2=1470 util_diff=1368 util_diff_max=24534 Android vendor hooks: cpu_util_cfs: CPU0 util_avg=31 util_avg2=559 util_diff=528 util_diff_max=86579 cpu_util_cfs: CPU1 util_avg=29 util_avg2=544 util_diff=515 util_diff_max=90896 cpu_util_cfs: CPU2 util_avg=19 util_avg2=995 util_diff=976 util_diff_max=16069 cpu_util_cfs: CPU3 util_avg=20 util_avg2=705 util_diff=685 util_diff_max=23924 cpu_util_cfs: CPU4 util_avg=234 util_avg2=3 util_diff=-231 util_diff_max=90758 cpu_util_cfs: CPU5 util_avg=176 util_avg2=3957 util_diff=3781 util_diff_max=94174 cpu_util_cfs: CPU6 util_avg=372 util_avg2=0 util_diff=-372 util_diff_max=66596 cpu_util_cfs: CPU7 util_avg=531 util_avg2=2945 util_diff=2414 util_diff_max=63987 Created attachment 299789 [details]
test patch 1
Created attachment 299791 [details]
test patch 2
I attached the 2 test patches. Thanks for testing this on your device! On my Juno-r0 with an ~idle Debian system I reach 'util_diff_max' up to 300 during boot. But with `hackbench -g40 -l1000` (1600 tasks) I can reach ~20000 easily. I think we have enough evidence to ask whether 1c35b07e6d39 (1) is really necessary to avoid that SCHED_WARN_ON() in cfs_rq_is_decayed (9e077b52d86a) triggers (that was the aim of this patch). I can't get it to trigger in case I run w/o (1), but fcf6631f3736 (2) is definitely needed. Which makes sense since before a cfs_rq has no util_avg any more, the last task would have to be detached from it. I want to run more tests on an Ubuntu server (lots of multi-level taskgroups) before I can be sure we possibly can revert (1). So w/o 1c35b07e6d39 ("sched/fair: Ensure _sum and _avg values stay consistent") I didn't run into the SCHED_WARN_ON() in cfs_rq_is_decayed() (9e077b52d86a) so far on Juno-r0 or Ampere Altra with Ubuntu Server: [ 13.874486] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg [ 13.874519] WARNING: CPU: 5 PID: 259 at kernel/sched/fair.c:3299 update_blocked_averages+0x640/0x67c So it looks like that we don't need 1c35b07e6d39 to make sure all cfs_rq _avg's a re 0 when the _util's are 0. Dietmar reported this bug to me. According to the bug description reported for commit 1c35b07e6d39, I'm not sure that we can simply revert it. IIUC, you are facing a decrease of util_avg since the patch is applied. Although I'm a bit surprised by the large difference with vs with the patch, this can probably happen if we have a huge amount of migration between cpu of task group. I have attached a path that will make sure that we use the higher bound of util_sum when we estimate it using util_avg. Created attachment 299853 [details]
use highest estimate of util_sum
Thanks for all the info and the new test patch (#18). I have tested it and it seems we could get back to our previous performance level. We will check if there is any power difference. Thanks. I compared 2 versions - applying patch of #18 and reverting the patch in question. It seems that applying patch of #18 performs better, but it also has more power consumption. I guess the higher bound of util_sum results in higher cpu frequency and the test result as mentioned. Yes, that's the downside of this version. I'm preparing a version that would stay better aligned with reality whenever possible but not regress regarding the bug fixed with 1c35b07e6d39 I found a setup in which I can see the SCHED_WARN_ON() from cfs_rq_is_decayed() in case I run on tip/sched/core with commit 28f620977975 ("sched/fair: Ensure _sum and _avg values stay consistent") reverted. on Juno-r0: TG={/A/B /C/D /E/F} cgcreate -g cpu:$TG; echo $$ > /sys/fs/cgroup/cpu/$TG/tasks; while(true); do hackbench; done With extra debug in cfs_rq_is_decayed(): [ 2840.288479] CPU2 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [ 2840.295039] ------------[ cut here ]------------ [ 2840.299662] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg [ 2840.299673] WARNING: CPU: 2 PID: 2804 at kernel/sched/fair.c:3310 update_blocked_averages+0x6d4/0x6e4 [ 2840.316678] Modules linked in: [ 2840.319737] CPU: 2 PID: 2804 Comm: hackbench Not tainted 5.16.0-rc1-00013-g28f620977975-dirty #33 [ 2840.328631] Hardware name: ARM Juno development board (r0) (DT) [ 2840.334560] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2840.341539] pc : update_blocked_averages+0x6d4/0x6e4 [ 2840.346513] lr : update_blocked_averages+0x6d4/0x6e4 [ 2840.351485] sp : ffff80001c82b680 [ 2840.354801] x29: ffff80001c82b680 x28: ffff00097ef65040 x27: ffff000800af9c00 [ 2840.361958] x26: 000000000000032b x25: 000002907cd947d8 x24: ffff000800af6a00 [ 2840.369113] x23: 0000000000000000 x22: ffff00097ef65940 x21: 0000000000000010 [ 2840.376267] x20: 0000000000000000 x19: ffff000800af6b40 x18: fffffffffffe6830 [ 2840.383420] x17: 6e6e75722e677661 x16: 3e2d71725f736663 x15: 207c7c206776615f [ 2840.390573] x14: 6c6974752e677661 x13: ffff8000120abeb0 x12: 00000000000004e3 [ 2840.397727] x11: 00000000000001a1 x10: ffff8000121065c8 x9 : fffffffffffe6830 [ 2840.404880] x8 : ffff8000120abeb0 x7 : ffff800012103eb0 x6 : 0000000000002718 [ 2840.412034] x5 : ffff00097ef56998 x4 : 0000000000000000 x3 : 0000000000000027 [ 2840.419187] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00081403e900 [ 2840.426341] Call trace: [ 2840.428787] update_blocked_averages+0x6d4/0x6e4 [ 2840.433411] newidle_balance+0x120/0x440 [ 2840.437340] pick_next_task_fair+0x30/0x364 [ 2840.441529] __schedule+0x178/0x854 [ 2840.445024] schedule+0x58/0xf0 [ 2840.448168] schedule_hrtimeout_range_clock+0x1c4/0x1e0 [ 2840.453404] schedule_hrtimeout_range+0x18/0x24 [ 2840.457942] do_sys_poll+0x378/0x594 [ 2840.461526] __arm64_sys_ppoll+0xa0/0x130 [ 2840.465543] invoke_syscall+0x48/0x114 [ 2840.469300] el0_svc_common.constprop.0+0x44/0xec [ 2840.474013] do_el0_svc+0x28/0x90 [ 2840.477333] el0_svc+0x20/0x60 [ 2840.480390] el0t_64_sync_handler+0xec/0xf0 [ 2840.484578] el0t_64_sync+0x1a0/0x1a4 [ 2840.488245] ---[ end trace e492db65168e971c ]--- [ 2862.399211] CPU0 cfs_rq path=/A/B load_avg=3 util_avg=0 runnable_avg=0 [ 3089.926846] CPU5 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [ 3211.275202] CPU0 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [ 3336.981919] CPU5 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [ 3603.649114] CPU4 cfs_rq path=/C/D load_avg=3 util_avg=0 runnable_avg=0 [ 4031.307261] CPU3 cfs_rq path=/C/D load_avg=2 util_avg=0 runnable_avg=1 [ 4572.845112] CPU3 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [ 5743.849159] CPU4 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [ 7969.595851] CPU0 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [ 8919.577350] CPU5 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [ 8936.024917] CPU4 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [ 9253.304513] CPU3 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [ 9319.116493] CPU5 cfs_rq path=/A/B load_avg=0 util_avg=0 runnable_avg=1 it seems we never hit the issue for util_avg? From our setup, we still see a difference behavior with the path, since dietmar@ can reproduce our issue, is it possible that we can compare the util_diff between the old code (reverted) to the new proposal also on your side? It also happens with util_avg. I ran the some test over night and there are occurrences now where util_avg != 0. It also seems to happen for autogroups (CONFIG_SCHED_AUTOGROUP=y). root@juno:~# [ 1066.485845] CPU4 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [ 1066.493221] ------------[ cut here ]------------ [ 1066.497851] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg [ 1066.497866] WARNING: CPU: 4 PID: 2114 at kernel/sched/fair.c:3309 cfs_rq_is_decayed.part.0+0x28/0x40 [ 1066.514810] Modules linked in: [ 1066.517877] CPU: 4 PID: 2114 Comm: hackbench Not tainted 5.16.0-rc1-00016-g13433e31f41a-dirty #59 [ 1066.526781] Hardware name: ARM Juno development board (r0) (DT) [ 1066.532716] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1066.539703] pc : cfs_rq_is_decayed.part.0+0x28/0x40 [ 1066.544600] lr : cfs_rq_is_decayed.part.0+0x28/0x40 [ 1066.549497] sp : ffff8000163db660 [ 1066.552817] x29: ffff8000163db660 x28: ffff00097ef9d040 x27: ffff000801e39000 [ 1066.559984] x26: 0000000000000199 x25: 000000d8dfe7a40f x24: ffff000801e3a400 [ 1066.567150] x23: 0000000000000000 x22: ffff00097ef9d980 x21: 0000000000000020 [ 1066.574315] x20: 0000000000000000 x19: 0000000000000001 x18: fffffffffffe6c20 [ 1066.581481] x17: 6e6e75722e677661 x16: 3e2d71725f736663 x15: 207c7c206776615f [ 1066.588647] x14: 6c6974752e677661 x13: ffff8000120ac470 x12: 0000000000000510 [ 1066.595813] x11: 00000000000001b0 x10: ffff800012106cf0 x9 : fffffffffffe6c20 [ 1066.602978] x8 : ffff8000120ac470 x7 : ffff800012104470 x6 : 0000000000002880 [ 1066.610144] x5 : ffff00097ef8e998 x4 : 0000000000000000 x3 : 0000000000000027 [ 1066.617309] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00080b194b00 [ 1066.624475] Call trace: [ 1066.626925] cfs_rq_is_decayed.part.0+0x28/0x40 [ 1066.631474] update_blocked_averages+0x658/0x670 [ 1066.636108] newidle_balance+0x120/0x440 [ 1066.640045] pick_next_task_fair+0x54/0x394 [ 1066.644242] __schedule+0x178/0x854 [ 1066.647745] schedule+0x58/0xf0 [ 1066.650897] schedule_hrtimeout_range_clock+0x1c4/0x1e0 [ 1066.656142] schedule_hrtimeout_range+0x18/0x24 [ 1066.660690] do_sys_poll+0x378/0x594 [ 1066.664282] __arm64_sys_ppoll+0xa0/0x130 [ 1066.668304] invoke_syscall+0x48/0x114 [ 1066.672070] el0_svc_common.constprop.0+0x44/0xec [ 1066.676792] do_el0_svc+0x28/0x90 [ 1066.680120] el0_svc+0x20/0x60 [ 1066.683184] el0t_64_sync_handler+0xec/0xf0 [ 1066.687381] el0t_64_sync+0x1a0/0x1a4 [ 1066.691056] ---[ end trace aebe3e8308c9e02a ]--- [ 1628.606043] CPU3 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [ 1935.577636] CPU5 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [ 2986.786826] CPU2 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [ 3539.998270] CPU5 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [ 4064.283506] CPU4 cfs_rq path=/C/D load_avg=2 util_avg=0 runnable_avg=0 [ 4138.144467] CPU0 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [ 4355.892939] CPU3 cfs_rq path=/C/D load_avg=3 util_avg=0 runnable_avg=0 [ 4822.404014] CPU2 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [ 5292.037811] CPU2 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0 [ 5554.329865] CPU4 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [ 5683.135217] CPU1 cfs_rq path=/A/B load_avg=3 util_avg=0 runnable_avg=0 [ 6716.362956] CPU1 cfs_rq path=/C/D load_avg=2 util_avg=3 runnable_avg=2 [ 9011.000614] CPU4 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [ 9240.703551] CPU5 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [ 9271.355775] CPU3 cfs_rq path=/C/D load_avg=2 util_avg=0 runnable_avg=2 [ 9710.115189] CPU0 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [10443.006634] CPU4 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [10581.886655] CPU0 cfs_rq path=/A/B load_avg=2 util_avg=0 runnable_avg=0 [11459.211727] CPU0 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [11468.425439] CPU1 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [11800.857444] CPU5 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [12104.986785] CPU0 cfs_rq path=/E/F load_avg=3 util_avg=0 runnable_avg=0 [12523.282196] CPU0 cfs_rq path=/A/B load_avg=8 util_avg=0 runnable_avg=0 [13953.285925] CPU1 cfs_rq path=/C/D load_avg=0 util_avg=1 runnable_avg=1 [14339.367705] CPU3 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [15099.604232] CPU3 cfs_rq path=/C/D load_avg=2 util_avg=0 runnable_avg=2 [16384.614556] CPU1 cfs_rq path=/C/D load_avg=4 util_avg=0 runnable_avg=0 [16752.505681] CPU0 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=1 [17505.691538] CPU0 cfs_rq path=/autogroup-28 load_avg=1 util_avg=0 runnable_avg=0 [18808.949903] CPU1 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [19763.671012] CPU5 cfs_rq path=/A/B load_avg=0 util_avg=0 runnable_avg=1 [20215.309354] CPU3 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [21194.694144] CPU3 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [21903.211030] CPU0 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [22314.040376] CPU4 cfs_rq path=/C/D load_avg=2 util_avg=0 runnable_avg=0 [22817.700703] CPU5 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [23271.921624] CPU5 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [24854.397233] CPU2 cfs_rq path=/autogroup-28 load_avg=4 util_avg=1 runnable_avg=1 [27022.508470] CPU1 cfs_rq path=/A/B load_avg=2 util_avg=0 runnable_avg=3 [27213.875019] CPU3 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [27790.516282] CPU5 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [29493.742868] CPU0 cfs_rq path=/C/D load_avg=2 util_avg=0 runnable_avg=0 [30298.648778] CPU1 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [31474.036813] CPU5 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [31510.215954] CPU0 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [31548.768221] CPU4 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [32945.462321] CPU2 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [34353.294144] CPU2 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [34376.961609] CPU3 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [34523.068040] CPU0 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [35222.645859] CPU0 cfs_rq path=/A/B load_avg=1 util_avg=0 runnable_avg=0 [35462.971562] CPU4 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [36394.186385] CPU4 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [36596.134208] CPU5 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [37376.955270] CPU3 cfs_rq path=/C/D load_avg=2 util_avg=0 runnable_avg=0 [37386.003371] CPU0 cfs_rq path=/C/D load_avg=3 util_avg=0 runnable_avg=2 [38711.712458] CPU4 cfs_rq path=/E/F load_avg=1 util_avg=0 runnable_avg=0 [40188.770061] CPU4 cfs_rq path=/C/D load_avg=1 util_avg=0 runnable_avg=0 [40210.565037] CPU3 cfs_rq path=/A/B load_avg=5 util_avg=0 runnable_avg=0 I added `Vincent's patch (3)` and `my new patch (4)*` to the logs in cpu_util_cfs(): * The issue always seems to happen when we call update_cfs_rq_load_avg() from update_blocked_averages(). But the majority of calls to update_cfs_rq_load_avg() are happening from update_load_avg() though. There a _avg/_sum sync doesn't seem to be necessary. So only do the sync from update_blocked_averages() -> __update_blocked_fair(). <...>-1931 [005] d..5. 190.656831: sugov_get_util: CPU0 util_avg=[721 8988 2592 8987] util_diff=[8267 1871 8266] util_diff_max=[8267 1871 8266] What does the trace show? util_avg=[721 8988 2592 8987] = [mainline (1), reverted (2), Vincent's patch (3), my new patch (4)*] util_diff=[8267 1871 8266] = [(2) - (1), (3) - (1), (4) - (1)] util_diff_max = [max((2) - (1)), max((3) - (1)), max((4) - (1))] running simple hackbench in '/'. <...>-1931 [005] d..5. 190.656831: sugov_get_util: CPU0 util_avg=[721 8988 2592 8987] util_diff=[8267 1871 8266] util_diff_max=[8267 1871 8266] <...>-1931 [005] d..5. 190.656839: sugov_get_util: CPU3 util_avg=[313 862 410 862] util_diff=[549 97 549] util_diff_max=[5576 3269 5576] <...>-1931 [005] d..5. 190.656846: sugov_get_util: CPU4 util_avg=[237 1283 412 1283] util_diff=[1046 175 1046] util_diff_max=[5548 1717 5548] <...>-1931 [005] d..5. 190.656853: sugov_get_util: CPU5 util_avg=[608 7961 4247 7961] util_diff=[7353 3639 7353] util_diff_max=[7353 3639 7353] <...>-1931 [005] d..5. 190.656959: sugov_get_util: CPU0 util_avg=[721 8988 2592 8987] util_diff=[8267 1871 8266] util_diff_max=[8267 1871 8266] <...>-1931 [005] d..5. 190.656966: sugov_get_util: CPU3 util_avg=[313 862 410 862] util_diff=[549 97 549] util_diff_max=[5576 3269 5576] <...>-1931 [005] d..5. 190.656973: sugov_get_util: CPU4 util_avg=[237 1283 412 1283] util_diff=[1046 175 1046] util_diff_max=[5548 1717 5548] <...>-1931 [005] d..5. 190.656979: sugov_get_util: CPU5 util_avg=[611 8018 4275 8018] util_diff=[7407 3664 7407] util_diff_max=[7407 3664 7407] hackbench-1913 [000] d..3. 190.656988: sugov_get_util: CPU0 util_avg=[718 9050 2605 9049] util_diff=[8332 1887 8331] util_diff_max=[8332 1887 8331] hackbench-1913 [000] d..3. 190.656995: sugov_get_util: CPU3 util_avg=[313 862 410 862] util_diff=[549 97 549] util_diff_max=[5576 3269 5576] hackbench-1913 [000] d..3. 190.657002: sugov_get_util: CPU4 util_avg=[237 1283 412 1283] util_diff=[1046 175 1046] util_diff_max=[5548 1717 5548] hackbench-1913 [000] d..3. 190.657009: sugov_get_util: CPU5 util_avg=[611 8018 4275 8018] util_diff=[7407 3664 7407] util_diff_max=[7407 3664 7407] hackbench-1933 [000] d..5. 190.657109: sugov_get_util: CPU0 util_avg=[721 9118 2624 9117] util_diff=[8397 1903 8396] util_diff_max=[8397 1903 8396] hackbench-1933 [000] d..5. 190.657117: sugov_get_util: CPU3 util_avg=[313 862 410 862] util_diff=[549 97 549] util_diff_max=[5576 3269 5576] hackbench-1933 [000] d..5. 190.657124: sugov_get_util: CPU4 util_avg=[237 1283 412 1283] util_diff=[1046 175 1046] util_diff_max=[5548 1717 5548] hackbench-1933 [000] d..5. 190.657130: sugov_get_util: CPU5 util_avg=[608 8069 4297 8069] util_diff=[7461 3689 7461] util_diff_max=[7461 3689 7461] hackbench-1908 [005] d..3. 190.657140: sugov_get_util: CPU0 util_avg=[721 9118 2624 9117] util_diff=[8397 1903 8396] util_diff_max=[8397 1903 8396] hackbench-1908 [005] d..3. 190.657148: sugov_get_util: CPU3 util_avg=[313 862 410 862] util_diff=[549 97 549] util_diff_max=[5576 3269 5576] hackbench-1908 [005] d..3. 190.657155: sugov_get_util: CPU4 util_avg=[237 1283 412 1283] util_diff=[1046 175 1046] util_diff_max=[5548 1717 5548] hackbench-1908 [005] d..3. 190.657162: sugov_get_util: CPU5 util_avg=[608 8069 4297 8069] util_diff=[7461 3689 7461] util_diff_max=[7461 3689 7461] hackbench-1927 [003] d.h2. 190.657189: sugov_get_util: CPU0 util_avg=[721 9118 2624 9117] util_diff=[8397 1903 8396] util_diff_max=[8397 1903 8396] hackbench-1927 [003] d.h2. 190.657197: sugov_get_util: CPU3 util_avg=[315 862 411 862] util_diff=[547 96 547] util_diff_max=[5576 3269 5576] hackbench-1927 [003] d.h2. 190.657204: sugov_get_util: CPU4 util_avg=[240 1313 419 1313] util_diff=[1073 179 1073] util_diff_max=[5548 1717 5548] hackbench-2062 [002] d..5. 190.657205: sugov_get_util: CPU1 util_avg=[968 1081 2044 1081] util_diff=[113 1076 113] util_diff_max=[534 1531 534] hackbench-2062 [002] d..5. 190.657209: sugov_get_util: CPU2 util_avg=[890 1120 1818 1120] util_diff=[230 928 230] util_diff_max=[4896 1657 4896] hackbench-1927 [003] d.h2. 190.657211: sugov_get_util: CPU5 util_avg=[608 8069 4297 8069] util_diff=[7461 3689 7461] util_diff_max=[7461 3689 7461] Created attachment 299981 [details]
sched/fair: Sync _avg/_sum in update_cfs_rq_load_avg() only when called from update_blocked_averages()
(In reply to dietmar.eggemann from comment #25) > I added `Vincent's patch (3)` and `my new patch (4)*` to the logs in > cpu_util_cfs(): My previous patch was buggy as util_avg was set twice instead of util_sum for the 2nd update. > > * The issue always seems to happen when we call update_cfs_rq_load_avg() > from update_blocked_averages(). But the majority of calls to > update_cfs_rq_load_avg() are happening from update_load_avg() though. There > a _avg/_sum sync doesn't seem to be necessary. So only do the sync from > update_blocked_averages() -> __update_blocked_fair(). Why it is not necessary ? The original problem happens when propagating the util_avg change down in the hierarchy. We have to make sure to not remove more util_sum than what was currently accounted. I have pushed a new version which only updates util_avg and doesn't touch load and runnable. I have replaced hard sync of util_sum ie util_sum = util_avg*divider by ensuring that we don't remove more than actual util_sum > > <...>-1931 [005] d..5. 190.656831: sugov_get_util: CPU0 util_avg=[721 8988 > 2592 8987] util_diff=[8267 1871 8266] util_diff_max=[8267 1871 8266] > > What does the trace show? > > util_avg=[721 8988 2592 8987] = [mainline (1), reverted (2), Vincent's > patch (3), my new patch (4)*] > > util_diff=[8267 1871 8266] = [(2) - (1), (3) - (1), (4) - (1)] > > util_diff_max = [max((2) - (1)), max((3) - (1)), max((4) - (1))] > > running simple hackbench in '/'. > Created attachment 299983 [details]
Don't sync hardly util_sum with uti_avg
I tested the patch, Don't sync hardly util_sum with uti_avg, and the power and performance are similar to reverting the patch in question. Thanks for the great work from you guys! I still see the SCHED_WARN_ON triggering in cfs_rq_is_decayed() because of util_avg != 0 with patch from attachment 299983 [details]:
root@juno:~# [ 477.107078] CPU0 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 477.114447] ------------[ cut here ]------------
[ 477.119079] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg
[ 477.119098] WARNING: CPU: 0 PID: 16113 at kernel/sched/fair.c:3309 cfs_rq_is_decayed.part.0+0x28/0x40
[ 477.136139] Modules linked in:
[ 477.139208] CPU: 0 PID: 16113 Comm: hackbench Not tainted 5.16.0-rc1-00015-gf86c224d1a26 #76
[ 477.147683] Hardware name: ARM Juno development board (r0) (DT)
[ 477.153622] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 477.160613] pc : cfs_rq_is_decayed.part.0+0x28/0x40
[ 477.165516] lr : cfs_rq_is_decayed.part.0+0x28/0x40
[ 477.170418] sp : ffff80001847bad0
[ 477.173741] x29: ffff80001847bad0 x28: ffff000802e1c200 x27: ffff00097ef2d040
[ 477.180915] x26: 0000006b9e571261 x25: ffff000801cbce00 x24: 0000000000000000
[ 477.188087] x23: ffff00097ef2d980 x22: 0000000000000000 x21: 0000000000000000
[ 477.195258] x20: ffff000801cbcf40 x19: 0000000000000001 x18: fffffffffffe6860
[ 477.202430] x17: 6e6e75722e677661 x16: 3e2d71725f736663 x15: 207c7c206776615f
[ 477.209602] x14: 6c6974752e677661 x13: ffff8000120ac470 x12: 00000000000004e6
[ 477.216774] x11: 00000000000001a2 x10: ffff800012106ba0 x9 : fffffffffffe6860
[ 477.223946] x8 : ffff8000120ac470 x7 : ffff800012104470 x6 : 0000000000002730
[ 477.231118] x5 : ffff00097ef1e998 x4 : 0000000000000000 x3 : 0000000000000027
[ 477.238289] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000813f98f00
[ 477.245462] Call trace:
[ 477.247914] cfs_rq_is_decayed.part.0+0x28/0x40
[ 477.252469] update_blocked_averages+0x664/0x674
[ 477.257106] newidle_balance+0x120/0x440
[ 477.261046] pick_next_task_fair+0x54/0x394
[ 477.265247] __schedule+0x178/0x854
[ 477.268753] do_task_dead+0x4c/0x50
[ 477.272257] do_exit+0x628/0xa20
[ 477.275501] do_group_exit+0x3c/0xa0
[ 477.279092] __arm64_sys_exit_group+0x1c/0x20
[ 477.283467] invoke_syscall+0x48/0x114
[ 477.287236] el0_svc_common.constprop.0+0x44/0xec
[ 477.291963] do_el0_svc+0x28/0x90
[ 477.295294] el0_svc+0x20/0x60
[ 477.298363] el0t_64_sync_handler+0xec/0xf0
[ 477.302563] el0t_64_sync+0x1a0/0x1a4
[ 477.306242] ---[ end trace 31e9af861b1c7ed8 ]---
root@juno:~#
root@juno:~#
root@juno:~#
root@juno:~#
root@juno:~#
root@juno:~# [ 835.392995] CPU1 cfs_rq path=/E/F load_avg=0 util_avg=2 runnable_avg=0
[ 880.985178] CPU4 cfs_rq path=/autogroup-25 load_avg=0 util_avg=3 runnable_avg=0
[ 1025.272039] CPU4 cfs_rq path=/autogroup-25 load_avg=0 util_avg=2 runnable_avg=0
[ 1134.509873] CPU1 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0
[ 1136.867879] CPU3 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 1272.447674] CPU0 cfs_rq path=/autogroup-17 load_avg=0 util_avg=1 runnable_avg=0
[ 1321.013673] CPU3 cfs_rq path=/A/B load_avg=0 util_avg=1 runnable_avg=0
[ 1363.695423] CPU2 cfs_rq path=/C/D load_avg=0 util_avg=1 runnable_avg=0
[ 1909.162797] CPU0 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 1957.484775] CPU5 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 1962.405342] CPU4 cfs_rq path=/A/B load_avg=0 util_avg=1 runnable_avg=0
[ 2029.380496] CPU2 cfs_rq path=/C/D load_avg=0 util_avg=1 runnable_avg=0
[ 2339.386024] CPU4 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 2373.174107] CPU2 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0
[ 2502.927291] CPU3 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0
[ 2643.261132] CPU0 cfs_rq path=/C/D load_avg=0 util_avg=1 runnable_avg=0
[ 2696.341857] CPU5 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 2732.358723] CPU1 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0
[ 3173.668700] CPU3 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 3173.798572] CPU0 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 3292.889893] CPU4 cfs_rq path=/C/D load_avg=0 util_avg=1 runnable_avg=0
[ 3415.400678] CPU5 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 3420.371131] CPU1 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0
[ 3598.737192] CPU5 cfs_rq path=/A/B load_avg=0 util_avg=1 runnable_avg=0
[ 3900.997003] CPU4 cfs_rq path=/autogroup-25 load_avg=0 util_avg=2 runnable_avg=0
[ 4218.346232] CPU0 cfs_rq path=/autogroup-25 load_avg=0 util_avg=2 runnable_avg=0
[ 4238.125713] CPU3 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 4255.044767] CPU0 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 4558.395665] CPU5 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 4592.389457] CPU0 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 4671.489079] CPU4 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 runnable_avg=0
[ 4869.572101] CPU4 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0
Will look into the code more deeply this afternoon.
(In reply to dietmar.eggemann from comment #30) > I still see the SCHED_WARN_ON triggering in cfs_rq_is_decayed() because of > util_avg != 0 with patch from attachment 299983 [details]: OK, I will have a deeper look at your setup to try to reproduce the problem [snip] > [ 4592.389457] CPU0 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 > runnable_avg=0 > [ 4671.489079] CPU4 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 > runnable_avg=0 > [ 4869.572101] CPU4 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0 > > Will look into the code more deeply this afternoon. (In reply to vincent.guittot from comment #31) > (In reply to dietmar.eggemann from comment #30) > > I still see the SCHED_WARN_ON triggering in cfs_rq_is_decayed() because of > > util_avg != 0 with patch from attachment 299983 [details]: > > OK, I will have a deeper look at your setup to try to reproduce the problem I have been able to reproduce the Warn and it seems that there is 2 different problems. One fixed by the patch above and another one where the util_sum of a se about to be detach is often 1 above the the cfs. This is not a major problem when there is 1-1 map between cfs and se but if you have lot of tasks attached on the cfs and a lot of detach between 2 update of the util_avg, you can end up with cfs->util_sum being cleared whereas util_avg is not. Will continue to investigate this +1 diff > > [snip] > > > [ 4592.389457] CPU0 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 > > runnable_avg=0 > > [ 4671.489079] CPU4 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 > > runnable_avg=0 > > [ 4869.572101] CPU4 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0 > > > > Will look into the code more deeply this afternoon. (In reply to vincent.guittot from comment #32) > (In reply to vincent.guittot from comment #31) > > (In reply to dietmar.eggemann from comment #30) > > > I still see the SCHED_WARN_ON triggering in cfs_rq_is_decayed() because > of > > > util_avg != 0 with patch from attachment 299983 [details]: > > > > OK, I will have a deeper look at your setup to try to reproduce the problem > > I have been able to reproduce the Warn and it seems that there is 2 > different problems. One fixed by the patch above and another one where the > util_sum of a se about to be detach is often 1 above the the cfs. > This is not a major problem when there is 1-1 map between cfs and se but if > you have lot of tasks attached on the cfs and a lot of detach between 2 > update of the util_avg, you can end up with cfs->util_sum being cleared > whereas util_avg is not. > > Will continue to investigate this +1 diff I have updated the patch with more checks on util_sum. The WARN has not raised with this patch whereas it was with previous and the performance should be restored > > > > > [snip] > > > > > [ 4592.389457] CPU0 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 > > > runnable_avg=0 > > > [ 4671.489079] CPU4 cfs_rq path=/autogroup-25 load_avg=0 util_avg=1 > > > runnable_avg=0 > > > [ 4869.572101] CPU4 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0 > > > > > > Will look into the code more deeply this afternoon. Created attachment 300033 [details]
Don't sync hardly util_sum with uti_avg
v2 with fixes
I tested attachment 299983 [details]:
I see these occurrences in detach_entity_load_avg() in which util_sum=0 and util_avg!=0.
But they are not related to the occurrences of the SCHED_WARN_ON (or my extra log) in cfs_rq_is_decayed().
[ 243.293249] detach_entity_load_avg: CPU3 cfs_rq path=/C/D util_avg=1 util_sum=0 nr_running=0 p->comm=hackbench p->pid=10878
[ 389.643737] detach_entity_load_avg: CPU3 cfs_rq path=/C/D util_avg=2 util_sum=0 nr_running=0 p->comm=hackbench p->pid=9373
[ 447.497054] detach_entity_load_avg: CPU3 cfs_rq path=/C/D util_avg=1 util_sum=0 nr_running=0 p->comm=hackbench p->pid=15451
[ 452.778746] cfs_rq_is_decayed: CPU5 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0 <-- there was no issue in detach_entity_load_avg() for the cfs_rq for /E/F on CPU5
[ 452.786987] ------------[ cut here ]------------
[ 452.791625] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg
[ 452.791643] WARNING: CPU: 5 PID: 17580 at kernel/sched/fair.c:3309 cfs_rq_is_decayed.part.0+0x28/0x40
[ 452.808688] Modules linked in:
[ 452.811760] CPU: 5 PID: 17580 Comm: hackbench Not tainted 5.16.0-rc1-00015-gf86c224d1a26-dirty #80
[ 452.820758] Hardware name: ARM Juno development board (r0) (DT)
[ 452.826699] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 452.833691] pc : cfs_rq_is_decayed.part.0+0x28/0x40
[ 452.838598] lr : cfs_rq_is_decayed.part.0+0x28/0x40
[ 452.843501] sp : ffff800012d93810
[ 452.846826] x29: ffff800012d93810 x28: ffff000800196a00 x27: ffff000800199680
[ 452.853999] x26: 00000064b48ec965 x25: ffff000800199600 x24: 0000000000000000
[ 452.861171] x23: ffff00097efb9980 x22: 0000000000000028 x21: 0000000000000000
[ 452.868343] x20: ffff000800199740 x19: 0000000000000001 x18: fffffffffffe69e8
[ 452.875515] x17: 6e6e75722e677661 x16: 3e2d71725f736663 x15: 207c7c206776615f
[ 452.882687] x14: 6c6974752e677661 x13: ffff8000120ac470 x12: 00000000000004ef
[ 452.889859] x11: 00000000000001a5 x10: ffff800012106be8 x9 : fffffffffffe69e8
[ 452.897030] x8 : ffff8000120ac470 x7 : ffff800012104470 x6 : 0000000000002778
[ 452.904202] x5 : ffff00097efaa998 x4 : 0000000000000000 x3 : 0000000000000027
[ 452.911373] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00080d392d00
[ 452.918544] Call trace:
[ 452.920999] cfs_rq_is_decayed.part.0+0x28/0x40
[ 452.925558] update_blocked_averages+0x664/0x674
[ 452.930200] newidle_balance+0x120/0x440
[ 452.934145] pick_next_task_fair+0x54/0x394
[ 452.938350] __schedule+0x178/0x854
[ 452.941861] schedule+0x58/0xf0
[ 452.945021] schedule_timeout+0x17c/0x1c0
[ 452.949054] unix_stream_read_generic+0x590/0x9e0
[ 452.953782] unix_stream_recvmsg+0x4c/0x74
[ 452.957898] sock_read_iter+0x94/0xf0
[ 452.961584] new_sync_read+0x16c/0x180
[ 452.965358] vfs_read+0x110/0x1b0
[ 452.968691] ksys_read+0xdc/0xfc
[ 452.971935] __arm64_sys_read+0x20/0x30
[ 452.975790] invoke_syscall+0x48/0x114
[ 452.979564] el0_svc_common.constprop.0+0x44/0xec
[ 452.984296] do_el0_svc+0x28/0x90
[ 452.987632] el0_svc+0x20/0x60
[ 452.990707] el0t_64_sync_handler+0xec/0xf0
[ 452.994911] el0t_64_sync+0x1a0/0x1a4
[ 452.998592] ---[ end trace cdb2a0b6d32bfb33 ]---
[ 562.944024] cfs_rq_is_decayed: CPU2 cfs_rq path=/E/F load_avg=0 util_avg=5 runnable_avg=0 <-- similar here
[ 633.640478] detach_entity_load_avg: CPU0 cfs_rq path=/C/D util_avg=2 util_sum=0 nr_running=0 p->comm=hackbench p->pid=7964
[ 786.980570] detach_entity_load_avg: CPU3 cfs_rq path=/A/B util_avg=1 util_sum=0 nr_running=0 p->comm=hackbench p->pid=12011
[ 799.717605] detach_entity_load_avg: CPU3 cfs_rq path=/E/F util_avg=1 util_sum=0 nr_running=0 p->comm=hackbench p->pid=20358
[ 1012.779051] detach_entity_load_avg: CPU5 cfs_rq path=/A/B util_avg=1 util_sum=0 nr_running=17 p->comm=hackbench p->pid=30642
[ 1020.304032] detach_entity_load_avg: CPU4 cfs_rq path=/C/D util_avg=1 util_sum=0 nr_running=0 p->comm=hackbench p->pid=3275
[ 1253.697545] cfs_rq_is_decayed: CPU2 cfs_rq path=/E/F load_avg=0 util_avg=2 runnable_avg=0 <-- similar here
[ 1352.828710] detach_entity_load_avg: CPU0 cfs_rq path=/C/D util_avg=1 util_sum=0 nr_running=6 p->comm=hackbench p->pid=27432
[ 1352.839943] detach_entity_load_avg: CPU0 cfs_rq path=/C/D util_avg=1 util_sum=0 nr_running=5 p->comm=hackbench p->pid=27428
[ 1365.869758] detach_entity_load_avg: CPU0 cfs_rq path=/C/D util_avg=1 util_sum=0 nr_running=0 p->comm=hackbench p->pid=4555
[ 1649.494085] detach_entity_load_avg: CPU3 cfs_rq path=/A/B util_avg=2 util_sum=0 nr_running=3 p->comm=hackbench p->pid=28936
[ 1670.581267] detach_entity_load_avg: CPU3 cfs_rq path=/A/B util_avg=1 util_sum=0 nr_running=0 p->comm=hackbench p->pid=11141
[ 1679.517397] cfs_rq_is_decayed: CPU2 cfs_rq path=/E/F load_avg=0 util_avg=1 runnable_avg=0 <-- similar here
This still happens if I enforce `cfs_rq->avg.util_avg = 0` in case `cfs_rq->avg.util_sum = 0` in detach_entity_load_avg():
root@juno:~# [ 340.826411] detach_entity_load_avg: CPU3 cfs_rq path=/C/D util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=19386
[ 448.784662] detach_entity_load_avg: CPU0 cfs_rq path=/A/B util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=24555
[ 488.811991] detach_entity_load_avg: CPU3 cfs_rq path=/C/D util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=18641
[ 575.915494] cfs_rq_is_decayed: CPU2 cfs_rq path=/A/B load_avg=0 util_avg=2 runnable_avg=0
[ 575.923722] ------------[ cut here ]------------
[ 575.928345] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg
[ 575.928355] WARNING: CPU: 2 PID: 11447 at kernel/sched/fair.c:3309 cfs_rq_is_decayed.part.0+0x28/0x40
[ 575.945368] Modules linked in:
[ 575.948429] CPU: 2 PID: 11447 Comm: hackbench Not tainted 5.16.0-rc1-00016-g2ae337cb9a3b-dirty #81
[ 575.957413] Hardware name: ARM Juno development board (r0) (DT)
[ 575.963343] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 575.970323] pc : cfs_rq_is_decayed.part.0+0x28/0x40
[ 575.975215] lr : cfs_rq_is_decayed.part.0+0x28/0x40
[ 575.980105] sp : ffff800018a43810
[ 575.983422] x29: ffff800018a43810 x28: ffff0008003af000 x27: ffff0008019ab480
[ 575.984787] cpufreq: __target_index: Failed to change cpu frequency: -5
[ 575.990583] x26: 000000852b1c9b4a x25: ffff0008019ab400 x24: 0000000000000000
[ 575.990595] x23: ffff00097ef65980 x22: 0000000000000010 x21: 0000000000000000
[ 576.011518] x20: ffff0008019ab540 x19: 0000000000000001 x18: fffffffffffe69f0
[ 576.018675] x17: 6e6e75722e677661 x16: 3e2d71725f736663 x15: 207c7c206776615f
[ 576.025832] x14: 6c6974752e677661 x13: ffff8000120ac470 x12: 00000000000004ef
[ 576.032987] x11: 00000000000001a5 x10: ffff800012106be8 x9 : fffffffffffe69f0
[ 576.036783] cpufreq: __target_index: Failed to change cpu frequency: -5
[ 576.040144] x8 : ffff8000120ac470 x7 : ffff800012104470 x6 : 0000000000002778
[ 576.053941] x5 : ffff00097ef56998 x4 : 0000000000000000 x3 : 0000000000000027
[ 576.061098] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00080bdb5a00
[ 576.068254] Call trace:
[ 576.070701] cfs_rq_is_decayed.part.0+0x28/0x40
[ 576.075244] update_blocked_averages+0x664/0x674
[ 576.079873] newidle_balance+0x120/0x440
[ 576.083804] pick_next_task_fair+0x54/0x394
[ 576.087996] __schedule+0x178/0x854
[ 576.091493] schedule+0x58/0xf0
[ 576.092791] cpufreq: __target_index: Failed to change cpu frequency: -5
[ 576.094639] schedule_timeout+0x17c/0x1c0
[ 576.105296] unix_stream_read_generic+0x590/0x9e0
[ 576.110011] unix_stream_recvmsg+0x4c/0x74
[ 576.114114] sock_read_iter+0x94/0xf0
[ 576.117787] new_sync_read+0x16c/0x180
[ 576.121545] vfs_read+0x110/0x1b0
[ 576.124866] ksys_read+0xdc/0xfc
[ 576.128099] __arm64_sys_read+0x20/0x30
[ 576.131942] invoke_syscall+0x48/0x114
[ 576.135701] el0_svc_common.constprop.0+0x44/0xec
[ 576.140417] do_el0_svc+0x28/0x90
[ 576.143739] el0_svc+0x20/0x60
[ 576.146798] el0t_64_sync_handler+0xec/0xf0
[ 576.148777] cpufreq: __target_index: Failed to change cpu frequency: -5
[ 576.150989] el0t_64_sync+0x1a0/0x1a4
[ 576.150997] ---[ end trace fdcc5837352241ef ]---
[ 649.748470] detach_entity_load_avg: CPU3 cfs_rq path=/C/D util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=28197
[ 787.941272] detach_entity_load_avg: CPU5 cfs_rq path=/E/F util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=20989
[ 950.895367] detach_entity_load_avg: CPU4 cfs_rq path=/E/F util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=32216
[ 1099.549972] detach_entity_load_avg: CPU3 cfs_rq path=/C/D util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=427
[ 1123.271798] detach_entity_load_avg: CPU3 cfs_rq path=/E/F util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=15682
[ 1287.703202] detach_entity_load_avg: CPU5 cfs_rq path=/C/D util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=26131
[ 1416.188977] cfs_rq_is_decayed: CPU4 cfs_rq path=/C/D load_avg=0 util_avg=1 runnable_avg=0
[ 1740.078759] detach_entity_load_avg: CPU0 cfs_rq path=/C/D util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=32253
[ 1775.830270] detach_entity_load_avg: CPU0 cfs_rq path=/C/D util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=23562
[ 1781.149965] detach_entity_load_avg: CPU0 cfs_rq path=/E/F util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=27758
[ 1968.274934] detach_entity_load_avg: CPU3 cfs_rq path=/E/F util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=21688
[ 2005.686897] cfs_rq_is_decayed: CPU1 cfs_rq path=/E/F load_avg=0 util_avg=2 runnable_avg=0
[ 2176.589197] cfs_rq_is_decayed: CPU5 cfs_rq path=/A/B load_avg=0 util_avg=2 runnable_avg=0
[ 2315.395927] detach_entity_load_avg: CPU3 cfs_rq path=/E/F util_avg=0 util_sum=0 nr_running=0 p->comm=hackbench p->pid=24056
Let me look at your new patch (attachment 300033 [details]) now.
The new patch: attachment 300033 [details] has run since last evening on hikey/hikey960 and db845 without triggering warning whereas it was happening in less than a hour previously.
Have you been able to trigger a WARN ?
I did run it now for ~4h on juno-r0 w/o any warning. IMHO, the hunk in update_cfs_rq_load_avg() - sa->util_sum = sa->util_avg * divider; + sub_positive(&sa->util_sum, r * divider); + /* See update_tg_cfs_util() */ + sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * divider); removes the performance regression. So util is fixed. But what about load & runnable? For them we would still do `sa->xxx_sum = sa->xxx_avg * divider` There are a couple of other questions but it might be better to discuss them directly on lkml. (In reply to dietmar.eggemann from comment #38) > I did run it now for ~4h on juno-r0 w/o any warning. > > IMHO, the hunk in update_cfs_rq_load_avg() > > - sa->util_sum = sa->util_avg * divider; > + sub_positive(&sa->util_sum, r * divider); > + /* See update_tg_cfs_util() */ > + sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * > divider); > > removes the performance regression. > > So util is fixed. But what about load & runnable? For them we would still do > `sa->xxx_sum = sa->xxx_avg * divider` That's the next step but I wanted to make sure that util_avg is fixed 1st. I should be able to apply exact same changes on runnable_avg It will be less straightforward for load which behave differently > > There are a couple of other questions but it might be better to discuss them > directly on lkml. Yeah, I was waiting feedback before preparing a official patch for lkml that will fix util_avg and runnable_avg at least so we can continue the discussion there. I will also have a look at this rounding problem in parallel I have sent a new version of the patch which includes runnable_sum and load_sum. The patchset sent on the mailing list is different from the one tested here as it uses (LOAD_AVG_MAX - 1024) instead of (LOAD_AVG_MAX - 1024 + period_contrib) email id of the patchset : 20211221094653.10776-1-vincent.guittot@linaro.org I tested the new patches, and the power and performance are similar. Thanks. |