Bug 7476 - migration cost estimator inaccurate due to touch_cache() defects
Summary: migration cost estimator inaccurate due to touch_cache() defects
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Process Management
Classification: Unclassified
Component: Scheduler (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Ingo Molnar
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-11-08 16:52 UTC by Bela Lubkin
Modified: 2007-11-17 19:12 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.19-rc5-mm1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Test program to demonstrate touch_cache() algorithm flaws (1.85 KB, text/plain)
2006-11-09 01:27 UTC, Bela Lubkin
Details
Test program with new test_cache() routine (4.66 KB, text/plain)
2006-11-10 16:51 UTC, Bela Lubkin
Details
Test script for updated touch_cache() (664 bytes, text/plain)
2006-11-10 16:52 UTC, Bela Lubkin
Details
Test program with new test_cache() routine (updated) (5.95 KB, text/plain)
2006-12-11 16:55 UTC, Bela Lubkin
Details
Test script for updated touch_cache() (860 bytes, text/plain)
2006-12-11 16:57 UTC, Bela Lubkin
Details
2.6.19-git18 patch (4.12 KB, patch)
2006-12-11 16:59 UTC, Bela Lubkin
Details | Diff

Description Bela Lubkin 2006-11-08 16:52:55 UTC
This loop in kernel/sched.c:touch_cache():

        for (i = 0; i < size / 6; i += 8) {
                switch (i % 6) {
                        case 0: cache[i]++;         /* 1st third, forward  */
                        case 1: cache[size-1-i]++;  /* 3rd third, backward */
                        case 2: cache[chunk1-i]++;  /* 1st third, backward */
                        case 3: cache[chunk1+i]++;  /* 2nd third, forward  */
                        case 4: cache[chunk2-i]++;  /* 2nd third, backward */
                        case 5: cache[chunk2+i]++;  /* 3rd third, forward  */
                }
        }

can only touch the first two thirds of the buffer because `i % 8' will always be 
even.  In order to visit the odd cases of the switch, the for loop's stride 
should be relatively prime to the modulus, e.g. 7.

This at least partially explains <http://lkml.org/lkml/2005/6/21/473>.
Comment 1 Ingo Molnar 2006-11-09 00:13:27 UTC
* Andrew Morton <akpm@osdl.org> wrote:

> This loop in kernel/sched.c:touch_cache():
> 
>         for (i = 0; i < size / 6; i += 8) {
>                 switch (i % 6) {
>                         case 0: cache[i]++;         /* 1st third, forward  */
>                         case 1: cache[size-1-i]++;  /* 3rd third, backward */
>                         case 2: cache[chunk1-i]++;  /* 1st third, backward */
>                         case 3: cache[chunk1+i]++;  /* 2nd third, forward  */
>                         case 4: cache[chunk2-i]++;  /* 2nd third, backward */
>                         case 5: cache[chunk2+i]++;  /* 3rd third, forward  */
>                 }
>         }
> 
> can only touch the first two thirds of the buffer because `i % 8' will 
> always be even.  In order to visit the odd cases of the switch, the 
> for loop's stride should be relatively prime to the modulus, e.g. 7.
> 
> This at least partially explains <http://lkml.org/lkml/2005/6/21/473>.

indeed - good catch. The patch below should fix this. Boot-tested. Also 
added some comments.

	Ingo

---------------------->
Subject: [patch] sched: fix migration cost estimator
From: Ingo Molnar <mingo@elte.hu>

blubkin@vmware.com found the following bug: touch_cache()
only touched 2 thirds of the cache buffer (hence wasting
memory and possibly making calibration less accurate).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

Index: linux-rt-new.q/kernel/sched.c
===================================================================
--- linux-rt-new.q.orig/kernel/sched.c
+++ linux-rt-new.q/kernel/sched.c
@@ -5721,14 +5721,24 @@ static void touch_cache(void *__cache, u
 	unsigned long *cache = __cache;
 	int i;
 
+	/*
+	 * We access the buffer via 6 independent 'streams' of
+	 * read/write access which are interleaved:
+	 *
+	 * [--->     <---|--->     <---|--->     <---]
+	 *
+	 * We touch every cacheline in the buffer. (iteration
+	 * step is 32 bytes on 32-bit systems, 64 bytes on
+	 * 64-bit systems)
+	 */
 	for (i = 0; i < size/6; i += 8) {
-		switch (i % 6) {
-			case 0: cache[i]++;
-			case 1: cache[size-1-i]++;
-			case 2: cache[chunk1-i]++;
-			case 3: cache[chunk1+i]++;
-			case 4: cache[chunk2-i]++;
-			case 5: cache[chunk2+i]++;
+		switch (i % 6*8) {
+			case 0*8: cache[i]++;
+			case 1*8: cache[size-1-i]++;
+			case 2*8: cache[chunk1-i]++;
+			case 3*8: cache[chunk1+i]++;
+			case 4*8: cache[chunk2-i]++;
+			case 5*8: cache[chunk2+i]++;
 		}
 	}
 }



Comment 2 Bela Lubkin 2006-11-09 01:27:23 UTC
Created attachment 9440 [details]
Test program to demonstrate touch_cache() algorithm flaws
Comment 3 Bela Lubkin 2006-11-09 01:46:49 UTC
The proposed patch (multiplying everything by 8) doesn't change it at all.  Now 
instead of comparing even numbers to 0,1,2,3,4,5, we compare multiples of 16 to 
multiples [0..5] of 8.  Half still can never be matched.

But it turns out that's irrelevant because I also didn't notice the lack of 
`break;' statements in the original code.  The original _does_ touch all the 
cache, by mistake.

I tacked on an attachment, a trivial C program that shows how the original and 
proposed patch are conceptually flawed (when implemented with `break;').

Meanwhile, the original loop goes from 0..size/6 by steps of 8, so it generates 
1/48 of the cache addresses.  `cache' is declared `unsigned long *cache' in the 
function, so on 32-bit this is doing an average of one 32-bit write per 192 
bytes of test cache -- one in 6 cache lines, if the comment about cache line 
size is correct.

Conclusion: needs a little work to ensure this is actually doing what the 
comments think it's doing...
Comment 4 Bela Lubkin 2006-11-09 02:07:35 UTC
Ok, now I see.  I think this code must have been right at one point, then the 
`switch' got wrapped around it and made it wrong.

This loop actually touches all the cache lines (on 32-bit):

        for (i = 0; i < size/6; i += 8) {
                cache[i]++;
                cache[size-1-i]++;
                cache[chunk1-i]++;
                cache[chunk1+i]++;
                cache[chunk2-i]++;
                cache[chunk2+i]++;
        }

That is, it writes to at least one longword within each 8 longwords.  Not all at 
the same offsets, since `chunk1' and `chunk2' can be a bit off and since 
`cache[size - 1]' is at the end of a cache line while `cache[0]' is at the 
beginning.

Kenneth W Chen's observation that:

> I'm consistently getting a smaller than expected cache migration cost 
> as measured by Ingo's scheduler-cache-hot-autodetect.patch currently 
> in -mm tree.

makes even more sense now.  This was pulling at most 1/6 of the test memory into 
cache on 32-bit, and either 1/6 or 1/12 on 64-bit.

Although my initial analysis was a bunch of bollocks, I was right when I thought 
I smelled something...
Comment 5 Andrew Morton 2006-11-10 00:31:05 UTC
Queued a fix from Ingo
Comment 6 Bela Lubkin 2006-11-10 16:51:23 UTC
Created attachment 9453 [details]
Test program with new test_cache() routine

This is a more robust touch_cache() routine, embedded in a test program.  Next
attachment is a small shell script that runs touch_cache() through its paces.
Comment 7 Bela Lubkin 2006-11-10 16:52:18 UTC
Created attachment 9454 [details]
Test script for updated touch_cache()
Comment 8 Bela Lubkin 2006-11-10 16:54:03 UTC
The patch in comment #1 doesn't do anything to this bug.

The 2nd attachment ("Test program with new test_cache() routine") contains a 
touch_cache() that is robust over various test buffer sizes and with various 
cache line lengths.
Comment 9 Bela Lubkin 2006-12-11 16:55:59 UTC
Created attachment 9790 [details]
Test program with new test_cache() routine (updated)

This has been tested more, and made adequately compatible with the kernel
environment.
Comment 10 Bela Lubkin 2006-12-11 16:57:04 UTC
Created attachment 9791 [details]
Test script for updated touch_cache()
Comment 11 Bela Lubkin 2006-12-11 16:59:23 UTC
Created attachment 9792 [details]
2.6.19-git18 patch

2.6.19-git18 patch.  NOT TESTED IN KERNEL ENVIRONMENT, only in my userland test
harness.
Comment 12 Bela Lubkin 2006-12-11 17:02:08 UTC
This activity reflects email conversation w/Ingo Molnar & Andrew Morton, who I 
will also now Cc on the bug.  This is supposed to be in lieu of the previous 
"sched-fix-migration-cost-estimator.patch", which had no actual effect on the 
bug.
Comment 13 Bela Lubkin 2007-06-01 16:13:38 UTC
This is still broken in newest 2.6.22 snapshots.
Comment 14 Ingo Molnar 2007-11-17 19:10:28 UTC
This bug should now be moot because we removed the estimator from 2.6.23. Might still be worth applying to 2.6.22 - albeit it could change the calibration results and trigger change in behavior.

if we introduce something like this in the future then we'll pick up your fix.

Note You need to log in before you can comment on or make changes to this bug.