Bug 194231
Summary: | Group Imbalance bug - performance drop upto factor 10x | ||
---|---|---|---|
Product: | Process Management | Reporter: | Jirka Hladky (hladky.jiri) |
Component: | Scheduler | Assignee: | Ingo Molnar (mingo) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | hladky.jiri, skarmarkar |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.10.0-0.rc6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Paper describing bug - see chapter 3.1.
Reproducer for Group Imbalance bug. |
Description
Jirka Hladky
2017-02-06 23:31:53 UTC
Created attachment 254401 [details]
Reproducer for Group Imbalance bug.
This is the reproducer for Group Imbalance bug.
Requires:
4 NUMA server
ssh server running on test machine
tmux
1) Compiling test (this is needed only once to install the tests)
./compile.sh
2)Running test - use server with at least 2 NUMA nodes
Start TWO ssh connections to the server.
* in first connection start tmux. It will be used to start jobs later.
* in the second connection run ./reproduce.sh from this tarball. Do not attempt to start tmux in this second ssh shell!
./reproduce.sh will start automatically stress --cpu 1 jobs in the tmux session started in the first ssh session.
Results are stored in the directory with format <kernel_name>_<timestamp>
4.10.0-0.rc6.git0.1.el7.x86_64_2017-Feb-06_23h12m54s
3)Examine results
grep -H total ${NAME}*log
grep -H seconds ${NAME}*log
grep -H -i Average *numa
Files with "GROUP" in name where produced by using different job groups (different ssh connections).
Files with "NORMAL" in name where produced by starting the whole workload from one ssh connection.
Both results should be the same. The typical bug symptoms are:
- uneven load across NUMA nodes (check *GROUP*numa file)
- much longer (factor 5x-10x) runtimes for lu.C.x benchmark
There are included results for kernel 4.10.0-0.rc6 (directory 4.10.0-0.rc6.git0.1.el7.x86_64_2017-Feb-06_23h12m54s)
It was fixed in kernel v5.5 |