Created attachment 89101 [details] Example script On my AMD Phenom II X6 1045T if a process uses multiple cores it has a very high cpu overhead (an average of 50%) compared to limiting the process to 1 cpu core. In the attachments is: - The script which was used for the test. It creates 2 threads which are in semi-idle (using ~10-20% of one core per thread on my system). - A log which shows top with a process (the script) which needed more cpu power before it was limited with taskset to 1 core.
Created attachment 89111 [details] Log of top using 1 and 2 cores
You need to profile your application and interpreter at a low level - most likely you've got contention problems. Doesn't seem to be a kernel bug here
I have now port the script to C to exclude Python as the source of the problem. The bug appears there too. My cpu usage on 1 core with 2 threads is ~14% and on 2 cores with 2 threads ~21%.
Created attachment 90241 [details] Example in C
Please close the bug, not a kernel problem. Infinite loop of usleeps(0.0000001) will in fact cause 'slight overhead'. usleep defined ranges [0,1000 000], not sure how did that float went unnoticed by compiler.
nevermind my last comment.
Don't worry - you have helped me to find a potential error in my testcase. I have updated it to use nanosleep() instead but the result is still the same - multiple cores will produce much overhead.
Created attachment 99371 [details] Example in C
Here is an updated version of the testcase. It uses now number of online core * 2 as threads because the overhead gets much higher with more threads. Here are the results on my system with 12 threads: Calling ./test will result in a cpu usage of ~185%/600%. Calling taskset -c 0 ./threads will result in a cpu usage of ~30%/600%.
Created attachment 115241 [details] Testcase Compiling with "gcc -o threads ./threads.c -lpthread".