Bug 6074 - K6/K6-II/K6-III kernel optimization needlessly bundled?
Summary: K6/K6-II/K6-III kernel optimization needlessly bundled?
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: i386 Linux
: P2 low
Assignee: Adrian Bunk
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-14 14:50 UTC by Carl Englund
Modified: 2006-02-15 14:01 UTC (History)
0 users

See Also:
Kernel Version: 2.6.15
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Carl Englund 2006-02-14 14:50:55 UTC
The Gentoo guys told me to take it elsewhere, so here I am :)
(http://bugs.gentoo.org/show_bug.cgi?id=122844)

Wether you have a K6, K6-II, K6-II+ or K6-III
processor, trying to optimize the kernel always results in -march=k6. I tried
changing the Makefile to compile with -march=k6-2 (on a K6-2) and it worked
just fine. Why not allow GCC to -march properly?

While I'm at it, if Pentium II is selected, why -march=686, -mcpu=pentium2
instead of just -march=pentium2?
Comment 1 Adrian Bunk 2006-02-14 17:16:06 UTC
There is no reason for -march=k6-2 in the kernel since the only difference
compared to -march=k6 is the availability of 3dnow - and you can't use floating
point in the kernel.

In the pentium case, gcc knows how to tune best for each of the CPUs, but
there's also nothing -march changes that would make a difference inside the kernel.
Comment 2 Carl Englund 2006-02-15 03:11:07 UTC
I'm sorry, I was mistaken. I thought the K6-2 hade a couple of extra registers
to play with but they were only for storing data for 3DNOW! instructions.

But I wonder if GCC's -march=k6-2 really only does -march=k6 with 3DNOW!
instructions? Digging around, I found this:

"they just added 16 wait states to the execution of the LOOPcc and thus caused
it to slow to the speed of a Pentium. AMD didn't just do this however. They
added a special case (speculation, might be coincidence) for the DEC (E)CX; Jcc
combination, which is semantically equivalent with the LOOPcc instruction, but
this semantic equivalency and the loop being faster on Intels caused the loop
instruction to always be used. Nobody used the DEC/Jcc combo. They kept the
original speed for this combo and specified in their optimization manuals that
this was the preferred method over the loopcc instruction."
(http://www.mega-tokyo.com/osfaq2/index.php/The%20IA32%20Architecture%20Family)

So -march=k6 should make use of the LOOPcc but -march=k6-2 would hopefully go
with DEC/Jcc. I'll have to try and dig around GCC to find out if that is so. But
I'm really not got at this.. :) I think(?) the K6 and the K6-2 has the same
amount of cache (64KiB) so there shouldn't perhaps be any difference in
optimizing with that in mind..
Comment 3 Adrian Bunk 2006-02-15 10:39:56 UTC
If you look at the gcc sources, you see that gcc doesn't know about any
differences between k6 and k6-2 except for 3dnow.

Comment 4 Carl Englund 2006-02-15 14:01:19 UTC
I stand corrected. Dug around and found this, strenghtening your claim
http://gcc.gnu.org/ml/gcc/2003-02/msg01518.html.

Still..the DEC/Jcc wait instructions puzzle me. But if the GCC people didn't
take them into account they must have had a reason..if they did, I should be
able to see it compiling some suitable code (time, perhaps?) and comparing the
output when march:ing. Difference in cache size for the processors is accounted
for (I guess) by using different -O options, it's nothing that march or mtune is
"aware" of.

Note You need to log in before you can comment on or make changes to this bug.