Codepoet wrote:
What do you think about integrating that library - or parts of it - into Horde after doing real world tests instead of writing your own version?
The math library provided by Sony is an all-in-one library and a bit complex and we can integrate ~20%-30% of that library into our utmath.
But this needs some changes in whole structure of engine [vec3f and matrix4f usages]. By using current utmath_rcx there is no need to change the whole structure, but we can't gain the real power of SSE, because utmath_rcx wastes a lot of cpu cycles for simple operations such as loading floats into __m128 and storing __m128 in floats.
If you have a closer look at other Sony or NebulaDevices libraries, you see that they load floats into __m128 when they are constructing vector and matrix classes once, after that everything is performed on __m128 types without wasting any cpu cycles and they will gain another ~20% performance boost.
There is many solutions to overcome this problem :
A.using current structure of utmath_rcx, so there is no need to change the engine.
B.using unions [m128] to store and load the data into them and storing them into float types [vectors and ...] but this is slower than
A because of current structure of engine.
C.Changing the whole structure of engine to remove the cpu cycles wasting load and store operations like Sony and Nebula math libraries and by this way we can gain the real power of SSE.
IMHO it's better to choose the
solution A if you don't want to change whole structure of engine and gained performance will stay ~1:1 [FPU|SSE] but if you really want to enjoy the full power of your ~200$ Core2Duo and expensive Quad cores it's better to choose the
solution C.
I don't know what to do but I prefer to choose the
solution C,
every thing depends on you, community and main developers ![Exclamation :!:](./images/smilies/icon_exclaim.gif)