Hi, I've used unions to remove extra memory loadings in Vec3f class
Code:
union
{
__m128 m128;
};
Code:
// ------------
// Constructors
// ------------
Vec3f() : x( 0.0f ), y( 0.0f ), z( 0.0f )
{
m128=_mm_setzero_ps();
}
explicit Vec3f( const float x, const float y, const float z ) : x( x ), y( y ), z( z )
{
m128=_mm_setr_ps(x, y, z, 0.0f);
}
after benchmarking this piece of code with original utmath in adding two vectors I'm getting this results :
utmath_org : ~200ms
utmath_sse : ~500ms
So I decided change this piece of code too
Code:
Vec3f operator+( const Vec3f &v ) const
{
SSE_ALIGNED( float out1[4] );
out1[0]=x+v.x; out1[1]=x+v.y; out1[2]=z+v.z;
return Vec3f( out1[0], out1[1], out1[2] );
}
Code:
Vec3f operator+( const Vec3f &v ) const
{
SSE_ALIGNED( float out1[4] );
_mm_store_ps(out1, _mm_add_ps(m128, v.m128));
return Vec3f( out1[0], out1[1], out1[2] );
}
after benchmarking this piece of code with original utmath in adding two vectors I'm getting this results :
first code : ~550ms
second code : ~650ms
Anybody knows how to eliminate that SSE_ALIGNED( float out1[4] ) to make the code a bit faster?