Horde3D :: View topic - utMath elite

float fastDeterminant() const
{
   /*
      factorization result :
         192 -> 20 ptr deref
         96 -> 28 fpmult
         24 -> 18 fpadd

         => solid 30% speed improvement
   */

   const float * const c0 = c[0];
   const float * const c1 = c[1];
   const float * const c2 = c[2];
   const float * const c3 = c[3];

   const float c00 = c0[0];
   const float c01 = c0[1];
   const float c02 = c0[2];
   const float c03 = c0[3];

   const float c10 = c1[0];
   const float c11 = c1[1];
   const float c12 = c1[2];
   const float c13 = c1[3];

   const float c0011 = c00*c11;
   const float c0012 = c00*c12;
   const float c0013 = c00*c13;

   const float c0110 = c01*c10;
   const float c0112 = c01*c12;
   const float c0113 = c01*c13;

   const float c0210 = c02*c10;
   const float c0211 = c02*c11;
   const float c0213 = c02*c13;

   const float c0310 = c03*c10;
   const float c0311 = c03*c11;
   const float c0312 = c03*c12;

   const float c03x12m02x13 = c0312 - c0213;
   const float c03x11m01x13 = c0311 - c0113;
   const float c02x11m01x12 = c0211 - c0112;
   const float c02x10m00x12 = c0210 - c0012;
   const float c03x10m00x13 = c0310 - c0013;
   const float c01x10m00x11 = c0110 - c0011;

   const float c20 = c2[0];
   const float c21 = c2[1];
   const float c22 = c2[2];
   const float c23 = c2[3];

   return
      c3[0] * ( c03x12m02x13*c21 - c03x11m01x13*c22 + c02x11m01x12*c23 )
      -
      c3[1] * ( c03x12m02x13*c20 - c03x10m00x13*c22 + c02x10m00x12*c23 )
      +
      c3[2] * ( c03x11m01x13*c20 - c03x10m00x13*c21 + c01x10m00x11*c23 )
      -
      c3[3] * ( c02x11m01x12*c20 - c02x10m00x12*c21 + c01x10m00x11*c22 );

Author:	swiftcoder [ 22.01.2009, 18:53 ]
Post subject:	Re: utMath elite
Siavash wrote: Hi, I've performed some high-school level optimizations [not SIMD] on utMath. I am pretty sure that isn't what you meant, high-level would be the term

Author:	Siavash [ 23.01.2009, 03:50 ]
Post subject:	Re: utMath elite
swiftcoder wrote: Siavash wrote: Hi, I've performed some high-school level optimizations [not SIMD] on utMath. I am pretty sure that isn't what you meant, high-level would be the term High-level? Is there any performance increase? [there is only some precalculated values, factoring of polynomials and ...]

Author:	Siavash [ 04.04.2009, 15:03 ]
Post subject:	Re: utMath elite
Is there any feel able performance diff between Horde3D beta2 & beta3 utMath libs?

Author:	marciano [ 05.04.2009, 12:31 ]
Post subject:	Re: utMath elite
Siavash wrote: Is there any feel able performance diff between Horde3D beta2 & beta3 utMath libs? The most inportant thing in utMath Beta3 is an optimized float to int conversion.

Author:	fullmetalcoder [ 11.04.2009, 12:18 ]
Post subject:	Re: utMath elite
not about the patch above but on the same topic. The determinant() function looked so... suboptimal that I couldn't restrain myself and factorized it. The resulting code is not that cute but the performance gain is there : Code: float fastDeterminant() const { /* factorization result : 192 -> 20 ptr deref 96 -> 28 fpmult 24 -> 18 fpadd => solid 30% speed improvement / const float const c0 = c[0]; const float * const c1 = c[1]; const float * const c2 = c[2]; const float * const c3 = c[3]; const float c00 = c0[0]; const float c01 = c0[1]; const float c02 = c0[2]; const float c03 = c0[3]; const float c10 = c1[0]; const float c11 = c1[1]; const float c12 = c1[2]; const float c13 = c1[3]; const float c0011 = c00c11; const float c0012 = c00c12; const float c0013 = c00c13; const float c0110 = c01c10; const float c0112 = c01c12; const float c0113 = c01c13; const float c0210 = c02c10; const float c0211 = c02c11; const float c0213 = c02c13; const float c0310 = c03c10; const float c0311 = c03c11; const float c0312 = c03c12; const float c03x12m02x13 = c0312 - c0213; const float c03x11m01x13 = c0311 - c0113; const float c02x11m01x12 = c0211 - c0112; const float c02x10m00x12 = c0210 - c0012; const float c03x10m00x13 = c0310 - c0013; const float c01x10m00x11 = c0110 - c0011; const float c20 = c2[0]; const float c21 = c2[1]; const float c22 = c2[2]; const float c23 = c2[3]; return c3[0] * ( c03x12m02x13c21 - c03x11m01x13c22 + c02x11m01x12c23 ) - c3[1] ( c03x12m02x13c20 - c03x10m00x13c22 + c02x10m00x12c23 ) + c3[2] ( c03x11m01x13c20 - c03x10m00x13c21 + c01x10m00x11c23 ) - c3[3] ( c02x11m01x12c20 - c02x10m00x12c21 + c01x10m00x11c22 ); Now there is probably room for explicit SIMD vectorization here but I'm not familiar enough with this to do it myself atm. By the way, "implicit" vectorization (passing this to gcc : -msse -msse2 -msse3 -mfpmath=sse) does not change the benchmark results. Also note that some profiling shows (valgrind, under linux 32 bit (arch : core 2) compiled with -O2) that the strategy used for += = and /= operators is quite suboptimal (I'm not posting "fixes" here as they are extremely simple).

Horde3D http://horde3d.org/forums/

utMath elite http://horde3d.org/forums/viewtopic.php?f=8&t=620	Page 1 of 1

Author:	Siavash [ 11.04.2009, 13:40 ]
Post subject:	Re: utMath elite
Thanks a lot fullmetalcoder for the patch; You made me to perform some benchmarks using Intel Vtune [MSVC2008 Express : Debug] Code: ////////Determinant() Benchmark/////////////// utMath beta3 : 33 ; fullmetalcoder : 30 ; utMath elite : 28 ; ////////////////////////////////////////////// ////////Inverted() Benchmark////////////////// utMath beta3 : 148 ; fullmetalcoder : 148 ; utMath elite : 132 ; ////////////////////////////////////////////// Looks that utMath elite rocks

Author:	fullmetalcoder [ 11.04.2009, 13:57 ]
Post subject:	Re: utMath elite
interesting benchmark figures. Could it be that MSVC does a better job at optimizing complex lookup + math ops? here is what I get with the code from SVN : Quote: det test : 100000000 normal : result=-0.480000, elapsed=123 fast : result=-0.480000, elapsed=88 elapsed time in ms, 108 determinant computations. Code compiled with GCC ( -march=i686 -mtune=generic -O2 ) running on my laptop (core 2 T7700, 3GB DDR, Linux) edit : just tried replacing regular determinant() with the one from the archive above. Here are the results : Quote:** det test : 100000000 normal : result=-0.480000, elapsed=112 fast : result=-0.480000, elapsed=87 edit2 : benchmarking in debug mode is generally not a good idea for such simple operations. It is very likely that det = o(debug overhead)

Author:	Siavash [ 11.04.2009, 14:24 ]
Post subject:	Re: utMath elite
fullmetalcoder wrote: interesting benchmark figures. Could it be that MSVC does a better job at optimizing complex lookup + math ops? It's too interesting because the system that I was using was an old PentiumIII 750MHz + 256mb SD-RAM on WindowsXP SP3 without any optimizations. fullmetalcoder wrote: benchmarking in debug mode is generally not a good idea for such simple operations. It is very likely that det = o(debug overhead) Yes I know, but if you have a look at disasm of code you will see that MSVC cheats in the release mode [precalculated values and ...].

Author:	fullmetalcoder [ 11.04.2009, 14:27 ]
Post subject:	Re: utMath elite
Siavash wrote: Yes I know, but if you have a look at disasm of code you will see that MSVC cheats in the release mode [precalculated values and ...]. All compilers cheat. I have been forced to add some extra code to in the loop of determinant compuation to make sure the whole loop was not optimized away (turned into a single call)...

Author:	Siavash [ 11.04.2009, 15:14 ]
Post subject:	Re: utMath elite
I've forced thee compiler to print out the results outside of loop [Release mode /O2]: Code: /////// det - inv ///////// beta3 : 13 - 56 fast : 9 - 49 elite : 9 - 49 //////////////////////////// Both of fullmetalcoders & elite versions are same

Author:	Siavash [ 12.04.2009, 18:22 ]
Post subject:	Re: utMath elite
It's a good idea to replace some parts of utMath beta3 with elite version, there is a ~13% performance boost in det & inv functions.

Page 1 of 1	All times are UTC + 1 hour
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/