Horde3D

Next-Generation Graphics Engine
It is currently 29.03.2024, 13:07

All times are UTC + 1 hour




Post new topic Reply to topic  [ 7 posts ] 
Author Message
PostPosted: 26.09.2010, 15:04 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Here is the results of a few experiments that I had to see how much /arch:SSE and /arch:SSE2 switches will improve the Horde3D performance. As you have suggested already, software skinning will benefit from such optimizations, so let's enable the SWSkinning in the Chicago sample :

Code:
// Chicago Sample - crowd.cpp

void CrowdSim::init()
{
   ...
   
   // Add characters
   for( unsigned int i = 0; i < 200; ++i )
   {
      Particle p;
      
      // Add character to scene and apply animation
      p.node = h3dAddNodes( H3DRootNode, characterRes );
      h3dSetNodeParamI(p.node, H3DModel::SWSkinningI, 1);
      h3dSetupModelAnimStage( p.node, 0, characterWalkRes, 0, "", false );
      
      // Characters start in a circle formation
      p.px = sinf( (i / 100.0f) * 6.28f ) * 10.0f;
      p.pz = cosf( (i / 100.0f) * 6.28f ) * 10.0f;

      chooseDestination( p );

      h3dSetNodeTransform( p.node, p.px, 0.02f, p.pz, 0, 0, 0, 1, 1, 1 );

      _particles.push_back( p );
   }
}

Now it's time for compiling and compare the time spent on Geo Updates :

Code:
Normal code : about 60ms
/arch:SSE   : about 60ms
/arch:SSE2  : about 130ms

Well, results are too interesting! There is no such difference between normal and SSE code, but hey why SSE2 code is 2x slower there? Profiler says that most of the time is consumed by ModelNode::updateGeometry() so I decided to compare the SSE and SSE2 generated assembly codes :
Here is the SSE disassembly and Here is the SSE2 disassembly

So what? First noticeable thing is that with /arch:SSE enabled, compiler has failed to optimize the code and generated the normal code instead that's why there is no difference between the SSE and non-SSE generated code. Second is that with /arch:SSE2 enabled, compiler has done a horrible job there. Why? If you notice, compiler is converting the all of the floats to doubles to perform SSE2 operations on them and lot more problems ...

All of the tests are done with MSVC 2010


Top
 Profile  
Reply with quote  
PostPosted: 26.09.2010, 16:08 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
What other compiler optimisation flags are set?

_________________
Tristam MacDonald - [swiftcoding]


Top
 Profile  
Reply with quote  
PostPosted: 26.09.2010, 16:25 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Default options of a fresh CMake generated project + /arch:SSE2 :
Code:
/I"D:/Development/SDK/Horde3D/Horde3D SF.net/Horde3D/Source/Horde3DEngine/." /I"D:/Development/SDK/Horde3D/Horde3D SF.net/Horde3D/Source/Horde3DEngine/../Shared" /I"D:/Development/SDK/Horde3D/Horde3D SF.net/Horde3D/Source/Horde3DEngine/../../Bindings/C++" /I"D:/Development/SDK/Horde3D/Horde3D SF.net/Horde3D/Source/Horde3DEngine/../../.." /I"D:/Development/SDK/Horde3D/Horde3D SF.net/CM_BIN_x86" /Zi /nologo /W3 /WX- /O2 /Ob1 /Oy- /D "WIN32" /D "_WINDOWS" /D "NDEBUG" /D "CMAKE" /D "CMAKE_INTDIR=\"RelWithDebInfo\"" /D "Horde3D_EXPORTS" /D "_WINDLL" /D "_MBCS" /Gm- /EHsc /MD /GS /arch:SSE2 /fp:precise /Zc:wchar_t /Zc:forScope /GR /Fp"Horde3D.dir\RelWithDebInfo\Horde3D.pch" /Fa"RelWithDebInfo" /Fo"Horde3D.dir\RelWithDebInfo\" /Fd"D:/Development/SDK/Horde3D/Horde3D SF.net/Horde3D/Binaries/RelWithDebInfo/Horde3D.pdb" /Gd /TP /analyze- /errorReport:queue


Top
 Profile  
Reply with quote  
PostPosted: 26.09.2010, 17:11 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
I would try swapping /fp:precise for /fp:fast, as that should let the compiler generate significantly faster floating-point code. You generally only need /fp:precise if you need to make strict maximum accuracy guarantees.

_________________
Tristam MacDonald - [swiftcoding]


Top
 Profile  
Reply with quote  
PostPosted: 26.09.2010, 17:24 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Now it's a bit faster :
Code:
Normal code : about 60ms
SSE code    : about 53ms
SSE2 code   : about 53ms

EDIT : SSE and SSE2 generated codes are pretty similar, most (may be all?) of the operations are done on single data instead of being SIMD and still results are very far away from hand tuned code.


Top
 Profile  
Reply with quote  
PostPosted: 26.09.2010, 19:04 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Round 2 of the experiments : 64bit generated code
Code:
x64 code  : about 53ms
SSE code  : about 53ms
SSE2 code : about 53ms

Normal generated code is faster at 64bit mode, but wait, why there is no difference between x64, SSE and SSE2 generated codes? Answer is here, MSVC generates exactly same code there. By default SSE optimizations are done and if you compare the x64 and x86 (SSE) codes you will notice that it is using movaps instruction to load 4 (aligned) floating point values at same time in x64 mode and uses movss to load 1 floating point value in x86 mode. BTW, IMHO it won't make much difference.


Top
 Profile  
Reply with quote  
PostPosted: 27.09.2010, 07:26 
Offline

Joined: 17.01.2010, 13:30
Posts: 7
Siavash, FPU, SSE and SSE2 under x64 are the same because of MSVC compiler. It has limitations on optimization under x64 code - no SSE, no ASM inlines - these things are avaliable only when you are compiling x86 code.
Also don't use Maximize Speed /O2 or Full optimization /Ox, just use Custom. I mean - do not always trust MSVC optimizer.
Another advice is to use __forceinline instead inline in code(not everywhere, but algebra/math is best thing where to use __forceinline).
My compiler options are: x86 code, /arch:SSE2, /fp:precise, Custom optimization, Favor Fast Code /Ot.

Yeah, another advice is to use not built-in memory allocator - use you custom memory allocator instead(my allocator is TBB Allocator, and experiments with NedAlloc).


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 34 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group