Horde3D

Next-Generation Graphics Engine
It is currently 19.04.2024, 02:56

All times are UTC + 1 hour




Post new topic Reply to topic  [ 12 posts ] 
Author Message
PostPosted: 12.08.2008, 00:44 
Offline

Joined: 26.03.2008, 02:58
Posts: 160
I am going thru the code line by line, just to familiarize myself with the base code in preparation for development. In general i love it, it's very easy to read :) but i did find something that confused me a little, a double test...

Code:
   if( Modules::config().wireframeMode && !Modules::config().debugViewMode )
   {
      glDisable( GL_CULL_FACE );
      glPolygonMode( GL_FRONT_AND_BACK, GL_LINE );
   }
   
   map< int, NodeRegEntry >::const_iterator itr = Modules::sceneMan()._registry.begin();
   while( itr != Modules::sceneMan()._registry.end() )
   {
      if( itr->second.renderFunc != 0x0 )
         (*itr->second.renderFunc)( shaderContext, theClass, debugView, frust1, frust2, order, occSet );

      ++itr;
   }

   if( Modules::config().wireframeMode && !Modules::config().debugViewMode )
   {
      glEnable( GL_CULL_FACE );
      glPolygonMode( GL_FRONT_AND_BACK, GL_FILL );
   }

taking a second look, it is clear why it's there, it enables the debug mode, however... wouldn't it be easier to use something like this?

Code:
   map< int, NodeRegEntry >::const_iterator itr = Modules::sceneMan()._registry.begin();

   if( Modules::config().wireframeMode && !Modules::config().debugViewMode )
   {   //Render Scene in wireframe mode
      glDisable( GL_CULL_FACE );
      glPolygonMode( GL_FRONT_AND_BACK, GL_LINE );

      while( itr != Modules::sceneMan()._registry.end() )
      {
         if( itr->second.renderFunc != 0x0 )
            (*itr->second.renderFunc)( shaderContext, theClass, debugView, frust1, frust2, order, occSet );

         ++itr;
      }

      glEnable( GL_CULL_FACE );
      glPolygonMode( GL_FRONT_AND_BACK, GL_FILL );
   }
   else{
      //Render scene in regular mode.
      while( itr != Modules::sceneMan()._registry.end() )
      {
         if( itr->second.renderFunc != 0x0 )
            (*itr->second.renderFunc)( shaderContext, theClass, debugView, frust1, frust2, order, occSet );

         ++itr;
      }
   }

and avoid the double test? Well i guess that one less test is not very relevant (performance wise) since it's only going to be tested once. But witch version do you think is more readable?


Top
 Profile  
Reply with quote  
PostPosted: 12.08.2008, 01:05 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
DDd wrote:
I am going thru the code line by line, just to familiarize myself with the base code in preparation for development. In general i love it, it's very easy to read :) but i did find something that confused me a little, a double test...
***code removed***
taking a second look, it is clear why it's there, it enables the debug mode, however... wouldn't it be easier to use something like this?
***code removed***
and avoid the double test? Well i guess that one less test is not very relevant (performance wise) since it's only going to be tested once. But witch version do you think is more readable?
The first is definitely more readable, andbut also easier to maintain - you don't have to keep 2 loops synchronised.

It also may perform better, as you reduce the code size (and thus the likelihood of an expensive cache miss), and only add a single branch (which remains constant over many frames, and is likely to be predicted correctly, thus having an effective cost of zero).

_________________
Tristam MacDonald - [swiftcoding]


Top
 Profile  
Reply with quote  
PostPosted: 12.08.2008, 01:47 
Offline

Joined: 26.03.2008, 02:58
Posts: 160
swiftcoder wrote:
DDd wrote:
I am going thru the code line by line, just to familiarize myself with the base code in preparation for development. In general i love it, it's very easy to read :) but i did find something that confused me a little, a double test...
***code removed***
taking a second look, it is clear why it's there, it enables the debug mode, however... wouldn't it be easier to use something like this?
***code removed***
and avoid the double test? Well i guess that one less test is not very relevant (performance wise) since it's only going to be tested once. But witch version do you think is more readable?
The first is definitely more readable, andbut also easier to maintain - you don't have to keep 2 loops synchronised.

It also may perform better, as you reduce the code size (and thus the likelihood of an expensive cache miss), and only add a single branch (which remains constant over many frames, and is likely to be predicted correctly, thus having an effective cost of zero).


Hum i did not remind that you can switch to debug mode at run time, in that case the first code would be better. good point. I was thinking on never changing the debug mode during runtime. That way i think the second code would be faster since only one of the branches would ever be used and would be equally constant in time, with the advantage that the second test would not be necessary.

Keep 2 loops synchronised ? -Since the iterator is the same, and it's scope is the function not the block would i need to do anything special to keep it synchronized?


Top
 Profile  
Reply with quote  
PostPosted: 12.08.2008, 02:06 
Offline

Joined: 08.11.2006, 03:10
Posts: 384
Location: Australia
With your modifications, then if you modify the loop, you've got to modify it in two places. It doesn't seem like a big deal, but unnecessary code duplication like that can lead to bugs.


Having said that, the original code also contains unnecessary code duplication - I would change those duplicated if statements to:
Code:
bool isWireframe = Modules::config().wireframeMode && !Modules::config().debugViewMode;
if( isWireframe )
...
loop
...
if( isWireframe )
...


This way if the condition even needs to change, you've only got to make the change in one places, not two (which could lead to bugs) :wink:


Top
 Profile  
Reply with quote  
PostPosted: 12.08.2008, 02:11 
Offline

Joined: 26.03.2008, 02:58
Posts: 160
Hum i like that idea! It makes the code more readable and concise, the only downside is the need for a local variable, so every time the function is exited it gets destroyed. I wonder witch version is faster (better optimized by the compiler). Hum...


Top
 Profile  
Reply with quote  
PostPosted: 12.08.2008, 02:30 
Offline

Joined: 08.11.2006, 03:10
Posts: 384
Location: Australia
DDd wrote:
Hum i like that idea! It makes the code more readable and concise, the only downside is the need for a local variable, so every time the function is exited it gets destroyed. I wonder witch version is faster (better optimized by the compiler). Hum...
The cost of creating/destroying a local variable on the stack is pretty much zero (when entering a function call, the stack pointer has to be incremented by a certain size. If you have local variables, then it is simply incremented by a larger number. Primitive types like bools don't have constructors/destructors, so there is no creation/destruction cost at all).

Also, with optimisations turned on, my code will probably compile down to the same thing as the original code anyway -- assuming that the "Modules::config()" function can be inlined, because it would be able to figure out that both if conditions are based on the same condition and then only compute the condition variable once (in effect, creating it's own "local variable" like my code).
If the compiler can't inline that function call, then my modified version will probably be slightly faster.

But the speed impact of either of these is probably so small that it doesn't matter anyway ;) I'd prefer to go with whichever one is the easiest to read/maintain than the absolute fastest one - leave it up to the compiler to generate fast code.


Top
 Profile  
Reply with quote  
PostPosted: 12.08.2008, 02:43 
Offline

Joined: 26.03.2008, 02:58
Posts: 160
Modules::config() is inlined. But i agree, the performance difference is probably null and your code is more readable IMHO :)

Still there is something that doesn't fell quite right... maybe we should use a bit field construct instead of separate bool variables for the EngineConfig ?

Bah... maybe i am overthinking it, i have to go over the rest of the code to see if something like that would actually be useful, this would probably be useful if we have to lookup these values many times, instead of doing all the calls and boolean operations, we would simply lookup the values.


Top
 Profile  
Reply with quote  
PostPosted: 12.08.2008, 13:44 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
DDd wrote:
Hum i did not remind that you can switch to debug mode at run time, in that case the first code would be better.
It still performs almost exactly the same. All modern processors do branch prediction, and both if tests should be predicted the same, meaning that there should only be a cost if debug mode is enabled. In this case, the if test should be pretty much free, as long as it fails.

Quote:
Keep 2 loops synchronised ? -Since the iterator is the same, and it's scope is the function not the block would i need to do anything special to keep it synchronized?
If you change one copy of the loop, you also have to change the other. Code duplication is much more problematic than an extra test.

_________________
Tristam MacDonald - [swiftcoding]


Top
 Profile  
Reply with quote  
PostPosted: 12.08.2008, 19:09 
Offline

Joined: 26.03.2008, 02:58
Posts: 160
swiftcoder wrote:
It still performs almost exactly the same. All modern processors do branch prediction, and both if tests should be predicted the same, meaning that there should only be a cost if debug mode is enabled. In this case, the if test should be pretty much free, as long as it fails.


Agreed. I think the compiler will basically do what i did, create a jump point to the code and with branch prediction the if test does become free. Perhaps older compiler would need this sort of branch hint but not the ones we use. If i have some time i will decompile the example with IDA pro just to check out how the code is getting generated.

Quote:
If you change one copy of the loop, you also have to change the other. Code duplication is much more problematic than an extra test.


My initial issue was that there was duplicated code in the first place. I end up create even more lines in my code :P Since that loop is not used in any place else it does not make sense to refactor it, so i agree, the first version and in particular the revised local variable version is more concise.

I guess it's my old habit of trying to train compilers to do what i want them to do, even if i have to write a few more hundread lines, to get 5% increase.


Top
 Profile  
Reply with quote  
PostPosted: 13.08.2008, 14:15 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
DDd wrote:
I guess it's my old habit of trying to train compilers to do what i want them to do, even if i have to write a few more hundread lines, to get 5% increase.
You wont get the 5% increase *ever* on a modern compiler and machine. You actually make the compiler's job of optimising harder, which will likely result in slightly larger code size. Larger code size in general causes more cache misses, and each cache miss costs somewhere around the same as 100 instructions.

_________________
Tristam MacDonald - [swiftcoding]


Top
 Profile  
Reply with quote  
PostPosted: 12.09.2008, 06:28 
Offline

Joined: 19.11.2007, 19:35
Posts: 218
Between 80 and 135 instructions, lower on single core, higher on dual core, and the early Core2's are the 135 for a cache miss. The increase as core's go up is due to bus contention, and that varies.

Swiftcoder is right, C-code very seldom gets faster. Touching on his point, often compiling with optimization for size instead of speed will get a 5% increase just because of the cache miss issue.


Top
 Profile  
Reply with quote  
PostPosted: 12.09.2008, 15:26 
Offline

Joined: 26.03.2008, 02:58
Posts: 160
I haven't run the numbers, but i doubt any good compiler and processor would incur in such an expensive context switch, i don't really see why it would miss, since the instruction is the same, it would simply jmp... the new intel processors have huge amounts of cache, and very smart prediction units, but i am not an expert on this topic, and lack the interest to pursue it further.

I agree that the compiler usually knows best, specially x86 compilers. I was thinking about a game project for ATMEL AVR wend i wrote that comment.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 23 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group