Horde3D

Next-Generation Graphics Engine
It is currently 27.04.2024, 14:51

All times are UTC + 1 hour




Post new topic Reply to topic  [ 15 posts ] 
Author Message
 Post subject: Rendering more models
PostPosted: 11.03.2011, 18:12 
Offline

Joined: 30.09.2010, 03:06
Posts: 21
Hi there, so I must say that I come and go from graphics (I only know very basic concepts, but no former education on graphics), anyway I will like to know how can I render more than say 200 models in the Chicago sample (200 in my graphic card is the exact 24 FPS for eye).

Sorry for not say what I have done, I havent done much and only to the sample reduced a little some OO calls in the loop and so on.

If you can point me to where I go, I will be glad to try and do what I can with your help.


Top
 Profile  
Reply with quote  
PostPosted: 12.03.2011, 00:55 
Offline

Joined: 15.02.2009, 02:13
Posts: 161
Location: Sydney Australia
Hi tyoc213, sorry but there are many factors here which could improve performance and also knowing what system you are running would be great.
1. What are the specs of your computer, would they be able to handle so many skinned models?
2. Are you using hardware shader skinning or CPU-side skinning? eg. SWSkinEnabled = h3dGetNodeParamI( _nodeOfModel, H3DModel::SWSkinningI);
3. How many lights are in the scene? Are you using forward or deferred rendering? (deferred would bring the draw calls, fill rate and vertex transforms way down if you have many lights vs forward).
4. Have you enabled LODs? eg. h3dGetNodeParamF( _nodeOfModel, H3DModel::LodDist1F, 0 ); should be greater than .0f (there's probably a better way of working this one out actually).
5. You'll need to profile the scene to really find out what is slowing you down.

Horde doesn't currently support hardware instancing, but if it did I think skinned character instancing might need to be implemented as a new node type due to other factors like needing to store all animations/bones into textures and to a vertex texture fetch lookup to get the bone matrices. This would allow an order of magnitude more characters on modern GPUs.

Failing that, another solution I was thinking of is creating many many root bone joints for the 'agents' and setting up quads to them so they work like impostors but only have one joint each, and this would be one single H3DModel, and then hide these quads and replace them with the actual agents by unhiding when the camera is close with these other agents attached to their root bones using the child/parent scenegraph API, that way when you zoom out ideally all agents on the screen would be one draw call then. These quads would have a spritesheet atlased texture of the possible animations and directions in them with also their normals so lighting would work on them, and hopefully they would be small enough to not notice from a distance... This would be a bit of application-level code here to manage but it could be an option if you're after many agents on-screen...

_________________
-Alex
Website


Top
 Profile  
Reply with quote  
PostPosted: 12.03.2011, 03:23 
Offline

Joined: 30.09.2010, 03:06
Posts: 21
You can consider me a noob at graphics programming even for my first impresion was something like: ok I put the number of models up and do some little modifications... so lets see if I learn on my own, I wanna see where I end.

Im using the default Knight demo, only done a little modification for change to struct of array instead of array of structs (Im also learning a little CUDA) that little change and #def a MAX number of models.

  • The specs for my system are NVIDIA GeForce GTS 450 1Gb, Phenom II X6 1055T with 8GB RAM.
  • H3DModel::SWSkinningI 0.
  • Fordward rendering (it is displayed after hit 2 times F6). But there is only one light.
  • lod is 10.0
  • what tools do you use for profile? Im actually learning to use CodeAnalyst from AMD, but Im not a 'pro' at profiling so I see that horde3D uses like 12.76% of system usage while the Sample_Chicago.exe uses 2.7

OK, thought I see that I will need to do and learn a lot, Im even willing to learn and if necessary buy books for self study, read and implement all that I need :) (if people here have suggestions, I will look on them).

So for do the other node type, I guess I will need to look and learn for example from terrain source plugin?


Top
 Profile  
Reply with quote  
PostPosted: 17.03.2011, 22:46 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
24 FPS is extremely low for your machine, I have 80 FPS for 200 models in Chicago on a much weaker machine. Are you sure you started the application in Release configuration?

I have a relatively new but low-end GPU and even there the application is CPU bound. This is good, on the GPU side there are not so many low hanging fruits in Horde any more but the CPU performance can easily be optimized by an order of magnitude (or two ;) ).


Top
 Profile  
Reply with quote  
PostPosted: 17.03.2011, 23:49 
Offline

Joined: 15.02.2009, 02:13
Posts: 161
Location: Sydney Australia
marciano wrote:
24 FPS is extremely low for your machine, I have 80 FPS for 200 models in Chicago on a much weaker machine. Are you sure you started the application in Release configuration?

I have a relatively new but low-end GPU and even there the application is CPU bound. This is good, on the GPU side there are not so many low hanging fruits in Horde any more but the CPU performance can easily be optimized by an order of magnitude (or two ;) ).

Hi marciano,

Just curious more than anything, what parts would you say could benefit from this 'order of magnitude' speed-ups the most? I'm guessing the joints/animation system and math lib?

_________________
-Alex
Website


Top
 Profile  
Reply with quote  
PostPosted: 18.03.2011, 12:52 
Offline

Joined: 11.09.2010, 20:21
Posts: 44
Location: Germany
I think, all those render queue updates are very underoptimized. With a more hierarchical queue system, you would not have to process all nodes when creating a shadow map or when rendering the lit geometry, as the for example the context and class filtering and also the camera frustum filtering do not have to be made everytime drawLightGeometry is called, a big problem in the forward pipeline, or when using a transparency pass, where most of the geometry is ignored due to the wrong material class, but nevertheless processed as the class filtering comes quite late, I think. And also, since the rendering methods for the different node types traverse all nodes this gets problematic when more and more different node types come into play.


Top
 Profile  
Reply with quote  
PostPosted: 19.03.2011, 00:36 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
Yeah, the frequent render queue update is one of the most obvious things and luckily pretty easy to optimize.

A hierarchical culling structure is not necessarily faster and can even be a lot slower than an optimized flat list culling because of the random memory access pattern of trees/graphs. The current culling can quickly be optimized by making some of the cull data data more cache friendly. Going further, vectorization with SIMD would save instructions and a final thing could be to execute the culling in parallel on several CPUs, for example by spawning some jobs from a simple job system.

At least the first and most important point should be addressed pretty soon.


Top
 Profile  
Reply with quote  
PostPosted: 19.03.2011, 15:08 
Offline

Joined: 11.09.2010, 20:21
Posts: 44
Location: Germany
I did not neccessarily mean a hierarchical culling system in the sense of a hierarchical scene graph structure, but a hierarchical queue system, just meaning, you cull for material class and context into a queue, from this queue into a queue culled by camera....


Top
 Profile  
Reply with quote  
PostPosted: 19.03.2011, 15:16 
Offline

Joined: 11.09.2010, 20:21
Posts: 44
Location: Germany
I think SIMD only really works in heavily compute intensiv parts, otherwise, it is often memory limited. The software skinning is an obvious possibility but I think it is already SSEd, isn't it? Perhaps the calcCropMatrix could also profit, as it does not use too much memory and all those mins, maxs and clamps (currently jumps) can be SSEd quite well.


Top
 Profile  
Reply with quote  
PostPosted: 19.03.2011, 16:35 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Rauy wrote:
I think SIMD only really works in heavily compute intensiv parts, otherwise, it is often memory limited. The software skinning is an obvious possibility but I think it is already SSEd, isn't it?
It isn't vectorized yet, but I wrote an optimized version which utilizes SSE and SSE3 sometime ago and it was about 30% faster than normal code on GCC [note that GCC has made the data 16 byte aligned automatically there to make it faster, which is impossible in MSVC]. For me just 30% isn't satisfying, to gain much more speed, engine needs to use aligned SOA data structures and that requires a lot of work and motivation.

Another simple solution is to make the code parallel which IMHO requires less effort. I had a few experiments to make the software skinning parallel with Intel TBB here which was easier and took less time than vectorized version to code. Again results wasn't satisfying, just 30% speed up for a quad core, also cores weren't utilized very well. After playing with Intel Parallel Studio, I found somewhere in code [can't remember ATM] which was putting a delay, just commented out that line and compiled again. This time CPU usage was at 100% and frame-rate was pretty smooth, only draw back was that overlays weren't there anymore :P

Rauy wrote:
Perhaps the calcCropMatrix could also profit, as it does not use too much memory and all those mins, maxs and clamps (currently jumps) can be SSEd quite well.
I doubt about it with current unaligned AOS data structure, because IMHO it isn't a heavy computing task and most of the CPU cycles will be wasted on loading and storing the data from memory to CPU registers and vice versa.

Multi-threaded Horde3D FTW!!!


Top
 Profile  
Reply with quote  
PostPosted: 20.03.2011, 01:53 
Offline

Joined: 15.02.2009, 02:13
Posts: 161
Location: Sydney Australia
Hi Siavash,

Have you investigated the vector math lib from Sony in the Bullet Extras directory? I was wondering if someone has the time, could they try to use that as a replacement to utMath.h and see if it has any performance gains from that. I assume you could easily test AOS and SOA configurations with just a #define switch, but I think the SSE default implementation defaulted to AOS...

What I'm really interested in is NEON acceleration for ARM platforms, the Oolong Engine has a bit of code in there but only for some situations and not all.

_________________
-Alex
Website


Top
 Profile  
Reply with quote  
PostPosted: 20.03.2011, 11:34 
Offline

Joined: 15.06.2008, 11:21
Posts: 166
Location: Germany
Last time I looked, SSEd SOA wasn't implemented in the Sony vector math library at all. And if you really want to get a speed boost from SSE, SOA probably is the way to go.


Top
 Profile  
Reply with quote  
PostPosted: 20.03.2011, 14:04 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
MistaED wrote:
Have you investigated the vector math lib from Sony in the Bullet Extras directory? I was wondering if someone has the time, could they try to use that as a replacement to utMath.h and see if it has any performance gains from that. I assume you could easily test AOS and SOA configurations with just a #define switch, but I think the SSE default implementation defaulted to AOS...
Hi MistaED, phoenix64 is right. SOA headers are missing for the PC version and only AOS is available. Unfortunately replacing the "utMath.h" isn't an easy plug and play task and that requires a lot of changes in the engine. First you need to replace the standard <vector.h> to solve the alignment issues with MSVC, then replacing the malloc/free with aligned versions everywhere, in the end someone needs to review the whole engine to perform the required changes.

As mentioned you can't gain much speed from AOS layout, you need to pad the vectors to keep the alignment which increases memory usage and cache misses. Every time CPU is trying to load something from memory it takes about ~400 cycles which cancels out all of gained speed from SSE instructions.

MistaED wrote:
What I'm really interested in is NEON acceleration for ARM platforms, the Oolong Engine has a bit of code in there but only for some situations and not all.
That's pretty impressive, never thought that they are such a powerful beasts. Having a NEON version of math library will make the Horde3D to fly on mobile devices, cause every cycle counts there. I think we need a configurable math library like Sony's [but home made] to support different versions of SSE and AVX in future.


Top
 Profile  
Reply with quote  
PostPosted: 20.03.2011, 17:31 
Offline

Joined: 30.09.2010, 03:06
Posts: 21
Sorry been busy on week, changing the build to release with vs express 2008 give me errors like

Code:
1>------ Build started: Project: Extension Terrain, Configuration: Release Win32 ------
1>Compiling...
1>cl : Command line warning D9035 : option 'Wp64' has been deprecated and will be removed in a future release
1>extension.cpp
1>C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\string.h(154) : error C2143: syntax error : missing ',' before '<'
1>C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\string.h(154) : error C2059: syntax error : ')'
1>C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\string.h(154) : error C2143: syntax error : missing ')' before ';'
1>C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\string.h(156) : warning C4003: not enough actual parameters for macro 'strncpy_s'
1>C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\string.h(156) : error C2065: '_Dest' : undeclared identifier
1>C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\string.h(156) : error C2988: unrecognizable template declaration/definition
1>C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\string.h(156) : error C2059: syntax error : '['
1>C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\string.h(156) : error C2059: syntax error : ')'
1>C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\string.h(156) : error C2470: '_Source' : looks like a function definition, but there is no parameter list; skipping apparent body


I have "traced" puting declarations like
Code:
test1;
#include "egPrerequisites.h"
test2;
#include <string>
test3;


Backwards in the include path, and I see those errors are launched with #include <string>.


Top
 Profile  
Reply with quote  
PostPosted: 20.03.2011, 21:09 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
This got fixed already some time ago in the svn head. Was dependent on the MSVC version you use.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 17 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group