Horde3D

Next-Generation Graphics Engine
It is currently 22.12.2024, 06:14

All times are UTC + 1 hour




Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: 25.02.2008, 19:33 
Offline

Joined: 25.02.2008, 18:32
Posts: 5
Hi everyone,
I'm currently evaluating rendering engines regarding their crowd rendering capabilities. I'm doing this for a master's thesis on crowd rendering.
I've had a brief look at the code but not really enough to gather some info on your crowd rendering techniques.
So, I'd like to play the numbers game :)
How does Horde3D scale when rendering many instances? What is the number of mesh instances you can render before the you become vertex-limited? Is the scene graph overhead a problem? How does deferred lighting help in reducing the number of draw calls? How is performance compared to other engines (e.g. Ogre3D)? Any other measurements which are interesting when rendering crowd scenes? Are you exploiting concurrency?

Many questions for a first post, thanks in advance for any info you may have on your hands!


Top
 Profile  
Reply with quote  
PostPosted: 26.02.2008, 01:12 
Offline

Joined: 08.11.2006, 03:10
Posts: 384
Location: Australia
random task wrote:
How does deferred lighting help...?

Crowd-scenes usually contain a high amount of over-draw (where pixels on the screen are drawn/overwritten multiple times). Deferred lighting helps here by ensuring that the lighting calculation (pixel shaders) are only executed once per-pixel, even if you've got lots of over-draw.
This allows you to have more complex lighting equations in highly complex scenes (such as a crowd) than traditional (forward-rendered) lighting, as the amount of time taken is proportional to the size of the frame-buffer, instead of the complexity of the scene itself.

Quote:
Are you exploiting concurrency?

Horde is single-threaded ATM. Horde is only the renderer though - so the simulation that updates the crowd could easily be multi-threaded.


Top
 Profile  
Reply with quote  
PostPosted: 26.02.2008, 01:28 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
random task wrote:
How does Horde3D scale when rendering many instances?

As well as any other OpenGL-based engine will. VBOs and hardware skinning are the best you can do in this case.
random task wrote:
What is the number of mesh instances you can render before the you become vertex-limited? Is the scene graph overhead a problem?

I have some doubts this the first is even possible on modern hardware. Certainly you will be fill rate limited long before vertex throughput becomes a problem, with any meaningful shaders. At any rate, the answer depends heavily on the number of vertices per mesh. The answer to the second is the same, i.e. it wont be enough to matter.

random task wrote:
How is performance compared to other engines (e.g. Ogre3D)?
Not to disparage in anyway the good work of the Ogre team, but performance should be at least as good, especially considering how much lighter-weight the Horde implementation is. That said, there may be particular aspects that one engine has optimised more than the other. In particular, Ogre supports Direct3D, and thus may be faster on cards which have better D3D drivers.

random task wrote:
Are you exploiting concurrency?
Nobody is exploiting concurrency in rendering games at this point in time, outside of a few research labs. However, you can make your crowd simulation as concurrent as you find beneficial.


Top
 Profile  
Reply with quote  
PostPosted: 26.02.2008, 14:44 
Offline

Joined: 25.02.2008, 18:32
Posts: 5
DarkAngel wrote:
Horde is single-threaded ATM. Horde is only the renderer though - so the simulation that updates the crowd could easily be multi-threaded.


What about CPU/GPU concurrency? Many engines I have seen collect stuff to render from the scene graph, cull and sort them and then send them to the graphics API in one loop. This is not the best way to utilitze the GPU, as is sits idle while the scene is being processed. It is better to start drawing as soon as you have collected enough potentially visible geometry. Of course, this way you can't sort effectively for expensive state changes, and transparent objects have to be stored somewhere else to draw them at the end, but it should give you a performance increase when drawing many instances (crowds).


Last edited by random task on 26.02.2008, 15:04, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: 26.02.2008, 15:03 
Offline

Joined: 25.02.2008, 18:32
Posts: 5
swiftcoder wrote:
random task wrote:
How does Horde3D scale when rendering many instances?

As well as any other OpenGL-based engine will. VBOs and hardware skinning are the best you can do in this case.


There are some more techniques I have looked into: pseudo instancing, hardware instancing, sorting for least amount of expensive state changes, sorting front to back to reduce overdraw. Do you have any experience with those? I'm currently not quite sure what are the best ways to improve scaling...

swiftcoder wrote:
random task wrote:
How is performance compared to other engines (e.g. Ogre3D)?
Not to disparage in anyway the good work of the Ogre team, but performance should be at least as good, especially considering how much lighter-weight the Horde implementation is. That said, there may be particular aspects that one engine has optimized more than the other. In particular, Ogre supports Direct3D, and thus may be faster on cards which have better D3D drivers.


I always thought of Ogre being very bloated, and I like the approach you guys are taking with Horde3D. However, there is a plug-in which supports instancing using shaders, and I'm interested in the performance they are getting out of it. If only I'd have more time I'd try it out myself...

swiftcoder wrote:
random task wrote:
Are you exploiting concurrency?
Nobody is exploiting concurrency in rendering games at this point in time, outside of a few research labs. However, you can make your crowd simulation as concurrent as you find beneficial.


Actually, there are many attempts to use the parallel nature of modern PCs and consoles for current rendering engines, e.g. Source, Capcom's MT, and most other commercial "next-gen" engines. They are struggling to get performance out of the PS3's cell and that will be a big factor for the success of any 3D engine in the near future.
I'm currently looking for ways to utilize multi-core CPUs for massive crowd rendering, but besides culling, there is not much you can parallelize. Simulation, as you mention, has much more potential in that area. And there's also CPU/GPU concurrency, as I mentioned above and on my post on gamedev.net.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: 26.02.2008, 15:34 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
Horde does a few things to improve performance for crowd scenes. The most important one is the optimization of vertex throughput. Vertex arrays are using 16 bit indices whenever possible, vertex buffer size is a multiple of 32 bytes (to profit from memory alignment) and I have experimented with different layouts for the vertex data to find the most efficient one. Another very important trick which brings a clear performance improvement is storing the vertex data in a way that makes optimal usage of the vertex cache. Currently this is achieved using NVTriStrip in a preprocessing step.

Another thing that helps in crowd scenes is level of detail, although Horde is using nothing fancy here, just a simple distance-based scheme. Deferred shading can greatly improve performance since it reduces overdraw (as DarkAngel said) and if you have several lights, you only need to render the geometry once into the GBuffer instead of n times for n lights. Of course this reduces the number of draw calls and amount of vertex processing.

The scene graph brings some overhead of course but it is acceptable and brings so many other advantages that the overhead wouldn't justify to remove it. After all Horde has developed from a pure crowd renderer to a fully featured graphics engine. I would say it is designed to render "hundreds" of characters. If you want an engine that renders "thousands" of characters (e.g. for a massive RTS game) other methods (e.g. simple vertex tweening instead of skeletal animation) are required that definately limit flexibility and quality for more general cases.


Top
 Profile  
Reply with quote  
PostPosted: 26.02.2008, 15:52 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
random task wrote:
What about CPU/GPU concurrency? Many engines I have seen collect stuff to render from the scene graph, cull and sort them and then send them to the graphics API in one loop. This is not the best way to utilitze the GPU, as is sits idle while the scene is being processed.


Horde does it the same way. This doesn't necessarily mean that the GPU is idle since it can still be busy with the previous frame, while the engine already processes the next frame. I think I have read that modern GPUs are allowed to cache up to three frames.

And as you already said yourself rendering things immediately would make it impossible to do any sorting (depth or material) so I think that would not be practical for a real world engine (only for a tech demo).


random task wrote:
There are some more techniques I have looked into: pseudo instancing, hardware instancing, sorting for least amount of expensive state changes, sorting front to back to reduce overdraw. Do you have any experience with those? I'm currently not quite sure what are the best ways to improve scaling...


True instancing would definately help but is currently not widely supported in OpenGL. Another thing that you could do for really large crowds is implementing impostors so that distant characters are autmatically rendered as a cheap quads. There are also some nice tricks like rendering normal maps for impostors that can improve quality.


random task wrote:
I'm currently looking for ways to utilize multi-core CPUs for massive crowd rendering, but besides culling, there is not much you can parallelize. Simulation, as you mention, has much more potential in that area.


You can also parallelize the animation system. One idea I had for Horde which should be realtively easy to realize is that the render queue which contains the collected nodes from the scene graph is updated using several threads with OpenMP. The update of model nodes applies the animations to the mesh and joint child nodes in Horde.


Top
 Profile  
Reply with quote  
PostPosted: 26.02.2008, 16:23 
Offline

Joined: 25.02.2008, 18:32
Posts: 5
marciano wrote:
Horde does it the same way. This doesn't necessarily mean that the GPU is idle since it can still be busy with the previous frame, while the engine already processes the next frame. I think I have read that modern GPUs are allowed to cache up to three frames.

Where did you get that info? I was always thinking that once SwapBuffers() is called, the driver basically does a glFinish(), flushing the GPU pipelines and then immediately presents the rendered frame. Wouldn't the caching of frames beyond that fence lead to problems with input latency?

marciano wrote:
And as you already said yourself rendering things immediately would make it impossible to do any sorting (depth or material) so I think that would not be practical for a real world engine (only for a tech demo).

Not really impossible, just less efficient. You can balance between immediate rendering and some buffering/sorting until a certain threshold is reached. It would be perfect if one could query the GPU for it's rendering capacity, a quick test if it is still busy or idling around.

marciano wrote:
True instancing would definately help but is currently not widely supported in OpenGL. Another thing that you could do for really large crowds is implementing impostors so that distant characters are autmatically rendered as a cheap quads. There are also some nice tricks like rendering normal maps for impostors that can improve quality.

I'm not so sure if impostors are still worth the effort. IMHO, it only works if you have a limited set of models/characters to compose crowds of, each with a limited set of animations. Otherwise, there are just too many impostor frames to build, consuming lots of texture memory. And I have yet to see an impostors demo that actually looks good. You can always spot the difference, and next-gen is all about detail at high resolutions and that's not what impostors are good at.

marciano wrote:
random task wrote:
I'm currently looking for ways to utilize multi-core CPUs for massive crowd rendering, but besides culling, there is not much you can parallelize. Simulation, as you mention, has much more potential in that area.


You can also parallelize the animation system. One idea I had for Horde which should be realtively easy to realize is that the render queue which contains the collected nodes from the scene graph is updated using several threads with OpenMP. The update of model nodes applies the animations to the mesh and joint child nodes in Horde.


Nice. That's just where I want to go, too. But I haven't touched animation yet, still trying to figure out how to render static stuff as fast as possible.


Top
 Profile  
Reply with quote  
PostPosted: 26.02.2008, 18:00 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
random task wrote:
Actually, there are many attempts to use the parallel nature of modern PCs and consoles for current rendering engines, e.g. Source, Capcom's MT, and most other commercial "next-gen" engines. They are struggling to get performance out of the PS3's cell and that will be a big factor for the success of any 3D engine in the near future.

Thus a "few research labs". The PS3 is a very different (and very tricky) case, as the PowerPC core tanks on branch prediction and either the primary core or the cells suffer from starvation. And even there, I think you will find that by and large, most of the rendering is done on the GPU, and fed by a single thread.

random task wrote:
Not really impossible, just less efficient. You can balance between immediate rendering and some buffering/sorting until a certain threshold is reached. It would be perfect if one could query the GPU for it's rendering capacity, a quick test if it is still busy or idling around.

In general, your rendering is going to overshadow your scene-graph traversal by such a degree that the reduced rendering efficiency would cost more than you saved on traversal.

random task wrote:
I'm not so sure if impostors are still worth the effort. IMHO, it only works if you have a limited set of models/characters to compose crowds of, each with a limited set of animations. Otherwise, there are just too many impostor frames to build, consuming lots of texture memory. And I have yet to see an impostors demo that actually looks good. You can always spot the difference, and next-gen is all about detail at high resolutions and that's not what impostors are good at.

If every character model is different, instancing will not help either, so we have to assume that many models in the scene are duplicates. At which point, impostors for distant objects can be worthwhile, especially if you can keep the viewers attention mainly on the foreground.


Top
 Profile  
Reply with quote  
PostPosted: 26.02.2008, 18:31 
Offline

Joined: 25.02.2008, 18:32
Posts: 5
swiftcoder wrote:
In general, your rendering is going to overshadow your scene-graph traversal by such a degree that the reduced rendering efficiency would cost more than you saved on traversal.

For the current size of scenes I agree, but I'm thinking of tens of thousands of objects. Having that many instances, it is important to consider scaling, and to use all the concurrency you can get.

swiftcoder wrote:
If every character model is different, instancing will not help either, so we have to assume that many models in the scene are duplicates. At which point, impostors for distant objects can be worthwhile, especially if you can keep the viewers attention mainly on the foreground.

Let's say it is specific to the application whether impostors make sense. I just see so much work done on crowd rendering being focused on LOD and impostors, and it's just not the silver bullet many obviously believe it to be. There's a lot of the rendering power of modern GPUs going to waste in current applications, because they are starving while the CPU is busy with LOD management or similar tasks. And most of the time, it doesn't even get to use more than one of its cores.
The tough one is of course to write a general purpose engine, which can be configured and optimized for very diverse applications. That's why the game industry creates reusable engines only for specific genres like shooters or sports games. Developers who create more varied games end up rewriting all their rendering code from scratch. At least that's my impression.
Well, enough ranting for now, I will definitely look into some of the points you brought up, thanks again.


Top
 Profile  
Reply with quote  
PostPosted: 26.02.2008, 18:50 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
random task wrote:
marciano wrote:
Horde does it the same way. This doesn't necessarily mean that the GPU is idle since it can still be busy with the previous frame, while the engine already processes the next frame. I think I have read that modern GPUs are allowed to cache up to three frames.

Where did you get that info? I was always thinking that once SwapBuffers() is called, the driver basically does a glFinish(), flushing the GPU pipelines and then immediately presents the rendered frame. Wouldn't the caching of frames beyond that fence lead to problems with input latency?


I'm very sure that the driver doesn't wait when calling SwapBuffers but instead returns immediately control to the application. Otherwise you would really lose much of the parallelity between GPU and CPU. Usually you also shouldn't call glFinish since this definately stops program execution until GPU work has been finished. Out of my head I can't give you any links to papers that prove this. But as a small confirmation:

http://www.gamedev.net/community/forums/topic.asp?topic_id=316374&whichpage=1&#2027791


Top
 Profile  
Reply with quote  
PostPosted: 26.02.2008, 20:09 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
random task wrote:
Let's say it is specific to the application whether impostors make sense. I just see so much work done on crowd rendering being focused on LOD and impostors, and it's just not the silver bullet many obviously believe it to be. There's a lot of the rendering power of modern GPUs going to waste in current applications, because they are starving while the CPU is busy with LOD management or similar tasks. And most of the time, it doesn't even get to use more than one of its cores.

I haven't done much in crowd simulation, but I have done some procedural planet rendering, where we have to generate terrain patches dynamically on the CPU as the camera moves, and feed them to the GPU as fast as possible. Even when maintaining 10,000+ patches, 2 generation threads on a dual core system easily kept up with the GPU with all but the simplest of fixed-function rendering. In this case we have close to optimal GPU utilisation, because everything is rendered from VBO/Texture, and tiles are only added/removed at discrete intervals.

random task wrote:
The tough one is of course to write a general purpose engine, which can be configured and optimized for very diverse applications. That's why the game industry creates reusable engines only for specific genres like shooters or sports games. Developers who create more varied games end up rewriting all their rendering code from scratch. At least that's my impression.
Well, enough ranting for now, I will definitely look into some of the points you brought up, thanks again.

This is very true, and the motivation behind Ogre's pluggable scene managers - which work very well. Horde will no doubt need to develop such a system in the future, when someone starts with indoor rendering, or space-sims, etc.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: 27.02.2008, 05:56 
Offline

Joined: 19.11.2007, 19:35
Posts: 218
Quote:
What about CPU/GPU concurrency? Many engines I have seen collect stuff to render from the scene graph, cull and sort them and then send them to the graphics API in one loop. This is not the best way to utilitze the GPU, as is sits idle while the scene is being processed. It is better to start drawing as soon as you have collected enough potentially visible geometry. Of course, this way you can't sort effectively for expensive state changes, and transparent objects have to be stored somewhere else to draw them at the end, but it should give you a performance increase when drawing many instances (crowds).


Don't forget that the real issue is not speed, but consistency. If you've got to use all that processor power to get a decent rendering speed then your logic, physics, and AI are probably suffering for it.

If you wanted to theres's no reason why you can't modify the pipeline to let you skip in and out between stages to do some cpu work, you'd have to make sure you don't mess up continuity though. I don't you'd really gain anything noteworthy from it though.

The best way to get good and reliable speed (as we've seen from Capcom, Crysis and Unreal's Geminii) is to exploit compositing. As far as imposters go, the distant terrain in Crysis is an imposter. Some effects such as depth of field can even help you to reduce pixel processing if you use smaller buffers, and you even get a feature out of it! Your near and far planes dont always have be constant either. There's enough ways to make the GPU do its job quickly that priority one should be using the processor for the stuff its really good at.


Top
 Profile  
Reply with quote  
PostPosted: 29.02.2008, 10:05 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
swiftcoder wrote:
This is very true, and the motivation behind Ogre's pluggable scene managers - which work very well. Horde will no doubt need to develop such a system in the future, when someone starts with indoor rendering, or space-sims, etc.


The new extension mechanism is a clear step in that direction. But I doubt that Horde will ever be able to beat the flexibility of Ogre. That's because of the different philosophies. Ogre is a huge class library where you can derive your own implementations from the base classes. Horde with its flat interface is rather a component. So Ogre makes it easier to add new functionality to your rendering code while Horde has a much simpler interface that's portable to virtually any programming language. On the other hand Horde is so small that it is easy to customize if you have very specific needs (like a completely different shadow implementation). But pluggable scene managers should also work with the extension mechanism since Horde has a central repository for modules that you can overwrite.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group