marciano wrote:
Horde does it the same way. This doesn't necessarily mean that the GPU is idle since it can still be busy with the previous frame, while the engine already processes the next frame. I think I have read that modern GPUs are allowed to cache up to three frames.
Where did you get that info? I was always thinking that once SwapBuffers() is called, the driver basically does a glFinish(), flushing the GPU pipelines and then immediately presents the rendered frame. Wouldn't the caching of frames beyond that fence lead to problems with input latency?
marciano wrote:
And as you already said yourself rendering things immediately would make it impossible to do any sorting (depth or material) so I think that would not be practical for a real world engine (only for a tech demo).
Not really impossible, just less efficient. You can balance between immediate rendering and some buffering/sorting until a certain threshold is reached. It would be perfect if one could query the GPU for it's rendering capacity, a quick test if it is still busy or idling around.
marciano wrote:
True instancing would definately help but is currently not widely supported in OpenGL. Another thing that you could do for really large crowds is implementing impostors so that distant characters are autmatically rendered as a cheap quads. There are also some nice tricks like rendering normal maps for impostors that can improve quality.
I'm not so sure if impostors are still worth the effort. IMHO, it only works if you have a limited set of models/characters to compose crowds of, each with a limited set of animations. Otherwise, there are just too many impostor frames to build, consuming lots of texture memory. And I have yet to see an impostors demo that actually looks good. You can always spot the difference, and next-gen is all about detail at high resolutions and that's not what impostors are good at.
marciano wrote:
random task wrote:
I'm currently looking for ways to utilize multi-core CPUs for massive crowd rendering, but besides culling, there is not much you can parallelize. Simulation, as you mention, has much more potential in that area.
You can also parallelize the animation system. One idea I had for Horde which should be realtively easy to realize is that the render queue which contains the collected nodes from the scene graph is updated using several threads with OpenMP. The update of model nodes applies the animations to the mesh and joint child nodes in Horde.
Nice. That's just where I want to go, too. But I haven't touched animation yet, still trying to figure out how to render static stuff as fast as possible.