Small update:
I have a new version running which looks like this:
- One camera for all characters, instead of one for each.
- For this camera I'm generating a pipeline on startup which, for each agent generates two stages:
Code:
...
...
<Stage id="VertexDepth42">
<SwitchTarget target="BUFFER"/>
<ClearTarget depthBuf="true" colBuf0="true"/>
<DrawGeometry context="DEPTH" class="synthetic"/>
</Stage>
<Stage id="compResults42">
<SwitchTarget target="AGENTRESULTS"/>
<BindBuffer sampler="view" sourceRT="BUFFER" bufIndex="0"/>
<DrawQuad context="ONE_CAM_COMPUTATION" material="materials/syntheticComputation.material.xml"/>
</Stage>
...
...
Using 200 agents this results in a 2.000-lines-pipeline
Stage
VertexDepth42 renders the view of agent 42 to the
BUFFER-rendertarget.
Stage
compResults42 is supposed to analyze this image and write its results to the
AGENTRESULTS target at a specific pixel position reserved for this agent.
The next stage in the pipeline is
VertexDepth43 which overwrites the
BUFFER target with the new agent Nr. 43s-view.
compResults43 writes its results to the
AGENTRESULTS-target next to the ones from the previos agent.
AGENTRESULTS is downloaded to the application at the end of the frame using
h3dGetRenderTargetData.
Using this setup I doubled my framerate (200 agents = 20fps/30fps, amd/nvidia).
It could be a lot higher, but I still have to use
DrawGeometry for each agent, meaning I am still uploading my synthetic scene from cpu to gpu for each agent.
With the available pipeline commands I see no way to use the uploaded geometry for all my characters.
Does anyone have any suggestions or can tell me if there is a way to reuse the uploaded geometry on the gpu, without any culling or clipping, multiple times?
Could this be made possible using a geometry shader?
_________________________________________________________________
Regarding batching/billboarding: I have not yet implemented any batching of the cylinders, or using billboarded quads like zoombapup suggested. I'm sure this would improve my framerate further, but the main bottleneck,
drawgeometry and
drawquad and all the OpenGL calls that are the result of this, will remain.
Another optimization opportunity could be the 10% redundant OpenGL state changes I am currently having according to gDEBugger, which supposedly can "reduce render performance" and "are not cheap". I'll have to look into this.
Any input would be appreciated. Cheers.