Horde3D

Next-Generation Graphics Engine
It is currently 19.12.2018, 16:36

All times are UTC + 1 hour




Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: 07.06.2011, 20:51 
Offline

Joined: 11.09.2010, 20:21
Posts: 44
Location: Germany
In the Horde3D source code there is the following comment:
Quote:
OpenGL 2.1 supports mat4x3 but it is internally realized as mat4 on most hardware so it would require 4 instead of 3 uniform slots per joint

But why then not use mat3x4, which is stored as 3 columns and therefore dosn't waste any space (the GLSL 1.20 spec states that at least for attributes, but I'm sure it applies to uniforms, too).

You would treat this matrix as transposed in the shader code and therefore the rows as columns (so multiply vec*mat), but this is done with the current system, too. It doesn't impose any change on the library code, the skinmat rows (treated as columns) are still transmitted by a single glUniform call. It would just be a more semantically correct handling of the matrix (as a real matrix and not just a collection of column vectors) and it could eliminate some of the helper functions of the utilityLib that need to explicitly construct a matrix from the columns, as I'm not sure if the GLSL compiler is really that smart to eliminate all these function calls and contructors.

And as said earlier, switching from 2.0 to 2.1 as requirement doesn't really bring backward compatibility issues, as modern 2.0 cards (supporting FBOs) should also support 2.1 and Horde3D uderstands itself as a NextGen engine anyway.


Top
 Profile  
Reply with quote  
PostPosted: 08.06.2011, 01:49 
Offline

Joined: 08.11.2006, 03:10
Posts: 384
Location: Australia
I was working on a similar shader at work recently (in HLSL, but I'll use Horde's GLSL to illustrate).

Our skinning shader worked the same as Horde's, in that it contained a uniform array of individual columns. When I took over maintaining the shader, I decided to convert it to store 3x4 matrices instead (in HLSL you've got the column_major and row_major keywords to control packing, so that both mat3x4 and mat4x3 can be stored in 3 registers).

So I started out with something like:
Code:
   return mat4( skinMatRows[jointIndex * 3],
             skinMatRows[jointIndex * 3 + 1],
             skinMatRows[jointIndex * 3 + 2],
             vec4( 0, 0, 0, 1 ) );
And ended up with something like:
Code:
   return mat4( skinMats[jointIndex],
             vec4( 0, 0, 0, 1 ) );

After compiling both versions and checking out the assembly code, they were practically identical -- the compiler had taken my first version and added the "*3", "+1" and "+2" into it, in order to access the individual registers storing the parts of the matrix anyway.
Assuming that GLSL compilers are as good as the HLSL one, it's probably not a concern performance wise.

In the end, I kept the first version of the code, but I stripped out the "* 3" part, and edited our model exporter to pre-multiply all joint indices with 3 when they're exported, which saves a few cycles per vertex :wink:


Top
 Profile  
Reply with quote  
PostPosted: 08.06.2011, 11:20 
Offline

Joined: 11.09.2010, 20:21
Posts: 44
Location: Germany
Ok, and what about completely leaving away the function and constructor creating a mat4 and just working with mat3x4s all over the place? As that is the real advantage.


Top
 Profile  
Reply with quote  
PostPosted: 08.06.2011, 20:51 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
If you are interested, you can use AMD's GPU ShaderAnalyzer to check the assembly that is created for different Radeon models with different Catalyst versions.

Non-square matrices are not supported by ES, so it's probably better to not use them if not required. Functions always get inlined in shaders, so there is no call overhead. It is possible to save a few instructions though. Premultiplying 3 as DarkAngel suggests offline or even once in the shader can help. When doing the matrix multiplication manually, the fourth row can be ignored. Also passing the joint indices as int rather than float can reduce instructions, although a long time ago that made things slower on some hardware. All the index computation could be avoided by using three separate uniform arrays (row0, row1, row2) but then the access is more scattered and constant waterfalling will be worse which will probably take any benefit.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group