Horde3D

Next-Generation Graphics Engine
It is currently 27.04.2024, 12:37

All times are UTC + 1 hour




Post new topic Reply to topic  [ 99 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next
Author Message
 Post subject: NOS PACK
PostPosted: 24.09.2008, 05:48 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
I've created a handy wrapper to use SSE instruction set to optimize the Horde3D engine and games very easy ! I will add some SSE2, SSE3 and multithreading features to the NOS PACK.

So I decided to first optimize the "utMath.h" because it's used in most of Horde3D code pieces. Currently more than 90% of utMath has been optimized. Hoping that this amends a bit of my duty to the Horde3D's admirable community !

Apr 17, 2013 wrote:
Attached files has been removed, to not confuse and mislead visitors with bad coding practices used in them. SSE optimizations should be done on higher level functions and (16byte aligned) streams of data using intrinsics for highest efficiency ; They won't yield much performance gain when not processing streams of data, and generated code might be slower than FPU.


Last edited by Siavash on 17.04.2013, 07:59, edited 24 times in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 24.09.2008, 06:15 
Offline

Joined: 08.11.2006, 03:10
Posts: 384
Location: Australia
Very interesting! I'm going to have a much closer look at this when I get home from work.

At first glance (without testing) I fear that all of these for loops might cancel out the performance gains from using SSE:
Code:
    for (unsigned short int i=0; i<4; i++)
    {
        num1[i]=numarray1[i];
        num2[i]=numarray2[i];
    }
Maybe we could eliminate all this extra copying by having the vector/matrix class store SSE types internally? Something like this:
Code:
class Vec3f
{
public:
#ifdef OPTIMISE_HORDE_MATH
    __m128 xyz;
#else
    float x, y, z;
#endif


Anyway, I'll try this out later and let you know how it goes ;)


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 24.09.2008, 06:28 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Good idea! I'll remove those for loops. Thanks for having a look at code :wink:

Edit : I've removed for loops in utMath beta2 :idea:
Edit : I've removed for loops in NOSPACK beta2 :idea:

Excuse me about my n00b programming style and making the code too complex


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 24.09.2008, 10:20 
Offline

Joined: 08.11.2006, 03:10
Posts: 384
Location: Australia
I've just started looking at the code on my home PC, but I realized that the alignment syntax doesn't work on MSVC, so I've made a macro that should be portable:
Code:
#ifdef GCC_VERSION
# define NOS_ALIGNED( V, N )   V __attribute__((aligned( N )))
#else
# ifdef _MSC_VER
#  define NOS_ALIGNED( V, N )   __declspec(align( N )) V
# else
#  error "Unrecognised compiler"
# endif
#endif
So instead of writing this:
Code:
    float num1[4] __attribute__((aligned(16)));
    float num2[4] __attribute__((aligned(16)));
We can write this:
Code:
   NOS_ALIGNED( float num1[4], 16 );
   NOS_ALIGNED( float num2[4], 16 );
...and it will work on both GCC and MSVC ;)


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 24.09.2008, 12:11 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Thank you very much dear DarkAngel for compatibility tip.
I've installed the MSVC2005 and your patch solved the compatiblity problems with MSVC. But I prefer to use Code::Blocks + MinGW 3.4.5 compiler.

When I'm compiling the project using MinGW I'm getting the "Unrecognised compiler" error while the GCC and MinGW compilers are too similar.I think changing your patch to this will solve the problem with MSVC and other GCC like compilers such as MinGW in my case :

Code:
#ifdef GCC_VERSION
# define NOS_ALIGNED( V, N )   V __attribute__((aligned( N )))
#else
#ifdef __MINGW32__
# define NOS_ALIGNED( V, N )   V __attribute__((aligned( N )))
#else
# ifdef _MSC_VER
#  define NOS_ALIGNED( V, N )   __declspec(align( N )) V
# else
#  error "Unrecognised compiler"
# endif
#endif
#endif


These changes are available at NOSPACK beta3 :idea:


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 24.09.2008, 15:40 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Is there anybody to test the new utMath.h + NOSPACK ? I want to be notified if there is any problems and solve them.

I can't run the Horde3D engine , because I have not the required hardware. I'm using PIII750 + Radeon7000


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 24.09.2008, 18:26 
Offline
Tool Developer

Joined: 13.11.2007, 11:07
Posts: 1150
Location: Germany
Could you redesign it a bit. I guess using C functions instead of member functions would be better. You have to rename the functions so that they don't clash with the system internal functions. That would avoid the construction of a NOSPACK object in every method. It's currently also a problem that the functions are not exported from the DLL. This way the Horde3DUtils and the applications using the Matrix class can't be compiled anymore.


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 24.09.2008, 21:08 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
Hey Siavash, basically I like what you are doing and it looks promising. But I suspect that the way it is currently realized, it is probably slower than before. You need a plenty of support code to get your SEE instruction started for simple operations like vector multiplication or addition. I had a quick look and saw much data copying happening. I think in general you will only profit from SIMD instructions for a more "complex" problem like matrix multiplication.
But my practical experience with SSE is quite limited, all I did was one hand-coded assembler SSE matrix mult function. Your code needs to be tested to see how it behaves in practice. Since you can't run the engine, I think it would make sense to write a small synthetic benchmark that does some vector math and measures the time with and without vectorization/optimization. I think this extra benchmark would even make sense if you could run the engine ;)


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 24.09.2008, 22:21 
Offline

Joined: 18.05.2008, 17:47
Posts: 96
http://nebuladevice.svn.sourceforge.net ... c/mathlib/
http://softwarecommunity.intel.com/arti ... g/2494.htm
dunno if they are related or helpful


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 24.09.2008, 23:26 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
I never said that I'm going to leave optimizations like this and these are some beta versions so I will profile the code after finding all of the problems.Currently my big problem is that these two new files [utMath, NOS PACK] has been created in a 3 hardworking days and I've never tested the code since those 3 days. Recently I've tested the code on a Sempron3600+ Geforce6200 and results were too confusing, after waiting for ~3mins nothing apears on the screen [excuse me team]. And I don't know where is the problem, my mathematic calculations or function conflicts with system ?

I'm going to redesign the NOS PACK and finding a way to reduce that data copying between arrays but any ideas and tips or learning resources are welcome :wink:

Thanks for your respect !


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 25.09.2008, 12:33 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
I want to change the NOS usage structure to something like this. IMO this avoids the array copyings and makes the code more secure and reduces the runtime errors because of wrong management of arrays and the final code will be to easier than before to read.

Code:
//using m128 types to remove the arrays
__m128 a,b,c;

float array1[3];

//loading float variables to the m128 variables
//and let the NOS PACK to manage other things
a=H3DNOS.load(x[0], y[1], z[3]);
b=H3DNOS.load(x[1], y[2], z[0]);

//performing mathematic operations on m128 packed floats
c=H3DNOS.mul(a, b);

//putting the results in the destination array
H3DNOS.store(c, array1);


Any body have any idea ?


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 25.09.2008, 14:19 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
Perhaps using an anonymous union, like this?
Code:
class Vec3f
{
public:
    union {
#ifdef OPTIMISE_HORDE_MATH
        __m128 xyz;
#endif
        struct {
            float x, y, z;
        };
    };
}

_________________
Tristam MacDonald - [swiftcoding]


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 26.09.2008, 00:32 
Offline

Joined: 08.11.2006, 03:10
Posts: 384
Location: Australia
swiftcoder wrote:
Perhaps using an anonymous union, like this?
Would that work? Inside the CPU, the m128 would be stored in a 128 bit SSE register while the float's would be stored in 3 80-bit FPU registers.
I'm not sure if the compiler would be clever enough to retrieve the right values if you quickly accessed the float's after modifying the m128.
I guess there's one way to find out :wink:


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 26.09.2008, 04:00 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
I don't know how much you guys are familiar with SSE programming but we can't use m128, m128d and m128i types like normal int or float types and you must load an array of float, double and ... to them.Their usage is something like this :
Code:
#include <xmmintrin.h>
//emmintrin for SSE2 and pmmintrin for SSE3

//m128 types can store 4xfloats and m128d can store 2xdoubles
//m128i and m128d are available in SSE2, SSE3 and SSE4.x
//and they must be 16bit aligned or multiply of 8
float num1[4] __attribute__((aligned(16)))={100, 2, 4.45, 99.66};
float num2[4] __attribute__((aligned(16)))={0.11, 88.31, 7.88, 19.33};

//Now you must load them into the m128 types
__m128 a,b,c;
a=_mm_load_ps(num1);
b=_mm_load_ps(num2);

//Now you can perform operations on them
c=_mm_div_ps(a,b);

//At last you must store them into normal arrays
float out[4];
_mm_store_ps(out, c);

for(int i=0; i<4; i++)
cout<<out[i]<<endl;

BTW, the new releases will be available at next 2 weeks or a bit later after some redesigning and code profilings [Don't worry I'll do my best to release them as soon as possible :wink: ]


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 26.09.2008, 13:40 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
DarkAngel wrote:
swiftcoder wrote:
Perhaps using an anonymous union, like this?
Would that work? Inside the CPU, the m128 would be stored in a 128 bit SSE register while the float's would be stored in 3 80-bit FPU registers. I'm not sure if the compiler would be clever enough to retrieve the right values if you quickly accessed the float's after modifying the m128.
It should be clever enough, but the union may force some odd register usage.

Once someone has a working SSE vector class, it would be interesting to benchmark it against the original vector class, with GCC 4 and 'auto-vectorise' enabled - I think that GCC should be able to fully vectorise the class, but I might be wrong.

_________________
Tristam MacDonald - [swiftcoding]


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 99 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 35 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group