Horde3D

Next-Generation Graphics Engine
It is currently 29.09.2024, 04:31

All times are UTC + 1 hour




Post new topic Reply to topic  [ 99 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author Message
 Post subject: Re: NOS PACK
PostPosted: 22.10.2008, 12:28 
Offline

Joined: 14.04.2008, 15:06
Posts: 183
Location: Germany
Siavash wrote:
I've upgraded my compiler from MinGW/GCC 3.4.5 to MinGW/GCC 4.3 alpha. By using which flag compiler enables the auto-vectorize feature? Currently I'm using -fexpensive-optimizations flag.


man gcc / man g++:
-ftree-vectorize: enable vectorization
-ftree-vect-loop-version: duplicates the codes for loops which can be vectorised but it's unknown whether the data is aligned or not; this results in a runtime check and appropriate branching
-ftree-vectorizer-verbose=n: n=1 reports each vectorized loop, ...
Depending on your platform this must be combined with -msse -msse2 or -march=...


Siavash wrote:
Why PIII optimized code [-march=pentium3] is slower than using -msse ?[/size]

pentium3 should include -msse. Which options are you comparing? And how did you benchmark them exactly?


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 22.10.2008, 12:44 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
My respects fly out to the dear marciano. After having a quick look at engine I found that Horde3D is really too simple and easy to learn ! So there is no need to change the whole structure of engine to optimize it, just some small pieces of code needs to be changed :wink:

Thanks a lot dear Codepoet for help. I've performed the benchmarks again and results was similar, perhaps I was wrong with something !


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 22.10.2008, 13:55 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
Siavash wrote:
swiftcoder wrote:
I have a feeling that you wont get any better performance this way than letting the compiler generate the same code automatically using unions.
Can you describe your opinion a bit more, it's too interesting :wink:
Given these two code samples:
Code:
class Vec3f {
public:
   union {
      __m128 m128;
      struct {float x, y, z};
   };
};

Vec3f v;
do_something(v.y);

Code:
class Vec3f {
private:
   __m128 m128;
public:
   float getY() {
      SSE_ALIGNED(float fv[4]);
      _mm_store_ps(fv, m128);
      return fv[1];
   }
};

Vec3f v;
do_something(v.getY());
The compiler should be better able to optimise the union approach.

_________________
Tristam MacDonald - [swiftcoding]


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 22.10.2008, 15:49 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
The first code looks better and a bit faster :
Code:
class Vec3f {
public:
   union {
      __m128 m128;
      struct {float x, y, z;};
   };
};

Vec3f v;
do_something(v.y);
But how to sync the x,y and z with m128 ? or compiler syncs them :| [I mean that their values must be same]


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 22.10.2008, 16:05 
Offline

Joined: 14.04.2008, 15:06
Posts: 183
Location: Germany
a union is a special struct which has (at least) the size of the largest member. All members share the same memory space, so when one writes to one member one can read back the data using another member. So it's always in sync.
But beware of some problems in regard to aliasing: see man gcc, -fstrict-aliasing
In short: Don't use pointers to union members.


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 22.10.2008, 17:04 
Offline

Joined: 22.11.2007, 17:05
Posts: 707
Location: Boston, MA
Codepoet wrote:
a union is a special struct which has (at least) the size of the largest member. All members share the same memory space, so when one writes to one member one can read back the data using another member. So it's always in sync.
Whichever approach you take, there is going to be a lot of register shifting (SSE to FPU and back), but with the unions we let the compiler do it, and that lets it optimise better.

_________________
Tristam MacDonald - [swiftcoding]


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 22.10.2008, 17:49 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Thanks a lot dear Codepoet and swiftcoder. It seems that there is a lot of things that this n00b must learn from pro members of community and use their great experiences :wink:

IMHO by using unions there is no need to change the other sections of engine but we can gain a little performance again by performing some changes on them. So what is your opinions ?


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 22.10.2008, 18:44 
Offline

Joined: 14.04.2008, 15:06
Posts: 183
Location: Germany
First avoid changing the engine structure and implement a way to show that your changes make Horde3D really faster. If you can improve FPS by say consistently 10% that's a good indicator. But if it's only 1% it will be lost in the noise. To show that it's really faster you need to implement a small timer which shows how much time is spent in the optimized section.
Maybe you can even provide timing details for the different tasks in Renderer::render.


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 25.10.2008, 04:01 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Hi, recently I've tested the utmath_rc5 on a PentiumIV 3.4 + 1GB Memory + Geforce7300 and there is some problems with Knight sample. First it shows a fuzzy screen [it think they are particles] then shows the knight for a small time. Anybody knows whats the problem ? [And there isn't any feelable fps boosts on chicago]


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 25.10.2008, 14:21 
Offline

Joined: 14.04.2008, 15:06
Posts: 183
Location: Germany
There's no predefined macro GCC_VERSION on gcc ;)
Use __GNUC__. See http://www.delorie.com/gnu/docs/gcc/cpp_22.html

Probably some of the calculations are wrong. I'd check the translation and multiplication stuff first.

Do you have a test suite testing to make sure everything is still working? Ideally comparing the implementations with some well choosen and many random inputs.


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 25.10.2008, 15:54 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Thanks a lot for predefined macro __GNUC__ on gcc tip dear Codepoet, now there isn't any problems with MinGW32 too ! Currently I don't have a complete test suite but I've checked the outputs of Vec3f, Vec4f, Quaternion and Matrix4f classes with random inputs and results are too similar.

utMath_rc6 is now available for download. Now it's using unions and ~1.5x faster than utMath_rc5 + a bugfix on Quaternion Class : slerp() and compilers compatibility.

Special thanks to the Codepoet and swiftcoder :wink:


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 25.10.2008, 16:33 
Offline

Joined: 14.04.2008, 15:06
Posts: 183
Location: Germany
Compiles, but produces very strange results in Chicago and Knight.


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 25.10.2008, 16:38 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Can you describe what exactly happens ? I'll have a test on utMath_rc6 again tomorrow.


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 25.10.2008, 16:50 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
There is some problems with MSVC2005 too :
Code:
error C3861: '_mm_cvtss_f32': identifier not found
error C3861: '_mm_cvtss_f32': identifier not found
error C2719: 'v': formal parameter with __declspec(align('16')) won't be aligned
error C3861: '_mm_cvtss_f32': identifier not found
error C3861: '_mm_cvtss_f32': identifier not found
error C2719: 'axis': formal parameter with __declspec(align('16')) won't be aligned
error C3861: '_mm_cvtss_f32': identifier not found
I've tested the code with MinGW 4.3 alpha and there isn't any errors :?

Any body knows how to fix them ?


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 25.10.2008, 16:53 
Offline

Joined: 14.04.2008, 15:06
Posts: 183
Location: Germany
http://server2.noeding.net/tmp/chicago.png
http://server2.noeding.net/tmp/knight.png

Some of the math is still wrong.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 99 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 19 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group