Horde3D

Next-Generation Graphics Engine
It is currently 26.11.2024, 07:30

All times are UTC + 1 hour




Post new topic Reply to topic  [ 99 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 7  Next
Author Message
 Post subject: Re: NOS PACK
PostPosted: 26.09.2008, 14:43 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
After some discussions about utMath with Volker, we decided to use C functions to optimize utMath to avoid the runtime problems and make the code more secure. [final optimizations and profilings are on the way]

note : Now there is no need to include NOSPACK in Horde3D engine and now they are two separate things.

I will release the new version of NOSPACK a few days [or hours] later to amend the features such as min/max finding, sqrt, ... that utMath lacks; By using NOSPACK for in game optimizations you can make your life more easier by using SSE [SSE2, SSE3, SSE4.x and multithreading are on the way]

Please don't forget to report me the performance boosts and bugz :wink:


Last edited by Siavash on 01.10.2008, 15:10, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 27.09.2008, 07:32 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
utMath_rc2 contains some bug fixes and these two functions has been optimized too : rayAABBIntersection, nearestDistToAABB


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 28.09.2008, 08:26 
Offline
Tool Developer

Joined: 13.11.2007, 11:07
Posts: 1150
Location: Germany
The current version crashes because the variables used to store the output variables are not SSE_ALIGNED. After I added the macro to all outputs the application does not crash any more but does not work correctly. I guess there is something wrong somewhere in the calculations. Don't have the time to check it in detail.


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 28.09.2008, 14:10 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Volker wrote:
The current version crashes because the variables used to store the output variables are not SSE_ALIGNED. After I added the macro to all outputs the application does not crash any more but does not work correctly. I guess there is something wrong somewhere in the calculations. Don't have the time to check it in detail.

I'll take a look in calculations to find the problems [perhaps the problem is bad array managements]

Thanks a lot dear Volker for programming tips and helping me to find bugs and finish the job as soon as possible :wink:


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 01.10.2008, 10:07 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Hi, most of bugs and compatibility problems has been fixed.Check out the new utMath_rc3 !

Please don't forget to report me the performance boosts and bugs :cry:

Special thanks to the Volker and DarkAngel :wink:


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 04.10.2008, 10:20 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Today I've performed some minor optimizations on Matrix4f class : determinant and inverted functions.So I decided to measure the code speed inverted function speed :

MSVC 2005 compiled code :
utMath : 10s
utMath_rc3 : 19s
optimized utMath_rc3 : 26s

MinGW 3.4.5 compiled code :
utMath : 6s
utMath_rc3 : 16s
optimized utMath_rc3 : 23s

[PentiumIII 750 + 256MB memory]

The results are too confusing, utMath_rc3 is 2x slower than original utMath :?
This is the results of that memory copyings, I must fully change the code structure, anybody have any ideas ?

BTW, MinGW 3.4.5 compiled code is faster than MSVC 2005 :lol:


Last edited by Siavash on 05.10.2008, 17:40, edited 1 time in total.

Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 05.10.2008, 13:05 
Offline
Tool Developer

Joined: 13.11.2007, 11:07
Posts: 1150
Location: Germany
I don't have any ideas, but this confirms short tests I had with your versions. Using the Chicago Demo together with software skinning the framerate dropped on my notebook from ~7fps with the original utMath to ~4.5fps with your version.


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 05.10.2008, 14:02 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Volker wrote:
I don't have any ideas, but this confirms short tests I had with your versions. Using the Chicago Demo together with software skinning the framerate dropped on my notebook from ~7fps with the original utMath to ~4.5fps with your version.

Thanks for testing the code dear Volker.As marciano said the code will be slower than before because of extra memory copyings.

I/We must fix this problem, but currently I don't have any ideas.I must have some researchs about this problem :wink:


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 05.10.2008, 14:29 
Offline

Joined: 18.05.2008, 17:47
Posts: 96
http://www.cs.technion.ac.il/~zdevir/main1.html
it uses unions


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 05.10.2008, 15:27 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
kal wrote:
http://www.cs.technion.ac.il/~zdevir/main1.html
it uses unions

Thanks dear kal.I'll have a l00k @ kode 8)


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 05.10.2008, 16:46 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
There is a composite function in Intel C++ intrinsics : _mm_set_ps(z,y,x,w)
This removes that num1 and num2 arrays and there is no need to load them into __m128 variables using _mm_load_ps(num1) and _mm_load_ps(num2).IMHO this reduces that extra memory copyings and makes the code a bit faster without fully changing the utMath structure.

I must have a deeper lOOk in Intel's SSE manuals and perform some benchmarks again.


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 05.10.2008, 17:42 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
Also a great article, although still using inline asm an not intrinsics:

http://www.cortstratton.org/articles/OptimizingForSSE.php


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 05.10.2008, 17:48 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Yes, this is it ! I've benchmarked the original utMath, utMath_rc3 and optimized utMath_rc3 [using _mm_set_ps] : sum of 2x Matrix4x4

utMath : 1s
utMath_rc3 : 4s
optimized utMath_rc3 : 2s [I've just optimized the Matrix4f + operator]

I'm going to remove that extra array copyings :D


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 05.10.2008, 17:58 
Offline

Joined: 21.08.2008, 11:44
Posts: 354
Thank you very much dear marciano for that great article !

I know that inline asm code is a bit faster than intrinsics, but IMHO it's better to use C++ intrinsics because by this way compiler knows what you are going to do and performs the related optimizations on code and compiled code will be faster than inline asm code [I'm not sure but I think there is also some problems in inline asm with non MSVC compilers]


Top
 Profile  
Reply with quote  
 Post subject: Re: NOS PACK
PostPosted: 05.10.2008, 18:34 
Offline
Engine Developer

Joined: 10.09.2006, 15:52
Posts: 1217
For learning purposes, it could be interesting for you to look at Nebula and Bullet.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 99 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 7  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 31 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group