Volker wrote:
I don't have a PIII available, but I can tell you that in a short test with the Chicago Demo and software skinning your current utMath_rc4 runs with about 4.5 fps, when enabling the /arch:SSE2 parameter in MSVC it even drops down to a maximum of 4 fps. With the original utMath and disabled /arch:SSE2 parameter it runs with ~7 fps and with enabled /arch:SSE2 and original utMath it drops to ~5.5 fps. All tests were run on a Asus A8JS with a Core2 Duo 7200 with 2 Ghz and a Geforce7700.
I don't want to compare SSE with SSE2.For ex PIII only supports MMX and SSE, on other hand new PentiumIV series supports MMX,SSE,SSE2 [and SSE3 I think] and HT technology.I want to compare the P3 SSE unit with new P4 and Core2Duo series SSE units performance [not SSE2 or SSE3].
I want to know is there any improvments on new cpus SSE units by cpu manufactures ? There is a few benchmarks in Intel manuals that they show HT enabled SIMD code runs 44% faster than normal SIMD codes and ....
Btw, I want to know that my code runs faster than FPU or remains ~1:1 like my old P3 ?
swiftcoder wrote:
Individual SSE instructions tend to maintain fairly steady relative performance between chip releases. However, you do have access to a much larger set of SSE instructions on newer chips, some of which might prove useful.
There isn't any benefits from SSE2 to engine.Because most of variables are in float format [not double] and SSE2 only adds ~140 new instructions to perform operations on double format [m128d and m128i] variables. But SSE3 adds 14 new useful instructions that they are useful for engine too.For ex Horizontal operations such as ADD-SUB, ADD and SUB on float variables [m128] too.
I'm not sure but I think because of engine's hardware limitations [PCI-Express enabled mainboards] target cpus support SSE3 and HTT well.Is there any hardware exceptions ?
About SSE4.x I must to say that is only supported in new core2duo series [not old LGA pentium4 cpus] and IMHO this only adds to the engine's limitations
I'm performing some optimizations on utmath_rc4 and it's atleast 1:1 with FPU code and in most of functions new utmath will compete very well with compiler auto vectorized FPU code.The job is near to be finished and I'll put it on forum next days
