Performance Optimization
Results of performance optimization study on both PowerPC and CoreDuo machines. 100 runs of the same two functions were done and the best time from each is recorded as changed are made to the code and compiler flags.
The “Sum” test sums 10,000 vectors (c = a + b).
The “Diffuse” test runs a fluid diffusion pass on a 2D array of vectors.
PowerPC (G5 1.8Ghz)
Change | Sum | Diffuse |
Baseline | 28ms | 48ms |
Switch to vFloat type | 68ms | 116ms |
'inline' Vector ctor | 69ms | 128ms |
AltiVec Vector functions | 27ms | 62ms |
'inline' AltiVec functions | 25ms | 58ms |
'inline' getNeighborSum() | 25ms | 38ms |
Hand tune diffuse with vec_madd | n/a | 23ms |
-mtune=G5 | 24ms | 22ms |
-ffast-math=16 | 24ms | 22ms |
-falign-loops=16 | 24ms | 22ms |
Intel (Core Duo 2Ghz)
Change | Sum | Diffuse |
Baseline | 43ms | 81ms |
Inline SSE | 18ms | 29ms |