Warren Nagourney wrote:
I'm not surprised that it is this fast with single-precision arithmetic, even as scalar code. Do you have any data for double-precision?I found that the xlc compiler produces code that runs about 15%-20% faster than gcc. This is on a program with a moderate amount of single precision arithmetic. The interesting thing is that the same (scalar) code runs about 30% faster on the spu as it does on the ppu (with the same 15% advantage to xlc). Provided the code and data fit in the local store of the spu, it seems better to use the spu for scalar code! Compared to a G4, the ppu is more or less equivalent to a 1.2 GHz processor and the spu is equivalent to a 1.7-2 GHz G4. For single-precision arithmetic, the SPU can process a new value each clock tick (it's a 6-cycle calculation, but fully-pipeline-able). For double-precision, not only does it take 13 cycles for the calculation, but it completely stalls the processor for the first 6 cycles! It's almost as bad as a branch! Jon --
![]() |
_______________________________________________ yellowdog-general mailing list [email protected] http://lists.terrasoftsolutions.com/mailman/listinfo/yellowdog-general HINT: to Google archives, try '<keywords> site:terrasoftsolutions.com'

