On quarta-feira, 16 de maio de 2012 13.34.42, Pavel Vasin wrote: > on x86: > benchmarked magic: 14.048889885s > benchmarked div: 5.426952392s > benchmarked mul: 4.034106976s > > on x86-64: > benchmarked magic: 2.467789582s > benchmarked div: 9.748067755s > benchmarked mul: 8.665307997s
Those are interesting numbers. The magic I understand being different, since on
x86 it's using two 32-bit registers and needs to do some magic on the magic to
support the 64-bit operation. The Intel optimisation manual says that the ADC
(add with carry) instruction isn't the fastest.
However, the div and mul are much more interesting. Those ought to be the
same, so I am actually wondering how it is possible that the div and mul on
x86-64 can be so slow.
Did you compile your 32-bit code with -mfpmath=sse? If not, could you try and
post the results again? I'd be quite surprised if it turned out that the x87
operations are faster than the SSE ones, but that's what your numbers show.
Either way, the x86 div and mul are still slower than the x86-64 magic, but
faster than the x86 magic, so it looks like your patch is correct, given your
benchmarks. It might be that other 64-bit platforms have similar benefits,
though, in which case the if should be defined(__x86_64__) || defined(__LP64__).
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Intel Sweden AB - Registration Number: 556189-6027
Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ wayland-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/wayland-devel
