From: Nemanja Lukic nemanja.lu...@rt-rk.com
Performance numbers before/after on MIPS-74kc @ 1GHz
Referent (before):
lowlevel-blt-bench:
over_n_8_ = L1: 10.40 L2: 9.79 M: 8.47 ( 33.62%) HT: 7.64
VT: 7.59 R: 7.48 RT: 5.30 ( 40Kops/s)
over_n_8_0565 = L1: 7.40 L2:
I started porting my src__0565 MMX function to SSE2, and in the
process started thinking about using SSE3+. The useful instructions
added post SSE2 that I see are
SSE3: lddqu - for unaligned loads across cache lines
SSSE3: palignr - for unaligned loads (but requires software