@Yuvaraj, Jason means all of m1 is uninitialize before instruction execute, you said it is initialize after. see Intel doc for palignr, the register m1 is src and dest, it is logic problem, but it save a mov instruction and work fine. At 2014-01-17 14:42:31,"Yuvaraj Venkatesh" <yuva...@multicorewareinc.com> wrote:
Only the last pixel got the dependency of m1(uninitialized), anyway that particular pixel was not used anywhere on the code. Moreover psrldq has higher latency than the palignr and also need to use additional mov instruction. On Fri, Jan 17, 2014 at 12:08 PM, chen <chenm...@163.com> wrote: At 2014-01-17 14:00:55,"Jason Garrett-Glaser" <ja...@x264.com> wrote: >+ movu m0, [r2 + 1] ; [16 15 14 13 >12 11 10 9 8 7 6 5 4 3 2 1] >+ palignr m1, m0, 1 ; [x 16 15 14 >13 12 11 10 9 8 7 6 5 4 3 2] > >Shouldn't this be pslrdq or similar? The dependency on uninitialized >registers is a bit weird too... This algorithm is suggest by me, the psrldq can't move register, we have to wasting some instruction to do it. Of course, we have a restrict use uninitialize value on other instruction. _______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel
_______________________________________________ x265-devel mailing list x265-devel@videolan.org https://mailman.videolan.org/listinfo/x265-devel