Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-14 Thread Ganesh Ajjanagadde
On Thu, Jan 14, 2016 at 6:54 PM, Henrik Gramner wrote: > On Thu, Jan 14, 2016 at 11:47 PM, Ganesh Ajjanagadde wrote: >> BTW, this is why I personally don't like the macro: >> so I was moving along, replacing one after the other, till I came to this >> line >> vfmadd213pd ymm1, ymm5, COVAR(iq

Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-14 Thread Henrik Gramner
On Thu, Jan 14, 2016 at 11:47 PM, Ganesh Ajjanagadde wrote: > BTW, this is why I personally don't like the macro: > so I was moving along, replacing one after the other, till I came to this line > vfmadd213pd ymm1, ymm5, COVAR(iq ,1) > I naturally replace by > fmaddpd ymm1, ymm1, ymm5, CO

Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-14 Thread Ganesh Ajjanagadde
On Thu, Jan 14, 2016 at 11:48 AM, James Almer wrote: > On 1/14/2016 1:26 PM, Ganesh Ajjanagadde wrote: >> On Thu, Jan 14, 2016 at 11:16 AM, James Almer wrote: >>> On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote: On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote: > Use the x86inc syn

Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-14 Thread Ganesh Ajjanagadde
On Thu, Jan 14, 2016 at 11:48 AM, James Almer wrote: > On 1/14/2016 1:26 PM, Ganesh Ajjanagadde wrote: >> On Thu, Jan 14, 2016 at 11:16 AM, James Almer wrote: >>> On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote: On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote: [...] There is no need

Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-14 Thread James Almer
On 1/14/2016 1:26 PM, Ganesh Ajjanagadde wrote: > On Thu, Jan 14, 2016 at 11:16 AM, James Almer wrote: >> On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote: >>> On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote: Use the x86inc syntax for FMA instructions (basically FMA4 syntax that ge

Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-14 Thread Henrik Gramner
On Thu, Jan 14, 2016 at 5:26 PM, Ganesh Ajjanagadde wrote: > readability still no. " dst, mult1, mult2, add" is significantly more readable than " src1, src2, src3" where you need to mentally parse which source operand corresponds to which mathematical operator depending on the order of the digit

Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-14 Thread Ganesh Ajjanagadde
On Thu, Jan 14, 2016 at 11:16 AM, James Almer wrote: > On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote: >> On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote: >>> Use the x86inc syntax for FMA instructions (basically FMA4 syntax that >>> gets assembled as FMA3) since normal FMA3 opcodes are ho

Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-14 Thread James Almer
On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote: > On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote: >> Use the x86inc syntax for FMA instructions (basically FMA4 syntax that >> gets assembled as FMA3) since normal FMA3 opcodes are horrible to >> read, nobody ever remembers the ordering of op

Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-14 Thread Ganesh Ajjanagadde
On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote: > Use the x86inc syntax for FMA instructions (basically FMA4 syntax that > gets assembled as FMA3) since normal FMA3 opcodes are horrible to > read, nobody ever remembers the ordering of operands. 1. It is very easy to remember: take fmadd231

Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-14 Thread Henrik Gramner
Use the x86inc syntax for FMA instructions (basically FMA4 syntax that gets assembled as FMA3) since normal FMA3 opcodes are horrible to read, nobody ever remembers the ordering of operands. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffm

Re: [FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-13 Thread Ganesh Ajjanagadde
On Wed, Jan 13, 2016 at 6:59 PM, Ganesh Ajjanagadde wrote: > This improves accuracy (very slightly) and speed for processors having > fma3. > > Sample benchmark (fate flac-16-lpc-cholesky, Haswell): > old: > 5993610 decicycles in ff_lpc_calc_coefs, 64 runs, 0 skips > 5951528 decicycles i

[FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls

2016-01-13 Thread Ganesh Ajjanagadde
This improves accuracy (very slightly) and speed for processors having fma3. Sample benchmark (fate flac-16-lpc-cholesky, Haswell): old: 5993610 decicycles in ff_lpc_calc_coefs, 64 runs, 0 skips 5951528 decicycles in ff_lpc_calc_coefs, 128 runs, 0 skips new: 5252410 decicycles