On Thu, Jan 14, 2016 at 6:54 PM, Henrik Gramner wrote:
> On Thu, Jan 14, 2016 at 11:47 PM, Ganesh Ajjanagadde wrote:
>> BTW, this is why I personally don't like the macro:
>> so I was moving along, replacing one after the other, till I came to this
>> line
>> vfmadd213pd ymm1, ymm5, COVAR(iq
On Thu, Jan 14, 2016 at 11:47 PM, Ganesh Ajjanagadde wrote:
> BTW, this is why I personally don't like the macro:
> so I was moving along, replacing one after the other, till I came to this line
> vfmadd213pd ymm1, ymm5, COVAR(iq ,1)
> I naturally replace by
> fmaddpd ymm1, ymm1, ymm5, CO
On Thu, Jan 14, 2016 at 11:48 AM, James Almer wrote:
> On 1/14/2016 1:26 PM, Ganesh Ajjanagadde wrote:
>> On Thu, Jan 14, 2016 at 11:16 AM, James Almer wrote:
>>> On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote:
On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote:
> Use the x86inc syn
On Thu, Jan 14, 2016 at 11:48 AM, James Almer wrote:
> On 1/14/2016 1:26 PM, Ganesh Ajjanagadde wrote:
>> On Thu, Jan 14, 2016 at 11:16 AM, James Almer wrote:
>>> On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote:
On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote:
[...]
There is no need
On 1/14/2016 1:26 PM, Ganesh Ajjanagadde wrote:
> On Thu, Jan 14, 2016 at 11:16 AM, James Almer wrote:
>> On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote:
>>> On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote:
Use the x86inc syntax for FMA instructions (basically FMA4 syntax that
ge
On Thu, Jan 14, 2016 at 5:26 PM, Ganesh Ajjanagadde wrote:
> readability still no.
" dst, mult1, mult2, add" is significantly more readable
than " src1, src2, src3" where you need
to mentally parse which source operand corresponds to which
mathematical operator depending on the order of the digit
On Thu, Jan 14, 2016 at 11:16 AM, James Almer wrote:
> On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote:
>> On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote:
>>> Use the x86inc syntax for FMA instructions (basically FMA4 syntax that
>>> gets assembled as FMA3) since normal FMA3 opcodes are ho
On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote:
> On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote:
>> Use the x86inc syntax for FMA instructions (basically FMA4 syntax that
>> gets assembled as FMA3) since normal FMA3 opcodes are horrible to
>> read, nobody ever remembers the ordering of op
On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner wrote:
> Use the x86inc syntax for FMA instructions (basically FMA4 syntax that
> gets assembled as FMA3) since normal FMA3 opcodes are horrible to
> read, nobody ever remembers the ordering of operands.
1. It is very easy to remember: take fmadd231
Use the x86inc syntax for FMA instructions (basically FMA4 syntax that
gets assembled as FMA3) since normal FMA3 opcodes are horrible to
read, nobody ever remembers the ordering of operands.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffm
On Wed, Jan 13, 2016 at 6:59 PM, Ganesh Ajjanagadde
wrote:
> This improves accuracy (very slightly) and speed for processors having
> fma3.
>
> Sample benchmark (fate flac-16-lpc-cholesky, Haswell):
> old:
> 5993610 decicycles in ff_lpc_calc_coefs, 64 runs, 0 skips
> 5951528 decicycles i
This improves accuracy (very slightly) and speed for processors having
fma3.
Sample benchmark (fate flac-16-lpc-cholesky, Haswell):
old:
5993610 decicycles in ff_lpc_calc_coefs, 64 runs, 0 skips
5951528 decicycles in ff_lpc_calc_coefs, 128 runs, 0 skips
new:
5252410 decicycles
12 matches
Mail list logo