Hello Rodrigo and all,
Returning to my old problem which deals with alignment of vector variables.
I noticed that on x86 vector locals are aligned at 8-byte boundary instead
of 16-byte thus causing to use 'movups' instead of much more efficient
'movaps'.
On PowerPC there is no such "bug", so I tri
The way to handle those situations is to have a arch decomposition pass that
converts MULPS into a VZERO + MULADD.
For bonus points, you can add to the arch peephole code to fuse MULPS +
ADDPS.
For an example of that, take a look at mini-x86.c /
mono_arch_decompose_opts.
Rodrigo
On Tue, Feb 9, 2
Hi,
Now I'm stuck with another problem on PPC. For multiplication of floats
Altivec has only a fuse-add instruction which does a*b+c. So in order to
implement OP_MULPS I need to assure c==0. The only solution which comes to
mind is:
XZERO D
MULADD D <= S1, S2, D
Where MULADD is the instruction and
Hi Sergei,
On Tue, Feb 2, 2010 at 6:59 AM, Sergei Dyshel wrote:
> Hello all,
>
> I'm currently working on PowerPC port of Mono which utilizes AltiVec SIMD
> instructions. During the development I've encountered an alignment problem:
>
> As far as I understood from running Mono's JIT, stack-alloca
Hello all,
I'm currently working on PowerPC port of Mono which utilizes AltiVec SIMD
instructions. During the development I've encountered an alignment problem:
As far as I understood from running Mono's JIT, stack-allocated
Mono.Simd.Vector* types are always aligned by 16 byte bound, but global