Even if inline  assembly does not support AVx-512  you can still use NASM
and link externally.. Anyway, there are only six instructions you need to
use:
VMOVUPS zmm1,[memory location]  ; Loads 8 floats from memory location to
zmm1
VMULPS zmm1,zmm2,zmm3 ; multiplies 8 float numbers in zmm2 with 8 numbers
in zmm3 and stores result in 8 numbers of zmm1
VADDPS zmm1,zmm2,zmm3 ; adds 8 float numbers in zmm2 with 8 numbers in zmm3
and stores result in 8 numbers of zmm1
VSUBPS zmm1,zmm2,zmm3 ; subtracts 8 float numbers in zmm3 from 8 numbers in
zmm2 and stores result in 8 numbers of zmm1
VDIVPS zmm1,zmm2,zmm3 ; divides 8 float numbers in zmm2 with 8 numbers in
zmm3 and stores result in 8 numbers of zmm1
VMOVUPS [memory location],zmm1  ; Stores 8 floats from zmn1 to memory
location
A bit faster than VMOVUPS is VMOVAPS, but the numbers must be at addresses
divisible by 64.
Check if your PC supports AVX-512. All Xeon processors support it, usually
no Pentium and Celeron, while Core processors may and may not.



On Sun, Feb 6, 2022 at 9:02 AM Yair Lenga <yair.le...@gmail.com> wrote:

> Thank you for feedback. I understand what are the limits of tcc. In my
> specific problem, I am trying to speed up user-provided expression in a
> simulation of 100 paths. Can I use the avx512 build-in - e.g. work on 8
> double precision values with one operation - practically reducing the 100
> evaluations to 13 (100/8) ?
>
> User expressions are all in the form that can be handle by AVX SIMD
> instructions: add, multiple, …
>
> Thanks, yair.
>
> Sent from my iPad
> _______________________________________________
> Tinycc-devel mailing list
> Tinycc-devel@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/tinycc-devel
>
_______________________________________________
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel

Reply via email to