Even if inline assembly does not support AVx-512 you can still use NASM and link externally.. Anyway, there are only six instructions you need to use: VMOVUPS zmm1,[memory location] ; Loads 8 floats from memory location to zmm1 VMULPS zmm1,zmm2,zmm3 ; multiplies 8 float numbers in zmm2 with 8 numbers in zmm3 and stores result in 8 numbers of zmm1 VADDPS zmm1,zmm2,zmm3 ; adds 8 float numbers in zmm2 with 8 numbers in zmm3 and stores result in 8 numbers of zmm1 VSUBPS zmm1,zmm2,zmm3 ; subtracts 8 float numbers in zmm3 from 8 numbers in zmm2 and stores result in 8 numbers of zmm1 VDIVPS zmm1,zmm2,zmm3 ; divides 8 float numbers in zmm2 with 8 numbers in zmm3 and stores result in 8 numbers of zmm1 VMOVUPS [memory location],zmm1 ; Stores 8 floats from zmn1 to memory location A bit faster than VMOVUPS is VMOVAPS, but the numbers must be at addresses divisible by 64. Check if your PC supports AVX-512. All Xeon processors support it, usually no Pentium and Celeron, while Core processors may and may not.
On Sun, Feb 6, 2022 at 9:02 AM Yair Lenga <yair.le...@gmail.com> wrote: > Thank you for feedback. I understand what are the limits of tcc. In my > specific problem, I am trying to speed up user-provided expression in a > simulation of 100 paths. Can I use the avx512 build-in - e.g. work on 8 > double precision values with one operation - practically reducing the 100 > evaluations to 13 (100/8) ? > > User expressions are all in the form that can be handle by AVX SIMD > instructions: add, multiple, … > > Thanks, yair. > > Sent from my iPad > _______________________________________________ > Tinycc-devel mailing list > Tinycc-devel@nongnu.org > https://lists.nongnu.org/mailman/listinfo/tinycc-devel >
_______________________________________________ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel