Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Yair Lenga
Thank you for feedback. I understand what are the limits of tcc. In my specific problem, I am trying to speed up user-provided expression in a simulation of 100 paths. Can I use the avx512 build-in - e.g. work on 8 double precision values with one operation - practically reducing the 100 evaluat

Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Elijah Stone
On Sat, 5 Feb 2022, Samir Ribić via Tinycc-devel wrote: However, it is possible at library level. Simply write matrix manipulations functions in assembly language. Inside functions load data into AVX registers, do calculations in registers and at exit write the result into memory. I think

Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Samir Ribić via Tinycc-devel
Single pass compilers still can have peephole optimization. Once I demonstrated with situations when tcc generates MOV EAX,const MOV [location],EAX and catched this case to generate MOV DWORD [location],const Back to the topic, AVX-512 is difficult to be supported at language level. In the case of

Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Christian Jullien
An optimizer compiler need several pass to operate. - constant folding - register allocation - peephole optimization - branch prediction ... When it knows the target it can reorganize code to keep as much as possible data un L1 cache and have the longest series of instructions that can be execut

Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread rempas via Tinycc-devel
5 Φεβ 2022, 11:01 Από eli...@orange.fr: > The price to pay its really fast compilation is that the generated code is > poor compared to gcc, clang or vc++ (among others). Depending on your > program, consider it is roughly 2 to 4x slower. > I would say that this is not always the case. And corre

Re: [Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Christian Jullien
Hi, I speak only for myself. I'm not sure tcc is the right target for you. We all loved to have tcc generating fast code but the two main goals of tcc are C compliance and FAST compilation code. The price to pay its really fast compilation is that the generated code is poor compared to gcc, clang

[Tinycc-devel] Optimizing for avx512

2022-02-05 Thread Yair Lenga
I have a project which is running a user simulation - time consuming - user defined code. I hope that performance can be improved using SIMD instructions. What is the optimization level supported by libtcc ? Can it generate optimized code for AVX512 ? See 4.x. Documentation indicate optimization