Fused multiply-add instruction

2024-03-04 Thread mratsim
In general, for high performance computing, you might as well use SIMD directly for performance: see `fmadd` ukernel_generator( x86_AVX

Fused multiply-add instruction

2024-02-23 Thread arnetheduck
\- FMA will be used if there's a reasonable instruction for it - `-march=haswell` for example has one. Regarding rounding, IEEE allows the use of FMA even if rounding differs:

Fused multiply-add instruction

2024-02-22 Thread dlesnoff
Thanks for the additional information concerning CPUs and your opinion on the question. I do not think that `-march=native` is mandatory for FMAs. The JS backend is problematic, we probably have to either emulate the FMA or replace it with an axpy. I would go the first route. I have opened an

Fused multiply-add instruction

2024-02-22 Thread awr1
there are many (recent!) x86 and ARM processors that do not support true non-rounded FMAs, e.g. the Celerons. implicitly setting `-march=mative` would also inherently makes many nim programs non-portable by design which is an assumption we really do not want to make. the libc impl does implicitl

Fused multiply-add instruction

2024-02-22 Thread dlesnoff
I wonder how one would use the fused multiply-add intrinsic defined by the IEEE754-2008 specification in Nim. Right now, I only found these instructions (slightly modified) on a nine-year old Github gist: {.passC: "-march=native".} proc fma(x,y,z: float32): float32 {.impor