Re: [C++] Replacing xsimd with compiler autovectorization

2022-04-04 Thread Antoine Pitrou
Le 03/04/2022 à 21:38, Sasha Krassovsky a écrit : There is concrete proof that autovectorization produces very flimsy results (even on the same compiler, simply by varying the datatypes). As I’ve shown, the Vector-Vector Add kernel example is consistently vectorized well across compilers

Re: [C++] Replacing xsimd with compiler autovectorization

2022-04-03 Thread Sasha Krassovsky
> It would be a very significant contributor, as the inconsistency can manifest > under the form of up to 8-fold differences in performance (or perhaps more). This is on a micro benchmark. For a user workload, the kernel will account for maybe 20% of the runtime, so even if the kernel gets 10x f

Re: [C++] Replacing xsimd with compiler autovectorization

2022-04-03 Thread Antoine Pitrou
Le 01/04/2022 à 08:43, Sasha Krassovsky a écrit : I agree that a potential inconsistent experience is a problem, but I disagree that SIMD would be the root of the problem, or even be a significant contributor to it. It would be a very significant contributor, as the inconsistency can manifes

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-31 Thread Sasha Krassovsky
I agree that a potential inconsistent experience is a problem, but I disagree that SIMD would be the root of the problem, or even be a significant contributor to it. The problem is essentially: "How can we be sure that all compilers will generate good code on all platforms?" As you said, we have a

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-31 Thread Antoine Pitrou
Le 31/03/2022 à 09:19, Sasha Krassovsky a écrit : As I showed, those auto-vectorized kernels may be vectorized only in some situations, depending on the compiler version, the input datatypes... I would more than anything interpret the fact that that code was vectorized at all as an amazing

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-31 Thread Sasha Krassovsky
> As I showed, those auto-vectorized kernels may be vectorized only in some > situations, depending on the compiler version, the input datatypes... I would more than anything interpret the fact that that code was vectorized at all as an amazing win for compiler technology, as it’s a very abstrac

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-30 Thread Sasha Krassovsky
Yep a suffix _simd would make perfect sense too. Sasha > 30 марта 2022 г., в 22:58, Micah Kornfield написал(а): > >  >> >> >> As for a naming convention, we could use something like the prefix `simd_`? > > Bikeshedding moment: we use suffixes today for instructions sets would it > make se

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-30 Thread Antoine Pitrou
Le 31/03/2022 à 01:24, Weston Pace a écrit : Apologies if this is an over simplified opinion but does it have to be one or the other? If people contribute and maintain XSIMD kernels then great. If people contribute and maintain auto-vectorizable kernels then great. Then it just comes down to

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-30 Thread Micah Kornfield
> > As for a naming convention, we could use something like the prefix `simd_`? Bikeshedding moment: we use suffixes today for instructions sets would it make sense to continue that for consistency. `scalar_arithmetic_simd.cc`? On Wed, Mar 30, 2022 at 4:58 PM Sasha Krassovsky wrote: > Yep, tha

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-30 Thread Sasha Krassovsky
Yep, that's basically what I'm suggesting. If someone contributes an xsimd kernel that's faster than the autovectorized kernel, then it'll be seamless to switch. The xsimd and autovectorized kernels would share the same source file, so anyone contributing an xsimd kernel would just have to change t

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-30 Thread Weston Pace
Apologies if this is an over simplified opinion but does it have to be one or the other? If people contribute and maintain XSIMD kernels then great. If people contribute and maintain auto-vectorizable kernels then great. Then it just comes down to having consistent dispatch rules. Something lik

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-30 Thread Micah Kornfield
> > We were not going to use xsimd's dynamic dispatch, and instead roll our own There is already a dynamic dispatch facility see: arrow/util/dispatch.h Since we're rolling our own dynamic dispatch, we'd still have to compile > the same source file several times with different, so my proposal does

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-30 Thread Sasha Krassovsky
> Looking at the disassembly, the int16 and int32 versions are unrolled and vectorized by clang 12.0, the int8 and int64 are not... I think a big part of this is how fragile the current kernel system implementation is. It seems to rely on templating lots of different parts of kernels and hoping com

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-30 Thread Johan Mabille
Hi all, xsimd core developer here writing on behalf of the xsimd core team ;) I just wanted to add some elements to this thread: - xsimd is more than a library that wraps simple intrinsics. It provides vectorized (and accurate) implementations of the traditional mathematical functions (exp, sin,

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-30 Thread Antoine Pitrou
Hi Sasha, Le 30/03/2022 à 00:14, Sasha Krassovsky a écrit : I've noticed that we include xsimd as an abstraction over all of the simd architectures. I'd like to propose a different solution which would result in fewer lines of code, while being more readable. My thinking is that anything simp

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-29 Thread Sasha Krassovsky
20% is acceptable IMO). And we do > > > support runtime dispatch kernels based on target machine arch. > > > > > > Then what is left to talk is how to deal with codes that are not > > > auto-vectorizable but can be manually optimized with simd instructions. > > > Look

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-29 Thread Micah Kornfield
do want to > > manually tune the code, I believe a simd library is the best way. > > > > To me there's no "replacing" between xsimd and auto-vectorization, they > > just do their own jobs. > > > > Yibo > > > > -Original Messag

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-29 Thread Sasha Krassovsky
ry is the best way. > > To me there's no "replacing" between xsimd and auto-vectorization, they > just do their own jobs. > > Yibo > > -Original Message- > From: Sasha Krassovsky > Sent: Wednesday, March 30, 2022 6:58 AM > To: dev@arrow.apache.org; emkornfi.

RE: [C++] Replacing xsimd with compiler autovectorization

2022-03-29 Thread Yibo Cai
al Message- From: Sasha Krassovsky Sent: Wednesday, March 30, 2022 6:58 AM To: dev@arrow.apache.org; emkornfi...@gmail.com Subject: Re: [C++] Replacing xsimd with compiler autovectorization xsimd has three problems I can think of right now: 1) xsimd code looks like normal simd code: you have to

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-29 Thread Sasha Krassovsky
xsimd has three problems I can think of right now: 1) xsimd code looks like normal simd code: you have to explicitly do loads and stores, you have to explicitly unroll and stride through your loop, and you have to explicitly process the tail of the loop. This makes writing a large number of kernels

Re: [C++] Replacing xsimd with compiler autovectorization

2022-03-29 Thread Micah Kornfield
Hi Sasha, Could you elaborate on the problems of the XSIMD dependency? What you describe sounds a lot like what XSIMD provides in a prepackaged form and without the extra CMake magic. I have to occasionally build Arrow with an external build system and it sounds like this type of logic could add

[C++] Replacing xsimd with compiler autovectorization

2022-03-29 Thread Sasha Krassovsky
Hi everyone, I've noticed that we include xsimd as an abstraction over all of the simd architectures. I'd like to propose a different solution which would result in fewer lines of code, while being more readable. My thinking is that anything simple enough to abstract with xsimd can be autovectoriz