Le 03/04/2022 à 21:38, Sasha Krassovsky a écrit :
There is concrete proof that autovectorization produces very flimsy results
(even on the same compiler, simply by varying the datatypes).
As I’ve shown, the Vector-Vector Add kernel example is consistently vectorized
well across compilers
> It would be a very significant contributor, as the inconsistency can manifest
> under the form of up to 8-fold differences in performance (or perhaps more).
This is on a micro benchmark. For a user workload, the kernel will account for
maybe 20% of the runtime, so even if the kernel gets 10x f
Le 01/04/2022 à 08:43, Sasha Krassovsky a écrit :
I agree that a potential inconsistent experience is a problem, but I
disagree that SIMD would be the root of the problem, or even be a
significant contributor to it.
It would be a very significant contributor, as the inconsistency can
manifes
I agree that a potential inconsistent experience is a problem, but I
disagree that SIMD would be the root of the problem, or even be a
significant contributor to it.
The problem is essentially: "How can we be sure that all compilers will
generate good code on all platforms?" As you said, we have a
Le 31/03/2022 à 09:19, Sasha Krassovsky a écrit :
As I showed, those auto-vectorized kernels may be vectorized only in some
situations, depending on the compiler version, the input datatypes...
I would more than anything interpret the fact that that code was vectorized at
all as an amazing
> As I showed, those auto-vectorized kernels may be vectorized only in some
> situations, depending on the compiler version, the input datatypes...
I would more than anything interpret the fact that that code was vectorized at
all as an amazing win for compiler technology, as it’s a very abstrac
Yep a suffix _simd would make perfect sense too.
Sasha
> 30 марта 2022 г., в 22:58, Micah Kornfield написал(а):
>
>
>>
>>
>> As for a naming convention, we could use something like the prefix `simd_`?
>
> Bikeshedding moment: we use suffixes today for instructions sets would it
> make se
Le 31/03/2022 à 01:24, Weston Pace a écrit :
Apologies if this is an over simplified opinion but does it have to be one
or the other?
If people contribute and maintain XSIMD kernels then great.
If people contribute and maintain auto-vectorizable kernels then great.
Then it just comes down to
>
> As for a naming convention, we could use something like the prefix `simd_`?
Bikeshedding moment: we use suffixes today for instructions sets would it
make sense to continue that for consistency.
`scalar_arithmetic_simd.cc`?
On Wed, Mar 30, 2022 at 4:58 PM Sasha Krassovsky
wrote:
> Yep, tha
Yep, that's basically what I'm suggesting. If someone contributes an xsimd
kernel that's faster than the autovectorized kernel, then it'll be seamless
to switch.
The xsimd and autovectorized kernels would share the same source file, so
anyone contributing an xsimd kernel would just have to change t
Apologies if this is an over simplified opinion but does it have to be one
or the other?
If people contribute and maintain XSIMD kernels then great.
If people contribute and maintain auto-vectorizable kernels then great.
Then it just comes down to having consistent dispatch rules. Something
lik
>
> We were not going to use xsimd's dynamic dispatch, and instead roll our own
There is already a dynamic dispatch facility see: arrow/util/dispatch.h
Since we're rolling our own dynamic dispatch, we'd still have to compile
> the same source file several times with different, so my proposal does
> Looking at the disassembly, the int16 and int32 versions are unrolled and
vectorized by clang 12.0, the int8 and int64 are not...
I think a big part of this is how fragile the current kernel system
implementation is. It seems to rely on templating lots of different parts
of kernels and hoping com
Hi all,
xsimd core developer here writing on behalf of the xsimd core team ;)
I just wanted to add some elements to this thread:
- xsimd is more than a library that wraps simple intrinsics. It provides
vectorized (and accurate) implementations of the traditional mathematical
functions (exp, sin,
Hi Sasha,
Le 30/03/2022 à 00:14, Sasha Krassovsky a écrit :
I've noticed that we include xsimd as an abstraction over all of the simd
architectures. I'd like to propose a different solution which would result
in fewer lines of code, while being more readable.
My thinking is that anything simp
20% is acceptable IMO). And we do
> > > support runtime dispatch kernels based on target machine arch.
> > >
> > > Then what is left to talk is how to deal with codes that are not
> > > auto-vectorizable but can be manually optimized with simd instructions.
> > > Look
do want to
> > manually tune the code, I believe a simd library is the best way.
> >
> > To me there's no "replacing" between xsimd and auto-vectorization, they
> > just do their own jobs.
> >
> > Yibo
> >
> > -Original Messag
ry is the best way.
>
> To me there's no "replacing" between xsimd and auto-vectorization, they
> just do their own jobs.
>
> Yibo
>
> -Original Message-
> From: Sasha Krassovsky
> Sent: Wednesday, March 30, 2022 6:58 AM
> To: dev@arrow.apache.org; emkornfi.
al Message-
From: Sasha Krassovsky
Sent: Wednesday, March 30, 2022 6:58 AM
To: dev@arrow.apache.org; emkornfi...@gmail.com
Subject: Re: [C++] Replacing xsimd with compiler autovectorization
xsimd has three problems I can think of right now:
1) xsimd code looks like normal simd code: you have to
xsimd has three problems I can think of right now:
1) xsimd code looks like normal simd code: you have to explicitly do loads
and stores, you have to explicitly unroll and stride through your loop, and
you have to explicitly process the tail of the loop. This makes writing a
large number of kernels
Hi Sasha,
Could you elaborate on the problems of the XSIMD dependency? What you
describe sounds a lot like what XSIMD provides in a prepackaged form and
without the extra CMake magic.
I have to occasionally build Arrow with an external build system and it
sounds like this type of logic could add
Hi everyone,
I've noticed that we include xsimd as an abstraction over all of the simd
architectures. I'd like to propose a different solution which would result
in fewer lines of code, while being more readable.
My thinking is that anything simple enough to abstract with xsimd can be
autovectoriz
22 matches
Mail list logo