You are welcome to provide any data that evidences the current
implementation
(intrinsics, AVX512) is not the most efficient, and you are free to
issue a Pull Request
in order to suggest a better one.
The op/avx component has pretty much nothing to do with scalability:
only one node is req
Gilles Gouaillardet via users writes:
> One motivation is packaging: a single Open MPI implementation has to be
> built, that can run on older x86 processors (supporting only SSE) and the
> latest ones (supporting AVX512).
I take dispatch on micro-architecture for granted, but it doesn't
require