date:20210720

Re: [OMPI users] vectorized reductions

2021-07-20 Thread Gilles Gouaillardet via users

You are welcome to provide any data that evidences the current implementation (intrinsics, AVX512) is not the most efficient, and you are free to issue a Pull Request in order to suggest a better one. The op/avx component has pretty much nothing to do with scalability: only one node is req

Re: [OMPI users] vectorized reductions

2021-07-20 Thread Dave Love via users

Gilles Gouaillardet via users writes: > One motivation is packaging: a single Open MPI implementation has to be > built, that can run on older x86 processors (supporting only SSE) and the > latest ones (supporting AVX512). I take dispatch on micro-architecture for granted, but it doesn't require