On Thu, Feb 11, 2021 at 04:46:44PM +0000, Jeff Squyres (jsquyres) wrote: > We did have some issues with 4.1.0 and AVX, but we have fixed everything that > we were aware of. > > I'd be curious to know if you still have build failures in the latest 4.1.x > nightly snapshot.
Not sure about the latest, but I built v4.1.x-202102090356-380ac96 without errors, then used that to successfully build andd test GROMACS parallel mdrun. While I do not like using non-release versions, this is promising; will there be a bugfix release any time soon? > If you do, can you send the following: > > - stdout/stderr from running configure > - config.log > - stdout/stderr from running "make V=1" (for brevity, you can "make", get to > the failure, and then "make V=1" to get just the details of the compile > failure, vs. the details of the entire make with oodles of lengthy successful > compiles) (I find it is better to set MAKEFLAGS="VERBOSE=1" to be sure any submakes get the news - I always build that way.) > > > > On Feb 11, 2021, at 9:15 AM, Max R. Dechantsreiter > > <m...@performancejones.com> wrote: > > > > ...The error that prompted me to start this thread occurred > > during "make all" with 4.1.0: > > > > . > > . > > . > > Making all in mca/op/avx > > gmake[2]: Entering directory > > `/home/maxd/XXXXXXXXXXXXXXXXXX/Build/openmpi-4.1.0_gcc-10.2.0/ompi/mca/op/avx' > > CC op_avx_component.lo > > CC liblocal_ops_avx_la-op_avx_functions.lo > > CCLD liblocal_ops_avx.la > > CC liblocal_ops_avx512_la-op_avx_functions.lo > > op_avx_functions.c: In function 'ompi_op_avx_2buff_bxor_uint64_t_avx512': > > op_avx_functions.c:208:21: warning: AVX512F vector return without AVX512F > > enabled changes the ABI [-Wpsabi] > > 208 | __m512i vecA = _mm512_loadu_si512((__m512i*)in); > > \ > > | ^~~~ > > op_avx_functions.c:263:5: note: in expansion of macro > > 'OP_AVX_AVX512_BIT_FUNC' > > 263 | OP_AVX_AVX512_BIT_FUNC(name, type_size, type, op); > > \ > > | ^~~~~~~~~~~~~~~~~~~~~~ > > op_avx_functions.c:573:5: note: in expansion of macro 'OP_AVX_BIT_FUNC' > > 573 | OP_AVX_BIT_FUNC(bxor, 64, uint64_t, xor) > > | ^~~~~~~~~~~~~~~ > > In file included from > > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:55, > > from op_avx_functions.c:26: > > op_avx_functions.c: In function 'ompi_op_avx_2buff_max_int8_t_avx512': > > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/avx512fintrin.h:6429:1: > > error: inlining failed in call to 'always_inline' '_mm512_storeu_si512': > > target specific option mismatch > > 6429 | _mm512_storeu_si512 (void *__P, __m512i __A) > > | ^~~~~~~~~~~~~~~~~~~ > > op_avx_functions.c:73:13: note: called from here > > 73 | _mm512_storeu_si512((__m512*)out, res); > > \ > > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > op_avx_functions.c:124:5: note: in expansion of macro 'OP_AVX_AVX512_FUNC' > > 124 | OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); > > \ > > | ^~~~~~~~~~~~~~~~~~ > > op_avx_functions.c:454:5: note: in expansion of macro 'OP_AVX_FUNC' > > 454 | OP_AVX_FUNC(max, i, 8, int8_t, max) > > | ^~~~~~~~~~~ > > In file included from > > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:65, > > from op_avx_functions.c:26: > > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/avx512bwintrin.h:1984:1: > > error: inlining failed in call to 'always_inline' '_mm512_max_epi8': > > target specific option mismatch > > 1984 | _mm512_max_epi8 (__m512i __A, __m512i __B) > > | ^~~~~~~~~~~~~~~ > > op_avx_functions.c:72:27: note: called from here > > 72 | __m512i res = > > _mm512_##op##_ep##type_sign##type_size(vecA, vecB); \ > > | > > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > op_avx_functions.c:124:5: note: in expansion of macro 'OP_AVX_AVX512_FUNC' > > 124 | OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); > > \ > > | ^~~~~~~~~~~~~~~~~~ > > op_avx_functions.c:454:5: note: in expansion of macro 'OP_AVX_FUNC' > > 454 | OP_AVX_FUNC(max, i, 8, int8_t, max) > > | ^~~~~~~~~~~ > > In file included from > > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:55, > > from op_avx_functions.c:26: > > . > > . > > . > > > > End result: the build failed. > > > > My build of v4.1.x-202102090356-380ac96 threw no errors. > > I will continue with an attempt to build GROMACS using > > that 4.1.x snapshot. > > > > > > On Thu, Feb 11, 2021 at 01:10:42PM +0000, Max R. Dechantsreiter wrote: > >> I ran into a problem with 4.1.0 several weeks ago, > >> and no longer recall precisely how; I am now rebuilding > >> both 4.1.0 and a recent 4.1.x, then will use them to > >> build GROMACS, probably the application I was attemping > >> back then. > >> > >> But I do have this from my notes (for 4.1.0): > >> > >> mpicc -fopenmp hybrid_hello.c > >> export OMP_NUM_THREADS=2 > >> mpirun -np 2 ./a.out > >> # [server.clearlight.us:18349] mca_base_component_repository_open: unable > >> to open mca_op_avx: > >> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so: > >> undefined symbol: ompi_op_avx_functions_avx512 (ignored) > >> # [server.clearlight.us:18348] mca_base_component_repository_open: unable > >> to open mca_op_avx: > >> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so: > >> undefined symbol: ompi_op_avx_functions_avx512 (ignored) > >> # Hello from thread 0 out of 2 from process 0 out of 2 on > >> server.clearlight.us > >> # Hello from thread 1 out of 2 from process 0 out of 2 on > >> server.clearlight.us > >> # Hello from thread 0 out of 2 from process 1 out of 2 on > >> server.clearlight.us > >> # Hello from thread 1 out of 2 from process 1 out of 2 on > >> server.clearlight.us > >> > >> (where I X-ed out confidential details). Not an error, > >> but surely indicative of something amiss. > >> > >> More to come! > >> > >> > >> On Thu, Feb 11, 2021 at 02:02:48AM +0000, Jeff Squyres (jsquyres) via > >> users wrote: > >>> I think Max did try the latest 4.1 nightly build (from an off-list > >>> email), and his problem still persisted. > >>> > >>> Max: can you describe exactly how Open MPI failed? All you said was: > >>> > >>>>> Consequently AVX512 intrinsic functions were erroneously > >>>>> deployed, resulting in OpenMPI failure. > >>> > >>> Can you provide more details? > >>> > >>> > >>>> On Feb 10, 2021, at 6:09 PM, Gilles Gouaillardet via users > >>>> <users@lists.open-mpi.org> wrote: > >>>> > >>>> Max, > >>>> > >>>> at configure time, Open MPI detects the *compiler* capabilities. > >>>> In your case, your compiler can emit AVX512 code. > >>>> (and fwiw, the tests are only compiled and never executed) > >>>> > >>>> Then at *runtime*, Open MPI detects the *CPU* capabilities. > >>>> In your case, it should not invoke the functions containing AVX512 code. > >>>> > >>>> That being said, several changes were made to the op/avx component, > >>>> so if you are experiencing some crashes, I do invite you to give a try > >>>> to the > >>>> latest nightly snapshot for the v4.1.x branch. > >>>> > >>>> > >>>> Cheers, > >>>> > >>>> Gilles > >>>> > >>>> On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users > >>>> <users@lists.open-mpi.org> wrote: > >>>>> > >>>>> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on > >>>>> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor > >>>>> that supports AVX2 but not AVX512, resulted in > >>>>> > >>>>> checking for AVX512 support (no additional flags)... no > >>>>> checking for AVX512 support (with -march=skylake-avx512)... yes > >>>>> > >>>>> in "configure" output, and in config.log > >>>>> > >>>>> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#' > >>>>> MCA_BUILD_ompi_op_has_avx512_support_TRUE='' > >>>>> > >>>>> Consequently AVX512 intrinsic functions were erroneously > >>>>> deployed, resulting in OpenMPI failure. > >>>>> > >>>>> The relevant test code was in essence > >>>>> > >>>>> cat > conftest.c << EOF > >>>>> #include <immintrin.h> > >>>>> > >>>>> int main() > >>>>> { > >>>>> __m512 vA, vB; > >>>>> > >>>>> _mm512_add_ps(vA, vB); > >>>>> > >>>>> return 0; > >>>>> } > >>>>> EOF > >>>>> > >>>>> The problem with this is that the result of the function > >>>>> is never used, so at optimization level higher than O0 > >>>>> the compiler elimates the function as "dead code" (DCE). > >>>>> To wit, > >>>>> > >>>>> gcc -O3 -march=skylake-avx512 -S conftest.c > >>>>> > >>>>> yields > >>>>> > >>>>> .file "conftest.c" > >>>>> .text > >>>>> .section .text.startup,"ax",@progbits > >>>>> .p2align 4 > >>>>> .globl main > >>>>> .type main, @function > >>>>> main: > >>>>> .LFB5345: > >>>>> .cfi_startproc > >>>>> xorl %eax, %eax > >>>>> ret > >>>>> .cfi_endproc > >>>>> .LFE5345: > >>>>> .size main, .-main > >>>>> .ident "GCC: (GNU) 10.2.0" > >>>>> .section .note.GNU-stack,"",@progbits > >>>>> > >>>>> Compare this with the result of > >>>>> > >>>>> gcc -O0 -march=skylake-avx512 -S conftest.c > >>>>> > >>>>> in which the function IS called: > >>>>> > >>>>> .file "conftest.c" > >>>>> .text > >>>>> .globl main > >>>>> .type main, @function > >>>>> main: > >>>>> .LFB4092: > >>>>> .cfi_startproc > >>>>> pushq %rbp > >>>>> .cfi_def_cfa_offset 16 > >>>>> .cfi_offset 6, -16 > >>>>> movq %rsp, %rbp > >>>>> .cfi_def_cfa_register 6 > >>>>> andq $-64, %rsp > >>>>> subq $136, %rsp > >>>>> vmovaps 72(%rsp), %zmm0 > >>>>> vmovaps %zmm0, -56(%rsp) > >>>>> vmovaps 8(%rsp), %zmm0 > >>>>> vmovaps %zmm0, -120(%rsp) > >>>>> movl $0, %eax > >>>>> leave > >>>>> .cfi_def_cfa 7, 8 > >>>>> ret > >>>>> .cfi_endproc > >>>>> .LFE4092: > >>>>> .size main, .-main > >>>>> .ident "GCC: (GNU) 10.2.0" > >>>>> .section .note.GNU-stack,"",@progbits > >>>>> > >>>>> Note the use of a 512-bit ZMM register - ZMM registers > >>>>> are used only by AVX512 instructions. Hence at O3 the > >>>>> test program does not detect the lack of AVX512 support > >>>>> by the host processor. > >>>>> > >>>>> An easy remedy would be to declare the operands as > >>>>> "volatile" and thereby force to compiler to invoke the > >>>>> function: > >>>>> > >>>>> cat > conftest.c << EOF > >>>>> #include <immintrin.h> > >>>>> > >>>>> int main() > >>>>> { > >>>>> volatile __m512 vA, vB; > >>>>> > >>>>> _mm512_add_ps(vA, vB); > >>>>> > >>>>> return 0; > >>>>> } > >>>>> > >>>>> Compiled at O3, the resulting executable dumps core as it > >>>>> should when run on my Haswell processor, returning nonzero > >>>>> exit status ($?), which would inform "configure" that the > >>>>> processor does not have AVX512 capability. > >>>>> > >>>>> Finally please note that this error could affect the > >>>>> detection of support for other instruction sets on other > >>>>> families of processors: compiler optimization must be > >>>>> inhibited for such tests to be reliable! > >>>>> > >>>>> Max > >>>>> --- > >>>>> Max R. Dechantsreiter > >>>>> President > >>>>> Performance Jones L.L.C. > >>>>> m...@performancejones.com > >>>>> Skype: PerformanceJones (UTC+01:00) > >>>>> +1 414 446-3100 (telephone/voicemail) > >>>>> http://www.linkedin.com/in/benchmarking > >>> > >>> > >>> -- > >>> Jeff Squyres > >>> jsquy...@cisco.com > >>> > > > -- > Jeff Squyres > jsquy...@cisco.com >