On Thu, Feb 11, 2021 at 04:46:44PM +0000, Jeff Squyres (jsquyres) wrote:
> We did have some issues with 4.1.0 and AVX, but we have fixed everything that 
> we were aware of.
> 
> I'd be curious to know if you still have build failures in the latest 4.1.x 
> nightly snapshot.

Not sure about the latest, but I built v4.1.x-202102090356-380ac96
without errors, then used that to successfully build andd test
GROMACS parallel mdrun.

While I do not like using non-release versions, this is promising;
will there be a bugfix release any time soon?

> If you do, can you send the following:
> 
> - stdout/stderr from running configure
> - config.log
> - stdout/stderr from running "make V=1" (for brevity, you can "make", get to 
> the failure, and then "make V=1" to get just the details of the compile 
> failure, vs. the details of the entire make with oodles of lengthy successful 
> compiles)

(I find it is better to set MAKEFLAGS="VERBOSE=1" to be sure any
submakes get the news - I always build that way.)

> 
> 
> > On Feb 11, 2021, at 9:15 AM, Max R. Dechantsreiter 
> > <m...@performancejones.com> wrote:
> > 
> > ...The error that prompted me to start this thread occurred
> > during "make all" with 4.1.0:
> > 
> > .
> > .
> > .
> > Making all in mca/op/avx
> > gmake[2]: Entering directory 
> > `/home/maxd/XXXXXXXXXXXXXXXXXX/Build/openmpi-4.1.0_gcc-10.2.0/ompi/mca/op/avx'
> >  CC       op_avx_component.lo
> >  CC       liblocal_ops_avx_la-op_avx_functions.lo
> >  CCLD     liblocal_ops_avx.la
> >  CC       liblocal_ops_avx512_la-op_avx_functions.lo
> > op_avx_functions.c: In function 'ompi_op_avx_2buff_bxor_uint64_t_avx512':
> > op_avx_functions.c:208:21: warning: AVX512F vector return without AVX512F 
> > enabled changes the ABI [-Wpsabi]
> >  208 |             __m512i vecA =  _mm512_loadu_si512((__m512i*)in);        
> >    \
> >      |                     ^~~~
> > op_avx_functions.c:263:5: note: in expansion of macro 
> > 'OP_AVX_AVX512_BIT_FUNC'
> >  263 |     OP_AVX_AVX512_BIT_FUNC(name, type_size, type, op);               
> >    \
> >      |     ^~~~~~~~~~~~~~~~~~~~~~
> > op_avx_functions.c:573:5: note: in expansion of macro 'OP_AVX_BIT_FUNC'
> >  573 |     OP_AVX_BIT_FUNC(bxor, 64, uint64_t, xor)
> >      |     ^~~~~~~~~~~~~~~
> > In file included from 
> > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:55,
> >                 from op_avx_functions.c:26:
> > op_avx_functions.c: In function 'ompi_op_avx_2buff_max_int8_t_avx512':
> > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/avx512fintrin.h:6429:1:
> >  error: inlining failed in call to 'always_inline' '_mm512_storeu_si512': 
> > target specific option mismatch
> > 6429 | _mm512_storeu_si512 (void *__P, __m512i __A)
> >      | ^~~~~~~~~~~~~~~~~~~
> > op_avx_functions.c:73:13: note: called from here
> >   73 |             _mm512_storeu_si512((__m512*)out, res);                  
> >           \
> >      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > op_avx_functions.c:124:5: note: in expansion of macro 'OP_AVX_AVX512_FUNC'
> >  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);        
> >           \
> >      |     ^~~~~~~~~~~~~~~~~~
> > op_avx_functions.c:454:5: note: in expansion of macro 'OP_AVX_FUNC'
> >  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
> >      |     ^~~~~~~~~~~
> > In file included from 
> > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:65,
> >                 from op_avx_functions.c:26:
> > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/avx512bwintrin.h:1984:1:
> >  error: inlining failed in call to 'always_inline' '_mm512_max_epi8': 
> > target specific option mismatch
> > 1984 | _mm512_max_epi8 (__m512i __A, __m512i __B)
> >      | ^~~~~~~~~~~~~~~
> > op_avx_functions.c:72:27: note: called from here
> >   72 |             __m512i res = 
> > _mm512_##op##_ep##type_sign##type_size(vecA, vecB);  \
> >      |                           
> > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > op_avx_functions.c:124:5: note: in expansion of macro 'OP_AVX_AVX512_FUNC'
> >  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);        
> >           \
> >      |     ^~~~~~~~~~~~~~~~~~
> > op_avx_functions.c:454:5: note: in expansion of macro 'OP_AVX_FUNC'
> >  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
> >      |     ^~~~~~~~~~~
> > In file included from 
> > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:55,
> >                 from op_avx_functions.c:26:
> > .
> > .
> > .
> > 
> > End result: the build failed.
> > 
> > My build of v4.1.x-202102090356-380ac96 threw no errors.
> > I will continue with an attempt to build GROMACS using
> > that 4.1.x snapshot.
> > 
> > 
> > On Thu, Feb 11, 2021 at 01:10:42PM +0000, Max R. Dechantsreiter wrote:
> >> I ran into a problem with 4.1.0 several weeks ago,
> >> and no longer recall precisely how; I am now rebuilding
> >> both 4.1.0 and a recent 4.1.x, then will use them to
> >> build GROMACS, probably the application I was attemping
> >> back then.
> >> 
> >> But I do have this from my notes (for 4.1.0):
> >> 
> >> mpicc -fopenmp hybrid_hello.c
> >> export OMP_NUM_THREADS=2
> >> mpirun -np 2 ./a.out
> >> # [server.clearlight.us:18349] mca_base_component_repository_open: unable 
> >> to open mca_op_avx: 
> >> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so:
> >>  undefined symbol: ompi_op_avx_functions_avx512 (ignored)
> >> # [server.clearlight.us:18348] mca_base_component_repository_open: unable 
> >> to open mca_op_avx: 
> >> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so:
> >>  undefined symbol: ompi_op_avx_functions_avx512 (ignored)
> >> # Hello from thread 0 out of 2 from process 0 out of 2 on 
> >> server.clearlight.us
> >> # Hello from thread 1 out of 2 from process 0 out of 2 on 
> >> server.clearlight.us
> >> # Hello from thread 0 out of 2 from process 1 out of 2 on 
> >> server.clearlight.us
> >> # Hello from thread 1 out of 2 from process 1 out of 2 on 
> >> server.clearlight.us
> >> 
> >> (where I X-ed out confidential details).  Not an error,
> >> but surely indicative of something amiss.
> >> 
> >> More to come!
> >> 
> >> 
> >> On Thu, Feb 11, 2021 at 02:02:48AM +0000, Jeff Squyres (jsquyres) via 
> >> users wrote:
> >>> I think Max did try the latest 4.1 nightly build (from an off-list 
> >>> email), and his problem still persisted.
> >>> 
> >>> Max: can you describe exactly how Open MPI failed?  All you said was:
> >>> 
> >>>>> Consequently AVX512 intrinsic functions were erroneously
> >>>>> deployed, resulting in OpenMPI failure.
> >>> 
> >>> Can you provide more details?
> >>> 
> >>> 
> >>>> On Feb 10, 2021, at 6:09 PM, Gilles Gouaillardet via users 
> >>>> <users@lists.open-mpi.org> wrote:
> >>>> 
> >>>> Max,
> >>>> 
> >>>> at configure time, Open MPI detects the *compiler* capabilities.
> >>>> In your case, your compiler can emit AVX512 code.
> >>>> (and fwiw, the tests are only compiled and never executed)
> >>>> 
> >>>> Then at *runtime*, Open MPI detects the *CPU* capabilities.
> >>>> In your case, it should not invoke the functions containing AVX512 code.
> >>>> 
> >>>> That being said, several changes were made to the op/avx component,
> >>>> so if you are experiencing some crashes, I do invite you to give a try 
> >>>> to the
> >>>> latest nightly snapshot for the v4.1.x branch.
> >>>> 
> >>>> 
> >>>> Cheers,
> >>>> 
> >>>> Gilles
> >>>> 
> >>>> On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users
> >>>> <users@lists.open-mpi.org> wrote:
> >>>>> 
> >>>>> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on
> >>>>> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor
> >>>>> that supports AVX2 but not AVX512, resulted in
> >>>>> 
> >>>>> checking for AVX512 support (no additional flags)... no
> >>>>> checking for AVX512 support (with -march=skylake-avx512)... yes
> >>>>> 
> >>>>> in "configure" output, and in config.log
> >>>>> 
> >>>>> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#'
> >>>>> MCA_BUILD_ompi_op_has_avx512_support_TRUE=''
> >>>>> 
> >>>>> Consequently AVX512 intrinsic functions were erroneously
> >>>>> deployed, resulting in OpenMPI failure.
> >>>>> 
> >>>>> The relevant test code was in essence
> >>>>> 
> >>>>> cat > conftest.c << EOF
> >>>>> #include <immintrin.h>
> >>>>> 
> >>>>> int main()
> >>>>> {
> >>>>>       __m512 vA, vB;
> >>>>> 
> >>>>>       _mm512_add_ps(vA, vB);
> >>>>> 
> >>>>>       return 0;
> >>>>> }
> >>>>> EOF
> >>>>> 
> >>>>> The problem with this is that the result of the function
> >>>>> is never used, so at optimization level higher than O0
> >>>>> the compiler elimates the function as "dead code" (DCE).
> >>>>> To wit,
> >>>>> 
> >>>>> gcc -O3 -march=skylake-avx512 -S conftest.c
> >>>>> 
> >>>>> yields
> >>>>> 
> >>>>>       .file   "conftest.c"
> >>>>>       .text
> >>>>>       .section        .text.startup,"ax",@progbits
> >>>>>       .p2align 4
> >>>>>       .globl  main
> >>>>>       .type   main, @function
> >>>>> main:
> >>>>> .LFB5345:
> >>>>>       .cfi_startproc
> >>>>>       xorl    %eax, %eax
> >>>>>       ret
> >>>>>       .cfi_endproc
> >>>>> .LFE5345:
> >>>>>       .size   main, .-main
> >>>>>       .ident  "GCC: (GNU) 10.2.0"
> >>>>>       .section        .note.GNU-stack,"",@progbits
> >>>>> 
> >>>>> Compare this with the result of
> >>>>> 
> >>>>> gcc -O0 -march=skylake-avx512 -S conftest.c
> >>>>> 
> >>>>> in which the function IS called:
> >>>>> 
> >>>>>       .file   "conftest.c"
> >>>>>       .text
> >>>>>       .globl  main
> >>>>>       .type   main, @function
> >>>>> main:
> >>>>> .LFB4092:
> >>>>>       .cfi_startproc
> >>>>>       pushq   %rbp
> >>>>>       .cfi_def_cfa_offset 16
> >>>>>       .cfi_offset 6, -16
> >>>>>       movq    %rsp, %rbp
> >>>>>       .cfi_def_cfa_register 6
> >>>>>       andq    $-64, %rsp
> >>>>>       subq    $136, %rsp
> >>>>>       vmovaps 72(%rsp), %zmm0
> >>>>>       vmovaps %zmm0, -56(%rsp)
> >>>>>       vmovaps 8(%rsp), %zmm0
> >>>>>       vmovaps %zmm0, -120(%rsp)
> >>>>>       movl    $0, %eax
> >>>>>       leave
> >>>>>       .cfi_def_cfa 7, 8
> >>>>>       ret
> >>>>>       .cfi_endproc
> >>>>> .LFE4092:
> >>>>>       .size   main, .-main
> >>>>>       .ident  "GCC: (GNU) 10.2.0"
> >>>>>       .section        .note.GNU-stack,"",@progbits
> >>>>> 
> >>>>> Note the use of a 512-bit ZMM register - ZMM registers
> >>>>> are used only by AVX512 instructions.  Hence at O3 the
> >>>>> test program does not detect the lack of AVX512 support
> >>>>> by the host processor.
> >>>>> 
> >>>>> An easy remedy would be to declare the operands as
> >>>>> "volatile" and thereby force to compiler to invoke the
> >>>>> function:
> >>>>> 
> >>>>> cat > conftest.c << EOF
> >>>>> #include <immintrin.h>
> >>>>> 
> >>>>> int main()
> >>>>> {
> >>>>>       volatile __m512 vA, vB;
> >>>>> 
> >>>>>       _mm512_add_ps(vA, vB);
> >>>>> 
> >>>>>       return 0;
> >>>>> }
> >>>>> 
> >>>>> Compiled at O3, the resulting executable dumps core as it
> >>>>> should when run on my Haswell processor, returning nonzero
> >>>>> exit status ($?), which would inform "configure" that the
> >>>>> processor does not have AVX512 capability.
> >>>>> 
> >>>>> Finally please note that this error could affect the
> >>>>> detection of support for other instruction sets on other
> >>>>> families of processors: compiler optimization must be
> >>>>> inhibited for such tests to be reliable!
> >>>>> 
> >>>>> Max
> >>>>> ---
> >>>>> Max R. Dechantsreiter
> >>>>> President
> >>>>> Performance Jones L.L.C.
> >>>>> m...@performancejones.com
> >>>>> Skype: PerformanceJones (UTC+01:00)
> >>>>> +1 414 446-3100 (telephone/voicemail)
> >>>>> http://www.linkedin.com/in/benchmarking
> >>> 
> >>> 
> >>> -- 
> >>> Jeff Squyres
> >>> jsquy...@cisco.com
> >>> 
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 

Reply via email to