Dear Rolf,

your suggestion works!

$ mpirun -np 4 --map-by ppr:1:socket -bind-to core  --mca coll ^ml osu_alltoall
# OSU MPI All-to-All Personalized Exchange Latency Test v4.2
# Size       Avg Latency(us)
1                       8.02
2                       2.96
4                       2.91
8                       2.91
16                      2.96
32                      3.07
64                      3.25
128                     3.74
256                     3.85
512                     4.11
1024                    4.79
2048                    5.91
4096                   15.84
8192                   24.88
16384                  35.35
32768                  56.20
65536                  66.88
131072                114.89
262144                209.36
524288                396.12
1048576               765.65


Can you clarify exactly where the problem come from?

Regards,
Filippo


On Mar 4, 2014, at 12:17 AM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
> Can you try running with --mca coll ^ml and see if things work? 
> 
> Rolf
> 
>> -----Original Message-----
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Filippo Spiga
>> Sent: Monday, March 03, 2014 7:14 PM
>> To: Open MPI Users
>> Subject: [OMPI users] 1.7.5rc1, error "COLL-ML ml_discover_hierarchy exited
>> with error."
>> 
>> Dear Open MPI developers,
>> 
>> I hit an expected error running OSU osu_alltoall benchmark using Open MPI
>> 1.7.5rc1. Here the error:
>> 
>> $ mpirun -np 4 --map-by ppr:1:socket -bind-to core osu_alltoall In
>> bcol_comm_query hmca_bcol_basesmuma_allocate_sm_ctl_memory failed
>> In bcol_comm_query hmca_bcol_basesmuma_allocate_sm_ctl_memory
>> failed
>> [tesla50][[6927,1],1][../../../../../ompi/mca/coll/ml/coll_ml_module.c:2996:mc
>> a_coll_ml_comm_query] COLL-ML ml_discover_hierarchy exited with error.
>> 
>> [tesla50:42200] In base_bcol_masesmuma_setup_library_buffers and mpool
>> was not successfully setup!
>> [tesla50][[6927,1],0][../../../../../ompi/mca/coll/ml/coll_ml_module.c:2996:mc
>> a_coll_ml_comm_query] COLL-ML ml_discover_hierarchy exited with error.
>> 
>> [tesla50:42201] In base_bcol_masesmuma_setup_library_buffers and mpool
>> was not successfully setup!
>> # OSU MPI All-to-All Personalized Exchange Latency Test v4.2
>> # Size       Avg Latency(us)
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 3 with PID 4508 on node tesla51 exited on
>> signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>> 2 total processes killed (some possibly by mpirun during cleanup)
>> 
>> Any idea where this come from?
>> 
>> I compiled Open MPI using Intel 12.1, latest Mellanox stack and CUDA 6.0RC.
>> Attached outputs grabbed from configure, make and run. The configure was
>> 
>> export MXM_DIR=/opt/mellanox/mxm
>> export KNEM_DIR=$(find /opt -maxdepth 1 -type d -name "knem*" -print0)
>> export FCA_DIR=/opt/mellanox/fca export HCOLL_DIR=/opt/mellanox/hcoll
>> 
>> ../configure CC=icc CXX=icpc F77=ifort FC=ifort FFLAGS="-xSSE4.2 -axAVX -ip -
>> O3 -fno-fnalias" FCFLAGS="-xSSE4.2 -axAVX -ip -O3 -fno-fnalias" 
>> --prefix=<...>
>> --enable-mpirun-prefix-by-default --with-fca=$FCA_DIR --with-
>> mxm=$MXM_DIR --with-knem=$KNEM_DIR  --with-
>> cuda=$CUDA_INSTALL_PATH --enable-mpi-thread-multiple --with-
>> hwloc=internal --with-verbs 2>&1 | tee config.out
>> 
>> 
>> Thanks in advance,
>> Regards
>> 
>> Filippo
>> 
>> --
>> Mr. Filippo SPIGA, M.Sc.
>> http://www.linkedin.com/in/filippospiga ~ skype: filippo.spiga
>> 
>> <Nobody will drive us out of Cantor's paradise.> ~ David Hilbert
>> 
>> *****
>> Disclaimer: "Please note this message and any attachments are
>> CONFIDENTIAL and may be privileged or otherwise protected from disclosure.
>> The contents are not to be disclosed to anyone other than the addressee.
>> Unauthorized recipients are requested to preserve this confidentiality and to
>> advise the sender immediately of any error in transmission."
> 
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Mr. Filippo SPIGA, M.Sc.
http://www.linkedin.com/in/filippospiga ~ skype: filippo.spiga

«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert

*****
Disclaimer: "Please note this message and any attachments are CONFIDENTIAL and 
may be privileged or otherwise protected from disclosure. The contents are not 
to be disclosed to anyone other than the addressee. Unauthorized recipients are 
requested to preserve this confidentiality and to advise the sender immediately 
of any error in transmission."


Reply via email to