Hi Reza

It is hard to guess with little information.
Other things you could check:

1) Are you allowed to increase the stack size (say,
by the sys admin in limits.conf)?
If using a job queue system,
does it limit the stack size somehow?

2) If you can compile and
run the Open MPI examples (hello_c.c, ring_c.c, connectivity_c.c),
then it is unlikely that the problem is with Open MPI.
This is kind of a first line of defense to diagnose this type
of problem and the health of your Open MPI installation.

Your error message says "Connection reset by peer", so
I wonder if there is any firewall or other network roadblock
or configuration issue.  Worth testing Open MPI
with simpler MPI programs,
and even (for network setup) with shell commands like "hostname".

3) Make sure there is no mixup of MPI implementations (e.g. MPICH
and Open MPI) or versions, both for mpicc and mpiexec.
Make sure the LD_LIBRARY_PATH is pointing to the right OpenMPI
lib location (and to the right BLAS/LAPACK location, for that matter).

4) No mixup of architectures either (32 vs 64 bit).
I wonder why your Open MPI library is installed in
/usr/lib/openmpi not /usr/lib64,
but your HPL ARCH = intel64 and everything else seems to be x86_64.
If you apt-get an Open MPI package, check if it is
i386 or x86_64.
(It may be simpler to download and install
the Open MPI tarball in /usr/local or in your home directory.)

5) Check if you are using a threaded or OpenMP enabled
BLAS/Lapack library or a number of threads greater than 1.

6) Is the problem size (N) in your HPL.dat parameter file
consistent with the physical memory available?

I hope this helps,
Gus Correa

On 04/03/2013 02:32 PM, Ralph Castain wrote:
I agree with Gus - check your stack size. This isn't occurring in OMPI
itself, so I suspect it is in the system setup.


On Apr 3, 2013, at 10:17 AM, Reza Bakhshayeshi <reza.b2...@gmail.com
<mailto:reza.b2...@gmail.com>> wrote:

Thanks for your answers.

@Ralph Castain:
Do you mean what error I receive?
It's the output when I'm running the program:

*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x1b7f000
[ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f6a84b524a0]
[ 1] hpcc(HPCC_Power2NodesMPIRandomAccessCheck+0xa04) [0x423834]
[ 2] hpcc(HPCC_MPIRandomAccess+0x87a) [0x41e43a]
[ 3] hpcc(main+0xfbf) [0x40a1bf]
[ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)
[0x7f6a84b3d76d]
[ 5] hpcc() [0x40aafd]
*** End of error message ***
[
][[53938,1],0][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 4164 on node 192.168.100.6
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

@Gus Correa:
I did it both on server and on instances but it didn't solve the problem.


On 3 April 2013 19:14, Gus Correa <g...@ldeo.columbia.edu
<mailto:g...@ldeo.columbia.edu>> wrote:

    Hi Reza

    Check the system stacksize first ('limit stacksize' or 'ulimit -s').
    If it is small, you can try to increase it
    before you run the program.
    Say (tcsh):

    limit stacksize unlimited

    or (bash):

    ulimit -s unlimited

    I hope this helps,
    Gus Correa


    On 04/03/2013 10:29 AM, Ralph Castain wrote:

        Could you perhaps share the stacktrace from the segfault? It's
        impossible to advise you on the problem without seeing it.


        On Apr 3, 2013, at 5:28 AM, Reza Bakhshayeshi
        <reza.b2...@gmail.com <mailto:reza.b2...@gmail.com>
        <mailto:reza.b2...@gmail.com <mailto:reza.b2...@gmail.com>>>
        wrote:

            ​Hi
            ​​I have installed HPCC benchmark suite and openmpi on a
            private cloud
            instances.
            Unfortunately I get Segmentation fault error mostly when I
            want to run
            it simultaneously on two or more instances with:
            mpirun -np 2 --hostfile ./myhosts hpcc

            Everything is on Ubuntu server 12.04 (updated)
            and this is my make.intel64 file:

            shell
            ------------------------------__------------------------------__--
            #
            
------------------------------__------------------------------__----------
            #
            SHELL = /bin/sh
            #
            CD = cd
            CP = cp
            LN_S = ln -s
            MKDIR = mkdir
            RM = /bin/rm -f
            TOUCH = touch
            #
            #
            
------------------------------__------------------------------__----------
            # - Platform identifier
            ------------------------------__------------------
            #
            
------------------------------__------------------------------__----------
            #
            ARCH = intel64
            #
            #
            
------------------------------__------------------------------__----------
            # - HPL Directory Structure / HPL library
            ------------------------------
            #
            
------------------------------__------------------------------__----------
            #
            TOPdir = ../../..
            INCdir = $(TOPdir)/include
            BINdir = $(TOPdir)/bin/$(ARCH)
            LIBdir = $(TOPdir)/lib/$(ARCH)
            #
            HPLlib = $(LIBdir)/libhpl.a
            #
            #
            
------------------------------__------------------------------__----------
            # - Message Passing library (MPI)
            ------------------------------__--------
            #
            
------------------------------__------------------------------__----------
            # MPinc tells the C compiler where to find the Message
            Passing library
            # header files, MPlib is defined to be the name of the
            library to be
            # used. The variable MPdir is only used for defining MPinc
            and MPlib.
            #
            MPdir = /usr/lib/openmpi
            MPinc = -I$(MPdir)/include
            MPlib = $(MPdir)/lib/libmpi.so
            #
            #
            
------------------------------__------------------------------__----------
            # - Linear Algebra library (BLAS or VSIPL)
            -----------------------------
            #
            
------------------------------__------------------------------__----------
            # LAinc tells the C compiler where to find the Linear
            Algebra library
            # header files, LAlib is defined to be the name of the
            library to be
            # used. The variable LAdir is only used for defining LAinc
            and LAlib.
            #
            LAdir = /usr/local/ATLAS/obj64
            LAinc = -I$(LAdir)/include
            LAlib = $(LAdir)/lib/libcblas.a $(LAdir)/lib/libatlas.a
            #
            #
            
------------------------------__------------------------------__----------
            # - F77 / C interface
            ------------------------------__--------------------
            #
            
------------------------------__------------------------------__----------
            # You can skip this section if and only if you are not
            planning to use
            # a BLAS library featuring a Fortran 77 interface.
            Otherwise, it is
            # necessary to fill out the F2CDEFS variable with the
            appropriate
            # options. **One and only one** option should be chosen in
            **each** of
            # the 3 following categories:
            #
            # 1) name space (How C calls a Fortran 77 routine)
            #
            # -DAdd_ : all lower case and a suffixed underscore (Suns,
            # Intel, ...), [default]
            # -DNoChange : all lower case (IBM RS6000),
            # -DUpCase : all upper case (Cray),
            # -DAdd__ : the FORTRAN compiler in use is f2c.
            #
            # 2) C and Fortran 77 integer mapping
            #
            # -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]
            # -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,
            # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
            #
            # 3) Fortran 77 string handling
            #
            # -DStringSunStyle : The string address is passed at the
            string loca-
            # tion on the stack, and the string length is then
            # passed as an F77_INTEGER after all explicit
            # stack arguments, [default]
            # -DStringStructPtr : The address of a structure is passed
            by a
            # Fortran 77 string, and the structure is of the
            # form: struct {char *cp; F77_INTEGER len;},
            # -DStringStructVal : A structure is passed by value for
            each Fortran
            # 77 string, and the structure is of the form:
            # struct {char *cp; F77_INTEGER len;},
            # -DStringCrayStyle : Special option for Cray machines,
            which uses
            # Cray fcd (fortran character descriptor) for
            # interoperation.
            #
            F2CDEFS =
            #
            #
            
------------------------------__------------------------------__----------
            # - HPL includes / libraries / specifics
            ------------------------------__-
            #
            
------------------------------__------------------------------__----------
            #
            HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc)
            $(MPinc)
            HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -lm
            #
            # - Compile time options
            ------------------------------__-----------------
            #
            # -DHPL_COPY_L force the copy of the panel L before bcast;
            # -DHPL_CALL_CBLAS call the cblas interface;
            # -DHPL_CALL_VSIPL call the vsip library;
            # -DHPL_DETAILED_TIMING enable detailed timers;
            #
            # By default HPL will:
            # *) not copy L before broadcast,
            # *) call the BLAS Fortran 77 interface,
            # *) not display detailed timing information.
            #
            HPL_OPTS = -DHPL_CALL_CBLAS
            #
            #
            
------------------------------__------------------------------__----------
            #
            HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
            #
            #
            
------------------------------__------------------------------__----------
            # - Compilers / linkers - Optimization flags
            ---------------------------
            #
            
------------------------------__------------------------------__----------
            #
            CC = /usr/bin/mpicc
            CCNOOPT = $(HPL_DEFS)
            CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops
            #CCFLAGS = $(HPL_DEFS)
            #
            # On some platforms, it is necessary to use the Fortran
            linker to find
            # the Fortran internals used in the BLAS library.
            #
            LINKER = /usr/bin/mpif90
            LINKFLAGS = $(CCFLAGS)
            #
            ARCHIVER = ar
            ARFLAGS = r
            RANLIB = echo
            #
            #
            
------------------------------__------------------------------__----------

            Would you mind please help me figure this problem out?

            Regards,
            Reza
            _________________________________________________
            users mailing list
            us...@open-mpi.org <mailto:us...@open-mpi.org>
            <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
            http://www.open-mpi.org/__mailman/listinfo.cgi/users
            <http://www.open-mpi.org/mailman/listinfo.cgi/users>




        _________________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        http://www.open-mpi.org/__mailman/listinfo.cgi/users
        <http://www.open-mpi.org/mailman/listinfo.cgi/users>


    _________________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/__mailman/listinfo.cgi/users
    <http://www.open-mpi.org/mailman/listinfo.cgi/users>


_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to