Best guess is you are seeing a race condition. If a proc immediately fails,
we will respond by aborting the launch of any other local processes as we
are going to kill the entire job. So if I get several of them started
before the first one aborts, then any remaining ones will never get
spawned, and you won't see an error for every proc you requested.

HTH
Ralph


On Tue, Nov 18, 2014 at 2:16 AM, <michael.rach...@dlr.de> wrote:

>  Tip:  INTEL-Ftn-compiler problems can be communicated to INTEL there:
>
>
>
>
> https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x
>
>
>
> Greetings
>
> Michael Rachner
>
>
>
> *Von:* users [mailto:users-boun...@open-mpi.org] *Im Auftrag von *John
> Bray
> *Gesendet:* Dienstag, 18. November 2014 11:03
>
> *An:* Open MPI Users
> *Betreff:* Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with
> Intel-15 does nothing silently
>
>
>
> The original problem used a separate file and not a module. Its clearly a
> bizarre Intel bug, I am only continuing to persue it here as I'm curious as
> to why the segfault messages disappear at higher process counts
>
> John
>
>
>
> On 18 November 2014 09:58, <michael.rach...@dlr.de> wrote:
>
> It may be possibly a bug in Intel-15.0 .
>
> I suspect it has to do with the   contains-block   and with the fact, that
> you call an intrinsic sbr in that contains-block.
>
> Normally this must work. You may try to separate the influence of both:
>
>
>
> What happens with these 3 variants of your code:
>
>
>
> variant a):   using an own sbr instead of the intrinsic sbr
>
>
>
> program fred
> use mpi
> integer :: ierr
> call mpi_init(ierr)
> print *,"hello"
> call mpi_finalize(ierr)
> contains
>   subroutine sub
>      real :: a(10)
>      call mydummy_random_number(a)
>    end subroutine sub
>
>    subroutine mydummy_random_number(a)
>
>      real :: a(10)
>
>      print *,’---I am in sbr mydummy_random_number’
>
>    end subroutine mydummy_random_number
>
> end program fred
>
>
>
>
>
> variant b):   removing the  contains-block
>
>
>
> program fred
> use mpi
> integer :: ierr
> call mpi_init(ierr)
> print *,"hello"
> call mpi_finalize(ierr)
>
> end program fred
>
> !
>
> subroutine sub
>     real :: a(10)
>     call random_number(a)
> end subroutine sub
>
>
>
> variant c):     moving the contains-block into a module
>
>
>
> module MYMODULE
>
> contains
>
>   subroutine sub
>     real :: a(10)
>     call random_number(a)
>    end subroutine sub
>
> end module MYMODULE
>
> !
>
> program fred
>
> use MYMODULE
> use mpi
> integer :: ierr
> call mpi_init(ierr)
> print *,"hello"
> call mpi_finalize(ierr)
> end program fred
>
>
>
>
>
> Greetings
>
> Michael Rachner
>
>
>
>
>
>
>
> *Von:* users [mailto:users-boun...@open-mpi.org] *Im Auftrag von *John
> Bray
> *Gesendet:* Dienstag, 18. November 2014 10:10
> *An:* Open MPI Users
> *Betreff:* Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with
> Intel-15 does nothing silently
>
>
>
> A delightful bug this, you get a segfault if you code contains a
> random_number call and is compiled with -fopenmp, EVEN IF YOU CANNOT CALL
> IT!
>
> program fred
> use mpi
> integer :: ierr
> call mpi_init(ierr)
> print *,"hello"
> call mpi_finalize(ierr)
> contains
>   subroutine sub
>     real :: a(10)
>     call random_number(a)
>    end subroutine sub
> end program fred
>
> The segfault is nothing to do with OpenMPI, but there remains a mystery as
> to why I only get the segfault error messages on lower node counts.
>
> mpif90 -O0 -fopenmp ./fred.f90
>
> mpiexec -n 6 ./a.out
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 4 with PID 28402 on node mic2 exited on
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> jbray@mic2:intel-15_openmpi-1.8.3% mpiexec -n 12 ./a.out
>
> <nothing>
>
> It was the silence that made me raise the issue here. I am running on a 12
> physical core hyperthreaded Xeon Phi. Is there something in OpenMPI that is
> suppressing the messages, as I am getting 4/5 core files each time.
>
> John
>
>
>
> On 18 November 2014 04:24, Ralph Castain <r...@open-mpi.org> wrote:
>
> Just checked the head of the 1.8 branch (soon to be released as 1.8.4),
> and confirmed the same results. I know the thread-multiple option is still
> broken there, but will test that once we get the final fix committed.
>
>
>
>
>
> On Mon, Nov 17, 2014 at 7:29 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
> FWIW: I don't have access to a Linux box right now, but I built the OMPI
> devel master on my Mac using Intel 2015 compilers and was able to build/run
> all of the Fortran examples in our "examples" directory.
>
>
>
> I suspect the problem here is your use of the --enable-mpi-thread-multiple 
> option.
> The 1.8 series had an issue with that option - we are in the process of
> fixing it (I'm waiting for an updated patch), and you might be hitting it.
>
>
>
> If you remove that configure option, do things then work?
>
> Ralph
>
>
>
>
>
> On Mon, Nov 17, 2014 at 5:56 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>   Hi John,
>
> do you MPI_Init() or do you MPI_Init_thread(MPI_THREAD_MULTIPLE) ?
>
> does your program calls MPI anywhere from an OpenMP region ?
> does your program calls MPI only within an !$OMP MASTER section ?
> does your program does not invoke MPI at all from any OpenMP region ?
>
> can you reproduce this issue with a simple fortran program ? or can you
> publish all your files ?
>
> Cheers,
>
> Gilles
>
>
>
> On 2014/11/18 1:41, John Bray wrote:
>
>  I have succesfully been using OpenMPI 1.8.3 compiled with Intel-14, using
>
>
>
> ./configure --prefix=/usr/local/mpi/$(basename $PWD) --with-threads=posix
>
> --enable-mpi-thread-multiple --disable-vt --with-scif=no
>
>
>
> I have now switched to Intel 15.0.1, and configuring with the same options,
>
> I get minor changes in config.log about warning spotting, but it makes all
>
> the binaries, and I can compile my own fortran code with mpif90/mpicc
>
>
>
> but a command 'mpiexec --verbose -n 12 ./fortran_binary' does nothing
>
>
>
> I checked the FAQ and started using
>
>
>
> ./configure --prefix=/usr/local/mpi/$(basename $PWD) --with-threads=posix
>
> --enable-mpi-thread-multiple --disable-vt --with-scif=no CC=icc CXX=icpc
>
> F77=ifort FC=ifort
>
>
>
> but that makes no difference.
>
>
>
> Only with -d do I get any more information
>
>
>
> mpirun -d --verbose -n 12
>
> /home/jbray/5.0/mic2/one/intel-15_openmpi-1.8.3/one_f_debug.exe
>
> [mic2:21851] procdir: /tmp/openmpi-sessions-jbray@mic2_0/27642/0/0
>
> [mic2:21851] jobdir: /tmp/openmpi-sessions-jbray@mic2_0/27642/0
>
> [mic2:21851] top: openmpi-sessions-jbray@mic2_0
>
> [mic2:21851] tmp: /tmp
>
> [mic2:21851] sess_dir_cleanup: job session dir does not exist
>
> [mic2:21851] procdir: /tmp/openmpi-sessions-jbray@mic2_0/27642/0/0
>
> [mic2:21851] jobdir: /tmp/openmpi-sessions-jbray@mic2_0/27642/0
>
> [mic2:21851] top: openmpi-sessions-jbray@mic2_0
>
> [mic2:21851] tmp: /tmp
>
> [mic2:21851] sess_dir_finalize: proc session dir does not exist
>
> <12 times>
>
>
>
>
>
> [mic2:21851] sess_dir_cleanup: job session dir does not exist
>
> exiting with status 139
>
>
>
> My C codes do not have this problem
>
>
>
> Compiler options are
>
>
>
> mpicxx -g -O0 -fno-inline-functions -openmp -o one_c_debug.exe async.c
>
> collective.c compute.c memory.c one.c openmp.c p2p.c variables.c
>
> auditmpi.c   control.c inout.c perfio.c ring.c wave.c io.c   leak.c mpiio.c
>
> pthreads.c -openmp -lpthread
>
>
>
> mpif90 -g -O0  -fno-inline-functions -openmp -o one_f_debug.exe control.o
>
> io.f90 leak.f90 memory.f90 one.f90 ring.f90 slow.f90 swapbounds.f90
>
> variables.f90 wave.f90 *.F90 -openmp
>
>
>
> Any suggestions as to what is upsetting Fortran with Intel-15
>
>
>
> John
>
>
>
>
>
> _______________________________________________
>
> users mailing list
>
> us...@open-mpi.org
>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25823.php
>
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25829.php
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25833.php
>
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25835.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25837.php
>

Reply via email to