Segfaults in FORTRAN generally mean either an array is out of bounds, or
you can't get the memory you are requesting. Check your array sizes
(particularly the ones in subroutines). You can compile with -C, but
that only tells you if you exceed an array declaration, not the actual
size. It is po
Which FORTRAN compiler are you using? I believe that most of them allow
you to compile with -g and optimization and then force a stack dump on
crash. I have found this to work on code that seems to vanish on random
processors. Also, you might look at the FORTRAN options and see if it
lets you a
en-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
--
David Warren
University of Washington
206 543-0954
g so that I can get libtool to see where icpc is?
>
> Thanks and best regards,
> Stephen
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
--
David Warren
University of Washington
206 543-0954
Your problem may not be related to bandwidth. It may be latency or
division of the problem. We found significant improvements running wrf
and other atmospheric code (CFD) over IB. The problem was not so much
the amount of data communicated, but how long it takes to send it. Also,
is your model
You should not have to recompile openmpi, but you do have to use the
correct type. You can check the size of integers in your fortrana nd use
MPI_INTEGER4 or MPI_INTEGER8 depending on what you get.
in gfortran use
integer i
if(sizeof(i) .eq. 8) then
mpi_int_type=MPI_INTEGER8
else
mpi_int
Actually, sub array passing is part of the F90 standard (at least
according to every document I can find), and not an Intel extension. So
if it doesn't work you should complain to the compiler company. One of
the reasons for using it is that the compiler should be optimized for
whatever method
What FORTRAN compiler are you using? This should not really be an issue
with the MPI implementation, but with the FORTRAN. This is legitimate
usage in FORTRAN 90 and the compiler should deal with it. I do similar
things using ifort and it creates temporary arrays when necessary and it
all works
I don't know if this is it, but if you use the name localhost, won't
processes on both machines try to talk to 127.0.0.1? I believe you need
to use the real hostname in you host file. I think that your two tests
work because there is no interprocess communication, just stdout.
On 08/08/11 23:4
That error is from one of the processes that was working when another
one died. It is not an indication that MPI had problems, but that you
had one of the wrf processes (#45) crash. You need to look at what
happened to process 45. What do the rsl.out and rsl.error files for #45
say?
On 08/04/
/native performance of the network between the devices reflects the same
dichotomy.
(e.g., ibv_rc_pingpong)
On Jul 15, 2011, at 7:58 PM, David Warren wrote:
All OFED 1.4 and 2.6.32 (that's what I can get to today)
qib to qib:
# OSU MPI Latency Test v3.3
# SizeLatency (
have done combined QLogic + Mellanox runs, so
this probably isn't a well-explored space.
Can you run some microbenchmarks to see what kind of latency / bandwidth you're
getting between nodes of the same type and nodes of different types?
On Jul 14, 2011, at 8:21 PM, David Warren wro
some longer tests as well before I went to ofed 1.6.
On 07/14/11 05:55, Jeff Squyres wrote:
On Jul 13, 2011, at 7:46 PM, David Warren wrote:
I finally got access to the systems again (the original ones are part of our
real time system). I thought I would try one other test I had set up
n attach a
debugger to one of the still-live processes after the error message is printed. Can you
send the stack trace? It would be interesting to know what is going on here -- I can't
think of a reason that would happen offhand.
On Jun 30, 2011, at 5:03 PM, David Warren wrote:
I
I have a cluster with mostly Mellanox ConnectX hardware and a few with
Qlogic QLE7340's. After looking through the web, FAQs etc. I built
openmpi-1.5.3 with psm and openib. If I run within the same hardware it
is fast and works fine. If I run between without specifying an MTL (e.g.
mpirun -np 2
15 matches
Mail list logo