Re: [OMPI users] Mac Ifort and gfortran together

2010-12-15 Thread Doug Reeder
Hello, You may me bumping into conflicts between the apple supplied ompi and your mpi. I use modules to force my mpi to the front of the PATH and DYLD_LIBRARY_PATH variables. Doug Reeder On Dec 15, 2010, at 5:22 PM, Jeff Squyres wrote: > Sorry for the ginormous delay in replying here; I blame

Re: [OMPI users] Mac Ifort and gfortran together

2010-12-15 Thread Tim Prince
On 12/15/2010 8:22 PM, Jeff Squyres wrote: Sorry for the ginormous delay in replying here; I blame SC'10, Thanksgiving, and the MPI Forum meeting last week... On Nov 29, 2010, at 2:12 PM, David Robertson wrote: I'm noticing a strange problem with Open MPI 1.4.2 on Mac OS X 10.6. We use both

Re: [OMPI users] Trouble with IPM & OpenMPI on SGI Altix

2010-12-15 Thread Jeff Squyres
I'm afraid that we're not conversant with the IPM application here -- have you tried pinging the IPM authors to see if they support what you're trying to do? (sorry to give such a weasel answer, but we really have no idea :-( ) On Dec 8, 2010, at 10:59 AM, Gilbert Grosdidier wrote: > Bonjour,

Re: [OMPI users] Using MPI_Put/Get correctly?

2010-12-15 Thread Jeff Squyres
Is there a reason to convert your code from send/receive to put/get? The performance may not be that significantly different, and as you have noted, the MPI-2 put/get semantics are a total nightmare to understand (I personally advise people not to use them -- MPI-3 is cleaning up the put/get sem

Re: [OMPI users] Calling MPI_Test() too many times results in a time spike

2010-12-15 Thread Jeff Squyres
On Dec 15, 2010, at 9:10 PM, Ioannis Papadopoulos wrote: > I agree that MPI_Test() has to do some progress, but as you can see I am only > sending one message and I busy wait on it - since there is nothing else to do > and no other incoming traffic, I would expect no difference among MPI_Test()

Re: [OMPI users] Calling MPI_Test() too many times results in a time spike

2010-12-15 Thread Ioannis Papadopoulos
On 12/15/2010 07:39 PM, Jeff Squyres wrote: On Nov 30, 2010, at 4:09 PM, Ioannis Papadopoulos wrote: The overall time may be the same, however it is alarming (at least to me) that if you call MPI_Test() too many times, the average time per MPI_Test() call increases. After all, that is what I

Re: [OMPI users] Calling MPI_Test() too many times results in a time spike

2010-12-15 Thread Jeff Squyres
On Nov 30, 2010, at 4:09 PM, Ioannis Papadopoulos wrote: > The overall time may be the same, however it is alarming (at least to me) > that if you call MPI_Test() too many times, the average time per MPI_Test() > call increases. After all, that is what I am trying to measure, how much it > cost

Re: [OMPI users] Mac Ifort and gfortran together

2010-12-15 Thread Jeff Squyres
Sorry for the ginormous delay in replying here; I blame SC'10, Thanksgiving, and the MPI Forum meeting last week... On Nov 29, 2010, at 2:12 PM, David Robertson wrote: > I'm noticing a strange problem with Open MPI 1.4.2 on Mac OS X 10.6. We use > both Intel Ifort 11.1 and gfortran 4.3 on the

Re: [OMPI users] questions about the openib component

2010-12-15 Thread Jeff Squyres
On Dec 8, 2010, at 10:59 AM, 侯杰 wrote: > Now I am studying the openib component, and I find it is really complicated. > Here I have one question to ask, it is as follows: > > In the initialization of openib component, the function named setup_qps() is > used. In this function, the following co

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Jeff Squyres
On Dec 15, 2010, at 3:24 PM, Ralph Castain wrote: >> I am not using the TCP BTL, only OPENIB one. Does this change the number of >> sockets in use per node, please ? > > I believe the openib btl opens sockets for connection purposes, so the count > is likely the same. An IB person can confirm t

Re: [OMPI users] segmentation fault

2010-12-15 Thread Gus Correa
Maybe a CFD jargon? Perhaps the number (not size) of cells in a mesh/grid being handled by each core/cpu? Ralph Castain wrote: I have no idea what you mean by "cell sizes per core". Certainly not any terminology within OMPI... On Dec 15, 2010, at 3:47 PM, Vaz, Guilherme wrote: Dear all, I

Re: [OMPI users] segmentation fault

2010-12-15 Thread Gus Correa
Vaz, Guilherme wrote: Dear all, I have a problem with openmpi1.3, ifort+mkl v11.1 in Ubuntu10.04 systems (32 or 64bit). My code worked in Ubuntu8.04 and works in RedHat based systems, with slightly different version changes on mkl and ifort. There were no changes in the source code. The probl

Re: [OMPI users] segmentation fault

2010-12-15 Thread Ralph Castain
I have no idea what you mean by "cell sizes per core". Certainly not any terminology within OMPI... On Dec 15, 2010, at 3:47 PM, Vaz, Guilherme wrote: > > Dear all, > > I have a problem with openmpi1.3, ifort+mkl v11.1 in Ubuntu10.04 systems (32 > or 64bit). My code worked in Ubuntu8.04 and

[OMPI users] segmentation fault

2010-12-15 Thread Vaz, Guilherme
Dear all, I have a problem with openmpi1.3, ifort+mkl v11.1 in Ubuntu10.04 systems (32 or 64bit). My code worked in Ubuntu8.04 and works in RedHat based systems, with slightly different version changes on mkl and ifort. There were no changes in the source code. The problem is that the applicati

[OMPI users] MPI-IO problem

2010-12-15 Thread Tom Rosmond
I want to implement an MPI-IO solution for some of the IO in a large atmospheric data assimilation system. Years ago I got some small demonstration Fortran programs ( I think from Bill Gropp) that seem to be good candidate prototypes for what I need. Two of them are attached as part of simple she

[OMPI users] Valgrind suppression not so suppressed

2010-12-15 Thread David Mathog
OMPI 1.4.3 Valgrind 3.5.0 Trying to use valgrind on a program and getting a ton of MPI related noise, totally swamping the memory problems in the program itself. Looked at the FAQ and used the suppression file referred to there: mpirun -np 2 -host newsaf.cluster \ valgrind \ --leak-check=fu

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Ralph Castain
On Dec 15, 2010, at 1:11 PM, Gilbert Grosdidier wrote: > Ralph, > > I am not using the TCP BTL, only OPENIB one. Does this change the number of > sockets in use per node, please ? I believe the openib btl opens sockets for connection purposes, so the count is likely the same. An IB person can

[OMPI users] Using MPI_Put/Get correctly?

2010-12-15 Thread Grismer, Matthew J Civ USAF AFMC AFRL/RBAT
I am trying to modify the communication routines in our code to use MPI_Put's instead of sends and receives. This worked fine for several variable Put's, but now I have one that is causing seg faults. Reading through the MPI documentation it is not clear to me if what I am doing is permissible or

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Gilbert Grosdidier
Ralph, I am not using the TCP BTL, only OPENIB one. Does this change the number of sockets in use per node, please ? But I suspect the ORTE daemons are communicating only through TCP anyway, right ? Also, is there anybody in the OpenMPI team using an SGI Altix cluster with a high number

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Ralph Castain
On Dec 15, 2010, at 12:30 PM, Gilbert Grosdidier wrote: > Bonsoir Ralph, > > Le 15/12/2010 18:45, Ralph Castain a écrit : >> It looks like all the messages are flowing within a single job (all three >> processes mentioned in the error have the same identifier). Only possibility >> I can think

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Gilbert Grosdidier
Bonsoir Ralph, Le 15/12/2010 18:45, Ralph Castain a écrit : It looks like all the messages are flowing within a single job (all three processes mentioned in the error have the same identifier). Only possibility I can think of is that somehow you are reusing ports - is it possible your system d

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Ralph Castain
On Dec 15, 2010, at 10:14 AM, Gilbert Grosdidier wrote: > Bonjour Ralph, > > Thanks for taking time to help me. > > Le 15 déc. 10 à 16:27, Ralph Castain a écrit : > >> It would appear that there is something trying to talk to a socket opened by >> one of your daemons. At a guess, I would bet

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Gilbert Grosdidier
Bonjour Ralph, Thanks for taking time to help me. Le 15 déc. 10 à 16:27, Ralph Castain a écrit : It would appear that there is something trying to talk to a socket opened by one of your daemons. At a guess, I would bet the problem is that a prior job left a daemon alive that is talking on

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Ralph Castain
It would appear that there is something trying to talk to a socket opened by one of your daemons. At a guess, I would bet the problem is that a prior job left a daemon alive that is talking on the same socket. Are you by chance using static ports for the job? Did you run another job just before

[OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Gilbert Grosdidier
Bonjour, Running with OpenMPI 1.4.3 on an SGI Altix cluster with 4096 cores, I got this error message, right at startup : mca_oob_tcp_peer_recv_connect_ack: received unexpected process identifier [[13816,0],209] and the whole job is going to spin for an undefined period, without crashing/ab

[OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2010-12-15 Thread Gilbert Grosdidier
Bonjour, Running with OpenMPI 1.4.3 on an SGI Altix cluster with 2048 cores, I got this error message on all cores, right at startup : btl_openib.c:211:adjust_cq] cannot resize completion queue, error: 12 What could be the culprit please ? Is there a workaround ? What parameter is to be tuned