Re: [OMPI users] "ssh: connect to host XXX.XXX.XXX.XX port 22: connection timed out" errors during mpirun

2013-06-07 Thread vacate
Hello Castain, Thank you for your reply! Actually, the thing I'm doing now is that I want to see how many OpenMPI jobs can be handled in one computer at the same ... So I'm trying to figure out the reason of the problem that I meet now. It seems that it's not the limit of OpenMPI, and the real

Re: [OMPI users] OMPI Coll Framework and RDMA

2013-06-07 Thread Jingcha Joba
Hi Pavel, Does that mean, if there is a AllGatherV and assuming that every process belongs to default comm, there will n-1 Queue Pair between the collecting process and other processes ? n = total number of MPI processes. -- Joba On Thu, Jun 6, 2013 at 3:37 PM, Shamis, Pavel wrote: > Default

Re: [OMPI users] Sandy Bridge performance question

2013-06-07 Thread Jeff Squyres (jsquyres)
+1 Depending on how much you care, you might also want to look at some performance analysis tools to look and see what is happening under the covers. The Intel VTune suite is the gold standard -- it shows all the counters and statistics from the CPUs themselves (be aware that there's a bit of

Re: [OMPI users] Sandy Bridge performance question

2013-06-07 Thread Blosch, Edwin L
My bad. Just a dumb mistake. Load-balance, as Ralph suggested. I had decomposed into 16 equally sized parts which didn't map well to 15 cores. Regarding VTune, we have a code that doesn't scale well so that's a good tip. I have access to VTune, I've used it. But I only remember looking at Open

Re: [OMPI users] Sandy Bridge performance question

2013-06-07 Thread Jeff Squyres (jsquyres)
On Jun 7, 2013, at 5:28 AM, "Blosch, Edwin L" wrote: > Regarding VTune, we have a code that doesn't scale well so that's a good tip. > I have access to VTune, I've used it. But I only remember looking at > OpenMP, I didn't know it could handle MPI runs. That would be great. You might have

Re: [OMPI users] "ssh: connect to host XXX.XXX.XXX.XX port 22: connection timed out" errors during mpirun

2013-06-07 Thread Ralph Castain
On Jun 6, 2013, at 10:33 PM, vacate wrote: > Hello Castain, > > Thank you for your reply! > > Actually, the thing I'm doing now is that I want to see how many OpenMPI jobs > can be handled in one computer at the same ... That is a simple question to answer. If you don't care about performanc

Re: [OMPI users] OMPI Coll Framework and RDMA

2013-06-07 Thread Shamis, Pavel
Does that mean, if there is a AllGatherV and assuming that every process belongs to default comm, there will n-1 Queue Pair between the collecting process and other processes ? n = total number of MPI processes. The answer depends on multiple parameters, like number of processes, message size,

Re: [OMPI users] OMPI Coll Framework and RDMA

2013-06-07 Thread Jingcha Joba
Interesting. I would like to understand more on how QP implementation in OpenMPI, for example, heuristics behind creating multiple QPs between two mpi processes. Is there any whitepaper / reference / manual that I can refer to for that? Or can you point me to the source code region for this? Th

Re: [OMPI users] OMPI Coll Framework and RDMA

2013-06-07 Thread Jeff Squyres (jsquyres)
http://www.open-mpi.org/papers/euro-pvmmpi-2007-ib/ On Jun 7, 2013, at 11:11 AM, Jingcha Joba wrote: > Interesting. > > I would like to understand more on how QP implementation in OpenMPI, for > example, heuristics behind creating multiple QPs between two mpi processes. > > Is there any whi

[OMPI users] Ompi-BLCR

2013-06-07 Thread Gary Lu
Hi guys, I'm having trouble with Open MPI and BLCR. The Open MPI version is 1.5.4 and BLCR version is 0.8.5 I am running the process with mpirun -am ft-enable-cr. When I try to checkpoint the mpirun process with ompi-checkpoint PID _of_mpirun, the process dies. A global snapshot directory is cr