Bonjour,
Since I'm very suspicious about the condition of the IB network on
my cluster,
I'm trying to use the csum pml feature of OMPI (1.4.3).
But I have a question: what happens if the Checksum is different on
both ends ?
Is there a warning printed, a flag set by the MPI_(I)recv or
Hello,
Is there anyway to spawn processes with the ompi-server option? I need the
child processes to open and publish ports for which I require this option.
Is there an alternative?
Thanks,
Suraj Prabhakaran
Not sure I fully understand the question. If you provide the --ompi-server
option to mpirun, that info will be passed along to all processes,
including those launched via comm_spawn, so they can subsequently connect to
the server.
On Dec 14, 2010, at 6:50 AM, Suraj Prabhakaran wrote:
>
Hi James:
I can reproduce the problem on a single node with Open MPI 1.5 and the
trunk. I have submitted a ticket with
the information.
https://svn.open-mpi.org/trac/ompi/ticket/2656
Rolf
On 12/13/10 18:44, James Dinan wrote:
Hi,
I'm getting strange behavior using datatypes in a one-sided
About 9 months ago we had a new installation with a system of 1800 cores and at
the time we found that jobs with more than 1028 cores would not start. At the
time a colleague found that setting
OMPI_MCA_plm_rsh_num_concurrent=256
help with the problem.
We have now increased our processor
David Mathog wrote:
Is there a tool in openmpi that will reveal how much "spin time" the
processes are using?
I don't know what sort of answer is helpful for you, but I'll describe
one option.
With Oracle Message Passing Toolkit (formerly Sun ClusterTools, anyhow,
an OMPI distribution
So the 2/2 consensus is to use the collective. That is straightforward
for the send part of this, since all workers are sent the same data.
For the receive I do not see how to use a collective. Each worker sends
back a data structure, and the structures are of of varying size. This
is almost
David Mathog wrote:
For the receive I do not see how to use a collective. Each worker sends
back a data structure, and the structures are of of varying size. This
is almost always the case in Bioinformatics, where what is usually
coming back from each worker is a count M of the number of
Hi Rolf,
Thanks for your help. I also noticed trouble with subarray data types.
I attached the same test again, but with subarray rather than indexed
types. It works correctly with MVAPICH on IB, but fails with OpenMPI
1.5 with the following message:
$ mpiexec -n 2 ./a.out
MPI RMA
I have experimented a bit more and found that if I set
OMPI_MCA_plm_rsh_num_concurrent=1024
a job with more than 2,500 processes will start and run.
However when I searched the open-mpi web site for the the variable I could not
find any indication.
Best wishes,
Lydia Heck
15. jobs
On 14 December 2010 17:32, Lydia Heck wrote:
>
> I have experimented a bit more and found that if I set
>
> OMPI_MCA_plm_rsh_num_concurrent=1024
>
> a job with more than 2,500 processes will start and run.
>
> However when I searched the open-mpi web site for the the
IF the checksum on both peers doesn't match, your MPI call will return with an
error. This is in addition of Open MPI printing a warning message on the output
(which can be silenced with the right mca parameter).
So, you're supposed to check the return values, and abort if something fishy is
Bonjour Ralph,
I wonder : is this plm_rsh_num_concurrent parameter standing ONLY for
rsh use,
or for ssh OR rsh, depending on plm_rsh_agent, please ?
Thanks, Best, G.
Le 14/12/2010 18:30, Ralph Castain a écrit :
That's a big cluster to be starting with rsh! :-)
When you say it won't
It applies to both. In the rsh/ssh launcher, there is a limit on how many
concurrent ssh/rsh sessions we have open at any one time. This is required due
to OS limitations. As each daemon completes its launch, it "daemonizes" and
closes the ssh/rsh session, thus enabling another daemon to be
On Dec 10, 2010, at 11:00 AM, Prentice Bisbal wrote:
>> Would it make sense to implement this as an MPI extension, and then
>> perhaps propose something to the Forum for this purpose?
>
> I think that makes sense. As core and socket counts go up, I imagine the need
> for this information will
On Dec 6, 2010, at 9:26 AM, Hicham Mouline wrote:
> Thanks, it is now clarified that a call to MPI_INIT has the same effect as a
> call to MPI_INIT_THREAD with
> a required = MPI_THREAD_SINGLE. Perhaps it should be added here:
> http://www.open-mpi.org/doc/v1.4/man3/MPI_Init_thread.3.php
> as
16 matches
Mail list logo