Hello.
Thanks for yours answers.
I'ts as you said Josh, i'm trying to do something uncoordinated, and on
demand. What i'm doing now is to put some code in the btl_tcp_endpoint.c and
others file that allows me to change the attempts of communication in the
sockets when a failure occurs. At the mom
I don't think this will help much, but I can tell you how we handled
this for the coordinated C/R functionality.
When we added automatic recovery and process migration using
coordinated checkpoints to the Open MPI trunk (spring/summer 2010) we
were able to take advantage of the coordinated nature
On Jun 4, 2011, at 5:21 AM, Hugo Meyer wrote:
> Thanks for your replies.
>
> >After doing that, the MPI_Init procedure calls grpcomm.modex to distribute
> >the data across all procs in the job. Unfortunately, being a collective, all
> >procs must participate. In your case, you'll have to find
Thanks for your replies.
>After doing that, the MPI_Init procedure calls grpcomm.modex to distribute
the data across all procs in the job. Unfortunately, being a collective, all
procs must participate. In your case, you'll have to find a different way to
do it. Upon receipt, each proc updates its
On Jun 3, 2011, at 10:12 AM, Ralph Castain wrote:
> When an MPI proc calls MPI_Init, each btl pushes its contact info into the
> modex database - one example is the btl.tcp.1.7 info you found there. That
> entry is for the TCP btl, which is probably what you are looking for. There
> is no way f
On Jun 3, 2011, at 8:03 AM, Hugo Meyer wrote:
> Hello Ralph.
>
> Are you talking about an MPI communication? If so, then you need to update
> every proc's modex info for the proc that moved - this is something stored
> in each MPI proc's memory, so it isn't something that you can just get fro
Hello Ralph.
Are you talking about an MPI communication? If so, then you need to update
every proc's modex info for the proc that moved - this is something stored
in each MPI proc's memory, so it isn't something that you can just get from
the daemon on-demand. You'll have to provide the update to
Are you talking about an MPI communication? If so, then you need to update
every proc's modex info for the proc that moved - this is something stored in
each MPI proc's memory, so it isn't something that you can just get from the
daemon on-demand. You'll have to provide the update to every sing
Hello again.
My actual problem is that i don't know where is the struct that has the
information that is used to send messages to the procs.
Something like:
Rank URI
0 21222:tcp:192.168.1.1:1250
1 21223:tcp:192.168.1.2:1250
. .
Because what i need
Hello @ll.
I'm needing some help to restart the communication with a process that i
restore in a different node. My situation is as follows:
The process fails and it's restored in another node succesfully from a
previous checkpoint that i sent there. Now, when a process try to send a
message to t
10 matches
Mail list logo