True - and I'm all for simple!
Unless someone objects, let's just leave it that way for now.
I'll put on my list to look at this later - maybe count how many publishes
we do vs unpublishes, and if there is a residual at finalize, then send the
"unpublish all" message. Still leaves a race conditio
We always have the possibility to fail the MPI_Comm_connect. There is
a specific error for this MPI_ERR_PORT. We can detect that the port is
not available anymore (whatever the reason is), by simply using the
TCP timeout on the connection. It's the best we can, and this will
give us a simpl
On 4/25/08 5:38 PM, "Aurélien Bouteiller" wrote:
> To bounce on last George remark, currently when a job dies without
> unsubscribing a port with Unpublish(due to poor user programming,
> failure or abort), ompi-server keeps the reference forever and a new
> application can therefore not publi
That sounds fine with me, George.
Just to clarify: My comment about differing interpretations didn't pertain
to this specific question, but was more an observation of some discussions
we have had about such issues in other areas. I didn't talk to anyone about
this particular question, just noted t
To bounce on last George remark, currently when a job dies without
unsubscribing a port with Unpublish(due to poor user programming,
failure or abort), ompi-server keeps the reference forever and a new
application can therefore not publish under the same name again. So I
guess this is a goo
Ralph,
Thanks for your concern regarding the level of compliance of our
implementation of the MPI standard. I don't know who were the MPI
gurus you talked with about this issue, but I can tell that for once
the MPI standard is pretty clear about this.
As stated by Aurelien in his last ema
As I said, it makes no difference to me. I just want to ensure that everyone
agrees on the interpretation of the MPI standard. We have had these
discussion in the past, with differing views. My guess here is that the port
was left open mostly because the person who wrote the C-binding forgot to
clo
Actually, the port was still left open forever before the change. The
bug damaged the port string, and it was not usable anymore, not only
in subsequent Comm_accept, but also in Close_port or Unpublish_name.
To more specifically answer to your open port concern, if the user
does not want to
Hmmm...just to clarify, this wasn't a "bug". It was my understanding per the
MPI folks that a separate, unique port had to be created for every
invocation of Comm_accept. They didn't want a port hanging around open, and
their plan was to close the port immediately after the connection was
establish