Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Ralph Castain
True - and I'm all for simple! Unless someone objects, let's just leave it that way for now. I'll put on my list to look at this later - maybe count how many publishes we do vs unpublishes, and if there is a residual at finalize, then send the "unpublish all" message. Still leaves a race

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread George Bosilca
We always have the possibility to fail the MPI_Comm_connect. There is a specific error for this MPI_ERR_PORT. We can detect that the port is not available anymore (whatever the reason is), by simply using the TCP timeout on the connection. It's the best we can, and this will give us a

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Ralph Castain
On 4/25/08 5:38 PM, "Aurélien Bouteiller" wrote: > To bounce on last George remark, currently when a job dies without > unsubscribing a port with Unpublish(due to poor user programming, > failure or abort), ompi-server keeps the reference forever and a new > application

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Ralph Castain
That sounds fine with me, George. Just to clarify: My comment about differing interpretations didn't pertain to this specific question, but was more an observation of some discussions we have had about such issues in other areas. I didn't talk to anyone about this particular question, just noted

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Aurélien Bouteiller
To bounce on last George remark, currently when a job dies without unsubscribing a port with Unpublish(due to poor user programming, failure or abort), ompi-server keeps the reference forever and a new application can therefore not publish under the same name again. So I guess this is a

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Ralph Castain
As I said, it makes no difference to me. I just want to ensure that everyone agrees on the interpretation of the MPI standard. We have had these discussion in the past, with differing views. My guess here is that the port was left open mostly because the person who wrote the C-binding forgot to