Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Ralph Castain
True - and I'm all for simple! Unless someone objects, let's just leave it that way for now. I'll put on my list to look at this later - maybe count how many publishes we do vs unpublishes, and if there is a residual at finalize, then send the "unpublish all" message. Still leaves a race conditio

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread George Bosilca
We always have the possibility to fail the MPI_Comm_connect. There is a specific error for this MPI_ERR_PORT. We can detect that the port is not available anymore (whatever the reason is), by simply using the TCP timeout on the connection. It's the best we can, and this will give us a simpl

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Ralph Castain
On 4/25/08 5:38 PM, "Aurélien Bouteiller" wrote: > To bounce on last George remark, currently when a job dies without > unsubscribing a port with Unpublish(due to poor user programming, > failure or abort), ompi-server keeps the reference forever and a new > application can therefore not publi

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Ralph Castain
That sounds fine with me, George. Just to clarify: My comment about differing interpretations didn't pertain to this specific question, but was more an observation of some discussions we have had about such issues in other areas. I didn't talk to anyone about this particular question, just noted t

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Aurélien Bouteiller
To bounce on last George remark, currently when a job dies without unsubscribing a port with Unpublish(due to poor user programming, failure or abort), ompi-server keeps the reference forever and a new application can therefore not publish under the same name again. So I guess this is a goo

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread George Bosilca
Ralph, Thanks for your concern regarding the level of compliance of our implementation of the MPI standard. I don't know who were the MPI gurus you talked with about this issue, but I can tell that for once the MPI standard is pretty clear about this. As stated by Aurelien in his last ema

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Ralph Castain
As I said, it makes no difference to me. I just want to ensure that everyone agrees on the interpretation of the MPI standard. We have had these discussion in the past, with differing views. My guess here is that the port was left open mostly because the person who wrote the C-binding forgot to clo

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Aurélien Bouteiller
Actually, the port was still left open forever before the change. The bug damaged the port string, and it was not usable anymore, not only in subsequent Comm_accept, but also in Close_port or Unpublish_name. To more specifically answer to your open port concern, if the user does not want to

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18303

2008-04-25 Thread Ralph Castain
Hmmm...just to clarify, this wasn't a "bug". It was my understanding per the MPI folks that a separate, unique port had to be created for every invocation of Comm_accept. They didn't want a port hanging around open, and their plan was to close the port immediately after the connection was establish

Re: [OMPI devel] Fix for XLC + libtool issue

2008-04-25 Thread Sérgio Durigan Júnior
Hi Jeff, On Fri, 2008-04-25 at 06:35 -0400, Jeff Squyres wrote: > Good to hear that upgrading fixes this problem. > > We actually already have an outstanding ticket to upgrade to 2.2.2 > (https://svn.open-mpi.org/trac/ompi/ticket/1265 > ). We were following the Libtool development process clos

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18252

2008-04-25 Thread Tim Prins
This commit causes mpirun to segfault when running the IBM spawn tests on our slurm platforms (it may affect others as well). The failures only happen when mpirun is run in a batch script. The backtrace I get is: Program terminated with signal 11, Segmentation fault. #0 0x002a969b9dbe in d

Re: [OMPI devel] Loadbalancing

2008-04-25 Thread Jeff Squyres
Kewl! I added ticket 1277 so that we are sure to document this for v1.3. On Apr 23, 2008, at 11:09 AM, Ralph H Castain wrote: I added a new "loadbalance" feature to OMPI today in r18252. Brief summary: adding --loadbalance to the mpirun cmd line will cause the round-robin mapper to balance

Re: [OMPI devel] Fix for XLC + libtool issue

2008-04-25 Thread Ralf Wildenhues
* Jeff Squyres wrote on Fri, Apr 25, 2008 at 01:54:39PM CEST: > On Apr 25, 2008, at 7:40 AM, Ralf Wildenhues wrote: > > > > Wow -- those timings are impressive! Quoting that URL (OMPI is [1]): > > - > For example[1], in a la

Re: [OMPI devel] Fix for XLC + libtool issue

2008-04-25 Thread Jeff Squyres
On Apr 25, 2008, at 7:40 AM, Ralf Wildenhues wrote: We actually already have an outstanding ticket to upgrade to 2.2.2 (https://svn.open-mpi.org/trac/ompi/ticket/1265 ). We were following the Libtool development process closely and waiting for at least 2.2.2 (get past 2.2.0). 2.2.2 is out sin

Re: [OMPI devel] Fix for XLC + libtool issue

2008-04-25 Thread Ralf Wildenhues
Hi Jeff, all, * Jeff Squyres wrote on Fri, Apr 25, 2008 at 12:35:12PM CEST: > Good to hear that upgrading fixes this problem. > > We actually already have an outstanding ticket to upgrade to 2.2.2 > (https://svn.open-mpi.org/trac/ompi/ticket/1265 ). We were following > the Libtool development pr

Re: [OMPI devel] Fix for XLC + libtool issue

2008-04-25 Thread Jeff Squyres
Good to hear that upgrading fixes this problem. We actually already have an outstanding ticket to upgrade to 2.2.2 (https://svn.open-mpi.org/trac/ompi/ticket/1265 ). We were following the Libtool development process closely and waiting for at least 2.2.2 (get past 2.2.0). This will definite