[OMPI devel] Which tests for larger cluster testing

2007-09-11 Thread Rolf . Vandevaart
I am curious which tests are being used when running tests on larger clusters. And by larger clusters, I mean anything with np > 128. (Although I realize that is not very large, but it is bigger than most of the clusters I assume tests are being run on) I ask this because I planned on using some

Re: [OMPI devel] [devel-core] [RFC] Exit without finalize

2007-09-11 Thread Aurelien Bouteiller
Sounds great to me. Aurelien Le 11 sept. 07 à 13:03, Jeff Squyres a écrit : If you genericize the concept, I think it's compatible with FT: 1. during MPI_INIT, one of the MPI processes can request a "notify" exit pattern for the job: a process must notify the RTE before it actually exits (i.e.

Re: [OMPI devel] [devel-core] [RFC] Exit without finalize

2007-09-11 Thread Jeff Squyres
On Sep 8, 2007, at 2:33 PM, Aurelien Bouteiller wrote: I agree (b) is not a good idea. However I am not very pleased by (a) either. It totally prevent any process Fault Tolerant mechanism if we go that way. If we plan to add some failure detection mechanism to RTE and failure management (to avoi

Re: [OMPI devel] UD BTL alltoall hangs

2007-09-11 Thread Andrew Friedley
First off, I've managed to reproduce this with nbcbench using only 16 procs (two per node), and setting btl_ofud_sd_num to 12 -- eases debugging with fewer procs to look at. ompi_coll_tuned_alltoall_intra_basic_linear is the alltoall routine that is being called. What I'm seeing from totalvie

Re: [OMPI devel] Adding a new component

2007-09-11 Thread Sajjad Tabib
Hi Aurelien, Thank you for the pointers. I was able to plug in a component to an existing framework. Thanks again, Sajjad Aurelien Bouteiller Sent by: devel-boun...@open-mpi.org 09/08/07 01:34 PM Please respond to Open MPI Developers To Open MPI Developers cc Subject Re: [OMPI devel]

[OMPI devel] Coverity

2007-09-11 Thread Jeff Squyres
David fixed a problem this morning that Coverity wasn't quite running right because the directory where OMPI lived was changing every night. So a few of the old runs were pruned. -- Jeff Squyres Cisco Systems

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Edgar Gabriel
Gleb Natapov wrote: On Tue, Sep 11, 2007 at 10:00:07AM -0500, Edgar Gabriel wrote: Gleb, in the scenario which you describe in the comment to the patch, what should happen is, that the communicator with the cid which started already the allreduce will basically 'hang' until the other processe

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov
On Tue, Sep 11, 2007 at 11:30:53AM -0400, George Bosilca wrote: > > On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote: > >> On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote: >>> We don't want to prevent two thread from entering the code is same time. >>> The algorithm you cited support

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread George Bosilca
On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote: On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote: We don't want to prevent two thread from entering the code is same time. The algorithm you cited support this case. There is only one moment that is Are you sure it support thi

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov
On Tue, Sep 11, 2007 at 10:00:07AM -0500, Edgar Gabriel wrote: > Gleb, > > in the scenario which you describe in the comment to the patch, what > should happen is, that the communicator with the cid which started > already the allreduce will basically 'hang' until the other processes > 'allow'

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov
On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote: > We don't want to prevent two thread from entering the code is same time. > The algorithm you cited support this case. There is only one moment that is Are you sure it support this case? There is a global var mask_in_use that preven

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Edgar Gabriel
Gleb, in the scenario which you describe in the comment to the patch, what should happen is, that the communicator with the cid which started already the allreduce will basically 'hang' until the other processes 'allow' the lower cids to continue. It should basically be blocked in the allredu

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread George Bosilca
We don't want to prevent two thread from entering the code is same time. The algorithm you cited support this case. There is only one moment that is critical. The local selection of the next available cid. And this is what we try to protect there. If after the first run, the collective call

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov
On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: > Gleb, > > This patch is not correct. The code preventing the registration of the same > communicator twice is later in the code (same file in the function > ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread George Bosilca
Gleb, This patch is not correct. The code preventing the registration of the same communicator twice is later in the code (same file in the function ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid is called, we know that each communicator only handle one "commun