Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-15 Thread Ralph H Castain
Not exactly. The problem is that rank=0 initially falls behind because it is doing more work - i.e., it has to receive all the buffers and do something with them. As a result, it doesn’t get to post the next allreduce before the messages from the other participants arrive - which means that rank

Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-15 Thread Ralph H Castain
There is a coll/sync component that will automatically inject those barriers for you so you don’t have to add them to your code. Controlled by MCA param: coll_sync_barrier_before: Do a synchronization before each Nth collective coll_sync_barrier_after: Do a synchronization after each Nth collect

Re: [OMPI devel] Gentle reminder: sign up for the face to face

2019-02-26 Thread Ralph H Castain
Done! > On Feb 26, 2019, at 8:33 AM, Brice Goglin wrote: > > Hello Jeff > > Looks like I am not allowed to modify the page but I'll be at the meeting ;) > > Brice > > > > Le 26/02/2019 à 17:13, Jeff Squyres (jsquyres) via devel a écrit : >> Gentle reminder to please sign up for the face-to-

Re: [OMPI devel] rml/ofi component broken in v4.0.x and v3.1.x

2019-02-14 Thread Ralph H Castain
I would recommend just removing it - frankly, I’m surprised it is in there as the code was deemed non-production-ready. > On Feb 14, 2019, at 5:11 PM, Gilles Gouaillardet wrote: > > Folks, > > > The rml/ofi component has been removed from master. > > Then common/ofi was later removed from m

Re: [OMPI devel] [OMPI users] open-mpi.org is DOWN

2018-12-23 Thread Ralph H Castain
The security scanner has apologized for a false positive and fixed their system - the site has been restored. Ralph > On Dec 22, 2018, at 12:12 PM, Ralph H Castain wrote: > > Hello all > > Apologies to everyone, but I received an alert this moring that malware has > be

[OMPI devel] open-mpi.org is DOWN

2018-12-22 Thread Ralph H Castain
Hello all Apologies to everyone, but I received an alert this moring that malware has been detected on the www.open-mpi.org site. I have tried to contact the hosting agency and the security scanners, but nobody is around on this pre-holiday weekend. Accordingly, I have taken the site OFFLINE f

[OMPI devel] PMIx v3.0 Standard released

2018-12-20 Thread Ralph H Castain
The PMIx community, representing a consortium of research, academic, and industry partners, is pleased to announce the release of the PMIx v3.0 Standard document. The document can be obtained from: * the PMIx website at https://pmix.org/wp-content/uploads/2018/12/pmix-standard-3.0.pdf * the PM

Re: [OMPI devel] OMPI and PRRTE separated

2018-12-17 Thread Ralph H Castain
FYI: I have deleted all the old OMPI tags from PRRTE, so we have a clean repo to work with now. > On Dec 17, 2018, at 5:58 PM, Ralph H Castain wrote: > > Hello all > > For those of you working with ORTE and/or PRRTE, GitHub has severed the > parent/child relationship be

[OMPI devel] OMPI and PRRTE separated

2018-12-17 Thread Ralph H Castain
Hello all For those of you working with ORTE and/or PRRTE, GitHub has severed the parent/child relationship between the OMPI and PRRTE repositories. Thus, we will no longer be able to directly “pull” changes made to ORTE downstream into PRRTE. This marks the end of direct support for ORTE exce

[OMPI devel] PMIx v2.1 Standard released

2018-12-06 Thread Ralph H Castain
The PMIx community, representing a consortium of research, academic, and industry partners, is pleased to announce the release of the PMIx v2.1 Standard document. The document can be obtained from: * the PMIx website at https://pmix.org/wp-content/uploads/2018/12/pmix-standard-2.1.pdf

[OMPI devel] PRRTE v3.0.0rc1 available for testing

2018-11-28 Thread Ralph H Castain
Hi folks Given a growing use of PRRTE plus OMPI’s announced plans to phase out ORTE in favor of PRRTE, it seems the time has come to begin generating formal releases of PRRTE. Accordingly, I have created a v3.0.0 release candidate for folks to (hopefully) test: https://github.com/pmix/prrte/re

Re: [OMPI devel] Hints for using an own pmix server

2018-10-18 Thread Ralph H Castain
> On Oct 17, 2018, at 3:32 AM, Stephan Krempel wrote: > > > Hi Ralph. > One point that remains open and is interesting for me is if I can achieve the same with the 3.1.2 release of OpenMPI. Is it somehow possible to configure it as there were the "--with-ompi-pmix-rte" swi

[OMPI devel] SC'18 PMIx BoF meeting

2018-10-15 Thread Ralph H Castain
Hello all [I’m sharing this on the OMPI mailing lists (as well as the PMIx one) as PMIx has become tightly integrated to the OMPI code since v2.0 was released] The PMIx Community will once again be hosting a Birds-of-a-Feather meeting at SuperComputing. This year, however, will be a little diff

Re: [OMPI devel] Hints for using an own pmix server

2018-10-14 Thread Ralph H Castain
> On Oct 12, 2018, at 6:15 AM, Ralph H Castain wrote: > >> One point that remains open and is interesting for me is if I can >> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow >> possible to configure it as there were the "--with-ompi-pmix-rt

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Ralph H Castain
point it out! Ralph > On Oct 12, 2018, at 6:15 AM, Ralph H Castain wrote: > > Hi Stephan > > >> On Oct 12, 2018, at 2:25 AM, Stephan Krempel > <mailto:krem...@par-tec.com>> wrote: >> >> Hallo Ralph, >> >>> I assume this (--with-

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Ralph H Castain
h >>> ORTE_SCHIZO_DETECTION=ORTE >>> OMPI_COMMAND=./hello_env >>> OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08- >>> d92c0e73869e1cfa >>> OMPI_MCA_orte_launch=1 >>> OMPI_APP_CTX_NUM_PROCS=1 >>> OMPI_MCA_pmix=^s1,s2,cray,isolated >>> OMP

Re: [OMPI devel] Hints for using an own pmix server

2018-10-09 Thread Ralph H Castain
ea what I need to change? Do I have to set an MCA > parameter to tell OpenMPI not to start orted, or does it need another > hint in the client environment beside the stuff comming from the PMIx > server helper library? > > > Stephan > > > On Tuesday, Oct 10 2018,

Re: [OMPI devel] Hints for using an own pmix server

2018-10-09 Thread Ralph H Castain
Hi Stephan Thanks for the clarification - that helps a great deal. You are correct that OMPI’s orted daemons do more than just host the PMIx server library. However, they are only active if you launch the OMPI processes using mpirun. This is probably the source of the trouble you are seeing. S

Re: [OMPI devel] Hints for using an own pmix server

2018-10-08 Thread Ralph H Castain
Even PRRTE won’t allow you to stop the orted from initializing its PMIx server. I’m not sure I really understand your objective. Remember, PMIx is just a library - the orted opens it and uses it to interface to its client application procs. It makes no sense to have some other process perform th

Re: [OMPI devel] Removing ORTE code

2018-10-02 Thread Ralph H Castain
Based on silence plus today’s telecon, the stale code has been removed: https://github.com/open-mpi/ompi/pull/5827 > On Sep 26, 2018, at 7:00 AM, Ralph H Castain wrote: > > We are considering a “purge” of stale ORTE code and want to know if anyone is > using it before proceedi

Re: [OMPI devel] btl/vader: race condition in finalize on OS X

2018-10-02 Thread Ralph H Castain
We already have the register_cleanup option in master - are you using an older version of PMIx that doesn’t support it? > On Oct 2, 2018, at 4:05 AM, Jeff Squyres (jsquyres) via devel > wrote: > > FYI: https://github.com/open-mpi/ompi/issues/5798 brought up what may be the > same issue. > >

[OMPI devel] Error in TCP BTL??

2018-10-01 Thread Ralph H Castain
I’m getting this error when trying to run a simple ring program on my Mac: [Ralphs-iMac-2.local][[21423,14],0][btl_tcp_endpoint.c:742:mca_btl_tcp_endpoint_start_connect] bind() failed: Invalid argument (22) Anyone recognize the problem? It causes the job to immediately abort. This is with curre

Re: [OMPI devel] Mac OS X 10.4.x users?

2018-09-28 Thread Ralph H Castain
Good lord - break away!! > On Sep 28, 2018, at 11:11 AM, Barrett, Brian via devel > wrote: > > All - > > In trying to clean up some warnings, I noticed one (around pack/unpack in > net/if.h) that is due to a workaround of a bug in MacOS X 10.4.x and earlier. > The simple way to remove the w

[OMPI devel] Removing ORTE code

2018-09-26 Thread Ralph H Castain
We are considering a “purge” of stale ORTE code and want to know if anyone is using it before proceeding. With the advent of PMIx, several ORTE features are no longer required by OMPI itself. However, we acknowledge that it is possible that someone out there (e.g., a researcher) is using them. T

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
ather add an .ompi_ignore and > give an opportunity to power users do continue playing with it. > > George. > > >> On Thu, Sep 20, 2018 at 8:04 PM Ralph H Castain wrote: >> I already suggested the configure option, but it doesn’t solve the problem. >> I wou

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
til next major release to get this in. > > > Arm > > > On Thu, Sep 20, 2018, 7:18 PM Ralph H Castain <mailto:r...@open-mpi.org>> wrote: > I suspect it is a question of what you tested and in which scenarios. Problem > is that it can bite someone and there isn’

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
MTL and able to use both > of them interchangeably with no problem. I dont know what changed. libpsm2? > > > Arm > > > On Thu, Sep 20, 2018, 7:06 PM Ralph H Castain <mailto:r...@open-mpi.org>> wrote: > We have too many discussion threads overlapping on

[OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
We have too many discussion threads overlapping on the same email chain - so let’s break the discussion on the OFI problem into its own chain. We have been investigating this locally and found there are a number of conflicts between the MTLs and the OFI/BTL stepping on each other. The correct s

Re: [OMPI devel] MTT Perl client

2018-09-18 Thread Ralph H Castain
Are we good to go with this changeover? If so, I’ll delete the Perl client from the main MTT repo. > On Sep 14, 2018, at 10:06 AM, Jeff Squyres (jsquyres) via devel > wrote: > > On Sep 14, 2018, at 12:37 PM, Gilles Gouaillardet > wrote: >> >> IIRC mtt-relay is not only a proxy (squid can do

Re: [OMPI devel] MTT Perl client

2018-09-14 Thread Ralph H Castain
that code). > > > >> On Sep 14, 2018, at 11:23 AM, Ralph H Castain wrote: >> >> Afraid I’m not familiar with that script - what does it do? >> >> >>> On Sep 14, 2018, at 7:46 AM, Christoph Niethammer >>> wrote: >>> >&

Re: [OMPI devel] MTT Perl client

2018-09-14 Thread Ralph H Castain
acement? > > Best > Christoph Niethammer > > - Mensaje original - > De: "Open MPI Developers" > Para: "Open MPI Developers" > CC: "Jeff Squyres" > Enviados: Martes, 11 de Septiembre 2018 20:37:40 > Asunto: Re: [OMPI devel]

Re: [OMPI devel] Will info keys ever be fixed?

2018-09-11 Thread Ralph H Castain
--with-gxx-include-dir=/usr/include/c++/4.2.1 > Apple LLVM version 9.1.0 (clang-902.0.39.2) > Target: x86_64-apple-darwin17.7.0 > Thread model: posix > InstalledDir: > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin > > > > > >

Re: [OMPI devel] mpirun error when not using span

2018-09-11 Thread Ralph H Castain
when binding. I’ll try to poke at it a bit. > On Sep 11, 2018, at 9:17 AM, Shrader, David Lee wrote: > > Here's the xml output from lstopo. Thank you for taking a look! > David > > From: devel on behalf of Ralph H Castain > > Sent: Monday, September 10, 201

[OMPI devel] MTT Perl client

2018-09-11 Thread Ralph H Castain
Hi folks Per today’s telecon, I have moved the Perl MTT client into its own repository: https://github.com/open-mpi/mtt-legacy. All the Python client code has been removed from that repo. The original MTT repo remains at https://github.com/open-mpi/mtt. I have a PR to remove all the Perl clien

Re: [OMPI devel] mpirun error when not using span

2018-09-10 Thread Ralph H Castain
Could you please send the output from “lstopo --of xml foo.xml” (the file foo.xml) so I can try to replicate here? > On Sep 4, 2018, at 12:35 PM, Shrader, David Lee wrote: > > Hello, > > I have run this issue by Howard, and he asked me to forward it on to the Open > MPI devel mailing list. I

[OMPI devel] Will info keys ever be fixed?

2018-09-10 Thread Ralph H Castain
Still seeing this in today’s head of master: info_subscriber.c: In function 'opal_infosubscribe_change_info': ../../opal/util/info.h:112:31: warning: '%s' directive output may be truncated writing up to 36 bytes into a region of size 27 [-Wformat-truncation=] #define OPAL_INFO_SAVE_PREFIX "_OMPI

Re: [OMPI devel] Open MPI website borked up?

2018-09-01 Thread Ralph H Castain
I suspect this is a stale message - I’m not seeing any problem with the website > On Aug 29, 2018, at 12:55 PM, Howard Pritchard wrote: > > Hi Folks, > > Something seems to be borked up about the OMPI website. Got to website and > you'll > get some odd parsing error appearing. > > Howard >

[OMPI devel] Continued warnings?

2018-07-31 Thread Ralph H Castain
Just curious - will this ever be fixed? From today’s head of master: In file included from info.c:46:0: info.c: In function 'opal_info_dup_mode': ../../opal/util/info.h:112:31: warning: '%s' directive output may be truncated writing up to 36 bytes into a region of size 27 [-Wformat-truncation=]

Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-04 Thread Ralph H. Castain
Looks okay to me Brian - I went ahead and filed the CMR and sent it on to Brad for approval. Ralph > On Tue, 3 Mar 2009, Brian W. Barrett wrote: > >> On Tue, 3 Mar 2009, Jeff Squyres wrote: >> >>> 1.3.1rc3 had a race condition in the ORTE shutdown sequence. The only >>> difference between rc3 a

Re: [OMPI devel] IBCM error

2008-07-14 Thread Ralph H. Castain
I've been quietly following this discussion, but now feel a need to jump in here. I really must disagree with the idea of building either IBCM or RDMACM support by default. Neither of these has been proven to reliably work, or to be advantageous. Our own experiences in testing them have been slight

[OMPI devel] User request: add envar?

2008-07-11 Thread Ralph H Castain
Yo folks For those not following the user list, this request was generated today: Absolutely, these are useful time and time again so should be part of the API and hence stable. Care to mention what they are and I'll add it to my note as something to change when upgrading to 1.3 (

Re: [OMPI devel] PLM consistency: priority

2008-07-11 Thread Ralph H Castain
ameter, and use only the pml=ob1,cm syntax from the user's > point of view. > > Aurelien > > Le 11 juil. 08 à 10:56, Ralph H Castain a écrit : > >> Okay, another fun one. Some of the PLM modules use MCA params to >> adjust >> their relative selection priorit

[OMPI devel] PLM consistency: priority

2008-07-11 Thread Ralph H Castain
Okay, another fun one. Some of the PLM modules use MCA params to adjust their relative selection priority. This can lead to very unexpected behavior as which module gets selected will depend on the priorities of the other selectable modules - which changes from release to release as people independ

Re: [OMPI devel] PLM consistency: launch agent param

2008-07-11 Thread Ralph H Castain
e_launch_agent? >> >> >> On Jul 11, 2008, at 10:17 AM, Ralph H Castain wrote: >> >>> Since the question of backward compatibility of params came up... ;-) >>> >>> I've been perusing the various PLM modules to check consistency. One >&g

[OMPI devel] PLM consistency: launch agent param

2008-07-11 Thread Ralph H Castain
Since the question of backward compatibility of params came up... ;-) I've been perusing the various PLM modules to check consistency. One thing I noted right away is that -every- PLM module registers an MCA param to let the user specify an orted cmd. I believe this specifically was done so people

Re: [OMPI devel] v1.3 RM: need a ruling

2008-07-11 Thread Ralph H Castain
On 7/11/08 7:48 AM, "Terry Dontje" wrote: > Jeff Squyres wrote: >> Check that -- Ralph and I talked more about #1383 and have come up >> with a decent/better solution that a) is not wonky and b) does not >> involve MCA parameter synonyms. We're working on it in an hg and will >> put it back w

Re: [OMPI devel] Ticket #1267 - thread locks in ompi_proc_t code

2008-07-07 Thread Ralph H Castain
s are welcomed Ralph On 7/7/08 8:22 AM, "Ralph H Castain" wrote: > I am seeking input before making a change to the ompi/proc/proc.c code to > resolve the referenced ticket. The change could potentially impact how the > ompi_proc_t struct is used in the rest of the MPI code bas

[OMPI devel] Ticket #1267 - thread locks in ompi_proc_t code

2008-07-07 Thread Ralph H Castain
I am seeking input before making a change to the ompi/proc/proc.c code to resolve the referenced ticket. The change could potentially impact how the ompi_proc_t struct is used in the rest of the MPI code base. If this doesn't impact you, please ignore the remainder of this note. I was asked last

[OMPI devel] Trunk broken with linear, direct routing

2008-07-01 Thread Ralph H Castain
Since this appears to have gone unnoticed, it may not be a big deal. However, I have found that multi-node operations are broken if you invoke the linear or direct routed modules. Things work fine with the default binomial routed module. I will be working to fix this - just a heads up. Ralph

[OMPI devel] Framework selection

2008-07-01 Thread Ralph H Castain
I ran into something unexpected today relative to the selection of frameworks. It was totally unplanned, and may be an error on my part - or I may be expecting the incorrect behavior. However, since others may encounter it unexpectedly as well, I am sending this to the list. What I had done was:

Re: [OMPI devel] mtt IBM SPAWN error

2008-06-30 Thread Ralph H Castain
Well, that error indicates that it was unable to launch the daemon on witch3 for some reason. If you look at the error reported by bash, you will see that the ³orted² binary wasn¹t found! Sounds like a path error ­ you might check to see if witch3 has the binaries installed, and if they are where

Re: [OMPI devel] mtt IBM SPAWN error

2008-06-30 Thread Ralph H Castain
That¹s correct ­ and is precisely the behavior it should exhibit. The reasons: 1. when you specify ­host, we assume max_slots is infinite since you cannot provide any info to the contrary. We therefore allow you to oversubscribe the node to your heart¹s desire. However, note one problem: if your o

Re: [OMPI devel] PML selection logic

2008-06-26 Thread Ralph H Castain
;>> fine, but if you/the sysadmin is smart, you can get performance >>> improvements. >>> >>> >>> On Jun 23, 2008, at 4:18 PM, Shipman, Galen M. wrote: >>> >>>> I concur >>>> - galen >>>> >>>> On Jun

Re: [OMPI devel] PML selection logic

2008-06-24 Thread Ralph H Castain
the sysadmin is smart, you can get >> performance improvements. >> >> >> On Jun 23, 2008, at 4:18 PM, Shipman, Galen M. wrote: >> >>> I concur >>> - galen >>> >>> On Jun 23, 2008, at 3:44 PM, Brian W. Barrett wrote: >>&g

Re: [OMPI devel] PML selection logic

2008-06-23 Thread Ralph H Castain
huge performance > hit for using OB1. So we run into a situation where user installs Open > MPI, starts running, gets horrible performance, bad mouths Open MPI, and > now we're in that game again. Yeah, the sys admin should know what to do, > but it doesn't always work that way.

Re: [OMPI devel] PML selection logic

2008-06-23 Thread Ralph H Castain
wn is generally used in environments where such network > mismatches are most likely to occur. > > Brian > > > On Mon, 23 Jun 2008, Ralph H Castain wrote: > >> Since my goal is to eliminate the modex completely for managed >> installations, could you give me a brief

Re: [OMPI devel] PML selection logic

2008-06-23 Thread Ralph H Castain
rocess. When using the complete PML selection, BTL > would be initialized several times, leading to a variety of bugs. > Eventually the PML selection should return to its old self, when the > BTL bug gets fixed. > > Aurelien > > Le 23 juin 08 à 12:36, Ralph H Castain a écrit :

[OMPI devel] PML selection logic

2008-06-23 Thread Ralph H Castain
Yo all I've been doing further research into the modex and came across something I don't fully understand. It seems we have each process insert into the modex the name of the PML module that it selected. Once the modex has exchanged that info, it then loops across all procs in the job to check the

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
s the default for that reason. Check to ensure you have the orte/mca/grpcomm/bad directory, and that it is getting built. My guess is that you have a corrupted checkout or build and that the component is either missing or not getting built. On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)" wr

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
Ha! I found it - you left out one very important detail. You are specifying the use of the grpcomm basic module instead of the default "bad" one. I just checked and that module is indeed showing a problem. I'll see what I can do. For now, though, just use the default grpcomm and it will work fine

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
I can't find anything wrong so far. I'm waiting in a queue on Odin to try there since Jeff indicated you are using rsh as a launcher, and that's the only access I have to such an environment. Guess Odin is being pounded because the queue isn't going anywhere. Meantime, I'm building on RoadRunner a

Re: [OMPI devel] RML Send

2008-06-19 Thread Ralph H Castain
#x27;s existed! Should have a fix in later today. Ralph On 6/19/08 8:43 AM, "Ralph H Castain" wrote: > WOW! Somebody really screwed up the DSS by adding some new API's I'd never > heard of before, but really can cause the system to break! > > I'm going to

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
You'll have to tell us something more than that, Pasha. What kind of environment, what rev level were you at, etc. So far as I know, the trunk is fine. On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" wrote: > I tried to run trunk on my machines and I got follow error: > > [sw214:04367] [[16563,1]

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18677

2008-06-19 Thread Ralph H Castain
I would argue that this behavior is in fact consistent - the returned state is that all required connections have been opened and is independent of the selected routed module. How that is done is irrelevant to the caller. Each routed module knows precisely what connections are used for its operati

Re: [OMPI devel] RML Send

2008-06-19 Thread Ralph H Castain
unpack(*buffer, &bo, &n, OPAL_BYTE_OBJECT); >> >> You can then transfer the data into whatever storage you like. All this does >> is pass the #bytes and the bytes as a collected unit - you could, of course, >> simply pass the #bytes and bytes with independent packs

Re: [OMPI devel] RML Send

2008-06-17 Thread Ralph H Castain
I'm not sure exactly how you are trying to do this, but the usual procedure would be: 1. call opal_dss.pack(*buffer, *data, #data, data_type) for each thing you want to put in the buffer. So you might call this to pack a string: opal_dss.pack(*buffer, &string, 1, OPAL_STRING); 2. once you have e

[OMPI devel] Vampirtrace warnings

2008-06-11 Thread Ralph H Castain
I'm not entirely sure who the Vampirtrace folks are, or if they are aware of the Coverity tool that is periodically reviewing the OMPI code base and providing warnings of potential code path errors. In the most recent review, dated 5/28, the Coverity tool found over 50 issues in the Vampirtrace co

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18625

2008-06-09 Thread Ralph H Castain
Okay, it's fixed now in r18629 On 6/9/08 3:23 PM, "Ralph H Castain" wrote: > Visibility issue - fix coming in a minute... > > > On 6/9/08 3:10 PM, "Ralph H Castain" wrote: > >> Interesting - it compiles for me under three different environ

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18625

2008-06-09 Thread Ralph H Castain
Visibility issue - fix coming in a minute... On 6/9/08 3:10 PM, "Ralph H Castain" wrote: > Interesting - it compiles for me under three different environments. > > Let me check - perhaps something isn't getting committed properly > > > On 6/9/08 3:07

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18625

2008-06-09 Thread Ralph H Castain
Interesting - it compiles for me under three different environments. Let me check - perhaps something isn't getting committed properly On 6/9/08 3:07 PM, "Aurélien Bouteiller" wrote: > This commit looks like it does not compile. > orterun.o: In function `orterun': > ../../../../trunk/orte/tool

Re: [OMPI devel] Communication between entities

2008-05-29 Thread Ralph H Castain
s to launch then > when one node dies. > > In this approach ORTE daemon are treated like application "protectors", > and the application are the "protected". > > Thanks, > Leonardo > > > Ralph H Castain escribió: >> There is no way to send

Re: [OMPI devel] Communication between entities

2008-05-29 Thread Ralph H Castain
There is no way to send a message to a daemon located on another node without relaying it through the local daemon. The application procs have no knowledge of the contact info for any daemon other than their own, so even using the direct routed module would not work. Can you provide some reason wh

Re: [OMPI devel] Open MPI session directory location

2008-05-28 Thread Ralph H Castain
; Oops, sorry. > > We were having problems with the memory allocator when ompi_info > called orte_init(). I think it might be best to call the ORTE MCA > registration function directly... > > > On May 27, 2008, at 10:40 AM, Ralph H Castain wrote: > >> I see the pro

Re: [OMPI devel] mpirun hangs

2008-05-28 Thread Ralph H Castain
It could be - I believe the Mac issue has been around for awhile. If you like, you could use that same platform file and give it a try. I think there are a few frameworks mentioned in there that aren't in 1.2, but that should be easy to edit out. On 5/28/08 7:11 AM, "Greg Watson" wrote: > That

Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Ralph H Castain
8:32 AM, "Ralph H Castain" wrote: > It "should" be visible nownot sure why it isn't. It conforms to the > naming rules and -used- to be reported by ompi_info... > > > > On 5/27/08 8:31 AM, "Shipman, Galen M." wrote: > >> Make

Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Ralph H Castain
nfo. > I thought this was done at some point, perhaps it got overwritten? > > Thanks, > > Galen > > On May 27, 2008, at 10:27 AM, Ralph H Castain wrote: > >> -mca orte_tmpdir_base foo >> >> >> >> On 5/27/08 8:24 AM, "Gleb Natapov&q

Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Ralph H Castain
-mca orte_tmpdir_base foo On 5/27/08 8:24 AM, "Gleb Natapov" wrote: > Hi, > > Is there a way to change where Open MPI creates session directory. I > can't find mca parameter that specifies this. > > -- > Gleb. > ___ > devel mailing list > de...@o

Re: [OMPI devel] [RFC] mca_base_select()

2008-05-06 Thread Ralph H Castain
Hmmmwell, I hit a problem (of course!). I have mca-no-build on the filem framework on my Mac. If I just mpriun -n 3 ./hello, I get the following error: -- It looks like orte_init failed for some reason; your parallel proce

[OMPI devel] Loadbalancing

2008-04-23 Thread Ralph H Castain
I added a new "loadbalance" feature to OMPI today in r18252. Brief summary: adding --loadbalance to the mpirun cmd line will cause the round-robin mapper to balance your specified #procs across the available nodes. More detail: Several users had noted that mapping byslot always caused us to prefe

Re: [OMPI devel] Memchecker: breaks trunk again

2008-04-21 Thread Ralph H Castain
Thanks Brian - I had been told precisely the opposite priority rule just a few weeks ago by someone else, hence my confusion. On 4/21/08 8:48 AM, "Brian W. Barrett" wrote: > On Mon, 21 Apr 2008, Ralph H Castain wrote: > >> So it appears to be a combination of memche

Re: [OMPI devel] Memchecker: breaks trunk again

2008-04-21 Thread Ralph H Castain
of a param set by a platform file not working. I can send you some stuff off-list in a little bit, if you still need it. > > Can you send your full configure output and config.log? > > > On Apr 21, 2008, at 9:51 AM, Ralph H Castain wrote: > >> As an FYI for anyon

[OMPI devel] Vprotocol build problem

2008-04-21 Thread Ralph H Castain
I am now simply trying some of our vaunted configure system's options to see what actually works, and what doesn't. Here is one that does NOT work: enable_mca_no_build=pml-v Generates the following build error: configure: error: conditional "OMPI_BUILD_vprotocol_pessimist_DSO" was never defined

Re: [OMPI devel] Memchecker: breaks trunk again

2008-04-21 Thread Ralph H Castain
build unless you have the valgrind headers installed on your machine. Ralph On 4/21/08 7:28 AM, "Ralph H Castain" wrote: > I am finding that the memchecker code is again breaking the trunk, > specifically on any machine that does not have valgrind installed. > Apparently, memc

[OMPI devel] Memchecker: breaks trunk again

2008-04-21 Thread Ralph H Castain
I am finding that the memchecker code is again breaking the trunk, specifically on any machine that does not have valgrind installed. Apparently, memchecker now forces a requirement for valgrind? Here is what I get: --- MCA component memchecker:valgrind (m4 configuration macro) checking for MCA c

[OMPI devel] Using do-not-launch, display-map, and do-not-resolve to test mappings

2008-04-17 Thread Ralph H Castain
Brief summary: In r18190, I have restored the --do-not-launch capability, and added a --do-not-resolve flag. This note describes how you can use those to build and test application mappings without first getting an allocation and/or launching it. Longer description: Users and developers have both

[OMPI devel] New mapper module

2008-04-17 Thread Ralph H Castain
I have implemented and committed (r18190) a new RMAPS module that sequentially maps ranks to the hosts listed in a hostfile. You must set -mca rmaps seq in order to access this module - it will -not- be selected any other way. The basic method of operation respects the hostfile descriptions on the

[OMPI devel] ORTE Scaling results: updated

2008-04-08 Thread Ralph H Castain
Hello all The wiki page has been updated with the latest test results from a new branch that implemented inbound collectives on the modex and barrier operations. As you will see from the graphs, ORTE/OMPI now exhibits a negative 2nd-derivative on the launch time curve for mpi_no_op (i.e., MPI_Init

Re: [OMPI devel] Signals

2008-04-08 Thread Ralph H Castain
t other things going on at the moment on the distributed machines), and all works as expected. However, that doesn't mean there isn't a problem in general. Will investigate when I have time shortly. > >> >> So, I think that what was intended to happen is the correct thing

Re: [OMPI devel] Signals

2008-04-08 Thread Ralph H Castain
> > So, I think that what was intended to happen is the correct thing, but for > some reason it is not happening. > > Rich > > > On 4/8/08 1:47 PM, "Ralph H Castain" wrote: > >> I found what Pak said a little confusing as the wait_daemon functio

Re: [OMPI devel] Signals

2008-04-08 Thread Ralph H Castain
I found what Pak said a little confusing as the wait_daemon function doesn't actually receive a signal itself - it only detects that a proc has exited and checks to see if that happened due to a signal. If so, it flags that situation and will order the job aborted. So if the proc continues alive,

Re: [OMPI devel] mpirun return code problems

2008-04-08 Thread Ralph H Castain
I'm aware - as we discussed on a recent telecon, I put it on my list of things to resolve. Solution is known - just busy with other things at the moment. On 4/8/08 6:06 AM, "Tim Prins" wrote: > Hi all, > > I reported this before, but it seems that the report got lost. I have > found some situa

Re: [OMPI devel] Memchecker errors on trunk

2008-04-07 Thread Ralph H Castain
Thanks George! On 4/7/08 8:48 AM, "George Bosilca" wrote: > That's gcc being really mean !!! There was a double ; at the end of > the line, and apparently the second one is interpreted as code ... > Commit r18090 should fix the problem. > >george. > >

[OMPI devel] Memchecker errors on trunk

2008-04-07 Thread Ralph H Castain
Hello We have a problem this morning on the trunk - recent commits r18084-7 involving the ompi/include/ompi/memchecker.h file contain arithmetic involving a void* pointer and other problems: ../../../../ompi/include/ompi/memchecker.h: In function 'memchecker_convertor_call': ../../../../ompi/incl

Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-07 Thread Ralph H Castain
On 4/7/08 7:45 AM, "Gleb Natapov" wrote: > On Mon, Apr 07, 2008 at 07:28:07AM -0600, Ralph H Castain wrote: >>> Also can you explain how >>> allgather is implemented in orte (sorry if you already explained this once >>> and I missed it). >> >

Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-07 Thread Ralph H Castain
On 4/7/08 7:15 AM, "Gleb Natapov" wrote: > On Mon, Apr 07, 2008 at 07:07:38AM -0600, Ralph H Castain wrote: >> >> >> >> On 4/7/08 7:04 AM, "Gleb Natapov" wrote: >> >>> On Fri, Apr 04, 2008 at 10:52:38AM -0600, Ralph H Castain

Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-07 Thread Ralph H Castain
On 4/7/08 7:04 AM, "Gleb Natapov" wrote: > On Fri, Apr 04, 2008 at 10:52:38AM -0600, Ralph H Castain wrote: >> With compression "on", you will get output telling you the original size of >> the message and its compressed size so you can see wha

Re: [OMPI devel] MPI_Comm_connect/Accept

2008-04-04 Thread Ralph H Castain
Ralph On 4/4/08 12:55 PM, "Ralph H Castain" wrote: > Well, something got borked in here - will have to fix it, so this will > probably not get done until next week. > > > On 4/4/08 12:26 PM, "Ralph H Castain" wrote: > >> Yeah, you didn't sp

Re: [OMPI devel] MPI_Comm_connect/Accept

2008-04-04 Thread Ralph H Castain
Well, something got borked in here - will have to fix it, so this will probably not get done until next week. On 4/4/08 12:26 PM, "Ralph H Castain" wrote: > Yeah, you didn't specify the file correctly...plus I found a bug in the code > when I looked (out-of-date a little i

Re: [OMPI devel] MPI_Comm_connect/Accept

2008-04-04 Thread Ralph H Castain
Yeah, you didn't specify the file correctly...plus I found a bug in the code when I looked (out-of-date a little in orterun). I am updating orterun (commit soon) and will include a better help message about the proper format of the orterun cmd-line option. The syntax is: -ompi-server uri or -omp

[OMPI devel] Affect of compression on modex and launch messages

2008-04-04 Thread Ralph H Castain
Hello all Based on some discussion on this list, I integrated a zlib-based compression ability into ORTE. Since the launch message sent to the orteds and the modex between the application procs are the only places where messages of any size are sent, I only implemented compression for those two ex

  1   2   3   >