Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-04 Thread Ralph H. Castain
Looks okay to me Brian - I went ahead and filed the CMR and sent it on to Brad for approval. Ralph > On Tue, 3 Mar 2009, Brian W. Barrett wrote: > >> On Tue, 3 Mar 2009, Jeff Squyres wrote: >> >>> 1.3.1rc3 had a race condition in the ORTE shutdown sequence. The only >>> difference between rc3 a

[OMPI devel] Continued warnings?

2018-07-31 Thread Ralph H Castain
Just curious - will this ever be fixed? From today’s head of master: In file included from info.c:46:0: info.c: In function 'opal_info_dup_mode': ../../opal/util/info.h:112:31: warning: '%s' directive output may be truncated writing up to 36 bytes into a region of size 27 [-Wformat-truncation=]

Re: [OMPI devel] Open MPI website borked up?

2018-09-01 Thread Ralph H Castain
I suspect this is a stale message - I’m not seeing any problem with the website > On Aug 29, 2018, at 12:55 PM, Howard Pritchard wrote: > > Hi Folks, > > Something seems to be borked up about the OMPI website. Got to website and > you'll > get some odd parsing error appearing. > > Howard >

[OMPI devel] Will info keys ever be fixed?

2018-09-10 Thread Ralph H Castain
Still seeing this in today’s head of master: info_subscriber.c: In function 'opal_infosubscribe_change_info': ../../opal/util/info.h:112:31: warning: '%s' directive output may be truncated writing up to 36 bytes into a region of size 27 [-Wformat-truncation=] #define OPAL_INFO_SAVE_PREFIX "_OMPI

Re: [OMPI devel] mpirun error when not using span

2018-09-10 Thread Ralph H Castain
Could you please send the output from “lstopo --of xml foo.xml” (the file foo.xml) so I can try to replicate here? > On Sep 4, 2018, at 12:35 PM, Shrader, David Lee wrote: > > Hello, > > I have run this issue by Howard, and he asked me to forward it on to the Open > MPI devel mailing list. I

[OMPI devel] MTT Perl client

2018-09-11 Thread Ralph H Castain
Hi folks Per today’s telecon, I have moved the Perl MTT client into its own repository: https://github.com/open-mpi/mtt-legacy. All the Python client code has been removed from that repo. The original MTT repo remains at https://github.com/open-mpi/mtt. I have a PR to remove all the Perl clien

Re: [OMPI devel] mpirun error when not using span

2018-09-11 Thread Ralph H Castain
when binding. I’ll try to poke at it a bit. > On Sep 11, 2018, at 9:17 AM, Shrader, David Lee wrote: > > Here's the xml output from lstopo. Thank you for taking a look! > David > > From: devel on behalf of Ralph H Castain > > Sent: Monday, September 10, 201

Re: [OMPI devel] Will info keys ever be fixed?

2018-09-11 Thread Ralph H Castain
--with-gxx-include-dir=/usr/include/c++/4.2.1 > Apple LLVM version 9.1.0 (clang-902.0.39.2) > Target: x86_64-apple-darwin17.7.0 > Thread model: posix > InstalledDir: > /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin > > > > > >

Re: [OMPI devel] MTT Perl client

2018-09-14 Thread Ralph H Castain
acement? > > Best > Christoph Niethammer > > - Mensaje original - > De: "Open MPI Developers" > Para: "Open MPI Developers" > CC: "Jeff Squyres" > Enviados: Martes, 11 de Septiembre 2018 20:37:40 > Asunto: Re: [OMPI devel]

Re: [OMPI devel] MTT Perl client

2018-09-14 Thread Ralph H Castain
that code). > > > >> On Sep 14, 2018, at 11:23 AM, Ralph H Castain wrote: >> >> Afraid I’m not familiar with that script - what does it do? >> >> >>> On Sep 14, 2018, at 7:46 AM, Christoph Niethammer >>> wrote: >>> >&

Re: [OMPI devel] MTT Perl client

2018-09-18 Thread Ralph H Castain
Are we good to go with this changeover? If so, I’ll delete the Perl client from the main MTT repo. > On Sep 14, 2018, at 10:06 AM, Jeff Squyres (jsquyres) via devel > wrote: > > On Sep 14, 2018, at 12:37 PM, Gilles Gouaillardet > wrote: >> >> IIRC mtt-relay is not only a proxy (squid can do

[OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
We have too many discussion threads overlapping on the same email chain - so let’s break the discussion on the OFI problem into its own chain. We have been investigating this locally and found there are a number of conflicts between the MTLs and the OFI/BTL stepping on each other. The correct s

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
MTL and able to use both > of them interchangeably with no problem. I dont know what changed. libpsm2? > > > Arm > > > On Thu, Sep 20, 2018, 7:06 PM Ralph H Castain <mailto:r...@open-mpi.org>> wrote: > We have too many discussion threads overlapping on

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
til next major release to get this in. > > > Arm > > > On Thu, Sep 20, 2018, 7:18 PM Ralph H Castain <mailto:r...@open-mpi.org>> wrote: > I suspect it is a question of what you tested and in which scenarios. Problem > is that it can bite someone and there isn’

Re: [OMPI devel] OFI issues on Open MPI v4.0.0rc1

2018-09-20 Thread Ralph H Castain
ather add an .ompi_ignore and > give an opportunity to power users do continue playing with it. > > George. > > >> On Thu, Sep 20, 2018 at 8:04 PM Ralph H Castain wrote: >> I already suggested the configure option, but it doesn’t solve the problem. >> I wou

[OMPI devel] Removing ORTE code

2018-09-26 Thread Ralph H Castain
We are considering a “purge” of stale ORTE code and want to know if anyone is using it before proceeding. With the advent of PMIx, several ORTE features are no longer required by OMPI itself. However, we acknowledge that it is possible that someone out there (e.g., a researcher) is using them. T

Re: [OMPI devel] Mac OS X 10.4.x users?

2018-09-28 Thread Ralph H Castain
Good lord - break away!! > On Sep 28, 2018, at 11:11 AM, Barrett, Brian via devel > wrote: > > All - > > In trying to clean up some warnings, I noticed one (around pack/unpack in > net/if.h) that is due to a workaround of a bug in MacOS X 10.4.x and earlier. > The simple way to remove the w

[OMPI devel] Error in TCP BTL??

2018-10-01 Thread Ralph H Castain
I’m getting this error when trying to run a simple ring program on my Mac: [Ralphs-iMac-2.local][[21423,14],0][btl_tcp_endpoint.c:742:mca_btl_tcp_endpoint_start_connect] bind() failed: Invalid argument (22) Anyone recognize the problem? It causes the job to immediately abort. This is with curre

Re: [OMPI devel] btl/vader: race condition in finalize on OS X

2018-10-02 Thread Ralph H Castain
We already have the register_cleanup option in master - are you using an older version of PMIx that doesn’t support it? > On Oct 2, 2018, at 4:05 AM, Jeff Squyres (jsquyres) via devel > wrote: > > FYI: https://github.com/open-mpi/ompi/issues/5798 brought up what may be the > same issue. > >

Re: [OMPI devel] Removing ORTE code

2018-10-02 Thread Ralph H Castain
Based on silence plus today’s telecon, the stale code has been removed: https://github.com/open-mpi/ompi/pull/5827 > On Sep 26, 2018, at 7:00 AM, Ralph H Castain wrote: > > We are considering a “purge” of stale ORTE code and want to know if anyone is > using it before proceedi

Re: [OMPI devel] Hints for using an own pmix server

2018-10-08 Thread Ralph H Castain
Even PRRTE won’t allow you to stop the orted from initializing its PMIx server. I’m not sure I really understand your objective. Remember, PMIx is just a library - the orted opens it and uses it to interface to its client application procs. It makes no sense to have some other process perform th

Re: [OMPI devel] Hints for using an own pmix server

2018-10-09 Thread Ralph H Castain
Hi Stephan Thanks for the clarification - that helps a great deal. You are correct that OMPI’s orted daemons do more than just host the PMIx server library. However, they are only active if you launch the OMPI processes using mpirun. This is probably the source of the trouble you are seeing. S

Re: [OMPI devel] Hints for using an own pmix server

2018-10-09 Thread Ralph H Castain
ea what I need to change? Do I have to set an MCA > parameter to tell OpenMPI not to start orted, or does it need another > hint in the client environment beside the stuff comming from the PMIx > server helper library? > > > Stephan > > > On Tuesday, Oct 10 2018,

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Ralph H Castain
h >>> ORTE_SCHIZO_DETECTION=ORTE >>> OMPI_COMMAND=./hello_env >>> OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08- >>> d92c0e73869e1cfa >>> OMPI_MCA_orte_launch=1 >>> OMPI_APP_CTX_NUM_PROCS=1 >>> OMPI_MCA_pmix=^s1,s2,cray,isolated >>> OMP

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Ralph H Castain
point it out! Ralph > On Oct 12, 2018, at 6:15 AM, Ralph H Castain wrote: > > Hi Stephan > > >> On Oct 12, 2018, at 2:25 AM, Stephan Krempel > <mailto:krem...@par-tec.com>> wrote: >> >> Hallo Ralph, >> >>> I assume this (--with-

Re: [OMPI devel] Hints for using an own pmix server

2018-10-14 Thread Ralph H Castain
> On Oct 12, 2018, at 6:15 AM, Ralph H Castain wrote: > >> One point that remains open and is interesting for me is if I can >> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow >> possible to configure it as there were the "--with-ompi-pmix-rt

[OMPI devel] SC'18 PMIx BoF meeting

2018-10-15 Thread Ralph H Castain
Hello all [I’m sharing this on the OMPI mailing lists (as well as the PMIx one) as PMIx has become tightly integrated to the OMPI code since v2.0 was released] The PMIx Community will once again be hosting a Birds-of-a-Feather meeting at SuperComputing. This year, however, will be a little diff

Re: [OMPI devel] Hints for using an own pmix server

2018-10-18 Thread Ralph H Castain
> On Oct 17, 2018, at 3:32 AM, Stephan Krempel wrote: > > > Hi Ralph. > One point that remains open and is interesting for me is if I can achieve the same with the 3.1.2 release of OpenMPI. Is it somehow possible to configure it as there were the "--with-ompi-pmix-rte" swi

[OMPI devel] PRRTE v3.0.0rc1 available for testing

2018-11-28 Thread Ralph H Castain
Hi folks Given a growing use of PRRTE plus OMPI’s announced plans to phase out ORTE in favor of PRRTE, it seems the time has come to begin generating formal releases of PRRTE. Accordingly, I have created a v3.0.0 release candidate for folks to (hopefully) test: https://github.com/pmix/prrte/re

[OMPI devel] PMIx v2.1 Standard released

2018-12-06 Thread Ralph H Castain
The PMIx community, representing a consortium of research, academic, and industry partners, is pleased to announce the release of the PMIx v2.1 Standard document. The document can be obtained from: * the PMIx website at https://pmix.org/wp-content/uploads/2018/12/pmix-standard-2.1.pdf

[OMPI devel] OMPI and PRRTE separated

2018-12-17 Thread Ralph H Castain
Hello all For those of you working with ORTE and/or PRRTE, GitHub has severed the parent/child relationship between the OMPI and PRRTE repositories. Thus, we will no longer be able to directly “pull” changes made to ORTE downstream into PRRTE. This marks the end of direct support for ORTE exce

Re: [OMPI devel] OMPI and PRRTE separated

2018-12-17 Thread Ralph H Castain
FYI: I have deleted all the old OMPI tags from PRRTE, so we have a clean repo to work with now. > On Dec 17, 2018, at 5:58 PM, Ralph H Castain wrote: > > Hello all > > For those of you working with ORTE and/or PRRTE, GitHub has severed the > parent/child relationship be

[OMPI devel] PMIx v3.0 Standard released

2018-12-20 Thread Ralph H Castain
The PMIx community, representing a consortium of research, academic, and industry partners, is pleased to announce the release of the PMIx v3.0 Standard document. The document can be obtained from: * the PMIx website at https://pmix.org/wp-content/uploads/2018/12/pmix-standard-3.0.pdf * the PM

[OMPI devel] open-mpi.org is DOWN

2018-12-22 Thread Ralph H Castain
Hello all Apologies to everyone, but I received an alert this moring that malware has been detected on the www.open-mpi.org site. I have tried to contact the hosting agency and the security scanners, but nobody is around on this pre-holiday weekend. Accordingly, I have taken the site OFFLINE f

Re: [OMPI devel] [OMPI users] open-mpi.org is DOWN

2018-12-23 Thread Ralph H Castain
The security scanner has apologized for a false positive and fixed their system - the site has been restored. Ralph > On Dec 22, 2018, at 12:12 PM, Ralph H Castain wrote: > > Hello all > > Apologies to everyone, but I received an alert this moring that malware has > be

Re: [OMPI devel] rml/ofi component broken in v4.0.x and v3.1.x

2019-02-14 Thread Ralph H Castain
I would recommend just removing it - frankly, I’m surprised it is in there as the code was deemed non-production-ready. > On Feb 14, 2019, at 5:11 PM, Gilles Gouaillardet wrote: > > Folks, > > > The rml/ofi component has been removed from master. > > Then common/ofi was later removed from m

Re: [OMPI devel] Gentle reminder: sign up for the face to face

2019-02-26 Thread Ralph H Castain
Done! > On Feb 26, 2019, at 8:33 AM, Brice Goglin wrote: > > Hello Jeff > > Looks like I am not allowed to modify the page but I'll be at the meeting ;) > > Brice > > > > Le 26/02/2019 à 17:13, Jeff Squyres (jsquyres) via devel a écrit : >> Gentle reminder to please sign up for the face-to-

Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-15 Thread Ralph H Castain
There is a coll/sync component that will automatically inject those barriers for you so you don’t have to add them to your code. Controlled by MCA param: coll_sync_barrier_before: Do a synchronization before each Nth collective coll_sync_barrier_after: Do a synchronization after each Nth collect

Re: [OMPI devel] MPI Reduce Without a Barrier

2019-04-15 Thread Ralph H Castain
Not exactly. The problem is that rank=0 initially falls behind because it is doing more work - i.e., it has to receive all the buffers and do something with them. As a result, it doesn’t get to post the next allreduce before the messages from the other participants arrive - which means that rank

Re: [OMPI devel] help - urgent

2006-06-30 Thread Ralph H Castain
Hi Amrita I¹m not entirely sure I understand your questions, but will try to answer them below. If you can share what you are doing, we¹d be happy to provide advice. Ralph On 6/30/06 5:45 AM, "amrita mathuria" wrote: > hi... > > I am working with open mpi source code > > I want

Re: [OMPI devel] orted problem

2006-07-05 Thread Ralph H Castain
This has been around for a very long time (at least a year, if memory serves correctly). The problem is that the system "hangs" while trying to flush the io buffers through the RML because it loses connection to the head node process (for 1.x, that's basically mpirun) - but the "flush" procedure do

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
Hi Nathan Could you tell us which version of the code you are using, and print out the rc value that was returned by the "get" call? I see nothing obviously wrong with the code, but much depends on what happened prior to this call too. BTW: you might want to release the memory stored in the retur

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
lamos National Laboratory > Parallel Tools Team > High Performance Computing Environments > phone: 505-667-3428 > email: ndeb...@lanl.gov > - > > > > Ralph H Castain wrote: >> Hi Nathan >> >> Could you t

Re: [OMPI devel] OpenMPI not conforming with the C90 spec?

2006-08-21 Thread Ralph H Castain
On 8/21/06 1:14 AM, "Ralf Wildenhues" wrote: > >> Perhaps we should use int64_t instead. > > No, that would not help: int64_t is C99, so it should not be declared > either in C89 mode. Also, the int64_t is required to have 64 bits, and > could thus theoretically be smaller than 'long long'

Re: [OMPI devel] OpenMPI not conforming with the C90 spec?

2006-08-21 Thread Ralph H Castain
On 8/21/06 6:58 AM, "Ralf Wildenhues" wrote: > * Ralph H Castain wrote on Mon, Aug 21, 2006 at 02:39:51PM CEST: >> >> It sounds, therefore, like we are now C99 compliant and no longer C90 >> compliant at all? > > Well, a compiler supporting C90 plus 

[OMPI devel] Upcoming: Major ORTE changes

2006-08-23 Thread Ralph H Castain
Yo all There has been a bit of discussion about this on the core developers list and on telecons, but I felt that perhaps I should provide a more detailed warning to the broader developer community. In the next few weeks, there will be some major revisions submitted to the Open MPI trunk on the O

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain
Actually, I was a part of that thread - see my comments beginning with http://www.open-mpi.org/community/lists/devel/2006/03/0797.php. Perhaps I communicated poorly here. The issue in the prior thread was that few systems nowadays don't offer at least some level of IPv6 compatibility, even if noth

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain
On 9/6/06 9:44 AM, "Christian Kauhaus" wrote: > Bogdan Costescu : >> I don't know why you think that this (talking to different nodes via >> different channels) is unusual - I think that it's quite probable, >> especially in a heterogenous environment. > > I think the first goal should be to

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-07 Thread Ralph H Castain
Jeff and I talked about this for awhile this morning, and we both agree (yes, I did change my mind after we discussed all the ramifications). It appears that we should be able to consolidate the code into a single component with the right configuration system "magic" - and that would definitely be

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-07 Thread Ralph H Castain
t; I even volunteer for that. Next week I will be away, so I will come > back with a design for the phone conference on ... well beginning of > october. > >george. > > > On Sep 7, 2006, at 12:22 PM, Ralph H Castain wrote: > >> Jeff and I talked about

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-08 Thread Ralph H Castain
It occurred to me last night that this solves the homogeneous case, but still leaves us with the problem of hetero systems. What we really need to know is not only "what do I support", but "what does the recipient support". Then it hit me that we may already have the solution for that problem in t

[OMPI devel] ORTE tutorial

2006-09-25 Thread Ralph H Castain
Yo folks I need to do a little planning and it would help a bunch to have a preliminary head count. Could you please let me know (a) if you plan to participate in the tutorial, and (b) indicate if in-person or remote? For an agenda, my thought is that we will start at 7am Mountain time (that's 9a

Re: [OMPI devel] ORTE tutorial

2006-09-25 Thread Ralph H Castain
0.30 pm on wednesday, and by the time I pick up > the rental car and drive to White Rocks, it can become quite late) > Could we maybe start a little later that day, e.g. 8am or 9am? > > Thanks > Edgar > > Ralph H Castain wrote: >> Yo folks >> >> I ne

[OMPI devel] Tentative OpenRTE tutorial agenda

2006-09-25 Thread Ralph H Castain
Hello all I have attached a tentative agenda for this week's tutorial, based on inputs received so far from planned participants. I have adjusted things to try and accommodate the needs of a geographically distributed audience, and the fact that - as sole speaker - I cannot possibly talk for hours

[OMPI devel] ORTE Tutorial Materials

2006-09-27 Thread Ralph H Castain
Hello all The materials for Thursday's session of the ORTE tutorial are now complete and stable. I have posted them on the OpenRTE web site at: http://www.open-rte.org/papers/tutorial-sept-2006/index.php Both Powerpoint and PDF (printed two slides/page) formats are available. I should have the

[OMPI devel] ORTE Timing

2006-09-29 Thread Ralph H Castain
Hello all There was some discussion at yesterday's tutorial about ORTE scalability and where bottlenecks might be occurring. I spent some time last night identifying key information required to answer those questions. I'll be presenting a slide today showing the key timing points that we would nee

Re: [OMPI devel] socket usage

2006-10-25 Thread Ralph H Castain
I can't speak to the MPI layer, but for OpenRTE, each process holds one socket open to the HNP. Each process *has* all the socket connection info for all of the processes in its job, but I don't believe we actually open those sockets until we attempt to communicate with that process (needs to be ve

Re: [OMPI devel] New oob/tcp?

2006-10-25 Thread Ralph H Castain
I don't see any new component, Adrian. There have been a few updates to the existing component, some of which might cause conflicts with the merge, but those shouldn't be too hard to resolve. As far as I know, the oob/tcp component is relatively stable. Brian is doing some work on it to enable us

Re: [OMPI devel] New oob/tcp?

2006-10-25 Thread Ralph H Castain
There are a number of things in the trunk that haven't been moved over to 1.2 branch yet. They are coming shortly, though...once the merge is done, you might get a few more conflicts, but it shouldn't be too bad. On 10/25/06 7:06 AM, "Adrian Knoth" wrote: > On Wed, Oct 25, 2006 at 02:48:33PM +0

[OMPI devel] Connect to default universe restored

2006-11-06 Thread Ralph H Castain
For those who are interested, I have restored the ability to "connect" to a default universe again (trunk as of r12438). Ralph

Re: [OMPI devel] Getting process PID

2006-11-09 Thread Ralph H Castain
Hi Greg All of the schema keys are listed in orte/mca/schema/schema_types.h. The key you are looking for is the ORTE_PROC_LOCAL_PID_KEY. You will also see a ORTE_PROC_PID_KEY. This one refers to the pid assigned by the launcher - the other refers to the pid reported by the process from its remote

Re: [OMPI devel] Getting process PID

2006-11-09 Thread Ralph H Castain
Hmmm...let me check it out - will get back to you later today. Sorry for the problem Ralph On 11/9/06 3:07 PM, "Greg Watson" wrote: > I tried ORTE_PROC_LOCAL_PID_KEY but it just returns 0 on MacOSX. > > Greg > > On Nov 9, 2006, at 1:31 PM, Ralph H Castain wrote: &g

Re: [OMPI devel] Build system changes

2006-11-30 Thread Ralph H Castain
Thanks Ralf! Much appreciated. On 11/30/06 8:33 AM, "Ralf Wildenhues" wrote: > * Ralph Castain wrote on Thu, Nov 30, 2006 at 04:12:16PM CET: >> That could be the problem. I had to update automake, and unfortunately >> Darwin Ports hasn't reached that level yet. So I had to build and install >>

Re: [OMPI devel] Major revision to the RML/OOB

2006-12-06 Thread Ralph H Castain
We aren't ignoring your situation, Adrian - Jeff and I are talking about how best to deal with the situation and your offer to help. This revision will indeed see some significant change in the oob/tcp component, mostly in the init and connect procedures. The concern is that we want to leave open

Re: [OMPI devel] Major revision to the RML/OOB

2006-12-06 Thread Ralph H Castain
The changes we are planning to do will in no way preclude the use of multicast for the xcast procedure. The changes in the OOB subsystem deal specifically with how those connections are initialized, which is something we would need to do for multicast anyway. The routing method for the xcast is al

[OMPI devel] OpenRTE telecon?

2007-01-04 Thread Ralph H Castain
Hi everyone Several of us were on a telecon yesterday and the topic of better coordinating the activities on OpenRTE came up. While things have percolated along reasonably well, the general feeling was that better, wider knowledge of current OpenRTE development activities and directions would help

Re: [OMPI devel] OpenRTE telecon?

2007-01-11 Thread Ralph H Castain
;m working on adding functionality to the code - I will note those on the site as I am fixing them. Again, I would like to note that people are always welcome to drop me a note or call me on the phone if they have a question about what I'm doing or planning to do. Thanks Ralph On 1

Re: [OMPI devel] 1.2b3 fails on bluesteel

2007-01-22 Thread Ralph H Castain
1.2 supports bproc just fine. There is an issue that I am currently working on that is causing a problem on both 1.2 and the trunk. For the moment, you need to disable shared memory: "--mca btl ^sm" Other than that, it seems to be working fine on flash, acme, and coyote. On 1/22/07 8:02 AM, "Gre

Re: [OMPI devel] 1.2b3 fails on bluesteel

2007-01-22 Thread Ralph H Castain
Oh yeah - Galen noted that you also have to do a preconnect, so what you need to add to your command line is: -mca btl ^sm -mca mpi_preconnect_all 1 Ralph On 1/22/07 8:02 AM, "Greg Watson" wrote: > > On Jan 19, 2007, at 4:39 PM, Li-Ta Lo wrote: > >> On Fri, 2007-01-19 at 14:42 -0700, Greg

Re: [OMPI devel] 1.2b3 fails on bluesteel

2007-01-22 Thread Ralph H Castain
alked about the more general bproc allocator question and I can commit a change later today to fix the situation for bluesteel. Until then, I fear that we may not run on that system, though you could give it a try anyway. On 1/22/07 8:14 AM, "Ralph H Castain" wrote: > Oh yeah - Ga

Re: [OMPI devel] 1.2b3 fails on bluesteel

2007-01-22 Thread Ralph H Castain
"bug" that caused all kinds of problems in production environments - that's been fixed for quite some time. So, yes - you do have to get an official "allocation" of some kind. Even the changes I mentioned wouldn't remove that requirement in the way you describe. >

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-29 Thread Ralph H Castain
On 1/27/07 9:37 AM, "Greg Watson" wrote: > There are two more interfaces that have changed: > > 1. orte_rds.query() now takes a job id, whereas in 1.2b1 it didn't > take any arguments. I seem to remember that I call this to kick orted > into action, but I'm not sure of the implications of not

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-29 Thread Ralph H Castain
On 1/29/07 10:20 AM, "Greg Watson" wrote: > > On Jan 29, 2007, at 6:47 AM, Ralph H Castain wrote: > >> >> >> >> On 1/27/07 9:37 AM, "Greg Watson" wrote: >> >>> There are two more interfaces that have changed: >>

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-30 Thread Ralph H Castain
on >> anything >> other than a hostfile, we really don't have a way to do that right >> now. The >> ORTE 2.0 design allows for it, but we haven't implemented that yet - >> probably a few months away. >> >> Hope that helps >> Ralph >>

Re: [OMPI devel] Is it possible to get BTL transport work directly with MPI level

2007-04-03 Thread Ralph H Castain
On 4/3/07 9:32 AM, "Li-Ta Lo" wrote: > On Sun, 2007-04-01 at 13:12 -0600, Ralph Castain wrote: > >> >> 2. I'm not sure what you mean by mapping MPI processes to "physical" >> processes, but I assume you mean how do we assign MPI ranks to processes on >> specific nodes. You will find that don

[OMPI devel] ORTE scalability issues

2007-04-16 Thread Ralph H Castain
Hello all I understand that several people are interested in the OpenRTE scalability issues - this is great! However, it appears we haven't done a very good job of circulating information about the identified causes of the current issues. In the hope of helping people to be productive in their con

[OMPI devel] OpenRTE and "malloc"

2007-04-16 Thread Ralph H Castain
Hello all There has been some recent activity aimed at reducing memory "leaks" from within the Open MPI code base, including OpenRTE. These are most welcome and long overdue. It has, though, caused a couple of questions to me about why we used malloc so extensively within OpenRTE. Rather than answ

Re: [OMPI devel] ORTE scalability issues

2007-04-17 Thread Ralph H Castain
Thanks Christian. Actually, I was aware of that and should have clarified that these tests did *not* involve the IPv6 code. Ralph On 4/17/07 1:31 AM, "Christian Kauhaus" wrote: > Ralph H Castain : >> even though the HNP isn't actually part of the MPI job itself,

[OMPI devel] Change to default xcast mode [RFC]

2007-05-18 Thread Ralph H Castain
For the last several months, we have supported three modes of sending the xcast messages used to release MPI processes from their various stage gates: 1. Direct - message sent directly to each process in a serial fashion 2. Linear - message sent serially to the daemon on each node, which then "fa

Re: [OMPI devel] [devel-core] Change to default xcast mode [RFC]

2007-05-18 Thread Ralph H Castain
orted independently (instead of via a binomial tree method). > > Andrew > > Ralph H Castain wrote: >> For the last several months, we have supported three modes of sending the >> xcast messages used to release MPI processes from their various stage gates: >>

Re: [OMPI devel] ORTE local rank

2007-05-21 Thread Ralph H Castain
ricom and QLogic) > > How do we get access to these values; are they in global variables > somewhere, or do we make a function call to get them? > > > On May 21, 2007, at 7:58 AM, Ralph H Castain wrote: > >> Well, it took awhile longer than I had thought to get around

[OMPI devel] Dumping process status etc.

2007-05-22 Thread Ralph H Castain
This came up in today's telecon and I promised to send this to George - however, it occurred to me that others may also want to know. If you want to dump info for debugging purposes, and if you can get into orterun/mpirun (e.g., via gdb), you can dump info on anything with the following (NOTE: Gdb

Re: [OMPI devel] Strange schema error

2007-05-23 Thread Ralph H Castain
Just a quick glance (running out door) - it looks like Josh commented out a critical piece of code in the rds hostfile component at line 442. It loads the cell info into the name service so it can correctly respond to the query you cite below. You might try restoring that code - if you do, check t

Re: [OMPI devel] Strange schema error

2007-05-23 Thread Ralph H Castain
ven't looked at this at all, but that line changed in r6813 which > was Aug. 2005 so I would guess the problem is elsewhere. However with > the recent ORTE changes maybe this is a side effect. > > -- Josh > > > On May 23, 2007, at 11:11 AM, Ralph H Castain wrote: >

Re: [OMPI devel] Strange schema error

2007-05-23 Thread Ralph H Castain
Okay, this is now fixed as of r14732. Thanks (and apologies) to George for spotting it. Ralph On 5/23/07 9:57 AM, "Ralph H Castain" wrote: > Actually, I think that is true (got back earlier than expected). The problem > really is that we had multiple compensating errors

Re: [OMPI devel] ORTE registry patch

2007-05-24 Thread Ralph H Castain
Thanks - I'll take a look at this (and the prior ones!) in the next couple of weeks when time permits and get back to you. Ralph On 5/23/07 1:11 PM, "George Bosilca" wrote: > Attached is another patch to the ORTE layer, more specifically the > replica. The idea is to decrease the number of str

[OMPI devel] Why the HNP gets so big...

2007-05-31 Thread Ralph H Castain
Scaling tests over the last few months have all shown a behavior that has elicited significant comment: namely, that the HNP is observed to grow to multiple gigabytes in size for runs involving several thousand processes. This represents a peak size that declines to a much smaller footprint once th

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Ralph H Castain
t;>>>> see >>>>>> which ones make sense in the latter case). This will ensure that we >>>>>> have at >>>>>> least some degree of coverage. >>>>>> >>>>>> Thanks >>>>>> Ra

Re: [OMPI devel] ORTE registry patch

2007-06-06 Thread Ralph H Castain
remember to also > remove test/class/orte_bitmap.c > > Thanks, > > Tim > > > Ralph H Castain wrote: >> Sigh...is it really so much to ask that we at least run the tests in >> orte/test/system and orte/test/mpi using both mpirun and singleton (where >> approp

Re: [OMPI devel] threaded builds

2007-06-11 Thread Ralph H Castain
I think that 1.2 is a lost cause in this regard - I thought we were just looking forward on the trunk. On 6/11/07 8:17 AM, "Brian Barrett" wrote: > Yes, this is a known issue. I don't know -- are we trying to make > threads work on the 1.2 branch, or just the trunk? I had thought > just the t

[OMPI devel] Major commit to trunk

2007-06-12 Thread Ralph H Castain
Yo all I made a major commit to the trunk this morning (r15007) that merits general notification and some explanation. *** IMPORTANT NOTE *** One major impact of the commit you *may* notice is that support for several environments will be broken. This commit is known to break s

Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralph H Castain
Actually, I was talking specifically about configuration at build time. I realize there are trade-offs here, and suspect we can find a common ground. The problem with using the options Jeff described is that they require knowledge on the part of the builder as to what environments have had their in

Re: [OMPI devel] ticket 1023

2007-07-10 Thread Ralph H Castain
As I understood our original discussions, this would move responsibility for mapping rank to processor back into the orted - is that still true? Reason I ask is to again clarify for people if we are doing so as it (a) impacts those systems that don't use our orteds (e.g., will affinity still work

[OMPI devel] Bproc support

2007-07-10 Thread Ralph H Castain
Yo all I have upgraded the support for Bproc on the Open MPI trunk as of r15328. We now support Bproc environments that do not utilize resource managers - in these cases, we will allow the user to launch on all nodes upon which they have execution authorities. Please note that, if you login to yo

Re: [OMPI devel] ticket 1023

2007-07-10 Thread Ralph H Castain
Currently this component is the ODLS. Most of my > work is in the ODLS component so if you decide to eliminate the orteds > you mast, somehow, preserve the ODLS functionality. > > Sharon. > > > > -Original Message----- > From: devel-boun...@open-mpi.org [mailto:devel

Re: [OMPI devel] Multi-environment builds

2007-07-10 Thread Ralph H Castain
works (e.g., RAS and PLS). E.g., "orte_base_launcher=tm", or > somesuch. > > > On Jul 10, 2007, at 9:08 AM, Ralph H Castain wrote: > >> Actually, I was talking specifically about configuration at build >> time. I >> realize there are trade-offs here,

Re: [OMPI devel] Multi-environment builds

2007-07-11 Thread Ralph H Castain
Interesting point - no reason why we couldn't use that functionality for this purpose. Good idea! On 7/11/07 5:38 AM, "Jeff Squyres" wrote: > On Jul 10, 2007, at 1:26 PM, Ralph H Castain wrote: > >>> 2. It may be useful to have some high-level parameters to s

[OMPI devel] Orte update

2007-07-12 Thread Ralph H Castain
Yo all I have a fairly significant change coming to the orte part of the code base that will require an autogen (sorry). I'll check it in late this afternoon (can't do it at night as it is on my office desktop). The commit will fix the singleton operations, including singleton comm_spawn. It also

Re: [OMPI devel] Orte update

2007-07-12 Thread Ralph H Castain
g the update. Thanks Ralph On 7/12/07 7:53 AM, "Ralph H Castain" wrote: > Yo all > > I have a fairly significant change coming to the orte part of the code base > that will require an autogen (sorry). I'll check it in late this afternoon > (can't do i

Re: [OMPI devel] Orte update

2007-07-12 Thread Ralph H Castain
). Please let me know of any problems. Ralph On 7/12/07 1:45 PM, "Ralph H Castain" wrote: > Yo folks > > Several of us are stuck waiting for this commit to hit. Rather than wasting > the next several hours, I'm going to make the commit now. > > So please be adv

  1   2   3   >