[OMPI devel] ORTE scalability issues

2007-04-16 Thread Ralph H Castain
Hello all I understand that several people are interested in the OpenRTE scalability issues - this is great! However, it appears we haven't done a very good job of circulating information about the identified causes of the current issues. In the hope of helping people to be productive in their

Re: [OMPI devel] Is it possible to get BTL transport work directly with MPI level

2007-04-03 Thread Ralph H Castain
On 4/3/07 9:32 AM, "Li-Ta Lo" wrote: > On Sun, 2007-04-01 at 13:12 -0600, Ralph Castain wrote: > >> >> 2. I'm not sure what you mean by mapping MPI processes to "physical" >> processes, but I assume you mean how do we assign MPI ranks to processes on >> specific nodes. You

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-30 Thread Ralph H Castain
anything >> other than a hostfile, we really don't have a way to do that right >> now. The >> ORTE 2.0 design allows for it, but we haven't implemented that yet - >> probably a few months away. >> >> Hope that helps >> Ralph >> >> >> O

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-29 Thread Ralph H Castain
On 1/29/07 10:20 AM, "Greg Watson" <gwat...@lanl.gov> wrote: > > On Jan 29, 2007, at 6:47 AM, Ralph H Castain wrote: > >> >> >> >> On 1/27/07 9:37 AM, "Greg Watson" <gwat...@lanl.gov> wrote: >> >>> There a

Re: [OMPI devel] Urgent: ORTE_RML_NAME_SEED removed from 1.2b3!

2007-01-29 Thread Ralph H Castain
On 1/27/07 9:37 AM, "Greg Watson" wrote: > There are two more interfaces that have changed: > > 1. orte_rds.query() now takes a job id, whereas in 1.2b1 it didn't > take any arguments. I seem to remember that I call this to kick orted > into action, but I'm not sure of the

[OMPI devel] OpenRTE telecon?

2007-01-04 Thread Ralph H Castain
Hi everyone Several of us were on a telecon yesterday and the topic of better coordinating the activities on OpenRTE came up. While things have percolated along reasonably well, the general feeling was that better, wider knowledge of current OpenRTE development activities and directions would

Re: [OMPI devel] Build system changes

2006-11-30 Thread Ralph H Castain
Thanks Ralf! Much appreciated. On 11/30/06 8:33 AM, "Ralf Wildenhues" wrote: > * Ralph Castain wrote on Thu, Nov 30, 2006 at 04:12:16PM CET: >> That could be the problem. I had to update automake, and unfortunately >> Darwin Ports hasn't reached that level yet. So I had

Re: [OMPI devel] New oob/tcp?

2006-10-25 Thread Ralph H Castain
I don't see any new component, Adrian. There have been a few updates to the existing component, some of which might cause conflicts with the merge, but those shouldn't be too hard to resolve. As far as I know, the oob/tcp component is relatively stable. Brian is doing some work on it to enable us

Re: [OMPI devel] socket usage

2006-10-25 Thread Ralph H Castain
I can't speak to the MPI layer, but for OpenRTE, each process holds one socket open to the HNP. Each process *has* all the socket connection info for all of the processes in its job, but I don't believe we actually open those sockets until we attempt to communicate with that process (needs to be

[OMPI devel] ORTE Timing

2006-09-29 Thread Ralph H Castain
Hello all There was some discussion at yesterday's tutorial about ORTE scalability and where bottlenecks might be occurring. I spent some time last night identifying key information required to answer those questions. I'll be presenting a slide today showing the key timing points that we would

[OMPI devel] ORTE Tutorial Materials

2006-09-27 Thread Ralph H Castain
Hello all The materials for Thursday's session of the ORTE tutorial are now complete and stable. I have posted them on the OpenRTE web site at: http://www.open-rte.org/papers/tutorial-sept-2006/index.php Both Powerpoint and PDF (printed two slides/page) formats are available. I should have the

Re: [OMPI devel] ORTE tutorial

2006-09-25 Thread Ralph H Castain
round 10.30 pm on wednesday, and by the time I pick up > the rental car and drive to White Rocks, it can become quite late) > Could we maybe start a little later that day, e.g. 8am or 9am? > > Thanks > Edgar > > Ralph H Castain wrote: >> Yo folks >> >> I

[OMPI devel] ORTE tutorial

2006-09-25 Thread Ralph H Castain
Yo folks I need to do a little planning and it would help a bunch to have a preliminary head count. Could you please let me know (a) if you plan to participate in the tutorial, and (b) indicate if in-person or remote? For an agenda, my thought is that we will start at 7am Mountain time (that's

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-07 Thread Ralph H Castain
. > > I even volunteer for that. Next week I will be away, so I will come > back with a design for the phone conference on ... well beginning of > october. > >george. > > > On Sep 7, 2006, at 12:22 PM, Ralph H Castain wrote: > >> Jeff and I talked about t

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain
On 9/6/06 9:44 AM, "Christian Kauhaus" wrote: > Bogdan Costescu : >> I don't know why you think that this (talking to different nodes via >> different channels) is unusual - I think that it's quite probable, >> especially in a

Re: [OMPI devel] [IPv6] new component oob/tcp6

2006-09-06 Thread Ralph H Castain
Actually, I was a part of that thread - see my comments beginning with http://www.open-mpi.org/community/lists/devel/2006/03/0797.php. Perhaps I communicated poorly here. The issue in the prior thread was that few systems nowadays don't offer at least some level of IPv6 compatibility, even if

Re: [OMPI devel] OpenMPI not conforming with the C90 spec?

2006-08-21 Thread Ralph H Castain
On 8/21/06 6:58 AM, "Ralf Wildenhues" <ralf.wildenh...@gmx.de> wrote: > * Ralph H Castain wrote on Mon, Aug 21, 2006 at 02:39:51PM CEST: >> >> It sounds, therefore, like we are now C99 compliant and no longer C90 >> compliant at all? > > Well

Re: [OMPI devel] OpenMPI not conforming with the C90 spec?

2006-08-21 Thread Ralph H Castain
On 8/21/06 1:14 AM, "Ralf Wildenhues" wrote: > >> Perhaps we should use int64_t instead. > > No, that would not help: int64_t is C99, so it should not be declared > either in C89 mode. Also, the int64_t is required to have 64 bits, and > could thus theoretically be

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
h Performance Computing Environments > phone: 505-667-3428 > email: ndeb...@lanl.gov > ----- > > > > Ralph H Castain wrote: >> Hi Nathan >> >> Could you tell us which version of the code you a

Re: [OMPI devel] Getting the number of nodes

2006-07-05 Thread Ralph H Castain
Hi Nathan Could you tell us which version of the code you are using, and print out the rc value that was returned by the "get" call? I see nothing obviously wrong with the code, but much depends on what happened prior to this call too. BTW: you might want to release the memory stored in the

Re: [OMPI devel] orted problem

2006-07-05 Thread Ralph H Castain
This has been around for a very long time (at least a year, if memory serves correctly). The problem is that the system "hangs" while trying to flush the io buffers through the RML because it loses connection to the head node process (for 1.x, that's basically mpirun) - but the "flush" procedure

Re: [OMPI devel] help - urgent

2006-06-30 Thread Ralph H Castain
Hi Amrita I¹m not entirely sure I understand your questions, but will try to answer them below. If you can share what you are doing, we¹d be happy to provide advice. Ralph On 6/30/06 5:45 AM, "amrita mathuria" wrote: > hi... > > I am working with open mpi

Re: [OMPI devel] [O-MPI devel] Alpha 4 and job state transitions

2006-02-13 Thread Ralph H. Castain
>> Nathan DeBardeleben, Ph.D. >> Los Alamos National Laboratory >> Parallel Tools Team >> High Performance Computing Environments >> phone: 505-667-3428 >> email: ndeb...@lanl.gov >>

Re: [O-MPI devel] Modification to triggers

2006-02-09 Thread Ralph H. Castain
Hmmmyuck! I'll take a look - will set it back to what it was before in the interim. Thanks Ralph At 07:05 AM 2/9/2006, you wrote: On Feb 8, 2006, at 12:46 PM, Ralph H. Castain wrote: > In addition, I took advantage of the change to fix something Brian > had flagged in the orte/mc

Re: [O-MPI devel] Alpha 4 and job state transitions

2006-02-08 Thread Ralph H. Castain
Nathan This should now be fixed on the trunk. Once it is checked out more thoroughly, I'll ask that it be moved to the 1.0 branch. For now, you might want to check out the trunk and verify it meets your needs. Ralph At 03:05 PM 2/1/2006, you wrote: This was happening on Alpha 1 as well but

[O-MPI devel] New data support subsystem for ORTE

2006-02-06 Thread Ralph H. Castain
Hello all After several months of development, I have merged the new data support subsystem for ORTE into the trunk. I must provide one caveat of warning: I have made every effort to test the revised system, but cannot guarantee its operation in every condition and under every system. For

Re: [O-MPI devel] Alpha 4 and job state transitions

2006-02-02 Thread Ralph H. Castain
I've just finished some stuff - will check it into the system (hopefully) tomorrow. I'll be able to take a look at this next week. My guess is that the launcher isn't setting that proc state at this time since it isn't being used by the system internally and we didn't know anyone else was

Re: [O-MPI devel] rsh and fork pls components

2005-12-13 Thread Ralph H. Castain
No problem with me - seems straightforward and resolves some confusion. On the orted check for the fork pls, you will find that there is a flag in the process info structure that indicates "I am a daemon". You may just need to check that flag - gets set very early and so should be available

[O-MPI devel] Process for modifying APIs

2005-11-21 Thread Ralph H. Castain
Yo all As you may have seen from earlier emails, I encountered some difficulty in modifying existing APIs within the streamlined build system. After some effort, I think I have defined a method for modifying the API-level of a subsystem that gets around some of the problems. I thought I

Re: [O-MPI devel] New build methodology

2005-11-21 Thread Ralph H. Castain
Hi Ralf Appreciate the offer, but I think at this stage it isn't worth the hassle. We either implement a long-term fix, or just pay the price. Thanks though Ralph At 01:37 AM 11/21/2005, you wrote: Hi Ralph, * Ralph H. Castain wrote on Mon, Nov 21, 2005 at 04:04:34AM CET: > Just as an

Re: [O-MPI devel] New build methodology

2005-11-21 Thread Ralph H. Castain
un "make" in a framework directory, it just builds the stuff in base without recusing. Of course, you can't run make in the base/ directory, but since running make in the framework directory is essentially equivalent, it doesn't exactly matter. Brian On Nov 20, 2005, at 10:04 PM, Ralph H. Casta

Re: [O-MPI devel] New build methodology

2005-11-20 Thread Ralph H. Castain
ut we have definitely made it harder to develop a subsystem. Is that really a good trade? I wonder. Ralph At 08:08 AM 11/15/2005, you wrote: * Ralph H. Castain wrote on Tue, Nov 15, 2005 at 03:45:26PM CET: > At 07:33 AM 11/15/2005, you wrote: > > > >Would it help if onl

Re: [O-MPI devel] New build methodology

2005-11-15 Thread Ralph H. Castain
Your proposed change would help a great deal - thanks! Can you steer me through the change? At 07:33 AM 11/15/2005, you wrote: Hi Ralph, * Ralph H. Castain wrote on Tue, Nov 15, 2005 at 03:12:38PM CET: > > While I generally find the new build methodology (i.e., reducing the &g

[O-MPI devel] New build methodology

2005-11-15 Thread Ralph H. Castain
Yo folks While I generally find the new build methodology (i.e., reducing the number of makefiles) has little impact on me, I have now encountered one problem that causes a significant difficulty. In trying to work on a revised data packing system for the orte part of the branch, I now find

[O-MPI devel] Startup/shutdown performance

2005-09-13 Thread Ralph H. Castain
Yo folks Josh ran some tests for me on Odin earlier today - the results show a major improvement in our startup/shutdown performance. As you may recall, our times grew roughly exponentially before - as the attached graph shows, they now grow roughly linearly. The data also shows that the

Re: [O-MPI devel] RHC development plans

2005-09-01 Thread Ralph H. Castain
Yo folks I have now completed the first three of these items. I believe this brings ORTE to a stage that is - at the least - very close to release quality. There are a few memory leaks left (oob and iof subsystems), but I'm not as familiar with those and have asked for help. Barring any

[O-MPI devel] RHC development plans

2005-09-01 Thread Ralph H. Castain
Yo folks Several people have asked lately what I am planning to do next on ORTE. Just to help maintain coordination, here is my current list of planned activities (in priority order). Any requests/suggestions are welcomed - this isn't in concrete by any means. 1. Add George's architecture

Re: [O-MPI devel] compile error

2005-08-08 Thread Ralph H. Castain
Very interesting - it built fine for me (building static). However, the ns_base_nds.c file is "stale", so I just committed a "delete" of that file. It shouldn't have been building anyway as it isn't in the Makefile. My guess, therefore, is that you are building dynamically and are encountering

[O-MPI devel] New simplified registry API's

2005-08-02 Thread Ralph H. Castain
Yo all Per last week's discussions, I have created a set of new simplified API's for the registry. These include: 1. orte_gpr.put_1 and orte_gpr.put_N: these allow you to put data on the registry without having to define your own value structures. They take a segment name, a NULL-terminated

Re: [O-MPI devel] New Bproc Components

2005-07-28 Thread Ralph H. Castain
Very interesting! Appreciate the info. My numbers are slightly better - as I've indicated, there is a NxN message exchange currently in the system that needs to be removed. With that commented out, the system scales roughly linearly with number of processes. At 04:31 PM 7/28/2005, you wrote:

<    1   2   3