Hello all
I understand that several people are interested in the OpenRTE scalability
issues - this is great! However, it appears we haven't done a very good job
of circulating information about the identified causes of the current
issues. In the hope of helping people to be productive in their
On 4/3/07 9:32 AM, "Li-Ta Lo" wrote:
> On Sun, 2007-04-01 at 13:12 -0600, Ralph Castain wrote:
>
>>
>> 2. I'm not sure what you mean by mapping MPI processes to "physical"
>> processes, but I assume you mean how do we assign MPI ranks to processes on
>> specific nodes. You
anything
>> other than a hostfile, we really don't have a way to do that right
>> now. The
>> ORTE 2.0 design allows for it, but we haven't implemented that yet -
>> probably a few months away.
>>
>> Hope that helps
>> Ralph
>>
>>
>> O
On 1/29/07 10:20 AM, "Greg Watson" <gwat...@lanl.gov> wrote:
>
> On Jan 29, 2007, at 6:47 AM, Ralph H Castain wrote:
>
>>
>>
>>
>> On 1/27/07 9:37 AM, "Greg Watson" <gwat...@lanl.gov> wrote:
>>
>>> There a
On 1/27/07 9:37 AM, "Greg Watson" wrote:
> There are two more interfaces that have changed:
>
> 1. orte_rds.query() now takes a job id, whereas in 1.2b1 it didn't
> take any arguments. I seem to remember that I call this to kick orted
> into action, but I'm not sure of the
Hi everyone
Several of us were on a telecon yesterday and the topic of better
coordinating the activities on OpenRTE came up. While things have percolated
along reasonably well, the general feeling was that better, wider knowledge
of current OpenRTE development activities and directions would
Thanks Ralf! Much appreciated.
On 11/30/06 8:33 AM, "Ralf Wildenhues" wrote:
> * Ralph Castain wrote on Thu, Nov 30, 2006 at 04:12:16PM CET:
>> That could be the problem. I had to update automake, and unfortunately
>> Darwin Ports hasn't reached that level yet. So I had
I don't see any new component, Adrian. There have been a few updates to the
existing component, some of which might cause conflicts with the merge, but
those shouldn't be too hard to resolve.
As far as I know, the oob/tcp component is relatively stable. Brian is doing
some work on it to enable us
I can't speak to the MPI layer, but for OpenRTE, each process holds one
socket open to the HNP. Each process *has* all the socket connection info
for all of the processes in its job, but I don't believe we actually open
those sockets until we attempt to communicate with that process (needs to be
Hello all
There was some discussion at yesterday's tutorial about ORTE scalability and
where bottlenecks might be occurring. I spent some time last night
identifying key information required to answer those questions. I'll be
presenting a slide today showing the key timing points that we would
Hello all
The materials for Thursday's session of the ORTE tutorial are now complete
and stable. I have posted them on the OpenRTE web site at:
http://www.open-rte.org/papers/tutorial-sept-2006/index.php
Both Powerpoint and PDF (printed two slides/page) formats are available.
I should have the
round 10.30 pm on wednesday, and by the time I pick up
> the rental car and drive to White Rocks, it can become quite late)
> Could we maybe start a little later that day, e.g. 8am or 9am?
>
> Thanks
> Edgar
>
> Ralph H Castain wrote:
>> Yo folks
>>
>> I
Yo folks
I need to do a little planning and it would help a bunch to have a
preliminary head count. Could you please let me know (a) if you plan to
participate in the tutorial, and (b) indicate if in-person or remote?
For an agenda, my thought is that we will start at 7am Mountain time (that's
.
>
> I even volunteer for that. Next week I will be away, so I will come
> back with a design for the phone conference on ... well beginning of
> october.
>
>george.
>
>
> On Sep 7, 2006, at 12:22 PM, Ralph H Castain wrote:
>
>> Jeff and I talked about t
On 9/6/06 9:44 AM, "Christian Kauhaus" wrote:
> Bogdan Costescu :
>> I don't know why you think that this (talking to different nodes via
>> different channels) is unusual - I think that it's quite probable,
>> especially in a
Actually, I was a part of that thread - see my comments beginning with
http://www.open-mpi.org/community/lists/devel/2006/03/0797.php.
Perhaps I communicated poorly here. The issue in the prior thread was that
few systems nowadays don't offer at least some level of IPv6 compatibility,
even if
On 8/21/06 6:58 AM, "Ralf Wildenhues" <ralf.wildenh...@gmx.de> wrote:
> * Ralph H Castain wrote on Mon, Aug 21, 2006 at 02:39:51PM CEST:
>>
>> It sounds, therefore, like we are now C99 compliant and no longer C90
>> compliant at all?
>
> Well
On 8/21/06 1:14 AM, "Ralf Wildenhues" wrote:
>
>> Perhaps we should use int64_t instead.
>
> No, that would not help: int64_t is C99, so it should not be declared
> either in C89 mode. Also, the int64_t is required to have 64 bits, and
> could thus theoretically be
h Performance Computing Environments
> phone: 505-667-3428
> email: ndeb...@lanl.gov
> -----
>
>
>
> Ralph H Castain wrote:
>> Hi Nathan
>>
>> Could you tell us which version of the code you a
Hi Nathan
Could you tell us which version of the code you are using, and print out the
rc value that was returned by the "get" call? I see nothing obviously wrong
with the code, but much depends on what happened prior to this call too.
BTW: you might want to release the memory stored in the
This has been around for a very long time (at least a year, if memory serves
correctly). The problem is that the system "hangs" while trying to flush the
io buffers through the RML because it loses connection to the head node
process (for 1.x, that's basically mpirun) - but the "flush" procedure
Hi Amrita
I¹m not entirely sure I understand your questions, but will try to answer
them below. If you can share what you are doing, we¹d be happy to provide
advice.
Ralph
On 6/30/06 5:45 AM, "amrita mathuria" wrote:
> hi...
>
> I am working with open mpi
>> Nathan DeBardeleben, Ph.D.
>> Los Alamos National Laboratory
>> Parallel Tools Team
>> High Performance Computing Environments
>> phone: 505-667-3428
>> email: ndeb...@lanl.gov
>>
Hmmmyuck! I'll take a look - will set it back to what it was
before in the interim.
Thanks
Ralph
At 07:05 AM 2/9/2006, you wrote:
On Feb 8, 2006, at 12:46 PM, Ralph H. Castain wrote:
> In addition, I took advantage of the change to fix something Brian
> had flagged in the orte/mc
Nathan
This should now be fixed on the trunk. Once it is checked out more
thoroughly, I'll ask that it be moved to the 1.0 branch. For now, you
might want to check out the trunk and verify it meets your needs.
Ralph
At 03:05 PM 2/1/2006, you wrote:
This was happening on Alpha 1 as well but
Hello all
After several months of development, I have merged the new data
support subsystem for ORTE into the trunk. I must provide one caveat
of warning: I have made every effort to test the revised system, but
cannot guarantee its operation in every condition and under every
system. For
I've just finished some stuff - will check it into the system
(hopefully) tomorrow. I'll be able to take a look at this next week.
My guess is that the launcher isn't setting that proc state at this
time since it isn't being used by the system internally and we didn't
know anyone else was
No problem with me - seems straightforward and resolves some confusion.
On the orted check for the fork pls, you will find that there is a
flag in the process info structure that indicates "I am a daemon".
You may just need to check that flag - gets set very early and so
should be available
Yo all
As you may have seen from earlier emails, I encountered some
difficulty in modifying existing APIs within the streamlined build
system. After some effort, I think I have defined a method for
modifying the API-level of a subsystem that gets around some of the
problems. I thought I
Hi Ralf
Appreciate the offer, but I think at this stage it isn't worth the
hassle. We either implement a long-term fix, or just pay the price.
Thanks though
Ralph
At 01:37 AM 11/21/2005, you wrote:
Hi Ralph,
* Ralph H. Castain wrote on Mon, Nov 21, 2005 at 04:04:34AM CET:
> Just as an
un "make" in a framework directory, it
just builds the stuff in base without recusing. Of course, you can't
run make in the base/ directory, but since running make in the
framework directory is essentially equivalent, it doesn't exactly
matter.
Brian
On Nov 20, 2005, at 10:04 PM, Ralph H. Casta
ut we have
definitely made it harder to develop a subsystem. Is that really a
good trade? I wonder.
Ralph
At 08:08 AM 11/15/2005, you wrote:
* Ralph H. Castain wrote on Tue, Nov 15, 2005 at 03:45:26PM CET:
> At 07:33 AM 11/15/2005, you wrote:
> >
> >Would it help if onl
Your proposed change would help a great deal - thanks! Can you steer
me through the change?
At 07:33 AM 11/15/2005, you wrote:
Hi Ralph,
* Ralph H. Castain wrote on Tue, Nov 15, 2005 at 03:12:38PM CET:
>
> While I generally find the new build methodology (i.e., reducing the
&g
Yo folks
While I generally find the new build methodology (i.e., reducing the
number of makefiles) has little impact on me, I have now encountered
one problem that causes a significant difficulty. In trying to work
on a revised data packing system for the orte part of the branch, I
now find
Yo folks
Josh ran some tests for me on Odin earlier today - the results show a
major improvement in our startup/shutdown performance. As you may
recall, our times grew roughly exponentially before - as the attached
graph shows, they now grow roughly linearly. The data also shows that
the
Yo folks
I have now completed the first three of these items. I believe this
brings ORTE to a stage that is - at the least - very close to release
quality. There are a few memory leaks left (oob and iof subsystems),
but I'm not as familiar with those and have asked for help.
Barring any
Yo folks
Several people have asked lately what I am planning to do next on
ORTE. Just to help maintain coordination, here is my current list of
planned activities (in priority order). Any requests/suggestions are
welcomed - this isn't in concrete by any means.
1. Add George's architecture
Very interesting - it built fine for me (building static). However,
the ns_base_nds.c file is "stale", so I just committed a "delete" of
that file. It shouldn't have been building anyway as it isn't in the
Makefile. My guess, therefore, is that you are building dynamically
and are encountering
Yo all
Per last week's discussions, I have created a set of new simplified
API's for the registry. These include:
1. orte_gpr.put_1 and orte_gpr.put_N: these allow you to put data on
the registry without having to define your own value structures. They
take a segment name, a NULL-terminated
Very interesting! Appreciate the info. My numbers are slightly better
- as I've indicated, there is a NxN message exchange currently in the
system that needs to be removed. With that commented out, the system
scales roughly linearly with number of processes.
At 04:31 PM 7/28/2005, you wrote:
201 - 240 of 240 matches
Mail list logo