Re: [OMPI devel] RTE Framework

2013-01-24 Thread Richard Graham
n 22, 2013, at 11:31 PM, Richard Graham wrote: > Brian, > First - thanks. I am very happy this is proceeding. > General question here - do you have any idea how much global state sits > behind the current implementation ? What I am trying to gauge at what level > of gra

Re: [OMPI devel] [EXTERNAL] Re: RTE Framework

2013-01-24 Thread Richard Graham
>ompi/mca/rte framework is required to map those function calls to their >own implementation. The function calls themselves are just a rename of >the current ORTE calls, so the implementations must provide the same >functionality - they are simply free to do so however they choose. &g

Re: [OMPI devel] RTE Framework

2013-01-23 Thread Richard Graham
Brian, First - thanks. I am very happy this is proceeding. General question here - do you have any idea how much global state sits behind the current implementation ? What I am trying to gauge at what level of granularity one can bring in additional capabilities. I have not looked in deta

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Richard Graham
removing a non-matched request has no impact on the sequence number. george. On Jul 26, 2012, at 16:31 , Richard Graham wrote: > I do not see any resetting of sequence numbers. It has been a long time > since I have looked at the matching code, so don't know if the out-of-order

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread Richard Graham
I do not see any resetting of sequence numbers. It has been a long time since I have looked at the matching code, so don't know if the out-of-order handling has been taken out. If not, the sequence number has to be dealt with in some manner, or else there will be a gap in the arriving sequence

Re: [OMPI devel] non-blocking barrier

2012-07-06 Thread Richard Graham
Forget what I just posted - I looked at George's words, and not the code - wait() is the synchronization point, so George's response is correct. Rich -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca Sent: Friday, July 06

Re: [OMPI devel] non-blocking barrier

2012-07-06 Thread Richard Graham
Don't agree here - the only synchronization point is the completion. Ibarrier can't be completed until all have entered the barrier, but each process can leave the ibarrier() call as soon as they want to. Rich -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@o

Re: [OMPI devel] Modex

2012-06-13 Thread Richard Graham
holds the interface. Besides, "pineapple" hit a roadblock during the call and is a totally separate discussion. On Jun 13, 2012, at 7:03 AM, Richard Graham wrote: > I would suggest exposing modex at the pineapple level, and not tie it to a > particular instance of run-tim

Re: [OMPI devel] Modex

2012-06-13 Thread Richard Graham
I would suggest exposing modex at the pineapple level, and not tie it to a particular instance of run-time instantiation. This decouples the instantiation from the details of the run-time, and also gives the freedom to provide different instantiations for different job scenarios. Rich -Or

Re: [OMPI devel] Meta Question -- Open MPI: Is it a dessert toppingor is it a floor wax?

2009-03-12 Thread Richard Graham
I am assuming that by distributed OS you are referring to the changes that we (not just ORNL) are trying to do. If this is the case, this is a mischaracterization of the of out intentions. We have two goals - To be able to use a different run-time than ORTE to drive Open MPI - To use the com

Re: [OMPI devel] RFC: move BTLs out of ompi into separate layer

2009-03-11 Thread Richard Graham
Brian, Going back over the e-mail trail it seems like you have raised two concerns: - BTL performance after the change, which I would take to be - btl latency - btl bandwidth - Code maintainability - repeated code changes that impact a large number of files - A demonstration t

Re: [OMPI devel] mca_btl_sm_sendi question

2009-02-25 Thread Richard Graham
It really does not matter what one does with the sm sends that can't be posted to the FIFO, as long as they are posted at some later time. The current implementation generates does not rely on the ordering memory provides, but generates a sequence number and uses this in the matching, just like an

Re: [OMPI devel] add_procs

2009-02-05 Thread Richard Graham
I would leave the code alone. The intent was for (A), but it is not used for that. It is not in the performance critical region, works correctly as we use it today, and putting it back later on would be a hassle not needed. Rich On 2/5/09 2:41 PM, "Eugene Loh" wrote: > BTLs have "add_procs"

Re: [OMPI devel] "unknown" in-coming fragment in sm BTL

2009-02-05 Thread Richard Graham
In the pt-2-pt code, the default case should never be hit - it would be a bug in the code. Don't know about other uses of the sm btl. Rich On 2/5/09 12:30 PM, "Eugene Loh" wrote: > In btl_sm_component.c, mca_btl_sm_component_progress() polls on FIFOs. > If it gets something, it has a "switch"

Re: [OMPI devel] RFC: Move of ompi_bitmap_t

2009-02-01 Thread Richard Graham
>>> >>> So two were created. Then the orte_bitmap_t was blown away at a >>>> >>> later time when we removed the GPR as George felt it wasn't >>>> >>> necessary (which was true). It was later reborn when we needed it >>>> &g

Re: [OMPI devel] RFC: Move of ompi_bitmap_t

2009-01-30 Thread Richard Graham
> I think it primarily is a question for the Fortran folks to address - > can they deal with Fortran limits in some other manner without making > the code unmanageable and/or taking a performance hit? > > Ralph > > > On Jan 30, 2009, at 2:40 PM, Richard Graham wrote: >

Re: [OMPI devel] RFC: Move of ompi_bitmap_t

2009-01-30 Thread Richard Graham
This should really be viewed as a code maintenance RFC. The reason this came up in the first place is because we are investigating the btl move, but these are really two very distinct issues. There are two bits of code that have virtually the same functionality - they do have the same interface I

Re: [OMPI devel] RFC: sm Latency

2009-01-22 Thread Richard Graham
rules. Rich On 1/22/09 12:51 PM, "Eugene Loh" wrote: > Richard Graham wrote: >> Re: [OMPI devel] RFC: sm Latency In the recvi function, do you first try to >> match off the unexpected list before you try and match data in the fifo¹s? >> > Within the proposed

Re: [OMPI devel] RFC: sm Latency

2009-01-22 Thread Richard Graham
BTW, In the recvi function, do you first try to match off the unexpected list before you try and match data in the fifo¹s ? Rich On 1/21/09 8:00 PM, "Eugene Loh" wrote: > Ron Brightwell wrote: >> >>> >>> If you poll only the queue that correspond to a posted receive, you only >>> optim

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Richard Graham
On 1/20/09 8:53 PM, "Jeff Squyres" wrote: > This all sounds really great to me. I agree with most of what has > been said -- e.g., benchmarks *are* important. Improving them can > even sometimes have the side effect of improving real applications. ;-) > > My one big concern is the moving o

Re: [OMPI devel] RFC: sm Latency

2009-01-20 Thread Richard Graham
On 1/20/09 2:08 PM, "Eugene Loh" wrote: > Richard Graham wrote: >> Re: [OMPI devel] RFC: sm Latency First, the performance improvements look >> really nice. >> A few questions: >> - How much of an abstraction violation does this introduce? > D

Re: [OMPI devel] RFC: sm Latency

2009-01-17 Thread Richard Graham
First, the performance improvements look really nice. A few questions: - How much of an abstraction violation does this introduce ? This looks like the btl needs to start “knowing” about MPI level semantics. Currently, the btl purposefully is ulp agnostic. I ask for 2 reasons - you ment

Re: [OMPI devel] sm BTL "extra procs"

2008-12-23 Thread Richard Graham
Not needed now. Since we did not want to deal with trying to grow the shared memory file after it's allocation, with all the required synchronization, we allocated extra memory up front - for dynamic process control. Since this has never been enabled, we really don't need this extra memory. Rich

Re: [OMPI devel] RFC: make predefined handles extern to pointers

2008-12-17 Thread Richard Graham
Terry, Is there any way you can quantify the cost ? This seems reasonable, but would be nice to get an idea what the performance cost is (and not within a tight loop where everything stays in cache). Rich On 12/16/08 10:41 AM, "Terry D. Dontje" wrote: > WHAT: To make predefined handles ext

Re: [OMPI devel] shared-memory allocations

2008-12-13 Thread Richard Graham
>> > > > On 12/12/08 8:21 PM, "Eugene Loh" wrote: > > Richard Graham wrote: > Re: [OMPI devel] shared-memory allocations The memory allocation is intended to take into account that two separate procs may be touching the same memory, so the intent is to red

Re: [OMPI devel] shared-memory allocations

2008-12-12 Thread Richard Graham
It has been a long time since I wrote the original code, and things have changed a fair amount since that time, so bear this in mind. The memory allocation is intended to take into account that two separate procs may be touching the same memory, so the intent is to reduce cache conflicts (false sh

Re: [OMPI devel] BTL move - the notion

2008-12-05 Thread Richard Graham
CI community and the OMPI community are not two non-overlapping groups, and run-time support we want to bring into OMPI is to support new functionality. The main point is that this is not STCI vs. OMPI at all. Rich > > My $0.0002 - hope it helps > Ralph > > > On Dec 4, 2008, at

Re: [OMPI devel] BTL move - the notion

2008-12-05 Thread Richard Graham
> > On 12/5/08 6:49 AM, "Terry D. Dontje" wrote: > > Richard Graham wrote: > > Let me start the e-mail conversation, and see how far we get. > > > > Goal: The goal several of us have is to be able to use the btl’s > > outside of the MPI layer

[OMPI devel] BTL move - the notion

2008-12-04 Thread Richard Graham
Let me start the e-mail conversation, and see how far we get. Goal: The goal several of us have is to be able to use the btl¹s outside of the MPI layer in Open MPI. The layer itself is generic, w/o specific knowledge of Upper Level Protocols, so is well suited for this sort of use. Technical App

Re: [OMPI devel] Jan ORTE meeting

2008-12-04 Thread Richard Graham
How about if we start on this over e-mail and phone ? A face-to-face meeting is good, but I am already booked Jan 5-9, maybe 12-13, Jan 16-Feb 6th, and Feb 8-11. I would prefer not to tack on something at the end of the MPI Forum meeting, as I will have been gone from home for most of the month b

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Richard Graham
xt. The >> > proposed approach contains a number of impacts that may be avoided >> > with an alternative approach. >> > >> > Without such a meeting, I fear we are going to rapidly dissolve into >> > email hell again. >> > >> > Ralph >

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Richard Graham
ckly, while having the error reporting > mechanism right where the error occurs represents the minimal impact and > maximum flexibility. > >>> >> more flexibility is obtained if the data is passed up the call stack, and >>> handled by the layer that wants to. >

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Richard Graham
s what transport) is an unlikely thing >> > to happen. >> > >> > Besides, one of the primary reasons for needing to call notifier is a >> > failure in the btl - so relying on the btl to send the message is >> > self-defeating. >> > >

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Richard Graham
t;> > a strong reason for not renaming them -- most could probably be >> > renamed to OPAL_* -- we just didn't do it then. Perhaps they can be >> > changed during the BTL extraction process (I noted this on the wiki). >> > >> > >> > >> > On Dec 3,

Re: [OMPI devel] Preparations for moving the btl's

2008-12-04 Thread Richard Graham
nfigure >> > (i.e., opal/include/opal_config.h) and were not renamed back when we >> > split the code base into OPAL, ORTE, and OMPI. I don't think we had >> > a strong reason for not renaming them -- most could probably be >> > renamed to OPAL_* -- we just

Re: [OMPI devel] Preparations for moving the btl's

2008-12-03 Thread Richard Graham
BTW, I was guessing FTB is Fault Tolerant Backbone, but if not, can someone tell me what it is ? If it is not the later, what I just wrote about it makes no sense. Rich On 12/3/08 9:34 PM, "Richard Graham" wrote: > The goal is to use the btl¹s outside of the context of MPI, w

Re: [OMPI devel] Preparations for moving the btl's

2008-12-03 Thread Richard Graham
>> >> The BTLs might have added calls to the notifier framework in their >>> >> error paths. >>> >> The notifier framework is currently in the ORTE layer... not sure >>> >> if we could >>> >> move it down to OPAL. Ralph, any

Re: [OMPI devel] Preparations for moving the btl's

2008-12-03 Thread Richard Graham
een working on some Fastpath code changes that we >> should make sure neither project obliterates the other. >> >> --td >> >> Richard Graham wrote: >>> Now that 1.3 will be released, we would like to go ahead with the >>> plan to move the btl¹s out of th

Re: [OMPI devel] Preparations for moving the btl's

2008-12-03 Thread Richard Graham
of the BTLs to call opal routines? > > --td > > Richard Graham wrote: >> > Now that 1.3 will be released, we would like to go ahead with the plan >> > to move the btl¹s out of the MPI layer. Greg Koenig who is doing most >> > of the work has started a wiki pa

Re: [OMPI devel] Preparations for moving the btl's

2008-12-03 Thread Richard Graham
> > On Dec 3, 2008, at 7:46 AM, Richard Graham wrote: > >> Now that 1.3 will be released, we would like to go ahead with the plan to >> move the btl¹s out of the MPI layer. Greg Koenig who is doing most of the >> work has started a wiki page with details on the pl

[OMPI devel] Preparations for moving the btl's

2008-12-03 Thread Richard Graham
Now that 1.3 will be released, we would like to go ahead with the plan to move the btl¹s out of the MPI layer. Greg Koenig who is doing most of the work has started a wiki page with details on the plans. Right now details are sketchy, as Greg is digging through the code, and has only hand writte

Re: [OMPI devel] SM backing file size

2008-11-14 Thread Richard Graham
Agreed. On 11/14/08 9:56 AM, "Ralph Castain" wrote: > > On Nov 14, 2008, at 7:41 AM, Richard Graham wrote: > >> Just a few comments: >>- not sure what sort of alternative memory approach is being considered. >> The current approach was selected fo

Re: [OMPI devel] SM backing file size

2008-11-14 Thread Richard Graham
Just a few comments: - not sure what sort of alternative memory approach is being considered. The current approach was selected for two reasons: - If something like anonymous memory is being used, one can only inherit access to the shared files, so one process needs set up the shared me

Re: [OMPI devel] Dec ORTE design meeting

2008-11-04 Thread Richard Graham
I plan to attend. Rich On 10/31/08 12:03 PM, "Ralph Castain" wrote: > Hello all > > Those of us who participated in the July ORTE meeting had so much fun, > we decided it was worth doing again! :-) > > Seriously, there are design issues that would benefit by a face-to- > face meeting with a

Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-23 Thread Richard Graham
cluster node, you are >>>>> >>>> going to get mynameN, or something similar. If you do a >>>>> >>>> gethostname() on an ALPS node, you are going to get nidN; there is >>>>> >>>> no differentiation between clu

Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Richard Graham
ant code consequences when we look at abnormal > terminations, comm_spawn, etc. > > Thanks > Ralph > > On Sep 22, 2008, at 11:26 AM, Richard Graham wrote: > >> This check in was in error - I had not realized that the checkout >> was from >> the 1.3 branch, so we w

Re: [OMPI devel] [OMPI svn] svn:open-mpi r19600

2008-09-22 Thread Richard Graham
This check in was in error - I had not realized that the checkout was from the 1.3 branch, so we will fix this, and put these into the trunk (1.4). We are going to bring in some limited multi-cluster support - limited is the operative word. Rich On 9/22/08 12:50 PM, "Jeff Squyres" wrote: > I

Re: [OMPI devel] v1.3: libnbc and sm2 coll components

2008-07-21 Thread Richard Graham
No need for SM2 right now. We need a change to the communicator, before we can bring this over, and have just gotten around to addressing this yet. Rich On 7/21/08 3:40 PM, "Jeff Squyres" wrote: > Should these 2 components be in v1.3? > > -- > Jeff Squyres > Cisco Systems > > _

Re: [OMPI devel] Trunk check-in policy until the branch for 1.3

2008-05-21 Thread Richard Graham
Thanks, Rich On 5/20/08 10:37 PM, "Brad Benton" wrote: > > > 2008/5/20 Richard Graham : >> Brad, >> Do you want these for bug fixes too ? > > I think that it's okay to check in small bug fixes without a ticket. I know > this is a somewhat nebu

Re: [OMPI devel] Trunk check-in policy until the branch for 1.3

2008-05-20 Thread Richard Graham
Brad, Do you want these for bug fixes too ? Rich On 5/20/08 5:53 PM, "Brad Benton" wrote: > All: > > In order to better track changes on the trunk until we branch for 1.3, we (the > release managers) would like to ask that all trunk checkins have corresponding > tickets associated with them

[OMPI devel] Process "layout"

2008-05-07 Thread Richard Graham
Is there a way to trick ompi/orte into thinking that a single node is actually a collection of several smp nodes interconnected with tcp ? If so, can someone give me a hint how to set this up ? I want to create a hierarchy on my laptop for testing purposes. Thanks, Rich

Re: [OMPI devel] SIGUSR2 response

2008-04-17 Thread Richard Graham
Ralph, Thanks for looking into this. I do not think that the behaviour needs to change - it is correct. However, for some reason this is not how things were running for me - I wander what the difference is. I worked around this by getting the pid's of the mpi processes, and delivered the sign

Re: [OMPI devel] Signals

2008-04-08 Thread Richard Graham
On 4/8/08 2:19 PM, "Ralph H Castain" wrote: > > > > On 4/8/08 12:10 PM, "Pak Lui" wrote: > >> Richard Graham wrote: >>> What happens if I deliver sigusr2 to mpirun ? What I observe (for both >>> ssh/rsh and torque) tha

Re: [OMPI devel] Signals

2008-04-08 Thread Richard Graham
the user processes, and let them decide what to do with the signals >> themselves. >> >> SGE needed this so the job kill or job suspension notification to work >> properly since they would send a SIGUSR1/2 to mpirun. I believe this is >> probably what you need in the

[OMPI devel] Signals

2008-04-08 Thread Richard Graham
I am running into a situation where I am trying to deliver a signal to the mpi procs (sigusr2). I deliver this to mpirun, which propagates it to the mpi procs, but then proceeds to kill the children. Is there an easy way that I can get around this ? I am using this mechanism in a situation where

[OMPI devel] Latency optimizations

2008-02-28 Thread Richard Graham
FYI, About six months ago several of us spent some time coming up with a plan to deal with the latency problems in Open MPI. George went ahead and has been implementing the send side changes of this optimization over the last several months, but has not had time to get to the receive side. Galen i

Re: [OMPI devel] Orte collectives

2008-01-29 Thread Richard Graham
o be delivered > - no harm done, it just gets ignored. > > Hope that helps. Let me know if you need some variant as I am adding (not > reducing or changing) grpcomm capabilities on the tmp branch. > > Ralph > > > > On 1/29/08 12:19 PM, "Richard Graham&qu

[OMPI devel] Orte collectives

2008-01-29 Thread Richard Graham
Are the group operations in ORTE (I assume this is what the grpcomm component does) available to subsets of a job, or do all procs in the orte_jobid_t need to invoke this ? Thanks, Rich

Re: [OMPI devel] matching code rewrite in OB1

2007-12-17 Thread Richard Graham
04:21PM -0500, Richard Graham wrote: >> > Yes, should be a bit more clear. Need an independent way to verify that >> > data is matched >> > in the correct order ­ sending this information as payload is one way to >> do >> > this. So, >> > sendi

Re: [OMPI devel] matching code rewrite in OB1

2007-12-14 Thread Richard Graham
rder situations. Rich On 12/14/07 2:20 AM, "Gleb Natapov" wrote: > On Thu, Dec 13, 2007 at 06:16:49PM -0500, Richard Graham wrote: >> The situation that needs to be triggered, just as George has mentions, is >> where we have a lot of unexpected messages, to make sure th

Re: [OMPI devel] matching code rewrite in OB1

2007-12-13 Thread Richard Graham
t;> > On Wed, 12 Dec 2007, Gleb Natapov wrote: >> > >>> >> On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote: >>>> >>> This is better than nothing, but really not very helpful for >>>> >>> looking at the >>

Re: [OMPI devel] matching code rewrite in OB1

2007-12-13 Thread Richard Graham
t;> >>> On Wed, 12 Dec 2007, Gleb Natapov wrote: >>> >>>> On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote: >>>>> This is better than nothing, but really not very helpful for >>>>> looking at the >>>>> specific iss

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Richard Graham
, but I can send you the patch. > I can send you a tarball too, but I prefer to not abuse email. > >> >> >> On Dec 11, 2007, at 4:14 PM, Richard Graham wrote: >> >>> I will re-iterate my concern. The code that is there now is mostly >>> nine >>

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Richard Graham
gt;>> Good Idea I'll try this. BTW I thing the reason for such a high rate of >>> reordering in UD is that it polls for MCA_BTL_UD_NUM_WC completions >>> (500) and process them one by one and if progress function is called >>> recursively next 500 completion wi

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Richard Graham
Gleb, I would suggest that before this is checked in this be tested on a system that has N-way network parallelism, where N is as large as you can find. This is a key bit of code for MPI correctness, and out-of-order operations will break it, so you want to maximize the chance for such operations

Re: [OMPI devel] IB pow wow notes

2007-12-05 Thread Richard Graham
One question ­ there is a mention a new pml that is essentially CM+matching. Why is this no just another instance of CM ? Rich On 11/26/07 7:54 PM, "Jeff Squyres" wrote: > OMPI OF Pow Wow Notes > 26 Nov 2007 > > --- > >

Re: [OMPI devel] THREAD_MULTIPLE

2007-11-28 Thread Richard Graham
Are there not users that are using THREAD_MULTIPLE successfully, or is this on the trunk ? We know that the code is not where it needs to be, but if this takes away current functionality, I would prefer to display a warning. Rich On 11/28/07 11:27 AM, "Jeff Squyres" wrote: > We've had a few u

Re: [OMPI devel] ORTE process name and nodeid

2007-11-18 Thread Richard Graham
What is the exact purpose of the process name ? Rich On 11/17/07 5:27 PM, "Shipman, Galen M." wrote: > > > I am doing some work on Cray's CNL to support shared memory. To support > shared memory I need to know if processes are local or remote. For other > systems we simply use the modex in o

[OMPI devel]

2007-11-18 Thread Richard Graham
Any suggestions on ISV's that should be notified about the MPI 2.1+ effort that is being started ? Any vendors that may have been missed ? Rich

[OMPI devel] FW: [mpi-21] Follow up on the MPI Forum meeting

2007-11-18 Thread Richard Graham
-- Forwarded Message From: Richard Graham Reply-To: List-Post: devel@lists.open-mpi.org Date: Fri, 16 Nov 2007 23:21:16 -0500 To: Conversation: Follow up on the MPI Forum meeting Subject: [mpi-21] Follow up on the MPI Forum meeting Here is a brief summary of the meeting held at SC07 in

[OMPI devel] FW: [mpi-21] Follow up on the MPI Forum meeting

2007-11-17 Thread Richard Graham
-- Forwarded Message From: Richard Graham Reply-To: List-Post: devel@lists.open-mpi.org Date: Fri, 16 Nov 2007 23:21:16 -0500 To: Conversation: Follow up on the MPI Forum meeting Subject: [mpi-21] Follow up on the MPI Forum meeting Here is a brief summary of the meeting held at SC07 in

Re: [OMPI devel] collective problems

2007-11-08 Thread Richard Graham
On 11/8/07 4:03 AM, "Gleb Natapov" wrote: > On Wed, Nov 07, 2007 at 11:25:43PM -0500, Patrick Geoffray wrote: >> Richard Graham wrote: >>> The real problem, as you and others have pointed out is the lack of >>> predictable time slices for the progres

Re: [OMPI devel] collective problems

2007-11-07 Thread Richard Graham
On 11/8/07 12:25 AM, "Patrick Geoffray" wrote: > Richard Graham wrote: >> The real problem, as you and others have pointed out is the lack of >> predictable time slices for the progress engine to do its work, when relying >> on the ULP to make calls into the

Re: [OMPI devel] collective problems

2007-11-07 Thread Richard Graham
n't like that idea, so > we came up with a way for back pressure from the BTL to say "it's not > on the wire yet". This is more complicated than just not marking MPI > completion early, but why would we do something that helps real apps > at the expense of benchmarks

Re: [OMPI devel] collective problems

2007-11-07 Thread Richard Graham
Does this mean that we don¹t have a queue to store btl level descriptors that are only partially complete ? Do we do an all or nothing with respect to btl level requests at this stage ? Seems to me like we want to mark things complete at the MPI level ASAP, and that this proposal is not to do

Re: [OMPI devel] openib currently broken

2007-11-04 Thread Richard Graham
Much higher (23), but in way over-subscribed mode ... On 11/2/07 8:07 PM, "Jeff Squyres (jsquyres)" wrote: > Did you run with a higher number of procs? > > -jms > Sent from my PDA > > -Original Message----- > From: Richard Graham [mailto:rlgra...@ornl.

Re: [OMPI devel] openib currently broken

2007-11-02 Thread Richard Graham
ntrivial omb to check? > > -jms > Sent from my PDA > > -Original Message- > From: Richard Graham [mailto:rlgra...@ornl.gov] > Sent: Friday, November 02, 2007 02:07 PM Eastern Standard Time > To: Open MPI Developers > Subject:Re: [OMPI devel] openi

Re: [OMPI devel] openib currently broken

2007-11-02 Thread Richard Graham
h last nights put back but I have not > looked into it yet. > > -DON > > Richard Graham wrote: > >> R16641 should have fixed the regression. Anyone using >> ompi_free_list_t_ex() and providing >> a memory allocator would have been bitten by this, since I did no

Re: [OMPI devel] openib currently broken

2007-11-02 Thread Richard Graham
defined. From looking through the btls, this seems to be only the openib btl. Rich On 11/2/07 12:31 PM, "Richard Graham" wrote: > > > > On 11/2/07 12:21 PM, "Jeff Squyres" wrote: > >> The freelist changes from yesterday appear to have broken the o

Re: [OMPI devel] openib currently broken

2007-11-02 Thread Richard Graham
On 11/2/07 12:21 PM, "Jeff Squyres" wrote: > The freelist changes from yesterday appear to have broken the openib > btl. We didn't get lots of test failures in MTT last night only > because there was a separate (unrelated) typo in the ofud BTL that > prevented the nightly tarball from buildin

[OMPI devel] Changes to ompi_free_list initialization

2007-11-01 Thread Richard Graham
I have just gone through and re-implemented the changes ompi_free_list_t in the trunk, and have changed all instances of ompi_free_list_init() to ompi_free_list_init_new() (keeping the old version around for a while). I have tested this with ob1 and dr (the system I use for cm is not available), w

[OMPI devel] FW: [devel-core] [RFC] Proposed changes to ompi_free_list

2007-10-15 Thread Richard Graham
-- Forwarded Message From: Richard Graham List-Post: devel@lists.open-mpi.org Date: Wed, 12 Sep 2007 19:53:19 -0400 Conversation: [RFC] Proposed changes to ompi_free_list Subject: [devel-core] [RFC] Proposed changes to ompi_free_list Proposed changes to the ompi_free_list: Please comment by: 9/19/2007 The

[OMPI devel] FW: [mpi-21] SC'07 MPI Forum organization meeting

2007-10-12 Thread Richard Graham
-- Forwarded Message From: Richard Graham Reply-To: "mpi...@mpi-forum.org" List-Post: devel@lists.open-mpi.org Date: Fri, 12 Oct 2007 12:48:04 -0400 To: "mpi...@mpi-forum.org" Conversation: SC'07 MPI Forum organization meeting Subject: [mpi-21] SC'07 MPI

Re: [OMPI devel] [RFC] change wrapper compilers from binaries to shellscripts

2007-10-12 Thread Richard Graham
That¹s the plan. Rich On 10/12/07 8:21 AM, "Terry Dontje" wrote: > Will these new scripts be using the same wrapper config files as the > binaries were? > > --td > > Richard Graham wrote: >> > What: Change the mpicc/mpicxx/mpif77/mpif90 from being

[OMPI devel] [RFC] change wrapper compilers from binaries to shell scripts

2007-10-11 Thread Richard Graham
What: Change the mpicc/mpicxx/mpif77/mpif90 from being binaries to being shell scripts Why: Our build environment assumes that wrapper compilers will use the same binary format that the Open MPI libraries do. In cross-compile environment, the MPI wrapper compilers will run on the front-end and n

Re: [OMPI devel] Module Design Concept

2007-10-09 Thread Richard Graham
One of the assumptions about the MTL¹s is that only a given MTL can handle the message matching for communications. This is done to accommodate mpi-like network stack that also handle the MPI message matching, which often do not expose their internal data structures used for matching. Open MPI

[OMPI devel] FW: Meeting at SC'07

2007-10-05 Thread Richard Graham
FYI. Rich -- Forwarded Message From: Richard Graham Reply-To: List-Post: devel@lists.open-mpi.org Date: Fri, 5 Oct 2007 03:55:27 -0400 To: Conversation: Meeting at SC'07 Subject: Re: Meeting at SC'07 I will have an agenda out in a week or so, so if people have specific items

Re: [OMPI devel] Use of the ompi free list

2007-09-26 Thread Richard Graham
eaving directory > > `/opt/mtt/64/non-threaded/free-list-branch-testing/installs/3m5g/src/free_list > /ompi' > make: *** [all-recursive] Error 1 > > > > > Richard Graham wrote: > >> >We are looking at making some changes to the ompi free list in

[OMPI devel] Use of the ompi free list

2007-09-26 Thread Richard Graham
We are looking at making some changes to the ompi free list in ompi/class/ ompi_free_list.[c,h] , and are trying to decide if to go ahead with an interface change that will allow separate control over alignment of the frag and payload data structures. We are aware of several implementations of btl

Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Richard Graham
If you are going to look at it, I will not bother with this. Rich On 8/29/07 10:47 AM, "Gleb Natapov" wrote: > On Wed, Aug 29, 2007 at 10:46:06AM -0400, Richard Graham wrote: >> Gleb, >> Are you looking at this ? > Not today. And I need the code to reprodu

Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Richard Graham
Gleb, Are you looking at this ? Rich On 8/29/07 9:56 AM, "Gleb Natapov" wrote: > On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote: >> Is this trunk or 1.2? > Oops. I should read more carefully :) This is trunk. > >> >> On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje w

Re: [OMPI devel] Maximum Shared Memory Segment - OK to increase?

2007-08-27 Thread Richard Graham
Rolf, Would it be better to put this parameter in the system configuration file, rather than change the compile time option ? Rich On 8/27/07 3:10 PM, "Rolf vandeVaart" wrote: > We are running into a problem when running on one of our larger SMPs > using the latest Open MPI v1.2 branch. We

Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Richard Graham
On 8/13/07 3:52 PM, "Gleb Natapov" wrote: > On Mon, Aug 13, 2007 at 09:12:33AM -0600, Galen Shipman wrote: > Here are the > items we have identified: > All those things sounds very promising. Is there > tmp branch where you are going to work on this? > > tmp/latency Some changes have alr

Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Richard Graham
On 8/13/07 12:34 PM, "Galen Shipman" wrote: > Ok here is the numbers on my machines: 0 bytes mvapich with header caching: 1.56 mvapich without header caching: 1.79 ompi 1.2: 1.59 So on zero bytes ompi not so bad. Also we can see that header cachin

Re: [OMPI devel] [RFC] Sparse group implementation

2007-07-25 Thread Richard Graham
This is good work, so I am happy to see it come over. My initial understanding was that there would be compile time protection for this. In the absence of this, I think we need to see performance data on a variety of communication substrates. It seems like a latency measurement is, perhaps, t

Re: [OMPI devel] threaded builds

2007-06-12 Thread Richard Graham
We should not pretend that threads work in the 1.2 code branch. Thread safety has been designed in, but we are just kicking off an effort to complete and verify the thread safety. Rich On 6/11/07 2:49 PM, "Paul H. Hargrove" wrote: > If Jeff has the resources to run threaded tests against 1.