Re: [OMPI devel] RFC: changes to modex

2008-04-15 Thread Tim Prins
Hate to bring this up again, but I was thinking that an easy way to reduce the size of the modex would be to reduce the length of the names describing each piece of data. More concretely, for a simple run I get the following names, each of which are sent over the wire for every proc (note that

Re: [OMPI devel] RFC: changes to modex

2008-04-03 Thread Jeff Squyres
On Apr 3, 2008, at 11:16 AM, Jeff Squyres wrote: The size of the openib modex is explained in btl_openib_component.c in the branch. It's a packed message now; we don't just blindly copy an entire struct. Here's the comment: /* The message is packed into multiple parts: * 1. a uint8_t

Re: [OMPI devel] RFC: changes to modex

2008-04-03 Thread Jeff Squyres
On Apr 3, 2008, at 8:52 AM, Gleb Natapov wrote: It'll increase it compared to the optimization that we're about to make. But it will certainly be a large decrease compared to what we're doing today May be I don't understand something in what you propose then. Currently when I run two procs

Re: [OMPI devel] RFC: changes to modex

2008-04-03 Thread Jeff Squyres
On Apr 3, 2008, at 9:18 AM, Gleb Natapov wrote: I am talking about openib part of the modex. The "garbage" I am referring to is this: FWIW, on the openib-cpc2 branch, the base data that is sent in the modex is this: uint64_t subnet_id; /** LID of this port */ uint16_t lid; /

Re: [OMPI devel] RFC: changes to modex

2008-04-03 Thread Gleb Natapov
On Thu, Apr 03, 2008 at 07:05:28AM -0600, Ralph H Castain wrote: > H...since I have no control nor involvement in what gets sent, perhaps I > can be a disinterested third party. ;-) > > Could you perhaps explain this comment: > > > BTW I looked at how we do modex now on the trunk. For OOB cas

Re: [OMPI devel] RFC: changes to modex

2008-04-03 Thread Ralph H Castain
H...since I have no control nor involvement in what gets sent, perhaps I can be a disinterested third party. ;-) Could you perhaps explain this comment: > BTW I looked at how we do modex now on the trunk. For OOB case more > than half the data we send for each proc is garbage. What "garbage

Re: [OMPI devel] RFC: changes to modex

2008-04-03 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 08:41:14PM -0400, Jeff Squyres wrote: > >> that it's the same for all procs on all hosts. I guess there's a few > >> cases: > >> > >> 1. homogeneous include/exclude, no carto: send all in node info; no > >> proc info > >> 2. homogeneous include/exclude, carto is used: send

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
On Apr 2, 2008, at 4:12 PM, Gleb Natapov wrote: I can specify different openib_if_include values for different procs on the same host. I know you *can*, but it is certainly uncommon. The common case is Uncommon - yes, but do you what to make it unsupported? No, there's no need for that. t

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 03:45:20PM -0400, Jeff Squyres wrote: > On Apr 2, 2008, at 1:58 PM, Gleb Natapov wrote: > >> No, I think it would be fine to only send the output after > >> btl_openib_if_in|exclude is applied. Perhaps we need an MCA param to > >> say "always send everything" in the case th

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
On Apr 2, 2008, at 1:58 PM, Gleb Natapov wrote: No, I think it would be fine to only send the output after btl_openib_if_in|exclude is applied. Perhaps we need an MCA param to say "always send everything" in the case that someone applies a non- homogeneous if_in|exclude set of values...? When i

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 12:08:47PM -0400, Jeff Squyres wrote: > On Apr 2, 2008, at 11:13 AM, Gleb Natapov wrote: > > On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote: > >> If we use carto to limit hcas/ports are used on a given host on a > >> per- > >> proc basis, then we can include

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
On Apr 2, 2008, at 11:13 AM, Gleb Natapov wrote: On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote: If we use carto to limit hcas/ports are used on a given host on a per- proc basis, then we can include some proc_send data to say "this proc only uses indexes X,Y,Z from the node data

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
On Apr 2, 2008, at 11:10 AM, Tim Prins wrote: Is there a reason to rename ompi_modex_{send,recv} to ompi_modex_proc_{send,recv}? It seems simpler (and no more confusing and less work) to leave the names alone and add ompi_modex_node_{send,recv}. If the arguments don't change, I don't have a

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Ralph H Castain
On 4/2/08 8:52 AM, "Terry Dontje" wrote: > Jeff Squyres wrote: >> WHAT: Changes to MPI layer modex API >> >> WHY: To be mo' betta scalable >> >> WHERE: ompi/mpi/runtime/ompi_module_exchange.* and everywhere that >> calls ompi_modex_send() and/or ompi_modex_recv() >> >> TIMEOUT: COB Fri 4 Ap

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote: > If we use carto to limit hcas/ports are used on a given host on a per- > proc basis, then we can include some proc_send data to say "this proc > only uses indexes X,Y,Z from the node data". The indexes can be > either uint8_ts, o

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Tim Prins
Is there a reason to rename ompi_modex_{send,recv} to ompi_modex_proc_{send,recv}? It seems simpler (and no more confusing and less work) to leave the names alone and add ompi_modex_node_{send,recv}. Another question: Does the receiving process care that the information received applies to a w

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Terry Dontje
Jeff Squyres wrote: WHAT: Changes to MPI layer modex API WHY: To be mo' betta scalable WHERE: ompi/mpi/runtime/ompi_module_exchange.* and everywhere that calls ompi_modex_send() and/or ompi_modex_recv() TIMEOUT: COB Fri 4 Apr 2008 DESCRIPTION: [...snip...] * int ompi_modex_node_send

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Jeff Squyres
On Apr 2, 2008, at 10:27 AM, Gleb Natapov wrote: In the case of openib BTL what part of modex are you going to send using proc_send() and what part using node_send()? In the /tmp-public/openib-cpc2 branch, almost all of it will go to the node_send(). The CPC's will likely now get 2 buffer

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 10:21:12AM -0400, Jeff Squyres wrote: > * int ompi_modex_proc_send(...): send modex data that is specific to > this process. It is just about exactly the same as the current API > call (ompi_modex_send). > [skip] > > * int ompi_modex_node_send(...): send modex dat