Re: [OMPI devel] The" Missing Symbol" issue and OpenMPI on NetBSD

2010-05-18 Thread Kevin . Buckley
> I added several FAQ items -- how do they look? > > http://www.open-mpi.org/faq/?category=troubleshooting#erroneous-file-not-found-message > http://www.open-mpi.org/faq/?category=troubleshooting#missing-symbols > http://www.open-mpi.org/faq/?category=building#install-overwrite > "This is due to

Re: [OMPI devel] The" Missing Symbol" issue and OpenMPI on NetBSD

2010-05-18 Thread Jeff Squyres
I added several FAQ items -- how do they look? http://www.open-mpi.org/faq/?category=troubleshooting#erroneous-file-not-found-message http://www.open-mpi.org/faq/?category=troubleshooting#missing-symbols http://www.open-mpi.org/faq/?category=building#install-overwrite On May 17, 2010, at 9:15 AM

Re: [OMPI devel] Bug in opal sos changes

2010-05-18 Thread Ralph Castain
Hmmm...well, the way that function -used- to work was it returned an error code, and had the index as a *int param in the function call. Tim P changed it awhile back (don't remember exactly why, but it was when he moved the pointer_array code from orte to opal), and I'm not sure the fixes it requir

Re: [OMPI devel] Bug in opal sos changes

2010-05-18 Thread Abhishek Kulkarni
On Tue, 18 May 2010, Rolf vandeVaart wrote: I think we are almost saying the same thing. But to be sure, I will restate. The call to opal_pointer_array_add() can return either an index (which I assume is a positive integer, maybe also 0?) or OPAL_ERR_OUT_OF_RESOURCE (which is a -2) if it can

Re: [OMPI devel] Bug in opal sos changes

2010-05-18 Thread Rolf vandeVaart
I think we are almost saying the same thing. But to be sure, I will restate. The call to opal_pointer_array_add() can return either an index (which I assume is a positive integer, maybe also 0?) or OPAL_ERR_OUT_OF_RESOURCE (which is a -2) if it cannot malloc anymore space in the table. So, I g

Re: [OMPI devel] Bug in opal sos changes

2010-05-18 Thread Jeff Squyres
Looks like the comparison to OMPI_ERROR worked by accident -- just because it happened to have a value of -1. The *_f_to_c_index values are the return value from a call to opal_point_array_add(). This value will either be non-negative or -1. -1 indicates a failure. It's not an *_ERR_* code

[OMPI devel] Bug in opal sos changes

2010-05-18 Thread Rolf vandeVaart
I am getting SEGVs while running the IMB-MPI1 tests. I believe the problem has to do with changes made to the group_init.c file last night. The error occurs in the call to MPI_Comm_split. There were 4 changes in the file that look like this: OLD: if (OMPI_ERROR == new_group->grp_f_to_c_index

Re: [OMPI devel] /dev/shm usage

2010-05-18 Thread Jeff Squyres
On May 18, 2010, at 10:58 AM, Paul H. Hargrove wrote: > I agree that /dev/shm introduces extra complications and should not be > the default. The FAQ text I provided was intended to suggest /dev/shm > as a session dir (or session root) ONLY for people who had diskless > nodes and thus no obvious

Re: [OMPI devel] /dev/shm usage

2010-05-18 Thread Paul H. Hargrove
Jeff Squyres wrote: [snip] Ralph and I talked about this on the phone a bit this morning. There's several complicating factors in using /dev/shm (aren't there always? :-) ). [snip] --> This seems to imply that using /dev/shm should not be default behavior. [snip] I agree that /dev/sh

Re: [OMPI devel] /dev/shm usage (was: Very poor performance with btlsm...)

2010-05-18 Thread Jeff Squyres
I was reminded this morning (by 2 people :-) ) that the sysv shmem stuff was initiated a long time ago as a workaround for many of these same issues (including the potential performance issues). Sam's work is nearly complete; I think that -- at least on Linux -- the mmap performance issues can

Re: [OMPI devel] RFC 2/2: merge the OPAL SOS development branchinto trunk

2010-05-18 Thread Jeff Squyres
Indeed. Nice job yesterday, Abhishek. You did it better than my hwloc merge into the trunk! :-) On May 18, 2010, at 9:20 AM, Josh Hursey wrote: > Abhishek and Jeff, > > Awesome! Thanks for all your hard work maintaining and shepherding > this branch into the trunk. > > -- Josh > > On May

Re: [OMPI devel] RFC 2/2: merge the OPAL SOS development branch into trunk

2010-05-18 Thread Josh Hursey
Abhishek and Jeff, Awesome! Thanks for all your hard work maintaining and shepherding this branch into the trunk. -- Josh On May 17, 2010, at 9:20 PM, Abhishek Kulkarni wrote: On May 14, 2010, at 12:24 PM, Josh Hursey wrote: On May 12, 2010, at 1:07 PM, Abhishek Kulkarni wrote: Updat

Re: [OMPI devel] RFC: Remove all other paffinity components

2010-05-18 Thread Jeff Squyres
On May 18, 2010, at 8:31 AM, Terry Dontje wrote: > The above sounds like you are replacing the whole paffinity framework with > hwloc. Is that true? Or is the hwloc accessors you are talking about > non-paffinity related? Good point; these have all gotten muddled in the email chain. Let me r

[OMPI devel] /dev/shm usage (was: Very poor performance with btl sm...)

2010-05-18 Thread Jeff Squyres
Ralph and I talked about this on the phone a bit this morning. There's several complicating factors in using /dev/shm (aren't there always? :-) ). 0. Note that anything in /dev/shm will need to have session-directory-like semantics: there needs to be per-user and per-job characteristics (e.g.,

Re: [OMPI devel] RFC: Remove all other paffinity components

2010-05-18 Thread Terry Dontje
Jeff Squyres wrote: Just chatted with Ralph about this on the phone and he came up with a slightly better compromise... He points out that we really don't need *all* of the hwloc API (there's a bajillion tiny little accessor functions). We could provide a steady, OPAL/ORTE/OMPI-specific API

Re: [OMPI devel] RFC: Remove all other paffinity components

2010-05-18 Thread Jeff Squyres
Just chatted with Ralph about this on the phone and he came up with a slightly better compromise... He points out that we really don't need *all* of the hwloc API (there's a bajillion tiny little accessor functions). We could provide a steady, OPAL/ORTE/OMPI-specific API (probably down in opal

Re: [OMPI devel] Very poor performance with btl sm on twin nehalem servers with Mellanox ConnectX installed

2010-05-18 Thread Sylvain Jeaugey
I would go further on this : when available, putting the session directory in a tmpfs filesystem (e.g. /dev/shm) should give you the maximum performance. Again, when using /dev/shm instead of the local /tmp filesystem, I get a consistent 1-5us latency improvement on a barrier at 32 cores (on a