Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-19 Thread Ralph Castain
It is related, but it means that coll/ml has a higher degree of sensitivity to the binding pattern than what you reported (which was that coll/ml doesn't work with unbound processes). What we are now seeing is that coll/ml also doesn't work when processes are bound across sockets. Which means t

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-19 Thread Gilles Gouaillardet
Ralph and Tetsuya, is this related to the hang i reported at http://www.open-mpi.org/community/lists/devel/2014/06/14975.php ? Nathan already replied he is working on a fix. Cheers, Gilles On 2014/06/20 11:54, Ralph Castain wrote: > My guess is that the coll/ml component may have problems wit

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-19 Thread Ralph Castain
My guess is that the coll/ml component may have problems with binding a single process across multiple cores like that - it might be that we'll have to have it check for that condition and disqualify itself. It is a particularly bad binding pattern, though, as shared memory gets completely messe

Re: [OMPI devel] Agenda for next week

2014-06-19 Thread Artem Polyakov
Hello, I would like to participate in PMI and modex discussions remotely. 2014-06-19 22:44 GMT+07:00 Jeff Squyres (jsquyres) : > We have a bunch of topics listed on the wiki, but no real set agenda: > > https://svn.open-mpi.org/trac/ompi/wiki/Jun14Meeting > > We had remote-attendance reques

[OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-19 Thread tmishima
Hi folks, Recently I have been seeing a hang with trunk when I specify a particular binding by use of rankfile or "-map-by slot". This can be reproduced by the rankfile which allocates a process beyond socket boundary. For example, on the node05 which has 2 socket with 4 core, the rank 1 is allo

Re: [OMPI devel] Agenda for next week

2014-06-19 Thread Edgar Gabriel
sorry, let me be more precise for Wednesday, I have time before 12pm on Wednesday. Thanks Edgar On 6/19/2014 2:52 PM, Edgar Gabriel wrote: > the best time for me would be either Wednesday morning (basically any > time), or Thursday morning before 11am central. > > Thanks > Edgar > > On 6/19/201

Re: [OMPI devel] Agenda for next week

2014-06-19 Thread Edgar Gabriel
the best time for me would be either Wednesday morning (basically any time), or Thursday morning before 11am central. Thanks Edgar On 6/19/2014 1:42 PM, Ralph Castain wrote: > I found it on the agenda under the 1.9 branch subject - let us know when you > are available, Edgar > > On Jun 19, 2014

Re: [OMPI devel] Agenda for next week

2014-06-19 Thread Ralph Castain
I found it on the agenda under the 1.9 branch subject - let us know when you are available, Edgar On Jun 19, 2014, at 11:29 AM, Jeff Squyres (jsquyres) wrote: > ...and pick a time that would work for you for a webex. :-) > > On Jun 19, 2014, at 1:48 PM, Ralph Castain wrote: > >> I don't se

Re: [OMPI devel] Agenda for next week

2014-06-19 Thread Jeff Squyres (jsquyres)
...and pick a time that would work for you for a webex. :-) On Jun 19, 2014, at 1:48 PM, Ralph Castain wrote: > I don't see that on the agenda, Edgar - can you please add it to ensure it > gets covered? > > > On Jun 19, 2014, at 10:36 AM, Edgar Gabriel wrote: > >> If possible, I would like

Re: [OMPI devel] Fortran busted on trunk

2014-06-19 Thread Jeff Squyres (jsquyres)
r32048 should have fixed the problem. On Jun 19, 2014, at 11:41 AM, Jeff Squyres wrote: > If you svn up right now, you should: > > 1. Disable building the Fortran bindings. I missed a case in all my testing; > there's brokenness with older versions of gfortran. > > 2. You'll also get SVN co

Re: [OMPI devel] Agenda for next week

2014-06-19 Thread Ralph Castain
I don't see that on the agenda, Edgar - can you please add it to ensure it gets covered? On Jun 19, 2014, at 10:36 AM, Edgar Gabriel wrote: > If possible, I would like to attend remotely the discussion about OMPIO > as well. > > Thanks > Edgar > > On 6/19/2014 10:44 AM, Jeff Squyres (jsquyre

Re: [OMPI devel] Agenda for next week

2014-06-19 Thread Edgar Gabriel
If possible, I would like to attend remotely the discussion about OMPIO as well. Thanks Edgar On 6/19/2014 10:44 AM, Jeff Squyres (jsquyres) wrote: > We have a bunch of topics listed on the wiki, but no real set agenda: > > https://svn.open-mpi.org/trac/ompi/wiki/Jun14Meeting > > We had rem

Re: [OMPI devel] r31916 question

2014-06-19 Thread Ralph Castain
No, ORTE (nor OMPI) makes no such assumption. That's up to the scheduler. We will launch a separate orted for each job, though, to avoid cross-contamination On Jun 19, 2014, at 8:00 AM, Pritchard, Howard P wrote: > Hi Ralph, > > Thanks for the explanation. Does ORTE/OMPI always assume that f

[OMPI devel] Agenda for next week

2014-06-19 Thread Jeff Squyres (jsquyres)
We have a bunch of topics listed on the wiki, but no real set agenda: https://svn.open-mpi.org/trac/ompi/wiki/Jun14Meeting We had remote-attendance requests for 2 topics, however, so I took the liberty of setting up some fixed-time webexes for them (see the wiki for the webex links): - Tue

Re: [OMPI devel] r31916 question

2014-06-19 Thread Ralph Castain
Yeah, I had slowly been working on repairing it, but will put that on hold until you commit On Jun 19, 2014, at 8:40 AM, Adrian Reber wrote: > The fault tolerance code also needs additional changes because of this > commit. I have the changes prepared but not committed. > > On Wed, Jun 18, 201

[OMPI devel] Fortran busted on trunk

2014-06-19 Thread Jeff Squyres (jsquyres)
If you svn up right now, you should: 1. Disable building the Fortran bindings. I missed a case in all my testing; there's brokenness with older versions of gfortran. 2. You'll also get SVN conflicts in the ompi/mpi/fortran tree. This is because some .f90 files that used to be generated are no

Re: [OMPI devel] r31916 question

2014-06-19 Thread Adrian Reber
The fault tolerance code also needs additional changes because of this commit. I have the changes prepared but not committed. On Wed, Jun 18, 2014 at 03:45:11PM -0700, Ralph Castain wrote: > Huh - thought I got that. Sorry I missed it. Let me take a look and ensure > that the alps ras module is s

Re: [OMPI devel] r31916 question

2014-06-19 Thread Pritchard, Howard P
Hi Ralph, Thanks for the explanation. Does ORTE/OMPI always assume that for multi-node jobs, there will only be one user's job/node?At my previous employer we were having to do some changes to runtime components in order to support slurm, for which the customers' default settings was to pr

Re: [OMPI devel] Compile OpenMPI with MXM support

2014-06-19 Thread Mike Dubman
could you please try hpcx for ompi and mxm? what mofed ver do you have? what is your configure command line? On Thu, Jun 19, 2014 at 3:45 PM, Kiryanov, Denis < denis.kirya...@t-platforms.ru> wrote: > > > From: devel [devel-boun...@open-mpi.org] On Behalf

Re: [OMPI devel] Compile OpenMPI with MXM support

2014-06-19 Thread Kiryanov, Denis
From: devel [devel-boun...@open-mpi.org] On Behalf Of Mike Dubman [mi...@dev.mellanox.co.il] Sent: Thursday, June 19, 2014 4:20 PM To: Open MPI Developers Subject: Re: [OMPI devel] Compile OpenMPI with MXM support Hi, it seems that you extracted mxm.bin.r

Re: [OMPI devel] Compile OpenMPI with MXM support

2014-06-19 Thread Mike Dubman
Hi, it seems that you extracted mxm.bin.rpm into $HOME and use it in configure. It may have an issues during "configure" phase as libmxm.so may contain rpath to /opt/mellanox/mxm and you probably need LD_LIBRARY_PATH set to the real mxm location to make it working and also adjust .la files in mxm d

[OMPI devel] Compile OpenMPI with MXM support

2014-06-19 Thread Kiryanov, Denis
Hi, I'm trying to compile openmpi-1.8.1 with the mxm library support but got the following error: --- MCA component mtl:mxm (m4 configuration macro) checking for MCA component mtl:mxm compile mode... dso checking --with-mxm value... sanity check ok (/home/users/tvoronov/kda/mxm-3.0.2822-1.x86_6