Re: [OMPI devel] Issue/PR tagging

2017-07-28 Thread Artem Polyakov
Brian, Have you had a chance to put this on the wiki? If so - can you send the link - I can't find it. 2017-07-19 16:47 GMT-07:00 Barrett, Brian via devel < devel@lists.open-mpi.org>: > I’ll update the wiki (and figure out where on our wiki to put more general > information), but the basics are:

Re: [OMPI devel] Yoda SPML and master/v3.0.0

2017-07-13 Thread Artem Polyakov
t couple of days from Mellanox? > > Thanks, > > Brian > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel -- - Best regards, Artem Polyakov (Mobile mail) __

Re: [OMPI devel] v3.0.x / v3.x branch mixup

2017-07-07 Thread Artem Polyakov
gt; > > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel -- - Best regards, Artem Polyakov (Mobile mail) ___ devel mailing list

Re: [OMPI devel] [3.0.0rc1] PMIX ERROR: UNPACK-INADEQUATE-SPACE

2017-07-04 Thread Artem Polyakov
ent Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > ___ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel -- - Best r

Re: [OMPI devel] Mellanox Jenkins

2017-06-21 Thread Artem Polyakov
Brian, I'm going to push for the fix tonight. If won't work - we will do as you advised. 2017-06-21 17:23 GMT-07:00 Barrett, Brian via devel < devel@lists.open-mpi.org>: > In the mean time, is it possible to disable the jobs that listen for pull > requests on Open MPI’s repos? I’m trying to get

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread Artem Polyakov
With regard to timezone - we have developers in close timezones, so I don't think this is a reasonable argument. 2016-12-01 16:49 GMT-08:00 Artem Polyakov : > +1 to Paul. > > I had to go git-bisect OMPI only several times but it always was a > non-trivial task. PR's

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread Artem Polyakov
+1 to Paul. I had to go git-bisect OMPI only several times but it always was a non-trivial task. PR's are grouping commit's logically and are good for the bookkeeping. Also you never know what will a "trivial fix" turn into and in what circumstances/configurations. IMO all changes needs to go thro

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread Artem Polyakov
en-mpi/ompi/pull/2488 >> >> So please don’t jump to conclusions >> >> On Dec 1, 2016, at 3:49 PM, Artem Polyakov wrote: >> >> But I guess that we can verify that things are not broken using other >> PR's. >> Looks that all is good: https://gith

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread Artem Polyakov
But I guess that we can verify that things are not broken using other PR's. Looks that all is good: https://github.com/open-mpi/ompi/pull/2493 2016-12-01 15:38 GMT-08:00 Artem Polyakov : > All systems are different and it is hard to compete in coverage with our > set of Jenkins'

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread Artem Polyakov
All systems are different and it is hard to compete in coverage with our set of Jenkins' :). 2016-12-01 14:51 GMT-08:00 r...@open-mpi.org : > FWIW: I verified it myself, and it was fine on my systems > > On Dec 1, 2016, at 2:46 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > >

Re: [OMPI devel] heads up about OMPI/master

2016-12-01 Thread Artem Polyakov
Howard, can you link to commits you are referring? Do you mean this one for example: https://github.com/open-mpi/ompi/commit/15098161a331168c66b29a696522fe52c8b2d8f5 ? 2016-12-01 15:28 GMT-08:00 Howard Pritchard : > Hi Gilles > > I didn't see a merge commit for all these commits, > hence my conc

Re: [OMPI devel] Performance analysis proposal

2016-08-26 Thread Artem Polyakov
Sufficient. Probably I missed it. No need to do anything. 2016-08-26 21:31 GMT+07:00 Jeff Squyres (jsquyres) : > Just curious: is https://github.com/open-mpi/2016-summer-perf-testing not > sufficient? > > > > > On Aug 26, 2016, at 10:28 AM, Artem Polyakov wrote: > &

Re: [OMPI devel] Performance analysis proposal

2016-08-26 Thread Artem Polyakov
> Let me know if you want one. > > > > On Aug 26, 2016, at 8:46 AM, Artem Polyakov wrote: > > > > I've marked the first week. > > > > 2016-08-26 19:26 GMT+07:00 George Bosilca : > > Let's go regular for a period and then adapt. > > > > Fo

Re: [OMPI devel] Performance analysis proposal

2016-08-26 Thread Artem Polyakov
ts such as single threaded bandwidth. It might be worth having a regular >>> phone call (in addition to the Tuesday morning) to make progress. >>> >>> George. >>> >>> >>> On Thu, Aug 25, 2016 at 9:37 PM, Artem Polyakov >>> wrote: >

Re: [OMPI devel] Performance analysis proposal

2016-08-26 Thread Artem Polyakov
pers meeting few > weeks ago, but we barely define what we think will be necessary for trivial > tests such as single threaded bandwidth. It might be worth having a regular > phone call (in addition to the Tuesday morning) to make progress. > > George. > > > On Thu, Aug 25,

Re: [OMPI devel] Performance analysis proposal

2016-08-25 Thread Artem Polyakov
ca написал: > Arm repo is a good location until we converge to a well-defined set of > tests. > > George. > > > On Thu, Aug 25, 2016 at 1:44 PM, Artem Polyakov > wrote: > >> That's a good question. I have results myself and I don't know where to >

Re: [OMPI devel] Performance analysis proposal

2016-08-25 Thread Artem Polyakov
l do the 2.0.1rc in the next days as well. > > Is it possible to add me to the results repository at github or should I > fork and request you to pull? > > Best > Christoph > > > - Original Message - > From: "Artem Polyakov" > > To: "Open M

Re: [OMPI devel] Performance analysis proposal

2016-08-23 Thread Artem Polyakov
up probably next week. I have to access > UTK machine for that. > * I did some test and yes, I have seen some openib hang in > multithreaded case. > Thank you, > Arm > > From: devel < devel-boun...@lists.open-mpi.org > on behalf of Artem > Polyakov < art

Re: [OMPI devel] Performance analysis proposal

2016-07-28 Thread Artem Polyakov
P.S. For the future reference we also need to keep launch scripts that were used to be able to carefully reproduce. Jeff mentioned that on the wiki page IFRC. 2016-07-29 12:42 GMT+07:00 Artem Polyakov : > Thank you, Arm! > > Good to have vader results (I haven't tried it my

Re: [OMPI devel] Performance analysis proposal

2016-07-28 Thread Artem Polyakov
tyle > commenting/referencing. > > > Arm > > > > > On 7/28/16, 3:02 PM, "devel on behalf of Jeff Squyres (jsquyres)" < > devel-boun...@lists.open-mpi.org on behalf of jsquy...@cisco.com> wrote: > > >On Jul 28, 2016, at 6:28 AM, Artem Polyakov wr

Re: [OMPI devel] Performance analysis proposal

2016-07-28 Thread Artem Polyakov
Jeff and others, 1. The benchmark was updated to support shared memory case. 2. The wiki was updated with the benchmark description: https://github.com/open-mpi/ompi/wiki/Request-refactoring-test#benchmark-prototype Let me know if we want to put this prototype to some general place. I think it ma

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Artem Polyakov
some such as I > recall). I’m checking the builds now - suspect it has to do with the new > PMIx_Get retrieval rules > >> > >> > >>> On Jul 21, 2016, at 8:25 AM, Artem Polyakov > wrote: > >>> > >>> correction: 3 out of 5 chec

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Artem Polyakov
correction: 3 out of 5 checks passed. 2016-07-21 21:24 GMT+06:00 Artem Polyakov : > Yes I though so as well. I see that only 2 checks was passed when your PR > was merged so it might be. > > 2016-07-21 21:23 GMT+06:00 Ralph Castain : > >> I’m checking this - could be som

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Artem Polyakov
Yes I though so as well. I see that only 2 checks was passed when your PR was merged so it might be. 2016-07-21 21:23 GMT+06:00 Ralph Castain : > I’m checking this - could be something to do with the recent PMIx update > > On Jul 21, 2016, at 8:21 AM, Artem Polyakov wrote: > >

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Artem Polyakov
I see the same error with `sm,self` and `vader,self` in the PR https://github.com/open-mpi/ompi/pull/1883. `openib` and `tcp` works fine. Seems like regression. 2016-07-21 20:11 GMT+06:00 Jeff Squyres (jsquyres) : > On Jul 21, 2016, at 3:53 AM, Gilles Gouaillardet > wrote: > > > > Folks, > > >

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Artem Polyakov
ausing this error, then please disregard it until I > update it tomorrow. > > note this log suggests a workspace shared by all pr, so I guess this is > obsolete now > > Cheers, > > Gilles > > > > On Thursday, July 21, 2016, Artem Polyakov wrote: > >> We s

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Artem Polyakov
error, then please disregard it until I > update it tomorrow. > > note this log suggests a workspace shared by all pr, so I guess this is > obsolete now > > Cheers, > > Gilles > > > > On Thursday, July 21, 2016, Artem Polyakov wrote: > >> We see th

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Artem Polyakov
We see the following error: *14:26:55* + taskset -c 2,3 timeout -s SIGSEGV 15m /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/bin/mpirun -np 8 -bind-to none -mca pml ob1 -mca btl self,tcp taskset -c 2,3 /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Artem Polyakov
> different build dir, different install dir. > > > > > > On Jul 21, 2016, at 3:56 AM, Artem Polyakov > wrote: > > > > Thank you for the input by the way. It sounds very useful! > > > > 2016-07-21 13:54 GMT+06:00 Artem Polyakov >: > > Gilles,

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Artem Polyakov
Thank you for the input by the way. It sounds very useful! 2016-07-21 13:54 GMT+06:00 Artem Polyakov : > Gilles, we are aware and working on this. > > 2016-07-21 13:53 GMT+06:00 Gilles Gouaillardet : > >> Folks, >> >> >> Mellanox Jenkins marks recent PR

Re: [OMPI devel] about Mellanox Jenkins

2016-07-21 Thread Artem Polyakov
Gilles, we are aware and working on this. 2016-07-21 13:53 GMT+06:00 Gilles Gouaillardet : > Folks, > > > Mellanox Jenkins marks recent PR's as failed for very surprising reasons. > > > mpirun --mca btl sm,self ... > > > failed because processes could not contact each other. i was able to > repro

Re: [OMPI devel] OSHMEM out-of-date?

2016-07-19 Thread Artem Polyakov
We have the fix. Will PR shortly. понедельник, 18 июля 2016 г. пользователь Ralph Castain написал: > Sorry - this is on today’s master > > On Jul 17, 2016, at 8:31 PM, Artem Polyakov > wrote: > > What is it? What repository? > > понедельник, 18 июля 2016 г. пользовате

Re: [OMPI devel] OSHMEM out-of-date?

2016-07-18 Thread Artem Polyakov
Ok, thank you. We will take a look понедельник, 18 июля 2016 г. пользователь Ralph Castain написал: > Sorry - this is on today’s master > > On Jul 17, 2016, at 8:31 PM, Artem Polyakov > wrote: > > What is it? What repository? > > понедельник, 18 июля 2016 г. пользовате

Re: [OMPI devel] OSHMEM out-of-date?

2016-07-17 Thread Artem Polyakov
*^* > *pshmem_put_f.c:36:5:* *note: *in expansion of macro ‘*MCA_SPML_CALL*’ > *MCA_SPML_CALL*(put(FPTR_2_VOID_PTR(target), > > > > -- - Best regards, Artem Polyakov (Mobile mail)

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread Artem Polyakov
2015-11-09 22:42 GMT+06:00 Artem Polyakov : > This is the very good point, Nysal! > > This is definitely a problem and I can say even more: avg. 3 from every 10 > tasks was affected by this bug. Once the PR ( > https://github.com/pmix/master/pull/8) was applied I was able to ru

Re: [OMPI devel] PMIX deadlock

2015-11-09 Thread Artem Polyakov
then >>>> send to it, but the OS hasn’t yet set it up. In those cases, you can hang >>>> the socket. However, I’ve tried adding some artificial delay, and while it >>>> helped, it didn’t completely solve the problem. >>>> >>>> I have an idea

Re: [OMPI devel] PMIX deadlock

2015-11-07 Thread Artem Polyakov
Hello, is there any progress on this topic? This affects our PMIx measurements. 2015-10-30 21:21 GMT+06:00 Ralph Castain : > I’ve verified that the orte/util/listener thread is not being started, so > I don’t think it should be involved in this problem. > > HTH > Ralph > > On Oct 30, 2015, at 8:0

Re: [OMPI devel] Info about ORTE structure

2015-03-26 Thread Artem Polyakov
P.S. also check ESS (orte/mca/ess) for environment setup. 2015-03-26 18:06 GMT+06:00 Artem Polyakov : > > 2015-03-26 17:58 GMT+06:00 Gianmario Pozzi : > >> Hi everyone, >> I'm an italian M.Sc. student in Computer Engineering at Politecnico di >> Milano. &

Re: [OMPI devel] Info about ORTE structure

2015-03-26 Thread Artem Polyakov
2015-03-26 17:58 GMT+06:00 Gianmario Pozzi : > Hi everyone, > I'm an italian M.Sc. student in Computer Engineering at Politecnico di > Milano. > > My team and I are trying to integrate OpenMPI with a real time resource > manager written by a group of students named BBQ ( > http://bosp.dei.polimi.i

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-04 Thread Artem Polyakov
2014-12-04 17:29 GMT+06:00 Jeff Squyres (jsquyres) : > On Dec 3, 2014, at 11:35 PM, Artem Polyakov wrote: > > > Jeff, I must admit that I don't completely understand how your fix work. > Can you explan me why this veriant was failing: > > > >

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-03 Thread Artem Polyakov
: > Thanks! > > On Dec 3, 2014, at 7:03 AM, Artem Polyakov wrote: > > > > > > > среда, 3 декабря 2014 г. пользователь Jeff Squyres (jsquyres) написал: > > They were equivalent until yesterday. :-) > > I see. Got that! > > > > I was going to file

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-03 Thread Artem Polyakov
Sure, will do that asap. > > > On Dec 3, 2014, at 5:56 AM, Artem Polyakov > wrote: > > > I finally found the clear reason of this strange situation! > > > > In ompi opal_setup_libltdl.m4 has the following content: > > CPPFLAGS="-I$srcdir -I$srcd

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-03 Thread Artem Polyakov
the unified solution. 2014-12-03 10:23 GMT+06:00 Ralph Castain : > It is working for me, but I’m not sure if that is because of these changes > or if it always worked for me. I haven’t tested the slurm integration in > awhile. > > > On Dec 2, 2014, at 7:59 PM, Artem Polyakov wro

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
Howard, does current mater fix your problems? среда, 3 декабря 2014 г. пользователь Artem Polyakov написал: > > 2014-12-03 8:30 GMT+06:00 Jeff Squyres (jsquyres) >: > >> On Dec 2, 2014, at 8:43 PM, Artem Polyakov > > wrote: >> >> > Jeff, your fix br

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
2014-12-03 8:30 GMT+06:00 Jeff Squyres (jsquyres) : > On Dec 2, 2014, at 8:43 PM, Artem Polyakov wrote: > > > Jeff, your fix brakes my system again. Actually you just reverted my > changes. > > No, I didn't just revert them -- I made changes. I did forget about the

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
let me add the config.log file, since it is > too large, I can forward the output to you directly as well (as I did to > Jeff). > >> > >> I honestly have not looked into the configure logic, I can just tell > that OPAL_HAVE_LTDL_ADVISE is not set on my linux system for m

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
;> *If* you add that workaround (which is a whole separate discussion), I >> would suggest adding a configure.m4 test to see if adding the additional >> -llibs are necessary. Perhaps AC_LINK_IFELSE looking for a symbol, and >> then if that fails, AC_LINK_IFELSE again with the additio

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
> Thanks > > > On Dec 2, 2014, at 3:17 AM, Artem Polyakov wrote: > > > > 2014-12-02 17:13 GMT+06:00 Ralph Castain : > >> Hmmm…if that is true, then it didn’t fix this problem as it is being >> reported in the master. >> > > I had this problem on my la

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
was also included into 1.8 branch. I am not sure that this is the same issue but they looks similar. > > > On Dec 1, 2014, at 9:40 PM, Artem Polyakov wrote: > > I think this might be related to the configuration problem I was fixing > with Jeff few months ago. Refer here: > https:/

Re: [OMPI devel] RTLD_GLOBAL question

2014-12-02 Thread Artem Polyakov
I think this might be related to the configuration problem I was fixing with Jeff few months ago. Refer here: https://github.com/open-mpi/ompi/pull/240 2014-12-02 10:15 GMT+06:00 Ralph Castain : > If it isn’t too much trouble, it would be good to confirm that it remains > broken. I strongly suspe

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Artem Polyakov
ave the same problem. > but mine is with bcol, not coll framework. And as you can see modules itself doesn't brake the program. Only some of their combinations. Also I am curious why basesmuma module listed twice. > Best regards, > Elena > > On Fri, Oct 17, 2014 at 7:01 PM, Artem

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Artem Polyakov
th a tentative > fix. > > Could you please give it a try and reports if it solves your problem ? > > Cheers > > Gilles > > > Artem Polyakov wrote: > Hello, I have troubles with latest trunk if I use PMI1. > > For example, if I use 2 nodes the application hangs. See b

[OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Artem Polyakov
Hello, I have troubles with latest trunk if I use PMI1. For example, if I use 2 nodes the application hangs. See backtraces from both nodes below. From them I can see that second (non launching) node hangs in bcol component selection. Here is the default setting of bcol_base_string parameter: bcol

Re: [OMPI devel] OPAL timing framework

2014-10-08 Thread Artem Polyakov
to use it. > > - There's "TODO" comments in opal/util/timings.c; should those be fixed? > > - opal_config.h should be the first include in opal/util/timings.c. > > - If timing support is not to be compiled in, then opal/util/timings.c > should not be be compiled via

Re: [OMPI devel] OPAL timing framework

2014-09-18 Thread Artem Polyakov
think it isn't that hard for them to > configure it. > > > On Sep 18, 2014, at 7:16 AM, Artem Polyakov > wrote: > > Jeff, thank you for the feedback! All of mentioned issues are clear and I > will fix them shortly. > > One important thing that needs additional

Re: [OMPI devel] OPAL timing framework

2014-09-18 Thread Artem Polyakov
? > > - opal_config.h should be the first include in opal/util/timings.c. > > - If timing support is not to be compiled in, then opal/util/timings.c > should not be be compiled via the Makefile.am (rather than entirely #if'ed > out). > > It looks like this work is about 9

[OMPI devel] OPAL timing framework

2014-09-16 Thread Artem Polyakov
Hello, I would like to introduce OMPI timing framework that was included into the trunk yesterday (r32738). The code is new so if you'll hit some bugs - just let me know. The framework consists of the set of macro's and routines for internal OMPI usage + standalone tool mpisync and few additional

Re: [OMPI devel] Agenda for next week

2014-06-23 Thread Artem Polyakov
ime. > > > On Jun 19, 2014, at 9:26 PM, Artem Polyakov > wrote: > > > Hello, > > > > I would like to participate in PMI and modex discussions remotely. > > > > > > 2014-06-19 22:44 GMT+07:00 Jeff Squyres (jsquyres) >: > > We ha

Re: [OMPI devel] Agenda for next week

2014-06-19 Thread Artem Polyakov
Hello, I would like to participate in PMI and modex discussions remotely. 2014-06-19 22:44 GMT+07:00 Jeff Squyres (jsquyres) : > We have a bunch of topics listed on the wiki, but no real set agenda: > > https://svn.open-mpi.org/trac/ompi/wiki/Jun14Meeting > > We had remote-attendance reques

[OMPI devel] OMPI timing fix

2014-06-04 Thread Artem Polyakov
Here is quick fix of OMPI timing facility. Currently first measurement is bogus because OMPI_PROC_MY_NAME is not initialized at the time of first ompistart setup: *time from start to completion of rte_init 1348381643658244 usec* time from completion of rte_init to modex 17585 usec time to execute

Re: [OMPI devel] RFC: refactor PMI support

2014-06-01 Thread Artem Polyakov
I did check this for SLURM 2.6.5 2014-06-01 20:31 GMT+07:00 Ralph Castain : > That really wasn't necessary - I had tested it under PMI-1 and it was > fine. Artem: did you test it, or just assume it wasn't right? > > > On May 31, 2014, at 11:47 PM, Artem Polyakov w

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Artem Polyakov
2014-06-01 14:24 GMT+07:00 Gilles Gouaillardet < gilles.gouaillar...@gmail.com>: > export OMPI_MCA_btl_openib_use_eager_rdma=0 Gilles, I test your approach. Both: a) export OMPI_MCA_btl_openib_use_eager_rdma=0 b) applying your patch and run without "export OMPI_MCA_btl_openib_use_eager_rdma=0"

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Artem Polyakov
he > openib BTL) > > > On Jun 1, 2014, at 2:57 AM, Artem Polyakov wrote: > > > Hello, while testing new PMI implementation I faced a problem with > OpenIB and/or usNIC support. > > The cluster I use is build on Mellanox QDR. We don't use Cisco hardware, >

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Artem Polyakov
t; Gilles > > > > On Sun, Jun 1, 2014 at 3:57 PM, Artem Polyakov > wrote: > >> >> 2. With fixed OpenIB support (add export OMPI_MCA_btl="openib,self" in >> attached batch script) I get followint error: >> hellompi: >> /home/research/artpol

Re: [OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Artem Polyakov
P.S. 1. Just to make sure I tried the same program with old ompi-1.6.5 that is installed on our cluster without any problem. 2. My testing program just sends data through the ring. 2014-06-01 13:57 GMT+07:00 Artem Polyakov : > Hello, while testing new PMI implementation I faced a problem w

[OMPI devel] OpenIB/usNIC errors

2014-06-01 Thread Artem Polyakov
Hello, while testing new PMI implementation I faced a problem with OpenIB and/or usNIC support. The cluster I use is build on Mellanox QDR. We don't use Cisco hardware, thus no Cisco Virtual Interface Card. To exclude possibility of new PMI code influence I used mpirun to launch the job. Slurm job

Re: [OMPI devel] RFC: refactor PMI support

2014-06-01 Thread Artem Polyakov
Thank you, Mike! 2014-06-01 13:43 GMT+07:00 Mike Dubman : > applied here: https://svn.open-mpi.org/trac/ompi/changeset/31909 > > > On Sun, Jun 1, 2014 at 9:15 AM, Artem Polyakov wrote: > >> Hi, all. >> >> Ralph commited the code that was developed for this

Re: [OMPI devel] RFC: refactor PMI support

2014-06-01 Thread Artem Polyakov
Hi, all. Ralph commited the code that was developed for this RFC (r31908). This commit will brake PMI1 support. In case of hurry - apply attached patch. Ralph will apply it once he'll be online. I have no rights for that yet. 2014-05-19 21:18 GMT+07:00 Ralph Castain : > WHAT:Refactor the PM

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
2014-05-08 9:54 GMT+07:00 Ralph Castain : > > On May 7, 2014, at 6:15 PM, Christopher Samuel > wrote: > > > -BEGIN PGP SIGNED MESSAGE- > > Hash: SHA1 > > > > Hi all, > > > > Apologies for having dropped out of the thread, night intervened here. > ;-) > > > > On 08/05/14 00:45, Ralph Casta

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
That is interesting. I think I will reconstruct your experiments on my system when I will be testing PMI selection logic. According to your resource count numbers I can do that. I will publish my results in the list. 2014-05-08 8:51 GMT+07:00 Christopher Samuel : > -BEGIN PGP SIGNED MESSAGE-

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
Hi Chris. Current disign is to provide the runtime parameter for PMI version selection. It would be even more flexible that configuration-time selection and (with my current understanding) not very hard to acheive. 2014-05-08 8:15 GMT+07:00 Christopher Samuel : > -BEGIN PGP SIGNED MESSAGE--

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
cted at runtime > > * moving some additional functions into that code area and out of the > individual components > Ok, that is pretty clear now. And will do exactly #2. Thank you. > > > On May 7, 2014, at 5:08 PM, Artem Polyakov wrote: > > I like #2 too. > B

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
ming the codes are mostly common > in the individual frameworks. > > > On May 7, 2014, at 4:51 PM, Artem Polyakov wrote: > > Just reread your suggestions in our out-of-list discussion and found that > I misunderstand it. So no parallel PMI! Take all possible code into > opal/mca/com

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
to implement. 2. or to have 2 separate common modules for PMI1 and one for PMI2, and does this fit opal/mca/common/ ideology at all? 2014-05-08 6:44 GMT+07:00 Artem Polyakov : > > 2014-05-08 5:54 GMT+07:00 Ralph Castain : > > Ummmno, I don't think that's right. I

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
are legal. If not - we'll do that sequentially. > In other places we'll just use the flag saying what PMI version to use. > Does that sounds reasonable? > > 2014-05-07 23:10 GMT+07:00 Artem Polyakov : > >> That's a good point. There is actually a bunch of module

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
ng what PMI version to use. Does that sounds reasonable? 2014-05-07 23:10 GMT+07:00 Artem Polyakov : > That's a good point. There is actually a bunch of modules in ompi, opal > and orte that has to be duplicated. > > среда, 7 мая 2014 г. пользователь Joshua Ladd написал: > >&

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
. There are several places in > OMPI where the distinction between PMI1and PMI2 is made, not only in > grpcomm. DB and ESS frameworks off the top of my head. > > Josh > > > On Wed, May 7, 2014 at 11:48 AM, Artem Polyakov wrote: > >> Good idea :)! >> >> ср

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
Good idea :)! среда, 7 мая 2014 г. пользователь Ralph Castain написал: > Jeff actually had a useful suggestion (gasp!).He proposed that we separate > the PMI-1 and PMI-2 codes into separate components so you could select them > at runtime. Thus, we would build both (assuming both PMI-1 and 2 libs

Re: [OMPI devel] SLURM affinity accounting in Open MPI

2014-03-07 Thread Artem Polyakov
de1, node2 with 12 cpus 2) node3 with 7 cpus. then it uses separate srun's for each group. The weakness of this patch is that we need to deal with several srun's and I am not sure that cleanup will perform correctly. I plan to test this case additionaly. 2014-02-12 17:42 GMT+07:00 Artem

[OMPI devel] SLURM affinity accounting in Open MPI

2014-02-12 Thread Artem Polyakov
Hello I found that SLURM installations that use cgroup plugin and have TaskAffinity=yes in cgroup.conf have problems with OpenMPI: all processes on non-launch node are assigned on one core. This leads to quite poor performance. The problem can be seen only if using mpirun to start parallel applica