Brian,
Have you had a chance to put this on the wiki? If so - can you send the
link - I can't find it.
2017-07-19 16:47 GMT-07:00 Barrett, Brian via devel <
devel@lists.open-mpi.org>:
> I’ll update the wiki (and figure out where on our wiki to put more general
> information), but the basics are:
t couple of days from Mellanox?
>
> Thanks,
>
> Brian
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
--
- Best regards, Artem Polyakov (Mobile mail)
__
gt;
>
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
--
- Best regards, Artem Polyakov (Mobile mail)
___
devel mailing list
ent Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
--
- Best r
Brian, I'm going to push for the fix tonight. If won't work - we will do as
you advised.
2017-06-21 17:23 GMT-07:00 Barrett, Brian via devel <
devel@lists.open-mpi.org>:
> In the mean time, is it possible to disable the jobs that listen for pull
> requests on Open MPI’s repos? I’m trying to get
With regard to timezone - we have developers in close timezones, so I don't
think this is a reasonable argument.
2016-12-01 16:49 GMT-08:00 Artem Polyakov :
> +1 to Paul.
>
> I had to go git-bisect OMPI only several times but it always was a
> non-trivial task. PR's
+1 to Paul.
I had to go git-bisect OMPI only several times but it always was a
non-trivial task. PR's are grouping commit's logically and are good for the
bookkeeping.
Also you never know what will a "trivial fix" turn into and in what
circumstances/configurations.
IMO all changes needs to go thro
en-mpi/ompi/pull/2488
>>
>> So please don’t jump to conclusions
>>
>> On Dec 1, 2016, at 3:49 PM, Artem Polyakov wrote:
>>
>> But I guess that we can verify that things are not broken using other
>> PR's.
>> Looks that all is good: https://gith
But I guess that we can verify that things are not broken using other PR's.
Looks that all is good: https://github.com/open-mpi/ompi/pull/2493
2016-12-01 15:38 GMT-08:00 Artem Polyakov :
> All systems are different and it is hard to compete in coverage with our
> set of Jenkins'
All systems are different and it is hard to compete in coverage with our
set of Jenkins' :).
2016-12-01 14:51 GMT-08:00 r...@open-mpi.org :
> FWIW: I verified it myself, and it was fine on my systems
>
> On Dec 1, 2016, at 2:46 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>
Howard,
can you link to commits you are referring?
Do you mean this one for example:
https://github.com/open-mpi/ompi/commit/15098161a331168c66b29a696522fe52c8b2d8f5
?
2016-12-01 15:28 GMT-08:00 Howard Pritchard :
> Hi Gilles
>
> I didn't see a merge commit for all these commits,
> hence my conc
Sufficient. Probably I missed it. No need to do anything.
2016-08-26 21:31 GMT+07:00 Jeff Squyres (jsquyres) :
> Just curious: is https://github.com/open-mpi/2016-summer-perf-testing not
> sufficient?
>
>
>
> > On Aug 26, 2016, at 10:28 AM, Artem Polyakov wrote:
> &
> Let me know if you want one.
>
>
> > On Aug 26, 2016, at 8:46 AM, Artem Polyakov wrote:
> >
> > I've marked the first week.
> >
> > 2016-08-26 19:26 GMT+07:00 George Bosilca :
> > Let's go regular for a period and then adapt.
> >
> > Fo
ts such as single threaded bandwidth. It might be worth having a regular
>>> phone call (in addition to the Tuesday morning) to make progress.
>>>
>>> George.
>>>
>>>
>>> On Thu, Aug 25, 2016 at 9:37 PM, Artem Polyakov
>>> wrote:
>
pers meeting few
> weeks ago, but we barely define what we think will be necessary for trivial
> tests such as single threaded bandwidth. It might be worth having a regular
> phone call (in addition to the Tuesday morning) to make progress.
>
> George.
>
>
> On Thu, Aug 25,
ca написал:
> Arm repo is a good location until we converge to a well-defined set of
> tests.
>
> George.
>
>
> On Thu, Aug 25, 2016 at 1:44 PM, Artem Polyakov > wrote:
>
>> That's a good question. I have results myself and I don't know where to
>
l do the 2.0.1rc in the next days as well.
>
> Is it possible to add me to the results repository at github or should I
> fork and request you to pull?
>
> Best
> Christoph
>
>
> - Original Message -
> From: "Artem Polyakov" >
> To: "Open M
up probably next week. I have to access
> UTK machine for that.
> * I did some test and yes, I have seen some openib hang in
> multithreaded case.
> Thank you,
> Arm
>
> From: devel < devel-boun...@lists.open-mpi.org > on behalf of Artem
> Polyakov < art
P.S. For the future reference we also need to keep launch scripts that were
used to be able to carefully reproduce. Jeff mentioned that on the wiki
page IFRC.
2016-07-29 12:42 GMT+07:00 Artem Polyakov :
> Thank you, Arm!
>
> Good to have vader results (I haven't tried it my
tyle
> commenting/referencing.
>
>
> Arm
>
>
>
>
> On 7/28/16, 3:02 PM, "devel on behalf of Jeff Squyres (jsquyres)" <
> devel-boun...@lists.open-mpi.org on behalf of jsquy...@cisco.com> wrote:
>
> >On Jul 28, 2016, at 6:28 AM, Artem Polyakov wr
Jeff and others,
1. The benchmark was updated to support shared memory case.
2. The wiki was updated with the benchmark description:
https://github.com/open-mpi/ompi/wiki/Request-refactoring-test#benchmark-prototype
Let me know if we want to put this prototype to some general place. I think
it ma
some such as I
> recall). I’m checking the builds now - suspect it has to do with the new
> PMIx_Get retrieval rules
> >>
> >>
> >>> On Jul 21, 2016, at 8:25 AM, Artem Polyakov
> wrote:
> >>>
> >>> correction: 3 out of 5 chec
correction: 3 out of 5 checks passed.
2016-07-21 21:24 GMT+06:00 Artem Polyakov :
> Yes I though so as well. I see that only 2 checks was passed when your PR
> was merged so it might be.
>
> 2016-07-21 21:23 GMT+06:00 Ralph Castain :
>
>> I’m checking this - could be som
Yes I though so as well. I see that only 2 checks was passed when your PR
was merged so it might be.
2016-07-21 21:23 GMT+06:00 Ralph Castain :
> I’m checking this - could be something to do with the recent PMIx update
>
> On Jul 21, 2016, at 8:21 AM, Artem Polyakov wrote:
>
>
I see the same error with `sm,self` and `vader,self` in the PR
https://github.com/open-mpi/ompi/pull/1883.
`openib` and `tcp` works fine. Seems like regression.
2016-07-21 20:11 GMT+06:00 Jeff Squyres (jsquyres) :
> On Jul 21, 2016, at 3:53 AM, Gilles Gouaillardet
> wrote:
> >
> > Folks,
> >
>
ausing this error, then please disregard it until I
> update it tomorrow.
>
> note this log suggests a workspace shared by all pr, so I guess this is
> obsolete now
>
> Cheers,
>
> Gilles
>
>
>
> On Thursday, July 21, 2016, Artem Polyakov wrote:
>
>> We s
error, then please disregard it until I
> update it tomorrow.
>
> note this log suggests a workspace shared by all pr, so I guess this is
> obsolete now
>
> Cheers,
>
> Gilles
>
>
>
> On Thursday, July 21, 2016, Artem Polyakov wrote:
>
>> We see th
We see the following error:
*14:26:55* + taskset -c 2,3 timeout -s SIGSEGV 15m
/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/bin/mpirun
-np 8 -bind-to none -mca pml ob1 -mca btl self,tcp taskset -c 2,3
/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/examples/hello
> different build dir, different install dir.
>
>
>
>
> > On Jul 21, 2016, at 3:56 AM, Artem Polyakov > wrote:
> >
> > Thank you for the input by the way. It sounds very useful!
> >
> > 2016-07-21 13:54 GMT+06:00 Artem Polyakov >:
> > Gilles,
Thank you for the input by the way. It sounds very useful!
2016-07-21 13:54 GMT+06:00 Artem Polyakov :
> Gilles, we are aware and working on this.
>
> 2016-07-21 13:53 GMT+06:00 Gilles Gouaillardet :
>
>> Folks,
>>
>>
>> Mellanox Jenkins marks recent PR
Gilles, we are aware and working on this.
2016-07-21 13:53 GMT+06:00 Gilles Gouaillardet :
> Folks,
>
>
> Mellanox Jenkins marks recent PR's as failed for very surprising reasons.
>
>
> mpirun --mca btl sm,self ...
>
>
> failed because processes could not contact each other. i was able to
> repro
We have the fix. Will PR shortly.
понедельник, 18 июля 2016 г. пользователь Ralph Castain написал:
> Sorry - this is on today’s master
>
> On Jul 17, 2016, at 8:31 PM, Artem Polyakov > wrote:
>
> What is it? What repository?
>
> понедельник, 18 июля 2016 г. пользовате
Ok, thank you. We will take a look
понедельник, 18 июля 2016 г. пользователь Ralph Castain написал:
> Sorry - this is on today’s master
>
> On Jul 17, 2016, at 8:31 PM, Artem Polyakov > wrote:
>
> What is it? What repository?
>
> понедельник, 18 июля 2016 г. пользовате
*^*
> *pshmem_put_f.c:36:5:* *note: *in expansion of macro ‘*MCA_SPML_CALL*’
> *MCA_SPML_CALL*(put(FPTR_2_VOID_PTR(target),
>
>
>
>
--
-
Best regards, Artem Polyakov
(Mobile mail)
2015-11-09 22:42 GMT+06:00 Artem Polyakov :
> This is the very good point, Nysal!
>
> This is definitely a problem and I can say even more: avg. 3 from every 10
> tasks was affected by this bug. Once the PR (
> https://github.com/pmix/master/pull/8) was applied I was able to ru
then
>>>> send to it, but the OS hasn’t yet set it up. In those cases, you can hang
>>>> the socket. However, I’ve tried adding some artificial delay, and while it
>>>> helped, it didn’t completely solve the problem.
>>>>
>>>> I have an idea
Hello, is there any progress on this topic? This affects our PMIx
measurements.
2015-10-30 21:21 GMT+06:00 Ralph Castain :
> I’ve verified that the orte/util/listener thread is not being started, so
> I don’t think it should be involved in this problem.
>
> HTH
> Ralph
>
> On Oct 30, 2015, at 8:0
P.S. also check ESS (orte/mca/ess) for environment setup.
2015-03-26 18:06 GMT+06:00 Artem Polyakov :
>
> 2015-03-26 17:58 GMT+06:00 Gianmario Pozzi :
>
>> Hi everyone,
>> I'm an italian M.Sc. student in Computer Engineering at Politecnico di
>> Milano.
&
2015-03-26 17:58 GMT+06:00 Gianmario Pozzi :
> Hi everyone,
> I'm an italian M.Sc. student in Computer Engineering at Politecnico di
> Milano.
>
> My team and I are trying to integrate OpenMPI with a real time resource
> manager written by a group of students named BBQ (
> http://bosp.dei.polimi.i
2014-12-04 17:29 GMT+06:00 Jeff Squyres (jsquyres) :
> On Dec 3, 2014, at 11:35 PM, Artem Polyakov wrote:
>
> > Jeff, I must admit that I don't completely understand how your fix work.
> Can you explan me why this veriant was failing:
> >
> >
:
> Thanks!
>
> On Dec 3, 2014, at 7:03 AM, Artem Polyakov wrote:
>
> >
> >
> > среда, 3 декабря 2014 г. пользователь Jeff Squyres (jsquyres) написал:
> > They were equivalent until yesterday. :-)
> > I see. Got that!
> >
> > I was going to file
Sure, will do that asap.
>
>
> On Dec 3, 2014, at 5:56 AM, Artem Polyakov > wrote:
>
> > I finally found the clear reason of this strange situation!
> >
> > In ompi opal_setup_libltdl.m4 has the following content:
> > CPPFLAGS="-I$srcdir -I$srcd
the unified solution.
2014-12-03 10:23 GMT+06:00 Ralph Castain :
> It is working for me, but I’m not sure if that is because of these changes
> or if it always worked for me. I haven’t tested the slurm integration in
> awhile.
>
>
> On Dec 2, 2014, at 7:59 PM, Artem Polyakov wro
Howard, does current mater fix your problems?
среда, 3 декабря 2014 г. пользователь Artem Polyakov написал:
>
> 2014-12-03 8:30 GMT+06:00 Jeff Squyres (jsquyres) >:
>
>> On Dec 2, 2014, at 8:43 PM, Artem Polyakov > > wrote:
>>
>> > Jeff, your fix br
2014-12-03 8:30 GMT+06:00 Jeff Squyres (jsquyres) :
> On Dec 2, 2014, at 8:43 PM, Artem Polyakov wrote:
>
> > Jeff, your fix brakes my system again. Actually you just reverted my
> changes.
>
> No, I didn't just revert them -- I made changes. I did forget about the
let me add the config.log file, since it is
> too large, I can forward the output to you directly as well (as I did to
> Jeff).
> >>
> >> I honestly have not looked into the configure logic, I can just tell
> that OPAL_HAVE_LTDL_ADVISE is not set on my linux system for m
;> *If* you add that workaround (which is a whole separate discussion), I
>> would suggest adding a configure.m4 test to see if adding the additional
>> -llibs are necessary. Perhaps AC_LINK_IFELSE looking for a symbol, and
>> then if that fails, AC_LINK_IFELSE again with the additio
> Thanks
>
>
> On Dec 2, 2014, at 3:17 AM, Artem Polyakov wrote:
>
>
>
> 2014-12-02 17:13 GMT+06:00 Ralph Castain :
>
>> Hmmm…if that is true, then it didn’t fix this problem as it is being
>> reported in the master.
>>
>
> I had this problem on my la
was also
included into 1.8 branch. I am not sure that this is the same issue but
they looks similar.
>
>
> On Dec 1, 2014, at 9:40 PM, Artem Polyakov wrote:
>
> I think this might be related to the configuration problem I was fixing
> with Jeff few months ago. Refer here:
> https:/
I think this might be related to the configuration problem I was fixing
with Jeff few months ago. Refer here:
https://github.com/open-mpi/ompi/pull/240
2014-12-02 10:15 GMT+06:00 Ralph Castain :
> If it isn’t too much trouble, it would be good to confirm that it remains
> broken. I strongly suspe
ave the same problem.
>
but mine is with bcol, not coll framework. And as you can see modules
itself doesn't brake the program. Only some of their combinations. Also I
am curious why basesmuma module listed twice.
> Best regards,
> Elena
>
> On Fri, Oct 17, 2014 at 7:01 PM, Artem
th a tentative
> fix.
>
> Could you please give it a try and reports if it solves your problem ?
>
> Cheers
>
> Gilles
>
>
> Artem Polyakov wrote:
> Hello, I have troubles with latest trunk if I use PMI1.
>
> For example, if I use 2 nodes the application hangs. See b
Hello, I have troubles with latest trunk if I use PMI1.
For example, if I use 2 nodes the application hangs. See backtraces from
both nodes below. From them I can see that second (non launching) node
hangs in bcol component selection. Here is the default setting of
bcol_base_string parameter:
bcol
to use it.
>
> - There's "TODO" comments in opal/util/timings.c; should those be fixed?
>
> - opal_config.h should be the first include in opal/util/timings.c.
>
> - If timing support is not to be compiled in, then opal/util/timings.c
> should not be be compiled via
think it isn't that hard for them to
> configure it.
>
>
> On Sep 18, 2014, at 7:16 AM, Artem Polyakov > wrote:
>
> Jeff, thank you for the feedback! All of mentioned issues are clear and I
> will fix them shortly.
>
> One important thing that needs additional
?
>
> - opal_config.h should be the first include in opal/util/timings.c.
>
> - If timing support is not to be compiled in, then opal/util/timings.c
> should not be be compiled via the Makefile.am (rather than entirely #if'ed
> out).
>
> It looks like this work is about 9
Hello,
I would like to introduce OMPI timing framework that was included into the
trunk yesterday (r32738). The code is new so if you'll hit some bugs - just
let me know.
The framework consists of the set of macro's and routines for internal OMPI
usage + standalone tool mpisync and few additional
ime.
>
>
> On Jun 19, 2014, at 9:26 PM, Artem Polyakov > wrote:
>
> > Hello,
> >
> > I would like to participate in PMI and modex discussions remotely.
> >
> >
> > 2014-06-19 22:44 GMT+07:00 Jeff Squyres (jsquyres) >:
> > We ha
Hello,
I would like to participate in PMI and modex discussions remotely.
2014-06-19 22:44 GMT+07:00 Jeff Squyres (jsquyres) :
> We have a bunch of topics listed on the wiki, but no real set agenda:
>
> https://svn.open-mpi.org/trac/ompi/wiki/Jun14Meeting
>
> We had remote-attendance reques
Here is quick fix of OMPI timing facility. Currently first measurement is
bogus because OMPI_PROC_MY_NAME is not initialized at the time of first
ompistart setup:
*time from start to completion of rte_init 1348381643658244 usec*
time from completion of rte_init to modex 17585 usec
time to execute
I did check this for SLURM 2.6.5
2014-06-01 20:31 GMT+07:00 Ralph Castain :
> That really wasn't necessary - I had tested it under PMI-1 and it was
> fine. Artem: did you test it, or just assume it wasn't right?
>
>
> On May 31, 2014, at 11:47 PM, Artem Polyakov w
2014-06-01 14:24 GMT+07:00 Gilles Gouaillardet <
gilles.gouaillar...@gmail.com>:
> export OMPI_MCA_btl_openib_use_eager_rdma=0
Gilles,
I test your approach. Both:
a) export OMPI_MCA_btl_openib_use_eager_rdma=0
b) applying your patch and run without "export
OMPI_MCA_btl_openib_use_eager_rdma=0"
he
> openib BTL)
>
>
> On Jun 1, 2014, at 2:57 AM, Artem Polyakov wrote:
>
> > Hello, while testing new PMI implementation I faced a problem with
> OpenIB and/or usNIC support.
> > The cluster I use is build on Mellanox QDR. We don't use Cisco hardware,
>
t; Gilles
>
>
>
> On Sun, Jun 1, 2014 at 3:57 PM, Artem Polyakov > wrote:
>
>>
>> 2. With fixed OpenIB support (add export OMPI_MCA_btl="openib,self" in
>> attached batch script) I get followint error:
>> hellompi:
>> /home/research/artpol
P.S.
1. Just to make sure I tried the same program with old ompi-1.6.5 that is
installed on our cluster without any problem.
2. My testing program just sends data through the ring.
2014-06-01 13:57 GMT+07:00 Artem Polyakov :
> Hello, while testing new PMI implementation I faced a problem w
Hello, while testing new PMI implementation I faced a problem with OpenIB
and/or usNIC support.
The cluster I use is build on Mellanox QDR. We don't use Cisco hardware,
thus no Cisco Virtual Interface Card. To exclude possibility of new PMI
code influence I used mpirun to launch the job. Slurm job
Thank you, Mike!
2014-06-01 13:43 GMT+07:00 Mike Dubman :
> applied here: https://svn.open-mpi.org/trac/ompi/changeset/31909
>
>
> On Sun, Jun 1, 2014 at 9:15 AM, Artem Polyakov wrote:
>
>> Hi, all.
>>
>> Ralph commited the code that was developed for this
Hi, all.
Ralph commited the code that was developed for this RFC (r31908). This
commit will brake PMI1 support. In case of hurry - apply attached patch.
Ralph will apply it once he'll be online. I have no rights for that yet.
2014-05-19 21:18 GMT+07:00 Ralph Castain :
> WHAT:Refactor the PM
2014-05-08 9:54 GMT+07:00 Ralph Castain :
>
> On May 7, 2014, at 6:15 PM, Christopher Samuel
> wrote:
>
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > Hi all,
> >
> > Apologies for having dropped out of the thread, night intervened here.
> ;-)
> >
> > On 08/05/14 00:45, Ralph Casta
That is interesting. I think I will reconstruct your experiments on my
system when I will be testing PMI selection logic. According to your
resource count numbers I can do that. I will publish my results in the list.
2014-05-08 8:51 GMT+07:00 Christopher Samuel :
> -BEGIN PGP SIGNED MESSAGE-
Hi Chris.
Current disign is to provide the runtime parameter for PMI version
selection. It would be even more flexible that configuration-time selection
and (with my current understanding) not very hard to acheive.
2014-05-08 8:15 GMT+07:00 Christopher Samuel :
> -BEGIN PGP SIGNED MESSAGE--
cted at runtime
>
> * moving some additional functions into that code area and out of the
> individual components
>
Ok, that is pretty clear now. And will do exactly #2.
Thank you.
>
>
> On May 7, 2014, at 5:08 PM, Artem Polyakov wrote:
>
> I like #2 too.
> B
ming the codes are mostly common
> in the individual frameworks.
>
>
> On May 7, 2014, at 4:51 PM, Artem Polyakov wrote:
>
> Just reread your suggestions in our out-of-list discussion and found that
> I misunderstand it. So no parallel PMI! Take all possible code into
> opal/mca/com
to implement.
2. or to have 2 separate common modules for PMI1 and one for PMI2, and does
this fit opal/mca/common/ ideology at all?
2014-05-08 6:44 GMT+07:00 Artem Polyakov :
>
> 2014-05-08 5:54 GMT+07:00 Ralph Castain :
>
> Ummmno, I don't think that's right. I
are legal. If not - we'll do that sequentially.
> In other places we'll just use the flag saying what PMI version to use.
> Does that sounds reasonable?
>
> 2014-05-07 23:10 GMT+07:00 Artem Polyakov :
>
>> That's a good point. There is actually a bunch of module
ng what PMI version to use.
Does that sounds reasonable?
2014-05-07 23:10 GMT+07:00 Artem Polyakov :
> That's a good point. There is actually a bunch of modules in ompi, opal
> and orte that has to be duplicated.
>
> среда, 7 мая 2014 г. пользователь Joshua Ladd написал:
>
>&
. There are several places in
> OMPI where the distinction between PMI1and PMI2 is made, not only in
> grpcomm. DB and ESS frameworks off the top of my head.
>
> Josh
>
>
> On Wed, May 7, 2014 at 11:48 AM, Artem Polyakov wrote:
>
>> Good idea :)!
>>
>> ср
Good idea :)!
среда, 7 мая 2014 г. пользователь Ralph Castain написал:
> Jeff actually had a useful suggestion (gasp!).He proposed that we separate
> the PMI-1 and PMI-2 codes into separate components so you could select them
> at runtime. Thus, we would build both (assuming both PMI-1 and 2 libs
de1, node2 with 12 cpus
2) node3 with 7 cpus.
then it uses separate srun's for each group.
The weakness of this patch is that we need to deal with several srun's and
I am not sure that cleanup will perform correctly. I plan to test this case
additionaly.
2014-02-12 17:42 GMT+07:00 Artem
Hello
I found that SLURM installations that use cgroup plugin and
have TaskAffinity=yes in cgroup.conf have problems with OpenMPI: all
processes on non-launch node are assigned on one core. This leads to quite
poor performance.
The problem can be seen only if using mpirun to start parallel applica
80 matches
Mail list logo