Re: [OMPI devel] binding output error

2015-04-23 Thread Elena Elkina
Thanks guys, you're right. This is an output of lstopo on our system which confirms that logical cpus numbering is used in report bindings: lstopo -l Machine (256GB) NUMANode L#0 (P#0 128GB) + Socket L#0 + L3 L#0 (35MB) L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0) L2 L#1 (256

Re: [OMPI devel] binding output error

2015-04-21 Thread Elena Elkina
22 (32KB) + Core L#22 + PU L#22 (P#22) > >> > >> L2 L#23 (256KB) + L1 L#23 (32KB) + Core L#23 + PU L#23 (P#23) > >> > >> L2 L#24 (256KB) + L1 L#24 (32KB) + Core L#24 + PU L#24 (P#24) > >> > >> L2 L#25 (256KB) + L1 L#25 (32KB) + Core L#25 +

[OMPI devel] binding output error

2015-04-20 Thread Elena Elkina
Hi guys, I faced with an issue on our cluster related to mapping & binding policies on 1.8.5. The matter is that --report-bindings output doesn't correspond to the locale. It looks like there is a mistake on the output itself, because it just puts serial core number while that core can be on anot

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch revert-520-valgrind_cleanness created. dev-1504-g7a8a4a0

2015-04-15 Thread Elena Elkina
15 at 6:36 PM, Ralph Castain wrote: > S….are you going to restore the rest of it? Or are we asking Nathan to > refile it without that one piece? > > > On Apr 15, 2015, at 7:26 AM, Elena Elkina wrote: > > Hi Ralph. > > We don't need to revert the whole c

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch revert-520-valgrind_cleanness created. dev-1504-g7a8a4a0

2015-04-15 Thread Elena Elkina
Hi Ralph. We don't need to revert the whole commit, just to fix this small part. I proposed a fast fix for that in the PR but probably we need to fix it more intellectually. Best regards, Elena On Wed, Apr 15, 2015 at 6:08 PM, Ralph Castain wrote: > I’m really puzzled - I saw where you fixed t

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-612-g05af80b

2014-12-24 Thread Elena Elkina
Hi Ralph, As I remember the idea of this code was to create a reply once (and set flag stored to true) but send this reply multiple times (to each process from the list of requests). Flag stored is set to false earlier in the code. It means that once (for the first request in the loop pmix_server_

Re: [OMPI devel] simple_spawn test fails using different set of btls.

2014-11-06 Thread Elena Elkina
pectation (and request when we committed it) that people would >> review and modify them as appropriate. I recall setting the openib scope as >> “remote” only because I wasn’t aware of anyone using it for local comm. >> Since Mellanox obviously is testing for that case, a scope of

[OMPI devel] simple_spawn test fails using different set of btls.

2014-11-05 Thread Elena Elkina
Hi, It looks like there is a problem in trunk which reproduces with simple_spawn test (orte/test/mpi/simple_spawn.c). It seems to be a n issue with pmix. It doesn't reproduce with default set of btls. But it reproduces with several btls specified. For example, salloc -N5 $OMPI_HOME/install/bin/mp

Re: [OMPI devel] OMPI BCOL hang with PMI1

2014-10-17 Thread Elena Elkina
Hi Artem, Actually some time ago there was a known issue with coll ml. I used to run my command lines with -mca coll ^ml to avoid these problems, so I don't know if it was fixed or not. It looks like you have the same problem. Best regards, Elena On Fri, Oct 17, 2014 at 7:01 PM, Artem Polyakov

Re: [OMPI devel] regression with derived datatypes

2014-05-30 Thread Elena Elkina
Hi, It looks like this fix resolved our problems as well. Thanks, Elena On Fri, May 30, 2014 at 4:58 PM, Rolf vandeVaart wrote: > This fixed all of my issues. Thanks. I will add that comment to ticket > also. > > >-Original Message- > >From: devel [mailto:devel-boun...@open-mpi.org]

Re: [OMPI devel] regression with derived datatypes

2014-05-08 Thread Elena Elkina
Hi, My reproducer failed even with one port enabled (-mca btl_openib_if_include mlx4_0:1 ). I tried with trunk as well - the same issue. Best, Elena On Thu, May 8, 2014 at 11:49 AM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > Nathan and George, > > here are the output files o

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Elena Elkina
Yes, this commit is also in the trunk. Best, Elena On Wed, May 7, 2014 at 5:45 PM, Jeff Squyres (jsquyres) wrote: > Is this also happening on the trunk? > > > Sent from my phone. No type good. > > On May 7, 2014, at 9:44 AM, "Elena Elkina" > wrote: > >

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Elena Elkina
regards, Elena On Wed, May 7, 2014 at 5:43 PM, Jeff Squyres (jsquyres) wrote: > Can you cite the branch and SVN r number? > > Sent from my phone. No type good. > > > On May 7, 2014, at 9:24 AM, "Elena Elkina" > wrote: > > > &g

[OMPI devel] regression with derived datatypes

2014-05-07 Thread Elena Elkina
Hi, I've found that commit b531973419a056696e6f88d813769aa4f1f1aee6 doesn't work Author: Jeff Squyres List-Post: devel@lists.open-mpi.org Date: Tue Apr 22 19:48:56 2014 + caused new failures with derived datatypes. Collectives return incorrect results. But it doesn't reproduce on a regular

[OMPI devel] -mca coll "ml" cause segv or hangs with different command lines.

2014-03-04 Thread Elena Elkina
Hi, Recently I often meet hangs and seg faults with different command lines and there are "ml" functions in the stack trace. When I just turn "ml" off by do -mca coll ^ml, problems disappear. For example, oshrun -np 4 --map-by node --display-map ./ring_oshmem fails with seg fault while oshrun -np