Re: [OMPI devel] 1.5.x plans

2010-10-30 Thread Shamis, Pavel
IMHO "B" will require a lot of attention from all developers/vendors, as well it maybe quite time consuming task (btw, I think it is q couple of openib btl changes that aren't on the list). So probably it will be good to ask all btl (or other modules/features) maintainers directly. Personally I

Re: [OMPI devel] openib btl_openib_async_thread poll question

2010-12-21 Thread Shamis, Pavel
According to man pages, only POLLIN or Errors maybe returned in the specific case: The bits returned in revents can include any of those specified in events, or one of the values POLLERR, POLLHUP, or POLLNVAL. (These three bits are meaningless in the events field, and will be set in the reven

Re: [OMPI devel] async thread in openib BTL

2010-12-23 Thread Shamis, Pavel
The async thread is used to handle asynchronous error/notification events, like port up/down, hca errors etc. So most of the time the thread sleeps, and in healthy network you not supposed to see any events. Regards, Pasha On Dec 23, 2010, at 12:49 AM, Eugene Loh wrote: > I'm starting to loo

Re: [OMPI devel] IBV_EVENT_QP_ACCESS_ERR

2011-01-03 Thread Shamis, Pavel
It looks that we are touching some QP that was released. Before close the QP we make sure to complete all outstanding messages on the endpoint. Once all qps (and other resources) are closed , we signal to async thread to remove this hca from monitoring list. For me it looks that somehow we clos

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-12 Thread Shamis, Pavel
RDMACM or OOB can not effect on performance of this benchmark, since they are not involved in communication. So I'm not sure that the performance changes that you see are related to connection manager changes. About oob - I'm not aware about hangs issue there, the code is very-very old, we did n

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-13 Thread Shamis, Pavel
x27;s a > bug in the oob cpc; it's been around for a long, long time; it should be > pretty stable. > > Do we create QP's differently between oob and rdmacm, such that perhaps they > are "better" (maybe better routed, or using a different SL, or ...) when > cr

Re: [OMPI devel] OMPI 1.4.3 hangs in gather

2011-01-16 Thread Shamis, Pavel
doesn't hang (maybe timing is different?). > I'm still trying to understand what are the diffrences in those areas > between 1.4.3 and 1.5 > > > BTW, > Choosing RDMACM fixes hangs and performance issues in all collective > operations. > > Thanks, > Doron &

Re: [OMPI devel] OFED question

2011-01-27 Thread Shamis, Pavel
Unfortunately verbose error reports are not so friendly...anyway , I may think about 2 issues: 1. You trying to open open too much QPs. By default ib devices support fairly large amount of QPs and it is quite hard to push it to this corner. But If your job is really huge it may be the case. Or

Re: [OMPI devel] OFED question

2011-01-27 Thread Shamis, Pavel
, and the nodes are fairly "fat" - it's quad socket, quad > core and they are running 16 MPI ranks for each node. > > Brian > > On Jan 27, 2011, at 6:17 PM, Shamis, Pavel wrote: > >> Unfortunately verbose error reports are not so friendly...anyway , I may

Re: [OMPI devel] OFED question

2011-01-27 Thread Shamis, Pavel
er of >> QPs available per node? The app likely does talk to a large number of peers >> from each process, and the nodes are fairly "fat" - it's quad socket, quad >> core and they are running 16 MPI ranks for each node. >> >> Brian >> >>

Re: [OMPI devel] OFED question

2011-01-28 Thread Shamis, Pavel
ing magical > incantation? > > On 1/27/2011 5:34 PM, Shamis, Pavel wrote: >> --mca btl_openib_receive_queues >> X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Shamis, Pavel
Hello Rolf, CUDA support is always welcome. Please see my comments bellow +#if OMPI_CUDA_SUPPORT +fl->fl_frag_block_alignment = 0; +fl->fl_flags = 0; +#endif [pasha] It seem that the "fl_flags" is a hack that allow you to do the second (cuda) registration in mpool_rdma: +#if OMPI_CUD

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Shamis, Pavel
> >> By default, the code is disable and has to be configured into the library. >> --with-cuda(=DIR) Build cuda support, optionally adding DIR/include, >> DIR/lib, and DIR/lib64 >> --with-cuda-libdir=DIR Search for cuda libraries in DIR > > My

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

2011-04-14 Thread Shamis, Pavel
> >> Actually I'm not sure that it is good idea to enable CUDA by default, since >> it disables zero-copy protocol, that is critical for good performance. > > That can easily be a run-time check during startup. It could be fixed. My point was that in the existing code, it's compile time decisi

[OMPI devel] Open MPI + HWLOC + Static build issue

2011-07-25 Thread Shamis, Pavel
Hello, I have been trying to compile Open MPI (trunk) static version with hwloc, the last is enabled by default in trunk. The build platform is AMD machine, that has dynamic libnuma version only. Problem: Open MPI fails to link orted, because it can't find static version of libnuma. Workaround:

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-07-26 Thread Shamis, Pavel
, Brice Goglin wrote: > Hello Pavel, > Do you have libnuma headers and dynamic lib installed without static lib > installed ? Which distro is this? > Brice > > > > Le 25/07/2011 23:56, Shamis, Pavel a écrit : >> Hello, >> >> I have been trying to compile

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-08-03 Thread Shamis, Pavel
Please see my comments below. > -Original Message- > From: Brice Goglin [mailto:brice.gog...@inria.fr] > Sent: Wednesday, August 03, 2011 10:29 AM > To: Shamis, Pavel > Cc: Open MPI Developers > Subject: Re: [OMPI devel] Open MPI + HWLOC + Static build issue > > I

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-08-03 Thread Shamis, Pavel
> > Err.. I don't quite understand. How exactly are you configuring? If I do > this: > > ./configure --prefix=/home/jsquyres/bogus --disable-mpi-f77 --disable-vt -- > disable-io-romio --disable-mpi-cxx --disable-shared --enable-static --enable- > mpirun-prefix-by-default LDFLAGS=-static > > I

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-08-03 Thread Shamis, Pavel
> > I get static binaries on SLES11 with > ./configure --enable-static --disable-shared LDFLAGS=-static > and > make LDFLAGS=-all-static LIBS=-lpthread > > $ ldd utils/lstopo > not a dynamic executable > $ utils/lstopo > Machine (24GB) > [...] > > No problem with libnuma here, it was

Re: [OMPI devel] Open MPI + HWLOC + Static build issue

2011-08-03 Thread Shamis, Pavel
hat works, then it rules that we have libnuma support. > > Can you send more details on exactly what is failing, and how you make that > happen? > > > On Jul 25, 2011, at 5:56 PM, Shamis, Pavel wrote: > > > Hello, > > > > I have been trying to compile Open M

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25093

2011-08-30 Thread Shamis, Pavel
Hi all, I'm not sure, if it is relevant to this specific commit, but it is relevant for some of epoch changes. I was not able to compile latest trunk version on our cray system, the failure was in ess/alps component, for me it seems like simple typo. I did not have chance to check my fix on our

Re: [OMPI devel] Launcher in trunk is broken?

2011-10-10 Thread Shamis, Pavel
+ 1 , I see the same issue. > -Original Message- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] > On Behalf Of Yevgeny Kliteynik > Sent: Monday, October 10, 2011 10:24 AM > To: OpenMPI Devel > Subject: [OMPI devel] Launcher in trunk is broken? > > It looks like the

Re: [OMPI devel] [EXTERNAL] Re: Rename "vader" BTL to "xpmem"

2011-11-17 Thread Shamis, Pavel
+1, If Nathan don't mind to change the name, then it's ok. Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Nov 17, 2011, at 11:34 AM, Barrett, Brian W wrote: > On 11/17/11 6:29 AM, "Ralph Castain" wrote: >

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106

2012-03-09 Thread Shamis, Pavel
>> Depending on the timing, this might go to 1.6 (1.5.5 has waited for too >> long, and this is not a regression). Keep in mind that the problem has been >> around for *a long, long time*, which is why I approved the diag message >> (i.e., because a real solution is still nowhere in sight). Th

Re: [OMPI devel] RFC: ob1: fallback on put/send on rget failure

2012-03-15 Thread Shamis, Pavel
Nathan, I did not get any patch. Regards, Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Mar 15, 2012, at 5:07 PM, Nathan Hjelm wrote: > > > What: Update ob1 to do the following: >- fallback on sen

Re: [OMPI devel] RFC: ob1: fallback on put/send on rget failure

2012-03-19 Thread Shamis, Pavel
/03/12 08:14, Shamis, Pavel wrote: > >> I did not get any patch. > > It arrived OK here, you can get it from the archive: > > http://www.open-mpi.org/community/lists/devel/2012/03/10717.php > > - -- >Christopher Samuel - Senior Systems Administrator > VLSCI - V

Re: [OMPI devel] RFC: change default for tuned alltoallv to pairwise

2012-03-22 Thread Shamis, Pavel
> >> What: Change coll tuned default to pairwise exchange >> >> Why: The linear algorithm does not scale to any reasonable number of PEs >> >> When: Timeout in 2 days (Fri) >> >> Is there any reason the default should not be changed? > > Nathan, > > I can see why people think the linear algor

Re: [OMPI devel] barrier problem

2012-03-23 Thread Shamis, Pavel
Pavel, Mvapich implements multicore optimized collectives, which perform substantially better than default algorithms. FYI, ORNL team works on new high performance collectives framework for OMPI. The framework provides significant boost in collectives performance. Regards, Pavel (Pasha) Shami

Re: [OMPI devel] Remove Portals support?

2012-03-27 Thread Shamis, Pavel
Probably ORNL-UT Kraken system still use it. I would not be so eager to remove it. Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Mar 23, 2012, at 9:56 AM, Barrett, Brian W wrote: > Hi all - > > This is not

Re: [OMPI devel] Developers Meeting

2012-04-03 Thread Shamis, Pavel
I would like to propose Oak Ridge as a potential location for the meeting. Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Apr 3, 2012, at 11:44 AM, Barrett, Brian W wrote: > Hi all - > > There is discussion o

Re: [OMPI devel] mca_btl_tcp_alloc

2012-04-04 Thread Shamis, Pavel
> In mca_btl_tcp_alloc (openmpi-trunk/ompi/mca/btl/tcp/btl_tcp.c:188) the > first segment is initialized to point to "frag + 1". > I don't get it... how/when is this location allocated? Isn't it just > after the mca_btl_tcp_frag_t structure ends? Alex, The frag allocation macros take the fragmen

Re: [OMPI devel] How to debug segv

2012-04-25 Thread Shamis, Pavel
Alex, +1 vote for core. It is good starting point. * If you can't (from some reason) generate the core file, you may drop while (1) somewhere in the init code and attach the gdb later. * If you are looking for more user-friendly experience, you may try Allinea DDT (they have 30day trial version)

Re: [OMPI devel] Time to unify OpenFabrics configury?

2012-04-27 Thread Shamis, Pavel
It is a good idea to unify the OFED configure scripts. BUT, I would prefer to do this rework after merge with the new collectives component, since we are going to bring totally new IB components based on extended verbs interface and it obviously adds new configure logic. Pavel (Pasha) Shamis --

Re: [OMPI devel] Time to unify OpenFabrics configury?

2012-04-27 Thread Shamis, Pavel
> On Apr 27, 2012, at 10:31 AM, Shamis, Pavel wrote: > >> It is a good idea to unify the OFED configure scripts. BUT, I would prefer >> to do this rework after merge with the new collectives component, since we >> are going to bring totally new IB components

Re: [OMPI devel] Modex

2012-06-13 Thread Shamis, Pavel
> > We currently block on exchange of contact information for the BTL's when we > perform an all-to-all operation we term the "modex". Do we have to do all-to-all or allgather ? allgather should be enough ... > At the end of that operation, each process constructs a list of information > for a

Re: [OMPI devel] OpenIB compile error

2012-06-20 Thread Shamis, Pavel
I hate it ... As far as I understand it is not reason to rename it. The OFED-lovin components should look at $with_openib. Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Jun 20, 2012, at 4:09 PM, Jeff Squyre

Re: [OMPI devel] OpenIB compile error

2012-06-21 Thread Shamis, Pavel
> On Jun 20, 2012, at 4:25 PM, Shamis, Pavel wrote: > >> I hate it ... >> >> As far as I understand it is not reason to rename it. The OFED-lovin >> components should look at $with_openib. > > Ah, sorry -- I didn't think this would be controversial.

Re: [OMPI devel] OpenIB compile error

2012-06-21 Thread Shamis, Pavel
BTW, if people want to rename openib btl to something else and then change the configure scripts - I'm ok. About naming - I would agree with Terry, it makes sense to name it after network API used for this btl - "verbs" (it is not ibverbs). Bottom line, I would recommend to keep configure option

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26707 - in trunk/ompi: config mca/btl/ofud mca/btl/openib mca/common/ofacm mca/common/ofautils mca/dpm

2012-07-02 Thread Shamis, Pavel
> Keep in mind that this is currently not used for the openib BTL. It is only > used in the upcoming OpenFabrics-based collectives component. > > The iWARP-required connector-must-send-first logic is not yet included in > this code, as I understand it. That must be added before it can be used

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26707 - in trunk/ompi: config mca/btl/ofud mca/btl/openib mca/common/ofacm mca/common/ofautils mca/dpm

2012-07-02 Thread Shamis, Pavel
So is ofacm another replacement for ibcm and rdmacm? Essentially it extraction of the OpenIB BTL connection manager functionality (minus rdmacm) from the OpenIB BTL. The idea is to allow access to this functionality for other communication components, like collectives and btls. OFACM supports

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26707 - in trunk/ompi: config mca/btl/ofud mca/btl/openib mca/common/ofacm mca/common/ofautils mca/dpm

2012-07-02 Thread Shamis, Pavel
Yeah, it is going to 1.7 Do you want to move your UD connection manager code there :) Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Jul 2, 2012, at 11:20 AM, Nathan Hjelm wrote: > Nice! Are we moving this to

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26707 - in trunk/ompi: config mca/btl/ofud mca/btl/openib mca/common/ofacm mca/common/ofautils mca/dpm

2012-07-02 Thread Shamis, Pavel
WARP vendor left around > (and iWARP *requires* RDMACM)... > > > On Jul 2, 2012, at 2:05 PM, Shamis, Pavel wrote: > >> >> So is ofacm another replacement for ibcm and rdmacm? >> >> Essentially it extraction of the OpenIB BTL connection manager functionality

Re: [OMPI devel] openib max_cqe

2012-07-05 Thread Shamis, Pavel
> So if I do a run of -np 2 across two separate node than the use of the > max_cqe of my ib device (4194303) is ok. Once I go beyond 1 process on the > node I start getting the memlocked limits message. So how much memory does a > cqe take? Is it 1k by any chance? I ask this because the mach

Re: [OMPI devel] openib max_cqe

2012-07-05 Thread Shamis, Pavel
>> I mentioned on the call that for Mellanox devices (+OFA verbs) this resource >> is really cheap. Do you run mellanox hca + OFA verbs ? > > (I'll reply because I know Terry is offline for the rest of the day) > > Yes, he does. I asked because SUN used to have own verbs driver. > > The heart

Re: [OMPI devel] r27078 and OMPI build

2012-08-21 Thread Shamis, Pavel
Evgeny, I don't have access to Solaris system, but please let me know if there a way to help you. Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Aug 21, 2012, at 11:36 AM, Eugene Loh wrote: r27078 (ML collective

Re: [OMPI devel] r27078 and OMPI build

2012-08-21 Thread Shamis, Pavel
On 8/21/2012 9:31 AM, Ralph Castain wrote: Looks to me like you just need to add a couple of includes and correct a typo - yes? Right. This part is under control. I hope r27100 resolves the issue #1 The library issue sounds like something is

Re: [OMPI devel] r27078 and OMPI build

2012-08-23 Thread Shamis, Pavel
Eugene, Did you have chance to make progress on the issue #2 ? I'm wondering how we want to proceed from here. Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Aug 21, 2012, at 2:19 PM, Eugene Loh wrote: On 8/21/2

Re: [OMPI devel] r27078 and OMPI build

2012-08-23 Thread Shamis, Pavel
n Aug 23, 2012, at 12:59 PM, Eugene Loh wrote: On 8/23/2012 8:58 AM, Shamis, Pavel wrote: Did you have chance to make progress on the issue #2 ? I'm wondering how we want to proceed from here. First of all, thanks for putting back the fixes for issue #1. That build is now successful. Iss

Re: [OMPI devel] r27078 and OMPI build

2012-08-24 Thread Shamis, Pavel
em to help. The build still fails on the same problem. On 8/23/2012 2:14 PM, Shamis, Pavel wrote: Evgeny, I'm wondering if the issue is some how related to the fact that these functions are inline. Can you please, try the attached patch and see what happens ? On Aug 23, 2012, at 12:59 PM,

Re: [OMPI devel] r27078 and OMPI build

2012-08-25 Thread Shamis, Pavel
trange about how it is setup - perhaps the version of Solaris, or it is configuring --enable-static, or... Just trying to assess how general a problem this might be, and thus if this should be a blocker or not. On Aug 24, 2012, at 8:00 AM, Eugene Loh mailto:eugene@oracle.com>> wrote: &

Re: [OMPI devel] r27078 and OMPI build

2012-08-29 Thread Shamis, Pavel
The issue #2 was fixed in r27178. Paul - Thanks for help !!! Regards, Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Aug 21, 2012, at 11:36 AM, Eugene Loh wrote: r27078 (ML collective component) broke some Solari

Re: [OMPI devel] r27078 and OMPI build

2012-08-29 Thread Shamis, Pavel
Eugene, Can you please confirm that the issue is resolved on your setup ? Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Aug 29, 2012, at 10:14 AM, Shamis, Pavel wrote: The issue #2 was fixed in r27178. Paul

Re: [OMPI devel] Warnings in OMPI trunk and 1.7

2012-09-12 Thread Shamis, Pavel
Ralph, Please see our comment inline. > common_allgather.c: In function 'comm_allgather_pml': > common_allgather.c:45: warning: 'recv_iov[1].iov_len' may be used > uninitialized in this function > common_allgather.c:45: warning: 'send_iov[1].iov_len' may be used > uninitialized in this function

Re: [OMPI devel] MPI-RMA on uGNI?

2012-10-22 Thread Shamis, Pavel
Paul, Did you look at mca_btl_ugni_put / mca_btl_ugni_get functions in the ugni btl ? -Pasha I am trying to resolve an odd issue I am seeing with my one uGNI-based code, and was hoping to use OMPI's uGNI support as an example of correct usage. My particular interest is in RDMA, but as far

Re: [OMPI devel] 1.7 rc4 compilation error

2012-10-26 Thread Shamis, Pavel
There is a bug in makefile. The file existing in svn, but it is not listed in the Makefile.am. As a result, it wasn't pulled to the tarball. Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Oct 26, 2012, at 2:33 PM,

Re: [OMPI devel] Trunk warnings in collectives

2012-11-12 Thread Shamis, Pavel
We are looking at this issue... Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Nov 11, 2012, at 8:49 PM, Ralph Castain wrote: Seeing the following warnings in the trunk: bcol_ptpcoll_bcast.c: In function 'bcol_pt

[OMPI devel] Is trunk broken ?

2012-11-12 Thread Shamis, Pavel
I get the following error on the trunk: base/memchecker_base_close.c: In function 'opal_memchecker_base_close': base/memchecker_base_close.c:28: error: implicit declaration of function 'opal_output_close' I may add #include "opal/util/output.h" to the file, but then it fails in other components

Re: [OMPI devel] Is trunk broken ?

2012-11-12 Thread Shamis, Pavel
Debug build works. --with-platform=optimized is broken Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Nov 12, 2012, at 3:44 PM, Shamis, Pavel wrote: I get the following error on the trunk: base

Re: [OMPI devel] bcol basesmuma maintainer?

2013-01-02 Thread Shamis, Pavel
Brian, I will take a look. Thanks for the patch ! Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Jan 2, 2013, at 4:37 PM, Barrett, Brian W wrote: Hi all - Who's maintaining the bcol basesmuma component? I'd lik

Re: [OMPI devel] bcol basesmuma maintainer?

2013-01-03 Thread Shamis, Pavel
Brian, The patch looks good. Please go ahead and push it. Thanks ! Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Jan 2, 2013, at 4:37 PM, Barrett, Brian W wrote: Hi all - Who's maintaining the bcol basesmuma com

Re: [OMPI devel] Compiling OpenMPI 1.7 with LLVM clang or llvm-gcc

2013-01-08 Thread Shamis, Pavel
Ken, I have no problem to compile OMPI trunk with llvm-gcc-4.2 (os x 10.8) Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Jan 7, 2013, at 3:49 PM, Kenneth A. Lloyd wrote: > Has anyone experienced any problems c

Re: [OMPI devel] Trunk: Link Failure -- multiple definition of ib_address_t_class

2013-04-04 Thread Shamis, Pavel
I pushed a bugfix to trunk (r28289). I don't have an access to a platform with XRC (MOFED) installation, so this is a "blind" bugfix. If you have a system with XRC, please test this revision. Hopefully this resolves the problem. Regards, - Pavel (Pasha) Shamis On Apr 4, 2013, at 3:28 PM, Ralph

Re: [OMPI devel] Trunk: Link Failure -- multiple definition of ib_address_t_class

2013-04-04 Thread Shamis, Pavel
system where I can reproduce the problem, but don't have up-to-date autotools. So, I can only test from a tarball. If somebody can roll me a tarball of r28289 I can test ASAP. Otherwise I'll try to remember to test from tonight's trunk nightly once it appears. -Paul On Thu, Apr 4, 20

Re: [OMPI devel] Trunk: Link Failure -- multiple definition of ib_address_t_class

2013-04-04 Thread Shamis, Pavel
r.bz2 PASSES $ make all $ make install $ make check -Paul On Thu, Apr 4, 2013 at 3:12 PM, Ralph Castain mailto:r...@open-mpi.org>> wrote: Available on the web site now: http://www.open-mpi.org/nightly/trunk/ On Apr 4, 2013, at 2:13 PM, "Shamis, Pavel" mailto:sham...@ornl.gov&

Re: [OMPI devel] Trunk: Link Failure -- multiple definition of ib_address_t_class

2013-04-05 Thread Shamis, Pavel
Paul (K.), I fixed the problem in trunk r28289. Can you please test the revision with your build environment. Regards, Pavel (Pasha) Shamis On Apr 5, 2013, at 4:26 AM, Paul Kapinos wrote: > Hello, > > On 04/05/13 03:16, Paul Hargrove wrote: >> I found that not only did I need XRC, but I

[OMPI devel] OMPI 1.7 - libevent warning

2013-07-01 Thread Shamis, Pavel
Open MPI version:1.7.2 on IB system. Test: everybody sends to everybody - Irecv, Isend, Wait. In total 1024 process. Warning: "[warn] opal_libevent2019 each event_base at once. [warn] opal_libevent2019_event_base_loop: reentrant invocation. Only one event_base_loop can run on each event_base a

Re: [OMPI devel] RFC: Dead code removal

2013-07-05 Thread Shamis, Pavel
> - coll ml This one is used. > > - sbgp basemsocket This one is used as well -P.

Re: [OMPI devel] Annual OMPI membership review: SVN accounts

2013-07-08 Thread Shamis, Pavel
All ORNL's accounts should stay active as well. Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Jul 8, 2013, at 6:32 PM, Jeff Squyres (jsquyres) mailto:jsquy...@cisco.com>> wrote: According to https://svn.open-mpi

Re: [OMPI devel] RFC: Dead code removal

2013-07-11 Thread Shamis, Pavel
Jeff, I reviewed the changes in the collectives code(ml,bcol,sbgp) - everything looks fine. Thanks for the cleanup. -P. On Jul 5, 2013, at 9:56 AM, Jeff Squyres (jsquyres) mailto:jsquy...@cisco.com>> wrote: They are assigned but not used. On Jul 5, 2013, at 8:47 AM, "Sh

Re: [OMPI devel] OpenSHMEM round 2

2013-08-06 Thread Shamis, Pavel
Josh, I get 404 error. Probably you have to unlock it. Best, -P From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Joshua Ladd Sent: Tuesday, August 06, 2013 12:30 PM To: Open MPI Developers (de...@open-mpi.org) Subject: [OMPI devel] OpenSHMEM round 2 Dear OMPI Co

Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

2013-08-14 Thread Shamis, Pavel
Ralph, There is OpenSHMEM test suite http://bongo.cs.uh.edu/site/sites/default/site_files/openshmem-test-suite-release-1.0d.tar.bz2 The test-suite exercises most of the API. Best, Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Labor

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
When I looked at the code last time - no. (The connection state machine is very different) Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Nov 14, 2013, at 11:51 AM, Jeff Squyres (jsquyres) mailto:jsquy...@cisco.co

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
There is some confusion in the thread. UDCM is just another CPC, like XOOB, OOB, and RDMACM (I think IBCM is officially dead). XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication. OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM OFACM supports (at least last time when we ch

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
see if the ofacm code > even works any more?? > > Only oob and xoob components appear to be present - so unless someone fixed > those since they were originally copied from openib, I doubt ofacm works. > > > On Nov 14, 2013, at 11:08 AM, Shamis, Pavel wrote: > >> Th

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
Comments inline. > > 3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm in > that move. Never changed openib to use ofacm/common. Pasha: This is not entirely true. I changed openib btl ~3 year ago before my departure from Mellanox. (I sent link to the code earlier). W

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
> > 1. Ralph made the OOB asynchronous. > Ralph, I'm not familiar with details of the change. If out-of-band communication is supported, it should not be that huge change for XOOB/OOB.

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
> So far as I can tell, the issue is one of blocking. The OOB handshake is now > async - i.e., you post a non-blocking recv at the beginning of time, and then > do a non-blocking send to the other side when you want to create a > connection. The question is: how do you know when that connection

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
> The only change is that the receive callback is now occurring in the ORTE > event thread, and so perhaps someone needs to look at a way to pass that back > into the OMPI event base (which I guess is the OPAL event base)? Just > glancing at the code, it looks like that could be the issue - but

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
callback. How does it work for other parts of OMPI (sm, communicator) ? I guess they don't do anything in the callbacks ? Best, Pasha On Nov 14, 2013, at 6:35 PM, Ralph Castain wrote: > > On Nov 14, 2013, at 3:33 PM, Shamis, Pavel wrote: > >> >>> The on

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-14 Thread Shamis, Pavel
, -Pasha On Nov 14, 2013, at 7:25 PM, Ralph Castain wrote: > > On Nov 14, 2013, at 4:22 PM, Shamis, Pavel wrote: > >> Well, this is major change in a behavior. >> >> Since openib calls communication calls from the callback >> it pretty much requires to enable

Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib ompi/mca/btl/openib/connect

2013-11-21 Thread Shamis, Pavel
>>> 3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm >>> in that move. Never changed openib to use ofacm/common. >> Pasha: This is not entirely true. I changed openib btl ~3 year ago before >> my departure from Mellanox. (I sent link to the code earlier). >> We (commun

Re: [OMPI devel] RFC: OB1 optimizations

2014-01-07 Thread Shamis, Pavel
Overall it looks good. It would be helpful to validate performance numbers for other interconnects as well. -Pasha > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan > Hjelm > Sent: Tuesday, January 07, 2014 6:45 PM > To: Open MPI Developers List >

Re: [OMPI devel] Still getting 100% trunk failure on 32 bit platform: coll ml

2014-01-30 Thread Shamis, Pavel
Let me know if you need y help. Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Jan 30, 2014, at 10:27 AM, Nathan Hjelm mailto:hje...@lanl.gov>> wrote: Ok. Looks like I need to fix one more. Will take a look now.

Re: [OMPI devel] Bcol/mcol violations

2014-02-07 Thread Shamis, Pavel
Can you please give a try to the attached hot-fix. It unrolls most of the spaghetti, except the iboffload component (which is anyway disabled). Sorry for the mess. Best, Pasha On Feb 7, 2014, at 10:52 AM, Nathan Hjelm mailto:hje...@lanl.gov>> wrote: On Fri, Feb 07, 2014 at 07:46:03AM -0800, Ra

Re: [OMPI devel] Bcol/mcol violations

2014-02-07 Thread Shamis, Pavel
Exchange is evil…. Attached. Best, P p4.patch.gz Description: p4.patch.gz On Feb 7, 2014, at 12:41 PM, Nathan Hjelm <hje...@lanl.gov> wrote:Can you gzip the patch. The local exchange server has a habit ofconverting LF to CRLF.-NathanOn Fri, Feb 07, 2014 at 12:14:02PM -0500, Shamis,

Re: [OMPI devel] 1.7.5 end-of-week status report

2014-03-17 Thread Shamis, Pavel
> > I thought ORNL had addresed the cross-linkage as well. I am sure they > will get a fix for that in the next couple of days. This was unused h file. I fixed it. -Pasha

Re: [OMPI devel] 答复: 答复: doubt on latency result with OpenMPI library

2014-03-28 Thread Shamis, Pavel
> On Mar 27, 2014, at 11:45 PM, "Wang,Yanfei(SYS)" > wrote: > >> 1. In the RoCE, we cannot use OOB(via tcp socket) for RDMA connection. > > More specifically, RoCE QPs can only be made using the RDMA connection > manager. Technically you may setup RoCE connection without RDMA CM. The vers

Re: [OMPI devel] [OMPI svn] svn:open-mpi r31302 - in trunk: opal/mca/base orte/tools/orterun

2014-04-03 Thread Shamis, Pavel
> mca param file treats any key=val as mca parameter only. > In order to add parser support for something that is not mca param, will > require change file syntax and it will look bad, i.e.: > > mca btl = sm,self,openib > env DISPLAY = console:0 > > I think the current implementation is less in

Re: [OMPI devel] [devel-core] OMPI MCA components - track external libs versions

2014-04-14 Thread Shamis, Pavel
+1. This is very helpful info to have. Best, Pavel (Pasha) Shamis On Apr 14, 2014, at 2:57 PM, Mike Dubman mailto:mi...@dev.mellanox.co.il>> wrote: sure, lets discuss it on the next telecon in 1w (Mellanox IL is OOO for holidays and Josh is on vacation). I think it is very good feature from e

Re: [OMPI devel] SHMEM symmetric objects in shared libraries

2014-07-29 Thread Shamis, Pavel
> > On 05/10/2014 02:46 PM, Bert Wesarg wrote: >> Hi, >> >> Btw, I'm pretty confident, that this Open SHMEM implementation does not >> recognize global or static variables in shared libraries as symmetric >> objects. It is probably wise to note this somewhere to the users. > > I've never got an

Re: [OMPI devel] SHMEM symmetric objects in shared libraries

2014-07-29 Thread Shamis, Pavel
then in your main example below do a shmem_long_fadd on my_dso_val. It won’t work unless you’ve put smarts in the shmem library to go through the segments of loaded shared libraries and register them with the same mechanism used for the data segment of the executable. In this case the "smart" pa

Re: [OMPI devel] SHMEM symmetric objects in shared libraries

2014-07-29 Thread Shamis, Pavel
Btw, I'm pretty confident, that this Open SHMEM implementation does not recognize global or static variables in shared libraries as symmetric objects. It is probably wise to note this somewhere to the users. >>> >>> I've never got an reply to this query. Any comments on it? >>

Re: [OMPI devel] SHMEM symmetric objects in shared libraries

2014-07-29 Thread Shamis, Pavel
Is v1.1 posted somewhere? I don't see it up on the LBNL site. www.openshmem.org<http://www.openshmem.org> "Get documentation" - > "Specification" (For some reason I can not get direct link) Pasha Josh On Tue, Jul 29, 2014 at 2:05 PM, Shamis, Pave

Re: [OMPI devel] v1.5: sigsegv in case of extremely low settings in the SRQs

2010-06-23 Thread Shamis, Pavel
Good catch. The patch looks ok for me. Regards --- Pavel Shamis (Pasha) sham...@ornl.gov On Jun 18, 2010, at 11:10 AM, nadia.derbey wrote: > Hi, > > Reference is the v1.5 branch > > If an SRQ has the following settings: S,,4,2,1 > > 1) setup_qps() sets the following: > mca_btl_openib_compone

Re: [OMPI devel] autogen.sh improvements

2010-08-31 Thread Shamis, Pavel
Jeff, Is the autogen changes are public available? I would like to see the code. Thanks. On Aug 16, 2010, at 10:55 AM, Jeff Squyres wrote: > I just wanted to give the community a heads up that Ralph, Brian, and I are > revamping autogen in a Mercurial branch. I don't know the exact timeline

Re: [OMPI devel] openib btl - fatal errors don't abort the job

2010-09-07 Thread Shamis, Pavel
On Sep 3, 2010, at 8:14 AM, Jeff Squyres wrote: > On Sep 1, 2010, at 4:47 PM, Steve Wise wrote: > >> I was wondering what the logic is behind allowing an MPI job to continue in >> the presence of a fatal qp error? > > It's a feature...? The idea was that in some near future we will be able to

Re: [OMPI devel] coll/ml without hwloc (?)

2014-08-26 Thread Shamis, Pavel
Theoretically, we may make it functional (with good performance) even without hwloc. As it is today, I would suggest to disable ML if hwloc is disabled. Best, Pasha > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles > Gouaillardet > Sent: Tuesday,

Re: [OMPI devel] segfault in openib component on trunk

2014-08-29 Thread Shamis, Pavel
I was under impression that mca_tl_openib_tune_endpoint supposed to handle the miss-match between tunings of different devices. Few years ago we did some "extreme" inter-operability testing and ompi handled all cases really well. I'm not sure if I understand correctly what is the "core" issue.

Re: [OMPI devel] Need to know your Github ID

2014-09-10 Thread Shamis, Pavel
Jeff, pasha -> shamisp > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff > Squyres (jsquyres) > Sent: Wednesday, September 10, 2014 6:46 AM > To: Open MPI Developers List > Subject: [OMPI devel] Need to know your Github ID > > As the next step of the

Re: [OMPI devel] [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-25 Thread Shamis, Pavel
As I read this thread - this issue is not related to the ML bootstrap itself, but the naming conflict between public functions in HCOLL and ML. Did I get it right ? If this the case, we can work with Mellanox folks to resolve this conflict. Best, Pavel (Pasha) Shamis --- Computer Science Rese

  1   2   >