Re: [OMPI users] (no subject)

2016-02-22 Thread Mike Dubman
Hi, it seems that your ompi was compiled with ofed ver X but running on ofed ver Y. X and Y are incompatible. On Mon, Feb 22, 2016 at 8:18 PM, Mark Potter wrote: > I am usually able to find the answer to my problems by searching the > archive but I've run up against one that I can't suss out. >

Re: [OMPI users] hcoll API in 1.10.1

2015-12-24 Thread Mike Dubman
> > > ../../../../../../../../ompi/mca/coll/hcoll/coll_hcoll_module.c:263: > warning: implicit declaration of function > 'hcoll_check_mem_release_cb_needed' > > > > Cheers, > > Ben > > > > *From:* users [mailto:users-boun...@open-mpi.org] *On Behalf Of *

Re: [OMPI users] hcoll API in 1.10.1

2015-12-23 Thread Mike Dubman
Hi, hcoll is part of MOFED or comes from HPCx. what version of hcoll do you have on your system? Thx On Wed, Dec 23, 2015 at 4:58 AM, Ben Menadue wrote: > Hi, > > It's probably in plain sight somewhere and I missed it, but is there a > minimum version of hcoll needed to build 1.10.1? > > We hav

Re: [OMPI users] hcoll dependency on mxm configure error

2015-10-21 Thread Mike Dubman
re configure got it to work, which I didn't > expect. Thanks for the tip! I didn't realize that loading in a shared > library of a library that is being linked in on the active compile line > fell under the runtime portion of linking, and could be affected by using > LD_LIB

Re: [OMPI users] hcoll dependency on mxm configure error

2015-10-21 Thread Mike Dubman
Hi David, what linux distro do you use? (and mofed version)? Do you have /etc/ld.conf.d/mxm.conf file? Can you please try add LD_LIBRARY_PATH=/opt/mellanox/mxm/lib ./configure ? Thanks On Wed, Oct 21, 2015 at 6:40 PM, David Shrader wrote: > I should probably point out that libhcoll.so does

Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-10-06 Thread Mike Dubman
these flags available in master and v1.10 branches and make sure that ranks to core allocation is done starting from cpu socket closer to the HCA. Of course you can have same effect with taskset. On Mon, Oct 5, 2015 at 8:46 PM, Dave Love wrote: > Mike Dubman writes: > > > what is

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-10-01 Thread Mike Dubman
well. (there is a reason that any MPI have hundreds of knobs) On Thu, Oct 1, 2015 at 1:50 PM, Dave Love wrote: > Mike Dubman writes: > > > we did not get to the bottom for "why". > > Tried different mpi packages (mvapich,intel mpi) and the observation hold > &g

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-10-01 Thread Mike Dubman
ance implications. Please set the heap size to the default value > (10240) > > Should say stack not heap. > > -Nathan > > On Wed, Sep 30, 2015 at 06:52:46PM +0300, Mike Dubman wrote: > >mxm comes with mxm_dump_config utility which provides and explains all > >

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-09-30 Thread Mike Dubman
mxm comes with mxm_dump_config utility which provides and explains all tunables. Please check HPCX/README file for details. On Wed, Sep 30, 2015 at 1:21 PM, Dave Love wrote: > Mike Dubman writes: > > > unfortunately, there is no one size fits all here. > > > > mxm provi

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-09-30 Thread Mike Dubman
we did not get to the bottom for "why". Tried different mpi packages (mvapich,intel mpi) and the observation hold true. it could be many factors affected by huge heap size (cpu cache misses? swapness?). On Wed, Sep 30, 2015 at 1:12 PM, Dave Love wrote: > Mike Dubman writes

Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-09-29 Thread Mike Dubman
what is your command line and setup? (ofed version, distro) This is what was just measured w/ fdr on haswell with v1.8.8 and mxm and UD + mpirun -np 2 -bind-to core -display-map -mca rmaps_base_mapping_policy dist:span -x MXM_RDMA_PORTS=mlx5_3:1 -mca rmaps_dist_device mlx5_3:1 -x MXM_TLS=self,sh

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-09-29 Thread Mike Dubman
th the value and see > >what works for your applications. Most applications should be using > >malloc or similar functions to allocate large memory regions in the heap > >and not on the stack. > > > >-Nathan > > > >On Mon, Sep 28, 2015 at 08:01:09PM +0300

Re: [OMPI users] Using OpenMPI (1.8, 1.10) with Mellanox MXM, ulimits ?

2015-09-28 Thread Mike Dubman
Hello Grigory, We observed ~10% performance degradation with heap size set to unlimited for CFD applications. You can measure your application performance with default and unlimited "limits" and select the best setting. Kind Regards. M On Mon, Sep 28, 2015 at 7:36 PM, Grigory Shamov wrote: >

Re: [OMPI users] No suitable active ports warning and -mca btl_openib_if_include option

2015-06-17 Thread Mike Dubman
Hi, the message in question belongs to MXM and it is warning (silenced in latter releases of MXM). To select specific device in MXM, please pass: mpirun -x MXM_IB_PORTS=mlx4_0:2 ... M On Wed, Jun 17, 2015 at 9:38 PM, Na Zhang wrote: > Hi all, > > I am trying to launch MPI jobs (with version o

Re: [OMPI users] MXM problem

2015-05-28 Thread Mike Dubman
? > "-x LD_PRELOAD=$HPCX_MXM_DIR/debug/lib/libmxm.so -x MXM_LOG_LEVEL=data" > > Also, could you please attach the entire output of > "$HPCX_MPI_DIR/bin/ompi_info -a" > > Thank you, > Alina. > > On Tue, May 26, 2015 at 3:39 PM, Mike Dubman <https

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-27 Thread Mike Dubman
e is empty, and > you just end up appending "-L" instead of "-L/something". So why not just > check to ensure that the variable is not empty? > > > > > On May 26, 2015, at 3:27 PM, Mike Dubman > wrote: > > > > in that case, O

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
libdir will be empty. > > Right? > > > > On May 26, 2015, at 1:28 PM, Mike Dubman > wrote: > > > > Thanks Jeff! > > > > but in this line: > > > > > https://github.com/open-mpi/ompi/blob/master/config/ompi_check_mxm.m4#L36 > > > >

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
ly). Thus, ompi_check_mxm_libdir never gets assigned which > results in just "-L" getting used on line 41. The same behavior could be > found by using '--with-mxm=yes'. > > Thanks, > David > > > On 05/26/2015 11:28 AM, Mike Dubman wrote: >

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
/blob/master/config/ompi_check_mxm.m4#L41 > > doesn't check to see if $ompi_check_mxm_libdir is empty. > > > > On May 26, 2015, at 11:50 AM, Mike Dubman > wrote: > > > > David, > > Could you please send me your config.log file? > > > > Looking i

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
David, Could you please send me your config.log file? Looking into config/ompi_check_mxm.m4 macro I don`t understand how it could happen. Thanks a lot. On Tue, May 26, 2015 at 6:41 PM, Mike Dubman wrote: > Hello David, > Thanks for info and patch - will fix ompi configure logic wit

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-26 Thread Mike Dubman
ed from the linking commands and make > completed fine. > > So, it looks like there are two solutions: move the install location of > mxm to not be in system-space or modify configure. Which one would be the > better one for me to pursue? > > Thanks, > David > > >

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Mike Dubman
btw, what is a rationale to run in chroot env? is it dockers-like env? does "ibv_devinfo -v" works for you from chroot env? On Tue, May 26, 2015 at 7:08 AM, Rahul Yadav wrote: > Yes Ralph, MXM cards are on the node. Command runs fine if I run it out of > the chroot environment. > > Thanks > R

Re: [OMPI users] MXM problem

2015-05-25 Thread Mike Dubman
e_mtu: 4096 (5) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > Best regards, > Timur. > > > Понедельник, 25 мая 2015, 19:39 +03:00 от Mike Dubman

Re: [OMPI users] MXM problem

2015-05-25 Thread Mike Dubman
Hi Timur, seems that yalla component was not found in your OMPI tree. can it be that your mpirun is not from hpcx? Can you please check LD_LIBRARY_PATH,PATH, LD_PRELOAD and OPAL_PREFIX that it is pointing to the right mpirun? Also, could you please check that yalla is present in the ompi_info -l 9

Re: [OMPI users] 1.8.5, mxm, and a spurious '-L' flag

2015-05-23 Thread Mike Dubman
Hi, How mxm was installed? by copying? The rpm based installation places mxm into /opt/mellanox/mxm and not into /usr/lib64/libmxm.so. Do you use HPCx (pack of OMPI and MXM and FCA)? You can download HPCX, extract it anywhere and compile OMPI pointing to mxm location under HPCX. Also, HPCx cont

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-28 Thread Mike Dubman
eck that I am > indeed using #2 ? > > Subhra > > On Fri, Apr 24, 2015 at 12:55 AM, Mike Dubman > wrote: > >> yes >> >> #1 - ob1 as pml, openib openib as btl (default: rc) >> #2 - yalla as pml, mxm as IB library (default: ud, use "-x >> MXM_TLS

Re: [OMPI users] MPI_Finalize not behaving correctly, orphaned processes

2015-04-26 Thread Mike Dubman
verbs (which was admittedly a long > time ago), the sample I pasted would segv... > > > > On Apr 24, 2015, at 9:40 AM, Mike Dubman > wrote: > > > > ibv_fork_init() will set special flag for madvise() > (IBV_DONTFORK/DOFORK) to inherit (and not cow) registered/locked

Re: [OMPI users] MPI_Finalize not behaving correctly, orphaned processes

2015-04-24 Thread Mike Dubman
// in the child > *buffer = 3; > // ... > } > ---- > > > > > On Apr 24, 2015, at 2:54 AM, Mike Dubman > wrote: > > > > btw, ompi master now calls ibv_fork_init() before initializing > btl/mtl/oob frameworks and all fork fears should be addressed. > >

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-24 Thread Mike Dubman
iband but in different ways? > > Thanks, > Subhra. > > > > On Thu, Apr 23, 2015 at 11:57 PM, Mike Dubman > wrote: > >> HPCX package uses pml "yalla" by default (part of ompi master branch, not >> in v1.8). >> So, "-mca mtl mxm" has no effec

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-24 Thread Mike Dubman
mtl ^mxm -n 1 /root/backend > localhost : -x LD_PRELOAD=/root/libci.so -n 1 /root/app2 > > Seems like it doesn't matter if I use mxm, not use mxm or use it with > reliable connection (RC). How can I be sure I am indeed using mxm over > infiniband? > > Thanks, > Subhra.

Re: [OMPI users] MPI_Finalize not behaving correctly, orphaned processes

2015-04-24 Thread Mike Dubman
btw, ompi master now calls ibv_fork_init() before initializing btl/mtl/oob frameworks and all fork fears should be addressed. On Fri, Apr 24, 2015 at 4:37 AM, Jeff Squyres (jsquyres) wrote: > Disable the memory manager / don't use leave pinned. Then you can > fork/exec without fear (because on

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-23 Thread Mike Dubman
hra. > > On Tue, Apr 21, 2015 at 10:43 PM, Mike Dubman > wrote: > >> cool, progress! >> >> >>1429676565.124664] sys.c:719 MXM WARN Conflicting CPU >> frequencies detected, using: 2601.00 >> >> means that cpu governor on your machine is

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-22 Thread Mike Dubman
> ibv_query_device() returned 38: Function not implemented > -- > Initialization of MXM library failed. > > Error: Input/output error > >

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-18 Thread Mike Dubman
== > > -- > > mpirun noticed that process rank 1 with PID 450 on node JARVICE exited on > signal 11 (Segmentation fault). > > -- >

Re: [OMPI users] Select a card in a multi card system

2015-04-15 Thread Mike Dubman
Hi, With MXM, you can specify list of devices to use for communication: -x MXM_IB_PORTS="mlx5_1:1,mlx4_1:1" also select specific or all transpoirts: -x MXM_TLS=shm,self,ud To change port rate one can use *ibportstate* *http://www.hpcadvisorycouncil.com/events/2011/switzerland_workshop/pdf/Pres

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-14 Thread Mike Dubman
egmentation fault). > -- > [JARVICE:00562] 1 more process has sent help message help-mca-base.txt / > find-available:not-valid > [JARVICE:00562] Set MCA parameter "orte_base_help_aggregate" to 0 to see >

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-13 Thread Mike Dubman
> -- > mpirun noticed that process rank 0 with PID 8398 on node JARVICE exited on > signal 11 (Segmentation fault). > ---

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-04-10 Thread Mike Dubman
ult? > > Thanks, > Subhra. > > > On Tue, Mar 31, 2015 at 9:46 AM, Mike Dubman > wrote: > >> Hi, >> mxm uses IB rdma/roce technologies. Once can select UD/RC/DC transports >> to be used in mxm. >> >> By selecting mxm, all MPI p2p routines will be

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-03-31 Thread Mike Dubman
band rdma? Also from programming perspective, > do I need to use anything else other than MPI_Send/MPI_Recv? > > Thanks, > Subhra. > > > On Sun, Mar 29, 2015 at 11:14 PM, Mike Dubman > wrote: > >> Hi, >> openib btl does not support this thread model. >> You

Re: [OMPI users] MPI_THREAD_MULTIPLE and openib btl

2015-03-30 Thread Mike Dubman
Hi, openib btl does not support this thread model. You can use OMPI w/ mxm (-mca mtl mxm) and multiple thread mode lin 1.8 x series or (-mca pml yalla) in the master branch. M On Mon, Mar 30, 2015 at 9:09 AM, Subhra Mazumdar wrote: > Hi, > > Can MPI_THREAD_MULTIPLE and openib btl work together

Re: [OMPI users] Determine IB transport type of OpenMPI job

2015-01-11 Thread Mike Dubman
Hi, also - you can use mxm library (which support RC,UD,DC and mixes) and comes as part of Mellanox OFED. The version for community OFED is also available from http://mellanox.com/products/hpcx On Fri, Jan 9, 2015 at 4:03 PM, Sasso, John (GE Power & Water, Non-GE) < john1.sa...@ge.com> wrote: >

Re: [OMPI users] ERROR: C_FUNLOC function

2014-12-18 Thread Mike Dubman
Hi Siegmar, Could you please check the /etc/mtab file for real FS type for the following mount points: get_mounts: dirs[16]:/misc fs:autofs nfs:No get_mounts: dirs[17]:/net fs:autofs nfs:No get_mounts: dirs[18]:/home fs:autofs nfs:No could you please check if mntent.h and paths.h were detected by

Re: [OMPI users] shmalloc error with >=512 mb

2014-11-17 Thread Mike Dubman
Hi, the default memheap size is 256MB, you can override it with oshrun -x SHMEM_SYMMETRIC_HEAP_SIZE=512M ... On Mon, Nov 17, 2014 at 3:38 PM, Timur Ismagilov wrote: > Hello! > Why does shmalloc return NULL when I try to allocate 512MB. > When i thry to allocate 256mb - all fine. > I use Open MPI

Re: [OMPI users] Building on a host with a shoddy OpenFabrics installation

2014-10-11 Thread Mike Dubman
Hi, yep - you can compile OFED/MOFED in the $HOME/ofed dir and point OMPI configure to it with "--with-verbs=/path/to/ofed/install". You can download and compile "libibverbs","libibumad","libibmad","librdmacm","opensm","infiniband-diags" packages only with custom prefix. M On Fri, Oct 10, 2014

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-28 Thread Mike Dubman
btw, you may want to use latest mxm v3.1 which is part of hpcx package http://www.mellanox.com/products/hpcx On Thu, Aug 28, 2014 at 4:10 AM, Brock Palen wrote: > Brice, et al. > > Thanks a lot for this info. We are setting up new builds of OMPI 1.8.2 > with knem and mxm 3.0, > > If we have qu

Re: [OMPI users] long initialization

2014-08-22 Thread Mike Dubman
2 AM, Timur Ismagilov > wrote: > > Have i I any opportunity to run mpi jobs? > > > Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain >: > > yes, i know - it is cmr'd > > On Aug 20, 2014, at 10:26 AM, Mike Dubman > wrote: > > btw, we get same error

Re: [OMPI users] Clarification about OpenMPI, slurm and PMI interface

2014-08-21 Thread Mike Dubman
Hi FIlippo, I think you can use SLURM_LOCALID var (at least with slurm v14.03.4-2) $srun -N2 --ntasks-per-node 3 env |grep SLURM_LOCALID SLURM_LOCALID=1 SLURM_LOCALID=2 SLURM_LOCALID=0 SLURM_LOCALID=0 SLURM_LOCALID=1 SLURM_LOCALID=2 $ Kind Regards, M On Thu, Aug 21, 2014 at 9:27 PM, Ralph Cas

Re: [OMPI users] ORTE daemon has unexpectedly failed after launch

2014-08-20 Thread Mike Dubman
btw, we get same error in v1.8 branch as well. On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain wrote: > It was not yet fixed - but should be now. > > On Aug 20, 2014, at 6:39 AM, Timur Ismagilov wrote: > > Hello! > > As i can see, the bug is fixed, but in Open MPI v1.9a1r32516 i still have > t

Re: [OMPI users] No log_num_mtt in Ubuntu 14.04

2014-08-19 Thread Mike Dubman
so, it seems you have old ofed w/o this parameter. Can you install latest Mellanox ofed? or check which community ofed has it? On Tue, Aug 19, 2014 at 9:34 AM, Rio Yokota wrote: > Here is what "modinfo mlx4_core" gives > > filename: > > /lib/modules/3.13.0-34-generic/kernel/drivers/net/ether

Re: [OMPI users] No log_num_mtt in Ubuntu 14.04

2014-08-18 Thread Mike Dubman
most likely you installing old ofed which does not have this parameter: try: #modinfo mlx4_core and see if it is there. I would suggest install latest OFED or Mellanox OFED. On Mon, Aug 18, 2014 at 9:53 PM, Rio Yokota wrote: > I get "ofed_info: command not found". Note that I don't install t

Re: [OMPI users] No log_num_mtt in Ubuntu 14.04

2014-08-18 Thread Mike Dubman
Hi, what ofed version do you use? (ofed_info -s) On Sun, Aug 17, 2014 at 7:16 PM, Rio Yokota wrote: > I have recently upgraded from Ubuntu 12.04 to 14.04 and OpenMPI gives the > following warning upon execution, which did not appear before the upgrade. > > WARNING: It appears that your OpenFabr

Re: [OMPI users] mpi+openshmem hybrid

2014-08-14 Thread Mike Dubman
You can use hybrid mode. following code works for me with ompi 1.8.2 #include #include #include "shmem.h" #include "mpi.h" int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); start_pes(0); { int version = 0; int subversion = 0; int num_proc = 0;

Re: [OMPI users] openib component not available

2014-07-24 Thread Mike Dubman
Hi, The openib btl is not compatible with "thread multiple" paradigm. You need to use mxm (lib on top of verbs) for ompi and threads. mxm is part of MOFED or you can download HPCX package (tarball of ompi + mxm) from http://mellanox.com/products/hpcx M On Thu, Jul 24, 2014 at 1:06 PM, madhurima

Re: [OMPI users] Salloc and mpirun problem

2014-07-16 Thread Mike Dubman
please add following flags to mpirun "--mca plm_base_verbose 10 --debug-daemons" and attach output. Thx On Wed, Jul 16, 2014 at 11:12 AM, Timur Ismagilov wrote: > Hello! > I have Open MPI v1.9a1r32142 and slurm 2.5.6. > > I can not use mpirun after salloc: > > $salloc -N2 --exclusive -p test -J

Re: [OMPI users] poor performance using the openib btl

2014-06-25 Thread Mike Dubman
Hi what ofed/mofed are you using? what HCA, distro and command line? M On Wed, Jun 25, 2014 at 1:40 AM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > What are your threading options for OpenMPI (when it was built) ? > > I have seen OpenIB BTL completely lock when some le

Re: [OMPI users] [warn] Epoll ADD(1) on fd 0 failed

2014-06-10 Thread Mike Dubman
btw, the output comes from ompi`s libevent and not from slurm itself (sorry about confusion and thanks to Yossi for catching this) opal/mca/event/libevent2021/libevent/epoll.c: event_warn("Epoll %s(%d) on fd %d failed. Old events were %d; read change was %d (%s); write change was %d (%s)", opal/

Re: [OMPI users] OPENIB unknown transport errors

2014-06-07 Thread Mike Dubman
could you please attach output of "ibv_devinfo -v" and "ofed_info -s" Thx On Sat, Jun 7, 2014 at 12:53 AM, Tim Miller wrote: > Hi Josh, > > I asked one of our more advanced users to add the "-mca btl_openib_if_include > mlx4_0:1" argument to his job script. Unfortunately, the same error > occur

Re: [OMPI users] spml_ikrit_np random values

2014-06-06 Thread Mike Dubman
fixed here: https://svn.open-mpi.org/trac/ompi/changeset/31962 Thanks for report. On Thu, Jun 5, 2014 at 7:45 PM, Mike Dubman wrote: > seems oshmem_info uses uninitialized value. > we will check it, thanks for report. > > > On Thu, Jun 5, 2014 at 6:56 PM, Timur Ismagilov >

Re: [OMPI users] Problem with yoda component in oshmem.

2014-06-06 Thread Mike Dubman
could you please provide command line ? On Fri, Jun 6, 2014 at 10:56 AM, Timur Ismagilov wrote: > Hello! > > I am using Open MPI v1.8.1 in > example program hello_oshmem.cpp. > > When I put spml_ikrit_np = 1000 (more than 4) and run task on 4 (2,1) > nodes, I get an: > in out file: > No availa

Re: [OMPI users] spml_ikrit_np random values

2014-06-05 Thread Mike Dubman
seems oshmem_info uses uninitialized value. we will check it, thanks for report. On Thu, Jun 5, 2014 at 6:56 PM, Timur Ismagilov wrote: > Hello! > > I am using Open MPI v1.8.1. > > $oshmem_info -a --parsable | grep spml_ikrit_np > > mca:spml:ikrit:param:spml_ikrit_np:value:1620524368 (alwase n

Re: [OMPI users] Deadly warning "Epoll ADD(4) on fd 2 failed." ?

2014-05-28 Thread Mike Dubman
I think it comes from PMI API used by OMPI/SLURM. SLURM`s libpmi is trying to control stdout/stdin which is already controlled by OMPI. On Tue, May 27, 2014 at 8:31 PM, Ralph Castain wrote: > I'm unaware of any OMPI error message like that - might be caused by > something in libevent as that co

Re: [OMPI users] no ikrit component of in oshmem

2014-04-23 Thread Mike Dubman
Hi Timur, What "configure" line you used? ikrit could be compile-it if no "--with-mxm=/opt/mellanox/mxm" was provided. Can you please attach your config.log? Thanks On Wed, Apr 23, 2014 at 3:10 PM, Тимур Исмагилов wrote: > Hi! > I am trying to build openmpi 1.8 with Open SHMEM and Mellanox M

Re: [OMPI users] probable bug in 1.9a1r31409

2014-04-16 Thread Mike Dubman
Hi, I committed your patch to the trunk. thanks M On Wed, Apr 16, 2014 at 6:49 PM, Mike Dubman wrote: > +1 > looks good. > > > On Wed, Apr 16, 2014 at 4:35 PM, Åke Sandgren > wrote: > >> On 04/16/2014 02:25 PM, Åke Sandgren wrote: >> >>> Hi! >>&g

Re: [OMPI users] probable bug in 1.9a1r31409

2014-04-16 Thread Mike Dubman
+1 looks good. On Wed, Apr 16, 2014 at 4:35 PM, Åke Sandgren wrote: > On 04/16/2014 02:25 PM, Åke Sandgren wrote: > >> Hi! >> >> Found this problem when building r31409 with Pathscale 5.0 >> >> pshmem_barrier.c:81:6: error: redeclaration of 'pshmem_barrier_all' must >> have the 'overloadable' at

Re: [OMPI users] one more finding in openmpi-1.7.5a1

2014-02-14 Thread Mike Dubman
Thanks for prompt help. Could you please resent the patch as attachment which can be applied with "patch" command, my mail client messes long lines. On Fri, Feb 14, 2014 at 7:40 AM, wrote: > > > Thanks. I'm not familiar with mindist mapper. But obviously > checking for ORTE_MAPPING_BYDIST is mi

Re: [OMPI users] one more finding in openmpi-1.7.5a1

2014-02-14 Thread Mike Dubman
Hi, after this patch we get this in jenkins: *07:03:15* [vegas12.mtr.labs.mlnx:01646] [[26922,0],0] ORTE_ERROR_LOG: Not implemented in file rmaps_mindist_module.c at line 391*07:03:15* [vegas12.mtr.labs.mlnx:01646] [[26922,0],0] ORTE_ERROR_LOG: Not implemented in file base/rmaps_base_map_job.c at

Re: [OMPI users] Can't build openmpi-1.6.5 with latest FCA 2.5 release.

2014-01-31 Thread Mike Dubman
Hi, Can it be that libibmad/libibumad installed on your system belongs to previous mofed installation? Thanks M. On Jan 31, 2014 2:02 AM, "Brock Palen" wrote: > I grabbed the latest FCA release from Mellnox's website. We have been > building against FCA 2.5 for a while, but it never worked righ

Re: [OMPI users] Get your Open MPI schwag!

2013-10-25 Thread Mike Dubman
While I enjoy the > enthusiasm, I actually suspect we would get into trouble using Chuck > Norris' name without first obtaining his permission. > > On Oct 25, 2013, at 2:28 AM, Mike Dubman wrote: > > ok, so - here is a final proposal: > > front: > small OMPI logo,

Re: [OMPI users] Get your Open MPI schwag!

2013-10-25 Thread Mike Dubman
ice. > > :-) > > Damien > > > On 23/10/2013 4:26 PM, Shamis, Pavel wrote: > >> +1 for Chuck Norris >> >> Pavel (Pasha) Shamis >> --- >> Computer Science Research Group >> Computer Science and Math Division >> Oak Ridge National Labora

Re: [OMPI users] Get your Open MPI schwag!

2013-10-23 Thread Mike Dubman
maybe to add some nice/funny slogan on the front under the logo, and cool picture on the back. some of community members are still in early twenties (and counting) . :) shall we open a contest for good slogan to put? and mid-size pict to put on the back side? - living the parallel world - iO

Re: [OMPI users] Big job, InfiniBand, MPI_Alltoallv and ibv_create_qp failed

2013-07-31 Thread Mike Dubman
Hi, What OFED vendor and version do you use? Regards M On Tue, Jul 30, 2013 at 8:42 PM, Paul Kapinos wrote: > Dear Open MPI experts, > > An user at our cluster has a problem running a kinda of big job: > (- the job using 3024 processes (12 per node, 252 nodes) runs fine) > - the job using 4032 p

Re: [OMPI users] max. message size

2013-07-17 Thread Mike Dubman
do you use IB as a transport? max message size in IB/RDMA is limited to 2G, but OMPI 1.7 splits large buffers during RDMA into 2G chunks. On Wed, Jul 17, 2013 at 11:51 AM, mohammad assadsolimani < m.assadsolim...@jesus.ch> wrote: > > Dear all, > > I do my PhD in physics and use a program, whi

Re: [OMPI users] using the xrc queues

2013-07-09 Thread Mike Dubman
Hi, I would suggest use MXM (part of mofed, can be downloaded as standalone rpm from http://mellanox.com/products/mxm for ofed) It uses UD (constant memory footprint) and should provide good performance. The next MXM v2.0 will support RC and DC (reliable UD) as well. Once mxm is installed from rp

Re: [OMPI users] EXTERNAL: Re: Need advice on performance problem

2013-06-12 Thread Mike Dubman
Also, what ofed version (ofed_info -s) and mxm version (rpm -qi mxm) do you use? On Wed, Jun 12, 2013 at 3:30 AM, Ralph Castain wrote: > Great! Would you mind showing the revised table? I'm curious as to the > relative performance. > > > On Jun 11, 2013, at 4:53 PM, eblo...@1scom.net wrote: > >

Re: [OMPI users] Using Service Levels (SLs) with OpenMPI 1.6.4 + MLNX_OFED 2.0

2013-06-11 Thread Mike Dubman
--mca btl_openib_ib_path_record_**service_level 1 flag controls openib btl, you need to remove --mca mtl mxm from command line. Have you compiled OpenMPI with rhel6.4 inbox ofed driver? AFAIK, the MOFED 2.x does not have XRC and you mentioned "--enable-openib-connectx-xrc" flag in configure. O

Re: [OMPI users] OMPI 1.6.3, InfiniBand and MTL MXM; unable to make it work!

2013-01-19 Thread Mike Dubman
Also, what MOFED/OFED version do you have? MXM is compiled per OFED/MOFED version, is there match between active ofed and mxm.rpm selected? On Thu, Jan 17, 2013 at 4:09 PM, Francesco Simula < francesco.sim...@roma1.infn.it> wrote: > I tried building from OMPI 1.6.3 tarball with the following ./co

Re: [OMPI users] OMPI 1.6.3, InfiniBand and MTL MXM; unable to make it work!

2013-01-19 Thread Mike Dubman
Hi Francesco, Can you please provide complete output from ibv_devinfo -v command? Also, it seems that you have Centos 5.8 with mxm/centos5.7 installed, will check if there is a distro version incompatibilities which may cause it and update you. Alina/Josh - please follow. Regards M On Thu, Jan 1

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-03 Thread Mike Dubman
Please download http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar, it contains mxm.rpm for mofed 1.5.4.1 On Mon, Dec 3, 2012 at 8:18 AM, Mike Dubman wrote: > ohh.. you have MOFED 1.5.4.1, thought it was 1.5.3-3.1.0 > will provide you a link to mxm package compiled with this

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-03 Thread Mike Dubman
ohh.. you have MOFED 1.5.4.1, thought it was 1.5.3-3.1.0 will provide you a link to mxm package compiled with this MOFED version (thanks to no ABI in OFED). On Sun, Dec 2, 2012 at 10:04 PM, Joseph Farran wrote: > 1.5.4.1

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Mike Dubman
please redownload from http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar it contains binaries compiled with mofed 1.5.3-3.1.0 M On Sun, Dec 2, 2012 at 12:13 PM, Mike Dubman wrote: > > It seems that your active mofed is 1.5.3-3.1.0, while installed mxm was > compiled with 1.

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Mike Dubman
\ >> --enable-openib-connectx-xrc\ >> --enable-mpi-thread-multiple\ >> --with-threads \ >> --with-hwloc\ >> --enable-heterogeneous \ >> --wi

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Mike Dubman
a=/opt/mellanox/fca\ > --with-mxm-libdir=/opt/mellanox/mxm/lib \ > --with-mxm=/opt/mellanox/mxm\ > --prefix=/data/openmpi-1-6.3 > > Please advise, > Joseph > > > > > > > On 12/1/2012 11:39 PM, Mike Dubman wrote: > > Hi Jos

Re: [OMPI users] OpenMPI-1.6.3 & MXM

2012-12-02 Thread Mike Dubman
Hi, The mxm which is part of MOFED 1.5.3 supports OMPI 1.6.0. The mxm upgrade is needed to work with OMPI 1.6.3+ Please remove mxm from your cluster nodes (rpm -e mxm) Install latest from http://mellanox/com/products/mxm/ Compile ompi 1.6.3, add following to its configure line: ./configure --wi

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-12-02 Thread Mike Dubman
Hi Joseph, I guess you install MOFED under /usr, is that right? Could you please specify "--with-openib=/usr" parameter during ompi "configure" stage? 10x M On Fri, Nov 30, 2012 at 1:11 AM, Joseph Farran wrote: > Hi YK: > > Yes, I have those installed but they are newer versions: > > # rpm -qa |

Re: [OMPI users] CentOS 6.3 & OpenMPI 1.6.3

2012-11-28 Thread Mike Dubman
You need mxm-1.1.3a5e745-1.x86_64-**rhel6u3.rpm On Wed, Nov 28, 2012 at 7:44 PM, Joseph Farran wrote: > mxm-1.1.3a5e745-1.x86_64-**rhel6u3.rpm >

Re: [OMPI users] application with mxm hangs on startup

2012-08-24 Thread Mike Dubman
Hi, Could you please download latest mxm from http://www.mellanox.com/products/mxm/ and retry? The mxm version which comes with OFED 1.5.3 was tested with OMPI 1.6.0. Regards M On Wed, Aug 22, 2012 at 2:22 PM, Pavel Mezentsev wrote: > I've tried to launch the application on nodes with QDR Infini

Re: [OMPI users] ompi mca mxm version

2012-05-11 Thread Mike Dubman
17820.58 > 524288 4604.16 8781.74 > 1048576 4635.51 4420.77 > 2097152 3575.17 1704.78 > 4194304 2828.19674.29 > > Thanks! > > -[dg] > > Derek Gerstmann, PhD Student &

Re: [OMPI users] ompi mca mxm version

2012-05-09 Thread Mike Dubman
you need latest OMPI 1.6.x and latest MXM ( ftp://bgate.mellanox.com/hpc/mxm/v1.1/mxm_1.1.1067.tar) On Wed, May 9, 2012 at 6:02 AM, Derek Gerstmann wrote: > What versions of OpenMPI and the Mellanox MXM libraries have been tested > and verified to work? > > We are currently trying to build Open

Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision

2012-01-26 Thread Mike Dubman
so far did not happen yet - will report if it does. On Tue, Jan 24, 2012 at 5:10 PM, Jeff Squyres wrote: > Ralph's fix has now been committed to the v1.5 trunk (yesterday). > > Did that fix it? > > > On Jan 22, 2012, at 3:40 PM, Mike Dubman wrote: > > > it

Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision

2012-01-22 Thread Mike Dubman
it was compiled with the same ompi. We see it occasionally on different clusters with different ompi folders. (all v1.5) On Thu, Jan 19, 2012 at 5:44 PM, Ralph Castain wrote: > I didn't commit anything to the v1.5 branch yesterday - just the trunk. > > As I told Mike off-list, I think it may hav

Re: [OMPI users] Possible bug in finalize, OpenMPI v1.5, head revision

2012-01-17 Thread Mike Dubman
It happens for us on RHEL 6.0 On Tue, Jan 17, 2012 at 3:46 AM, Ralph Castain wrote: > Well, I'm afraid I can't replicate your report. It runs fine for me. > > Sent from my iPad > > On Jan 16, 2012, at 4:25 PM, Ralph Castain wrote: > > > Hprobably a bug. I haven't tested that branch yet.

Re: [OMPI users] MPI_Send doesn't work if the data >= 2GB

2010-12-06 Thread Mike Dubman
Hi, What interconnect and command line do you use? For InfiniBand openib component there is a known issue with large transfers (2GB) https://svn.open-mpi.org/trac/ompi/ticket/2623 try disabling memory pinning: http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned regards

Re: [OMPI users] openib issues

2010-08-10 Thread Mike Dubman
Hey Eloi, What HCA card do you have ? Can you post code/instructions howto reproduce it? 10x Mike On Mon, Aug 9, 2010 at 5:22 PM, Eloi Gaudry wrote: > Hi, > > Could someone have a look on these two different error messages ? I'd like > to know the reason(s) why they were displayed and their act

[OMPI users] Error: system limit exceeded on number of pipes that can be open

2009-08-11 Thread Mike Dubman
Hello guys, When executing following command with mtt and ompi 1.3.3: mpirun --host witch15,witch15,witch15,witch15,witch16,witch16,witch16,witch16,witch17,witch17,witch17,witch17,witch18,witch18,witch18,witch18,witch19,witch19,witch19,witch19 -np 20 --mca btl_openib_use_srq 1 --mca btl self

Re: [OMPI users] 1.3.1 -rf rankfile behaviour ??

2009-07-16 Thread Mike Dubman
Hello Ralph, It seems that Option2 is preferred, because it is more intuitive for end-user to create rankfile for mpi job, which is described by -app cmd line. All hosts definitions used inside -app , will be treated like a single global hostlist combined from all hosts appearing inside "-app fil

Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?

2008-10-22 Thread Mike Dubman
using 2 HCAs on the same PCI-Exp bus (as well as 2 ports from the same HCA) will not improve performance, PCI-Exp is the bottleneck. On Mon, Oct 20, 2008 at 2:28 AM, Mostyn Lewis wrote: > Well, here's what I see with the IMB PingPong test using two ConnectX DDR > cards > in each of 2 machines.