Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-12 Thread Nate Chambers
*I appreciate you trying to help! I put the Java and its compiled .class file on Dropbox. The directory contains the .java and .class files, as well as a data/ directory:* http://www.dropbox.com/sh/pds5c5wecfpb2wk/AAAcz17UTDQErmrUqp2SPjpqa?dl=0 *You can run it with and without MPI:* > java

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Deva
do you have "-disable-dlopen" in your configure option? This might force coll_ml to be loaded first even with -mca coll ^ml. next HPCX is expected to release by end of Aug. -Devendar On Wed, Aug 12, 2015 at 3:30 PM, David Shrader wrote: > I remember seeing those, but forgot

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread David Shrader
I remember seeing those, but forgot about them. I am curious, though, why using '-mca coll ^ml' wouldn't work for me. We'll watch for the next HPCX release. Is there an ETA on when that release may happen? Thank you for the help! David On 08/12/2015 04:04 PM, Deva wrote: David, This is

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Deva
David, This is because of hcoll symbols conflict with ml coll module inside OMPI. HCOLL is derived from ml module. This issue is fixed in hcoll library and will be available in next HPCX release. Some earlier discussion on this issue:

Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-12 Thread Howard Pritchard
Hi Nate, Sorry for the delay in getting back to you. We're somewhat stuck on how to help you, but here are two suggestions. Could you add the following to your launch command line --mca odls_base_verbose 100 so we can see exactly what arguments are being feed to java when launching your app.

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread David Shrader
The admin that rolled the hcoll rpm that we're using (and got it in system space) said that she got it from hpcx-v1.3.336-gcc-OFED-1.5.4.1-redhat6.6-x86_64.tar. Thanks, David On 08/12/2015 10:51 AM, Deva wrote: From where did you grab this HCOLL lib? MOFED or HPCX? what version? On Wed,

Re: [OMPI users] Problem in using openmpi-1.8.7

2015-08-12 Thread Jeff Squyres (jsquyres)
This is likely because you installed Open MPI 1.8.7 into the same directory as a prior Open MPI installation. You probably want to uninstall the old version first (e.g., run "make uninstall" from the old version's build tree), or just install 1.8.7 into a new tree. > On Aug 11, 2015, at

Re: [OMPI users] CUDA Buffers: Enforce asynchronous memcpy's

2015-08-12 Thread Rolf vandeVaart
Hi Geoff: Our original implementation used cuMemcpy for copying GPU memory into and out of host memory. However, what we learned is that the cuMemcpy causes a synchronization for all work on the GPU. This means that one could not overlap very well running a kernel and doing communication.

Re: [OMPI users] CUDA Buffers: Enforce asynchronous memcpy's

2015-08-12 Thread Geoffrey Paulsen
I'm confused why this application needs an asynchronous cuMemcpyAsync()in a blocking MPI call.   Rolf could you please explain?And how does is a call to cuMemcpyAsync() followed by a syncronization any different than a cuMemcpy() in this use case?   I would still expect that if the MPI_Send / Recv

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Deva
>From where did you grab this HCOLL lib? MOFED or HPCX? what version? On Wed, Aug 12, 2015 at 9:47 AM, David Shrader wrote: > Hey Devendar, > > It looks like I still get the error: > > [dshrader@zo-fe1 tests]$ mpirun -n 2 -mca coll ^ml ./a.out > App launch reported: 1 (out

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread David Shrader
Hey Devendar, It looks like I still get the error: Konsole output [dshrader@zo-fe1 tests]$ mpirun -n 2 -mca coll ^ml ./a.out App launch reported: 1 (out of 1) daemons - 2 (out of 2) procs [1439397957.351764] [zo-fe1:14678:0] shm.c:65 MXM WARN Could not open the KNEM device file at

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Deva
Hi David, This issue is from hcoll library. This could be because of symbol conflict with ml module. This is fixed recently in HCOLL. Can you try with "-mca coll ^ml" and see if this workaround works in your setup? -Devendar On Wed, Aug 12, 2015 at 9:30 AM, David Shrader

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread David Shrader
Hello Gilles, Thank you very much for the patch! It is much more complete than mine. Using that patch and re-running autogen.pl, I am able to build 1.8.8 with './configure --with-hcoll' without errors. I do have issues when it comes to running 1.8.8 with hcoll built in, however. In my quick

Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-12 Thread Gilles Gouaillardet
basically, without --hetero-nodes, ompi assumes all nodes have the same topology (fast startup) with --hetero-nodes, ompi does not assume anything and request node topology (slower startup) I am nor sure if this is still 100% true on all versions. iirc, at least on master, a hwloc signature is

Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-12 Thread Dave Love
"Lane, William" writes: > I can successfully run my OpenMPI 1.8.7 jobs outside of Son-of-Gridengine but > not via qrsh. We're > using CentOS 6.3 and a heterogeneous cluster of hyperthreaded and > non-hyperthreaded blades > and x3550 chassis. OpenMPI 1.8.7 has been built

Re: [OMPI users] What Red Hat Enterprise/CentOS NUMA libraries are recommended/required for OpenMPI?

2015-08-12 Thread Dave Love
"Jeff Squyres (jsquyres)" writes: > I think Dave's point is that numactl-devel (and numactl) is only needed for > *building* Open MPI. Users only need numactl to *run* Open MPI. Yes. However, I guess the basic problem is that the component fails to load for want of

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Gilles Gouaillardet
Thanks David, i made a PR for the v1.8 branch at https://github.com/open-mpi/ompi-release/pull/492 the patch is attached (it required some back-porting) Cheers, Gilles On 8/12/2015 4:01 AM, David Shrader wrote: I have cloned Gilles' topic/hcoll_config branch and, after running autogen.pl,