Re: [OMPI users] Issues with different IB adapters and openmpi 2.0.2

2017-02-27 Thread Howard Pritchard
Hi Orion Does the problem occur if you only use font2 and 3? Do you have MXM installed on the font1 node? The 2.x series is using PMIX and it could be that is impacting the PML sanity check. Howard Orion Poplawski schrieb am Mo. 27. Feb. 2017 um 14:50: > We have a

[OMPI users] Issues with different IB adapters and openmpi 2.0.2

2017-02-27 Thread Orion Poplawski
We have a couple nodes with different IB adapters in them: font1/var/log/lspci:03:00.0 InfiniBand [0c06]: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] [15b3:6274] (rev 20) font2/var/log/lspci:03:00.0 InfiniBand [0c06]: QLogic Corp. IBA7220 InfiniBand HCA [1077:7220] (rev 02)

Re: [OMPI users] Fwd: [OMPI USERS] Fault Tolerance and migration

2017-02-27 Thread George Bosilca
Alberto, In the master there is no such support (we had support for migration a while back, but we have stripped it out). However, at UTK we developed a fork of Open MPI, called ULFM, which provides fault management capabilities. This fork provides support to detect failures, and support for

[OMPI users] Does MPI_Iallreduce work with CUDA-Aware in OpenMPI-2.0.2?

2017-02-27 Thread Junjie Qian
Hi list, I would like to know if MPI_Iallreduce is supported with cuda-aware in openMPI-2.0.2? The page https://www.open-mpi.org/faq/?category=runcuda, updated on 06/2016, says not supported until openmpi--1.8.5. Any updates on this? Thank you Junjie Qian

Re: [OMPI users] fatal error with openmpi-2.1.0rc1 on Linux with Sun C

2017-02-27 Thread Josh Hursey
Drat! Thanks for letting us know. That fix was missed when we swept through to create the PMIx v1.2.1 - which triggered the OMPI v2.1.0rc1. Sorry about that :( Jeff filed an Issue to track this here: https://github.com/open-mpi/ompi/issues/3048 I've filed a PR against PMIx to bring it into the

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org
> On Feb 27, 2017, at 9:39 AM, Reuti wrote: > > >> Am 27.02.2017 um 18:24 schrieb Angel de Vicente : >> >> […] >> >> For a small group of users if the DVM can run with my user and there is >> no restriction on who can use it or if I somehow can

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Angel de Vicente
Hi, Reuti writes: > At first I thought you want to run a queuing system inside a queuing > system, but this looks like you want to replace the resource manager. yes, if this could work reasonably well, we could do without the resource manager. > Under which user

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Reuti
> Am 27.02.2017 um 18:24 schrieb Angel de Vicente : > > […] > > For a small group of users if the DVM can run with my user and there is > no restriction on who can use it or if I somehow can authorize others to > use it (via an authority file or similar) that should be enough.

[OMPI users] Fwd: [OMPI USERS] Fault Tolerance and migration

2017-02-27 Thread Alberto Ortiz
Hi, I am interested in using OpenMPI to manage the distribution on a MicroZed cluster. This MicroZed boards come with a Zynq device, which has a dual-core ARM cortex A9. One of the objectives of the project I am working on is resilience, so I am trully interested in the fault tolerance provided by

[OMPI users] fatal error with openmpi-2.1.0rc1 on Linux with Sun C

2017-02-27 Thread Siegmar Gross
Hi, I tried to install openmpi-2.1.0rc1 on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.14. Unfortunately, "make" breaks with the following error. I had reported the same problem for openmpi-master-201702150209-404fe32. Gilles was able to solve the problem

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Reuti
Hi, > Am 27.02.2017 um 14:33 schrieb Angel de Vicente : > > Hi, > > "r...@open-mpi.org" writes: >>> With the DVM, is it possible to keep these jobs in some sort of queue, >>> so that they will be executed when the cores get free? >> >> It wouldn’t be hard to

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Angel de Vicente
Hi, "r...@open-mpi.org" writes: >> With the DVM, is it possible to keep these jobs in some sort of queue, >> so that they will be executed when the cores get free? > > It wouldn’t be hard to do so - as long as it was just a simple FIFO > scheduler. I wouldn’t want it to get

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org
> On Feb 27, 2017, at 4:58 AM, Angel de Vicente wrote: > > Hi, > > "r...@open-mpi.org" writes: >> You might want to try using the DVM (distributed virtual machine) >> mode in ORTE. You can start it on an allocation using the “orte-dvm” >> cmd, and then submit

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread Angel de Vicente
Hi, "r...@open-mpi.org" writes: > You might want to try using the DVM (distributed virtual machine) > mode in ORTE. You can start it on an allocation using the “orte-dvm” > cmd, and then submit jobs to it with “mpirun --hnp ”, where foo > is either the contact info printed out