Re: [OMPI users] fatal error: openmpi-v2.x-dev-415-g5c9b192 andopenmpi-dev-2696-gd579a07

2015-10-14 Thread Gilles Gouaillardet
ming you are using a bash shell, you can simply do CPPFLAGS="" configure ... instead of configure ... CPPFLAGS="" Cheers, Gilles On 10/7/2015 4:42 PM, Siegmar Gross wrote: Hi, I tried to build openmpi-v2.x-dev-415-g5c9b192 and openmpi-dev-2696-gd579a07 on my machines (Solaris 1

[OMPI users] python, mpi and shell subprocess: orte_error_log

2015-10-08 Thread Gilles Gouaillardet
such as clustershell. it is written in python, and an api is likely available. Cheers, Gilles On Thursday, October 8, 2015, simona bellavista > wrote: > > > 2015-10-07 14:59 GMT+02:00 Lisandro Dalcin : > >> On 7 October 2015 at 14:54, simona bellavista wrote: >> >

Re: [OMPI users] OpenMPI 1.8.8: Segfault when using non-blocking reduce operations with a user-defined operator

2015-10-07 Thread Gilles Gouaillardet
Georg, there won't be a 1.8.9 Cheers, Gilles On Wednesday, October 7, 2015, Georg Geiser wrote: > Nathan, > > thanks for your rapid response. Do you consider to release 1.8.9? > Actually, there is a bug tracking category for that version number. If so, > please backpo

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-07 Thread Gilles Gouaillardet
Jeff, there are quite a lot of changes, I did not update master yet (need extra pairs of eyes to review this...) so unless you want to make rc2 today and rc3 a week later, it is imho way safer to wait for v1.10.2 Ralph, any thoughts ? Cheers, Gilles On Wednesday, October 7, 2015, Jeff Squyres

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-07 Thread Gilles Gouaillardet
Marcin, here is a patch for the master, hopefully it fixes all the issues we discussed i will make sure it applies fine vs latest 1.10 tarball from tomorrow Cheers, Gilles On 10/6/2015 7:22 PM, marcin.krotkiewski wrote: Gilles, Yes, it seemed that all was fine with binding in the patched

Re: [OMPI users] MPI_CART_CREATE no matching specific subroutine

2015-10-06 Thread Gilles Gouaillardet
Hector, numprocs and .false. are scalars and MPI_Cart_create expects one dimension array. can you fix this and try again ? Cheers, Gilles On Wednesday, October 7, 2015, Hector E Barrios Molano wrote: > Hi Open MPI Experts! > > I'm using OpenMPI v1.10.0 and get this er

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-06 Thread Gilles Gouaillardet
the required mapping policy I will finalize them tomorrow hopefully Cheers, Gilles On Tuesday, October 6, 2015, marcin.krotkiewski < marcin.krotkiew...@gmail.com> wrote: > Hi, Gilles > > you mentionned you had one failure with 1.10.1rc1 and -bind-to core > could you please sen

Re: [OMPI users] Hybrid OpenMPI+OpenMP tasks using SLURM

2015-10-06 Thread Gilles Gouaillardet
--ntasks=2 --cpus-per-task=4 --cpu_bind=core,verbose -l grep Cpus_allowed_list /proc/self/status Cheers, Gilles On 10/6/2015 4:38 AM, marcin.krotkiewski wrote: Yet another question about cpu binding under SLURM environment.. Short version: will OpenMPI support SLURM_CPUS_PER_TASK for the

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread Gilles Gouaillardet
, Gilles On Monday, October 5, 2015, marcin.krotkiewski wrote: > > I have applied the patch to both 1.10.0 and 1.10.1rc1. For 1.10.0 it did > not help - I am not sure how much (if) you want pursue this. > > For 1.10.1rc1 I was so far unable to reproduce any binding problems with

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread Gilles Gouaillardet
Ralph and Marcin, here is a proof of concept for a fix (assert should be replaced with proper error handling) for v1.10 branch. if you have any chance to test it, please let me know the results Cheers, Gilles On 10/5/2015 1:08 PM, Gilles Gouaillardet wrote: OK, i'll see what i c

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-05 Thread Gilles Gouaillardet
OK, i'll see what i can do :-) On 10/5/2015 12:39 PM, Ralph Castain wrote: I would consider that a bug, myself - if there is some resource available, we should use it On Oct 4, 2015, at 5:42 PM, Gilles Gouaillardet <mailto:gil...@rist.or.jp>> wrote: Marcin, i ran a si

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-04 Thread Gilles Gouaillardet
2 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..] Hello, world, I am 1 of 3, (Open MPI v1.10.1rc1, package: Open MPI gilles@rapid Distribution, ident: 1.10.1rc1, repo rev: v1.10.0-84-g15ae63f, Oct 03, 2015, 128) Hello, world, I am 2 of 3, (Open MPI v1.10.1rc

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-04 Thread Gilles Gouaillardet
investigate this from tomorrow Cheers, Gilles On Sunday, October 4, 2015, Ralph Castain wrote: > Thanks - please go ahead and release that allocation as I’m not going to > get to this immediately. I’ve got several hot irons in the fire right now, > and I’m not sure when I’ll get a chance

Re: [OMPI users] Process binding with SLURM and 'heterogeneous' nodes

2015-10-03 Thread Gilles Gouaillardet
--oversubscribe --hetero-nodes, mpirun should not fail, and if it still fails with v1.10.1rc1, I will ask some more details in order to fix ompi Cheers, Gilles On Saturday, October 3, 2015, Ralph Castain wrote: > Thanks Marcin. Looking at this, I’m guessing that Slurm may be treating >

Re: [OMPI users] send_request error with allocate

2015-09-30 Thread Gilles Gouaillardet
and irecv with the second element, and then waitall the array of size 2 note this is not equivalent to doing two MPI_Wait in a row, since that would be prone to deadlock Cheers, Gilles On Wednesday, September 30, 2015, Diego Avesani wrote: > Dear all, > thank for the explanation, but som

[OMPI users] send_request error with allocate

2015-09-30 Thread Gilles Gouaillardet
MPi_Comm_rank, that makes no sense to me, and your indexes should likely be hard coded from 1 to 3 Cheers, Gilles On Wednesday, September 30, 2015, Diego Avesani > wrote: > Dear Gilles, Dear All, > > What do you mean that the array of requests has to be initialize via > MPI_Isend or MPI_I

Re: [OMPI users] send_request error with allocate

2015-09-29 Thread Gilles Gouaillardet
actually used, then you have to initialize them with MPI_REQUEST_NULL (it might be zero on ompi, but you cannot take this for granted) Cheers, Gilles On Tuesday, September 29, 2015, Diego Avesani wrote: > dear Jeff, dear all, > I have notice that if I initialize the variables, I do not ha

Re: [OMPI users] Missing pointer in MPI_Request / MPI_Ibarrier in documentation for 1.10.0

2015-09-28 Thread Gilles Gouaillardet
Harald, thanks for the clarification, i clearly missed that ! i will fix it from now Cheers, Gilles On 9/28/2015 4:49 PM, Harald Servat wrote: Hello Gilles, the webpages I pointed in the original mail and which are the official open-mpi.org, miss the * in the declaration of MPI_Ibarrier

Re: [OMPI users] Missing pointer in MPI_Request / MPI_Ibarrier in documentation for 1.10.0

2015-09-27 Thread Gilles Gouaillardet
Harald, could you be more specific ? btw, do you check the www.open-mpi.org main site or a mirror ? the man pages looks good to me, and the issue you described was fixed one month ago. Cheers, Gilles On 9/25/2015 8:07 PM, Harald Servat wrote: Dear all, I'd like to note you tha

Re: [OMPI users] Problem using Open MPI 1.10.0 built with Intel compilers 16.0.0

2015-09-24 Thread Gilles Gouaillardet
Jeff, I am not sure whether you made a typo or not ... the issue only occuex with f90 bindings (aka use mpi) f08 bindings (aka use mpi_f08) works fine Cheers, Gilles On Thursday, September 24, 2015, Jeff Squyres (jsquyres) wrote: > I looked into the MPI_BCAST problem -- I think we (Open

Re: [OMPI users] Problem with implementation of Foxa algorithm

2015-09-23 Thread Gilles Gouaillardet
int root, MPI_Comm comm) so you have recvbuf = 0 (!) recvcount = tmpVar[i*matrixSize] i guess you meant to have recvcount = blockSize that being said, tmpVar[i*matrixSize] is and int and it should likely be an double * Cheers, Gilles On 9/24/2015 8:13 AM, Surivinta Surivinta wrote:

Re: [OMPI users] Problem using Open MPI 1.10.0 built with Intel compilers 16.0.0

2015-09-23 Thread Gilles Gouaillardet
program compiles and runs just fine if you use mpi_f08 module (!) Cheers, Gilles On 9/24/2015 1:00 AM, Fabrice Roy wrote: program testmpi use mpi implicit none integer :: pid integer :: ierr integer :: tok call mpi_init(ierr) call mpi_comm_rank(mpi_comm_world

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-21 Thread Gilles Gouaillardet
remote start orted) instead of this silent behaviour Cheers, Gilles On Mon, Sep 21, 2015 at 11:43 PM, Patrick Begou wrote: > Hi Gilles, > > I've done a big mistake! Compiling the patched version of openMPI and > creating a new module, I've forgotten to add the path to oar

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-18 Thread Gilles Gouaillardet
r, there is a reachable framework. Could/should the tcp btl simply use it ? Cheers, Gilles On Saturday, September 19, 2015, Jeff Squyres (jsquyres) wrote: > Open MPI uses different heuristics depending on whether IP addresses are > public or private. > > All your IP addresses are

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-18 Thread Gilles Gouaillardet
you use a machine file (frog.txt) instead of using $OAR_NODEFILE directly ? /* not to mention I am surprised a French supercomputer is called "frog" ;-) */ Cheers, Gilles On Friday, September 18, 2015, Patrick Begou < patrick.be...@legi.grenoble-inp.fr> wrote: > Gilles

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-18 Thread Gilles Gouaillardet
of ssh) my concern is the remote orted might not run within the cpuset that was created by OAR for this job, so you might end up using all the cores on the remote nodes. please let us know how that works for you Cheers, Gilles On 9/18/2015 5:02 PM, Gilles Gouaillardet wrote: Patrick, i

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-18 Thread Gilles Gouaillardet
Patrick, i just filled PR 586 https://github.com/open-mpi/ompi-release/pull/586 for the v1.10 series this is only a three line patch. could you please give it a try ? Cheers, Gilles On 9/18/2015 4:54 PM, Patrick Begou wrote: Ralph Castain wrote: As I said, if you don’t provide an explicit

Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-17 Thread Gilles Gouaillardet
same network for example, on node 1, you can run route add -host 23.0.0.2 gw 12.0.0.2 route add -host 23.0.0.3 gw 13.0.0.3 Cheers, Gilles On 9/18/2015 1:31 AM, Shang Li wrote: Hi all, I wanted to setup a 3-node ring network, each connects to the other 2 using 2 Ethernet ports directly without a

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-17 Thread Gilles Gouaillardet
invokation is correct ? an other way to fix this could be to always set opal_hwloc_base_cpu_set Cheers, Gilles On 9/16/2015 11:47 PM, Ralph Castain wrote: As I said, if you don’t provide an explicit slot count in your hostfile, we default to allowing oversubscription. We don’t have OAR integration

Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-15 Thread Gilles Gouaillardet
pi or hwloc, or ompi should ask the correct info to hwloc if it is available in hwloc. makes sense ? Brice, can you please comment on hwloc and cpuset ? Cheers, Gilles On Wednesday, September 16, 2015, Ralph Castain wrote: > Not precisely correct. It depends on the environment. >

Re: [OMPI users] OpenMPI 1.10.0 and old SUSE SLES 11 SP3

2015-09-14 Thread Gilles Gouaillardet
--disable-shared that being said, i am not sure you do need --enable-static. are mxm/fca/hcoll in your $LD_LIBRARY_PATH or ld.so.conf ? if not, i recommend you set your LD_LIBRARY_PATH *before* you configure and make openmpi Cheers, Gilles On Tue, Sep 15, 2015 at 8:14 AM, Filippo Spiga wrote

Re: [OMPI users] Package mpi does not exist

2015-09-14 Thread Gilles Gouaillardet
please use mpijavac instead of javac this will automagically set the classpath with the ompi java libraries. if there is no javac, it is likely you did not configure ompi with --enable-mpi-java On Monday, September 14, 2015, Ibrahim Ikhlawi wrote: > > Hi, > > I am beginner in OpenMPI. I want to

Re: [OMPI users] Help with Specific Binding

2015-09-13 Thread Gilles Gouaillardet
on linux, you can look at /proc/self/status and search allowed_cpus_list or you can use the sched_getaffinity system call note that in some (hopefully rare)cases, this will return different results than hwloc On Sunday, September 13, 2015, Saliya Ekanayake wrote: > Thank you, I'll try this. Als

Re: [OMPI users] openmpi-v1.10.0-5-ge0b85ea: problem with Java

2015-09-08 Thread Gilles Gouaillardet
Thanks Siegmar, at first glance, I suspect String.valueOf(buffer) buffer is 256 chars, but the message you really want to print is only the first num chars. I will double check tomorrow, in the mean time, feel free to revamp the test and see if it works better Cheers, Gilles On Tuesday

Re: [OMPI users] openmpi-v1.10.0-5-ge0b85ea: problem with Java

2015-09-08 Thread Gilles Gouaillardet
Siegmar, can you post your test program ? did you try to run the very same test with ompi configure'd without --enable-heterogeneous ? did this help ? can you reproduce the crash with the v2.x series ? Cheers, Gilles On Tuesday, September 8, 2015, Siegmar Gross < siegmar.gr...@infor

Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)

2015-09-03 Thread Gilles Gouaillardet
Diego, did you update your code to check all MPI calls are successful ? (e.g. test ierr is MPI_SUCCES after each MPI call) can you write a short program that reproduce the same issue ? if not, is your program and input data public ally available ? Cheers, Gilles On Thursday, September 3, 2015

Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)

2015-09-03 Thread Gilles Gouaillardet
x intelmpi and openmpi, or use openmpi with a lib built with intelmpi. Cheers, Gilles On Thursday, September 3, 2015, Diego Avesani wrote: > Dear Jeff, Dear all, > I normaly use "USE MPI" > > This is the answar fro intel HPC forum: > > *If you are switching between intel an

Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)

2015-09-02 Thread Gilles Gouaillardet
recommend you first do this, so you can catch the error as soon it happens, and hopefully understand why it occurs. Cheers, Gilles On Wednesday, September 2, 2015, Diego Avesani wrote: > Dear all, > > I have notice small difference between OPEN-MPI and intel MPI. > For example in M

Re: [OMPI users] OpenMPI optimizations for intra-node process communication

2015-09-01 Thread Gilles Gouaillardet
are used in production. Cheers, Gilles On 9/1/2015 1:57 PM, Saliya Ekanayake wrote: One more question. I found this blog from Jeff [1] on vader and I got the impression that it's used only for peer-to-peer communications and not for collectives. Is this true or did I misunderstand? [1]

Re: [OMPI users] OpenMPI optimizations for intra-node process communication

2015-09-01 Thread Gilles Gouaillardet
understand what the vader btl can do and how Cheers, Gilles On 9/1/2015 1:28 PM, Saliya Ekanayake wrote: Thank you Gilles. Is there some documentation on vader btl and how I can check which (sm or vader) is being used? On Tue, Sep 1, 2015 at 12:18 AM, Gilles Gouaillardet mailto:gil

Re: [OMPI users] OpenMPI optimizations for intra-node process communication

2015-09-01 Thread Gilles Gouaillardet
more optimized configurations. Cheers, Gilles On 9/1/2015 5:59 AM, Saliya Ekanayake wrote: Hi, Just trying to see if there are any optimizations (or options) in OpenMPI to improve communication between intra node processes. For example do they use something like shared memory? Thank you

Re: [OMPI users] MPI_LB in a recursive type

2015-08-28 Thread Gilles Gouaillardet
Roy, you can create your type without MPI_UB nor MPI_LB, and then use MPI_Type_create_resized to set lower bound and extent (note this sets extent and not upper bound) Cheers, Gilles On Saturday, August 29, 2015, Roy Stogner wrote: > > From: George Bosilca >> >> First an

Re: [OMPI users] ssh: Could not resolve hostname xxxx: Name or service not known (v1.8+)

2015-08-26 Thread Gilles Gouaillardet
tweaking with hostnames, so ssh node0198 ... really do ssh node0198.mako0 ... under the hood Cheers, Gilles On 8/27/2015 8:08 AM, Yong Qin wrote: Yes all cross-node ssh works perfectly and this is our production system which have been running for years. I've done all of these testing an

Re: [OMPI users] ssh: Could not resolve hostname xxxx: Name or service not known (v1.8+)

2015-08-26 Thread Gilles Gouaillardet
both work ? per your log, mpirun might remove the domain name from the ssh command under the hood. e.g. ssh n0189.mako0 ssh n0198 echo ok or ssh n0198 ssh n0198.mako0 echo ok if that is the case, then I have no idea why we are doing this ... Cheers, Gilles On Thursday, August 27, 2015, Yong

Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-14 Thread Gilles Gouaillardet
here. do you configure with --disable-dlopen on hopper ? I wonder whether --mca mtl ^psm is effective if dlopen is disabled Cheers, Gilles On Saturday, August 15, 2015, Howard Pritchard wrote: > Hi Jeff, > > I don't know why Gilles keeps picking on the persistent request proble

Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-14 Thread Gilles Gouaillardet
data/ please run mpirun --mca mtl ^psm -np 1 java MPITestBroke data/ that solved the issue for me Cheers, Gilles On 8/13/2015 9:19 AM, Nate Chambers wrote: *I appreciate you trying to help! I put the Java and its compiled .class file on Dropbox. The directory contains the .java and .class files

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-13 Thread Gilles Gouaillardet
David, i guess you do not want to use the ml coll module at all in openmpi 1.8.8 you can simply do touch ompi/mca/coll/ml/.ompi_ignore ./autogen.pl ./configure ... make && make install so the ml component is not even built Cheers, Gilles On 8/13/2015 7:30 AM, David Shrader

Re: [OMPI users] open mpi upgrade

2015-08-13 Thread Gilles Gouaillardet
irectory ? Cheers, Gilles On 8/13/2015 2:29 PM, Ehsan Moradi wrote: hi, my dear friends i tried to upgrade my openmpi version from 1.2.8 to 1.8.8 but after installing it on different directory "/opt/openmpi-1.8.8/" when i enter mpirun its version is 1.2.8 and after installing t

Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-12 Thread Gilles Gouaillardet
checked and ompi transparently fall back to --heyero-nodes if needed bottom line, on a heterogeneous cluster, it is required or safer to use the --hetero-nodes option Cheers, Gilles On Wednesday, August 12, 2015, Dave Love wrote: > "Lane, William" > writes: > > >

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Gilles Gouaillardet
Thanks David, i made a PR for the v1.8 branch at https://github.com/open-mpi/ompi-release/pull/492 the patch is attached (it required some back-porting) Cheers, Gilles On 8/12/2015 4:01 AM, David Shrader wrote: I have cloned Gilles' topic/hcoll_config branch and, after running autog

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-11 Thread Gilles Gouaillardet
nvoke autogen.pl) Cheers, Gilles On 8/11/2015 2:39 PM, Åke Sandgren wrote: Please fix the hcoll test (and code) to be correct. Any configure test that adds /usr/lib and/or /usr/include to any compile flags is broken. And if hcoll include files are under $HCOLL_HOME/include/hcoll (and hcol

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-10 Thread Gilles Gouaillardet
evamp the hcoll detection (e.g. configure --with-hcoll) but you might need to manually set CPPFLAGS='-I/usr/include/hcoll -I/usr/include/hcoll/api' if not, i guess i will simply update the configure help message ... Cheers, Gilles On 8/11/2015 7:39 AM, David Shrader wrote: H

Re: [OMPI users] Mumps Parallel version hanging with OpenMPI 1.8.1

2015-08-08 Thread Gilles Gouaillardet
on your report, we might recommend some tuning for the tuned module (as you can guess, the basic coll module is not optimized) Cheers, Gilles On Saturday, August 8, 2015, Ralph Castain wrote: > My first suggestion would be to try using 1.8.8 instead to get all the bug > fixes since 1.8

Re: [OMPI users] bad XRC API

2015-08-07 Thread Gilles Gouaillardet
, since i am not 100% convinced the test is bulletproof : if ofed version is different between compilation and runtime, let openmpi crash since this is a user-side problem. Cheers, Gilles On 8/6/2015 8:43 AM, Ralph Castain wrote: Yeah, I recall your earlier email on the subject. Sadly, I need

Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-03 Thread Gilles Gouaillardet
Nate, a similar issue has already been reported at https://github.com/open-mpi/ompi/issues/369, but we have not yet been able to figure out what is going wrong. right after MPI_Init(), can you add Thread.sleep(5000); and see if it helps ? Cheers, Gilles On 8/4/2015 8:36 AM, Nate Chambers

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Gilles Gouaillardet
That is a good sign, it means orted was started on both nodes strictly speaking, you should confirm both nodes appear 16 times each in the output, do you can draw any firm conclusion Cheers, Gilles On Monday, August 3, 2015, abhisek Mondal wrote: > I wrote 2 new node names(which I had

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Gilles Gouaillardet
simply replace nwchem with hostname both hosts should be part of the output... Cheers, Gilles On Sunday, August 2, 2015, abhisek Mondal wrote: > Jeff, Gilles > > Here's my scenario again when I tried something different: > I've interactively booked 2 nodes(cx1015 and

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Gilles Gouaillardet
application /.../bin/mpirun hostname and then nwchem if it still does not work, then run with verbose palm and post the output Cheets, Gilles On Sunday, August 2, 2015, abhisek Mondal wrote: > I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module. > In .pbs scr

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Gilles Gouaillardet
wrong last but not least, can you try to use full path for both mpirun and nwchem ? Cheers, Gilles On Sunday, August 2, 2015, abhisek Mondal wrote: > Yes, I have tried this and got following error: > > *mpirun was unable to launch the specified application as it could not > find a

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Gilles Gouaillardet
Can you try running invoking mpirun with its full path instead ? e.g. /usr/local/bin/mpirun instead of mpirun Cheers, Gilles On Sunday, August 2, 2015, abhisek Mondal wrote: > Here is the other details, > > a. The Openmpi version is 1.6.4 > > b. The error as being generated

Re: [OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 1.8.7

2015-07-29 Thread Gilles Gouaillardet
Thomas, can you please elaborate ? I checked the code of opal_os_dirpath_create and could not find where such a thing can happen Thanks, Gilles On Wednesday, July 29, 2015, Thomas Jahns wrote: > Hello, > > On 07/28/15 17:34, Schlottke-Lakemper, Michael wrote: > >> That’

Re: [OMPI users] Building OpenMPI 1.8.7 on XC30

2015-07-28 Thread Gilles Gouaillardet
Erik, the OS X warning (which should not be OS X specific) is fixed in https://github.com/open-mpi/ompi-release/pull/430 it will land into the v2.x series once reviewed in the mean time, feel free to manually apply the patch on the tarball Cheers, Gilles On 7/29/2015 10:35 AM, Erik

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-28 Thread Gilles Gouaillardet
You are right and I misread your comment. Michael is using ROMIO, which is independent of ompio. Cheers, Gilles On Wednesday, July 29, 2015, Dave Love wrote: > Gilles Gouaillardet > writes: > > > Dave, > > > > On 7/24/2015 1:53 AM, Dave Love wrote: >

Re: [OMPI users] strange behavior of MPI_wait() method

2015-07-28 Thread Gilles Gouaillardet
thanks for clarifying there is only one container per host. do you always run 16 tasks per host/container ? or do you always run 16 hosts/containers ? also, do lxc sets iptables when you start a container ? Cheers, Gilles On Tuesday, July 28, 2015, Cristian RUIZ wrote: > Thank you

Re: [OMPI users] strange behavior of MPI_wait() method

2015-07-28 Thread Gilles Gouaillardet
Cristian, one more thing... make sure tasks run on the same physical node with and without containers. for example, if in native mode, tasks 0 to 15 run on node 0, then in container mode, tasks 0 to 15 should run on 16 containers hosted by node 0 Cheers, Gilles On Tuesday, July 28, 2015

Re: [OMPI users] strange behavior of MPI_wait() method

2015-07-28 Thread Gilles Gouaillardet
viour. on the network, did you measure zero or few errors ? few errors take some extra time to be fixed, and if your application is communication intensive, these delays get propagated and you can end up with huge performance hit. Cheers, Gilles On Tuesday, July 28, 2015, Cristian RUIZ

Re: [OMPI users] Fatal Error: Cannot read module file 'mpi.mod' opened at (1), because it was created by a different version of GNU Fortran

2015-07-28 Thread Gilles Gouaillardet
d is mpifort ... so you can run mpifort -showme ... to see the how gfortran is invoked. it is likely mpifort simply run gfortran, and your PATH does not point to gfortran 4.9.2 Cheers, Gilles On 7/28/2015 1:47 PM, Syed Ahsan Ali wrote: I am getting this error during installation of an applic

Re: [OMPI users] Building OpenMPI 1.8.7 on XC30

2015-07-27 Thread Gilles Gouaillardet
have an OS X cluster ...) Cheers, Gilles On Sunday, July 26, 2015, Erik Schnetter wrote: > Mark > > No, it doesn't need to be 1.8.7. > > I just tried v2.x-dev-96-g918650a. This leads to run-time warnings on OS > X; I see messages such as > > [warn] select: Bad file

Re: [OMPI users] shared memory performance

2015-07-24 Thread Gilles Gouaillardet
. Cheers, Gilles On Friday, July 24, 2015, Harald Servat wrote: > Dear Cristian, > > according to your configuration: > > a) - 8 Linux containers on the same machine configured with 2 cores > b) - 8 physical machines > c) - 1 physical machine > > a) and c) hav

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Gilles Gouaillardet
pport.) on my system : $ grep FILE_SYSTEM ./ompi/mca/io/romio/romio/config.status S["FILE_SYSTEM"]="testfs ufs nfs" unless i am misunderstanding, nfs is there Cheers, Gilles

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Gilles Gouaillardet
Michael, ROMIO is the default in the 1.8 series you can run ompi_info --all | grep io | grep priority ROMIO priority should be 20 and ompio priority should be 10. Cheers, Gilles On Thursday, July 23, 2015, Schlottke-Lakemper, Michael < m.schlottke-lakem...@aia.rwth-aachen.de> wrote:

Re: [OMPI users] File coherence issues with OpenMPI/torque/NFS (?)

2015-07-23 Thread Gilles Gouaillardet
Michael, are you running 1.8.7 or master ? if not default, which io module are you running ? (default is ROMIO with 1.8 but ompio with master) by any chance, could you post a simple program that evidences this issue ? Cheers, Gilles On Thursday, July 23, 2015, Schlottke-Lakemper, Michael

Re: [OMPI users] shared memory performance

2015-07-22 Thread Gilles Gouaillardet
threads per tasks Cheers, Gilles On 7/22/2015 4:42 PM, Crisitan RUIZ wrote: Sorry, I've just discovered that I was using the wrong command to run on 8 machines. I have to get rid of the "-np 8" So, I corrected the command and I used: mpirun --machinefile machine_mpi_bug.txt --mca

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-07-16 Thread Gilles Gouaillardet
Bill, good news : 1.8.7 has been released yesterday Cheers, Gilles On 7/16/2015 1:53 PM, Lane, William wrote: Ralph, I'd rather wait for the stable release of 1.8.7, but I'm willing to give it a try if my supervisor is

Re: [OMPI users] What collective implementation is used when?

2015-07-09 Thread Gilles Gouaillardet
collective modules (hierarch, ml, ?) implement hierarchical collective, which means they should be optimized for multi node / multi tasks per node. that being said, ml is not production ready, and i am not sure wheter hierarch is actively maintained) i hope this helps Gilles On 7/9/2015 5:37

Re: [OMPI users] Problems Compiling ULFM

2015-07-06 Thread Gilles Gouaillardet
George, I got confused since from bitbucket, ulfm was imported from svn r25756 and this svn id comes from the v1.7 branch Also, orte_job_map_t has a policy field in v1.6 but no more in v1.7 Anyway, you obviously know ulfm better than me, so I was wrong Cheers, Gilles On Monday, July 6, 2015

Re: [OMPI users] Problems Compiling ULFM

2015-07-06 Thread Gilles Gouaillardet
Rafael, At first glance, ulfm is based on openmpi 1.7 But the poe plm was written for openmpi 1.6 You'd better ask the ulfm folks about this issue Cheers, Gilles On Monday, July 6, 2015, Rafael Lago wrote: > Hello there! > I'm trying to work in a fault-tolerance project, a

Re: [OMPI users] my_sense in ompi_osc_sm_module_t not always protected by OPAL_HAVE_POSIX_THREADS

2015-06-29 Thread Gilles Gouaillardet
Nathan, Shall I remove the --with-threads configure option ? or make it dummy ? Cheers, Gilles On Tuesday, June 30, 2015, Nathan Hjelm wrote: > > Ah, that would explain why I am not seeing it in master. Can you PR the > changes to v1.10? > > -Nathan > > On Tue, Jun 3

Re: [OMPI users] my_sense in ompi_osc_sm_module_t not always protected by OPAL_HAVE_POSIX_THREADS

2015-06-29 Thread Gilles Gouaillardet
Nathan, I removed all of this (including the --with-threads configure option) on master a while ago. because this is a change in the configure command line, I never made a PR for v1.8 Cheers, Gilles On Tuesday, June 30, 2015, Nathan Hjelm wrote: > > Open MPI has required posix threa

Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-24 Thread Gilles Gouaillardet
file to blacklist the coll_ml module to ensure this is working. Mike and Mellanox folks, could you please comment on that ? Cheers, Gilles On 6/24/2015 5:23 PM, Daniel Letai wrote: Gilles, Attached the two output logs. Thanks, Daniel On 06/22/2015 08:08 AM, Gilles Gouaillardet wrote

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-24 Thread Gilles Gouaillardet
do not see how I can tackle the root cause without being able to reproduce the issue :-( can you try to reproduce the issue with the smallest hostfile, and then run lstopo on all the nodes ? btw, you are not mixing 32 bits and 64 bits OS, are you ? Cheers, Gilles mca_btl_sm_add_procs( int

Re: [OMPI users] Problem getting job to start

2015-06-23 Thread Gilles Gouaillardet
Jeff, it sounds like openmpI is not available on sone nodes ! an other possibility is it is installed but in an other directory or mabye it is not in your path and you did not configure with --enable-mpirun-prefix-by-default Cheers, Gilles On Wednesday, June 24, 2015, Jeff Layton wrote

Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-22 Thread Gilles Gouaillardet
Daniel, i double checked this and i cannot make any sense with these logs. if coll_ml_priority is zero, then i do not any way how ml_coll_hier_barrier_setup can be invoked. could you please run again with --mca coll_base_verbose 100 with and without --mca coll ^ml Cheers, Gilles On 6/22

Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-21 Thread Gilles Gouaillardet
Daniel, ok, thanks it seems that even if priority is zero, some code gets executed I will confirm this tomorrow and send you a patch to work around the issue if that if my guess is proven right Cheers, Gilles On Sunday, June 21, 2015, Daniel Letai wrote: > MCA coll: parame

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-19 Thread Gilles Gouaillardet
1.8, so an incorrect back port could be the root cause. Cheers, Gilles On Friday, June 19, 2015, Ralph Castain wrote: > Gilles > > I was fooled too, but that isn’t the issue. The problem is that > ompi_free_list is segfaulting: > > [csclprd3-0-13:30901] *** Process received signal

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-19 Thread Gilles Gouaillardet
? Cheers, Gilles On Friday, June 19, 2015, Lane, William wrote: > Ralph, > > I created a hostfile that just has the names of the hosts while > specifying no slot information whatsoever (e.g. csclprd3-0-0) > and received the following errors: > > mpirun -np 132 -report-b

Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-18 Thread Gilles Gouaillardet
This is really odd... you can run ompi_info --all and search coll_ml_priority it will display the current value and the origin (e.g. default, system wide config, user config, cli, environment variable) Cheers, Gilles On Thursday, June 18, 2015, Daniel Letai wrote: > No, that's t

Re: [OMPI users] simple mpi hello world segfaults when coll ml not disabled

2015-06-18 Thread Gilles Gouaillardet
Daniel, ML module is not ready for production and is disabled by default. Did you explicitly enable this module ? If yes, I encourage you to disable it Cheers, Gilles On Thursday, June 18, 2015, Daniel Letai wrote: > given a simple hello.c: > > #include > #include > >

Re: [OMPI users] Error building openmpi-dev-1883-g7cce015 on Linux

2015-06-16 Thread Gilles Gouaillardet
Siegmar, these are just warnings, you can safely ignore them Cheers, Gilles On Tuesday, June 16, 2015, Siegmar Gross < siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > today I tried to build openmpi-dev-1883-g7cce015 on my machines > (Solaris 10 Sparc, Solaris 10 x8

Re: [OMPI users] Missing file "openmpi/ompi/mpi/f77/constants.h"

2015-06-15 Thread Gilles Gouaillardet
Dave, v1.6 is ok if configure'd with --with-devel-headers Cheers, Gilles On Monday, June 15, 2015, Dave Love wrote: > Gilles Gouaillardet > writes: > > > Dave, > > > > commit > > > https://github.com/nerscadmin/IPM/commit/8f628dadc502b3e0113d6ab307

Re: [OMPI users] Missing file "openmpi/ompi/mpi/f77/constants.h"

2015-06-15 Thread Gilles Gouaillardet
* handle MPI_IN_PLACE in Fortran with Open MPI > 1.6 Cheers, Gilles On 6/12/2015 1:09 AM, Dave Love wrote: Filippo Spiga writes: Dear OpenMPI experts, I am rebuilding IPM (https://github.com/nerscadmin/ipm) based on OpenMPI 1.8.5. However, despite OMPI is compiled with the "--wi

Re: [OMPI users] Undefined ompi_mpi_info_null issue

2015-06-12 Thread Gilles Gouaillardet
Ray, one possibility is one of the loaded library was built with -rpath and this causes the mess an other option is you have to link _error.so with libmpi.so Cheers, Gilles On Friday, June 12, 2015, Ray Sheppard wrote: > Hi Gilles, > Thanks for the reply. I completely forgot that

Re: [OMPI users] Undefined ompi_mpi_info_null issue

2015-06-11 Thread Gilles Gouaillardet
Ray, this symbol is defined in libmpi.so. can you run ldd /N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py/_errors.so and make sure this is linked with openmpi 1.8.4 ? Cheers, Gilles On 6/12/2015 1:29 AM, Ray Sheppard wrote: Hi List, I know I saw this issue

Re: [OMPI users] building openmpi-v1.8.5-46-g9f5f498 still breaks

2015-06-10 Thread Gilles Gouaillardet
On Jun 10, 2015, at 12:00 AM, Gilles Gouaillardet > wrote: > > > > that can happen indeed, in a complex but legitimate environment : > > > > mkdir ~/src > > cd ~/src > > tar xvfj openmpi-1.8.tar.bz2 > > mkdir ~/build/openmpi-v1.8 > > cd ~/build

Re: [OMPI users] Looking for LAM-MPI sources to create a mirror

2015-06-10 Thread Gilles Gouaillardet
Hi, you can find some source rpms for various distros for example at http://rpms.famillecollet.com/rpmphp/zoom.php?rpm=lam Cheers, Gilles On 6/10/2015 6:07 PM, Cian Davis wrote: Hi All, While OpenMPI is the way forward, there is a fair amount of software out there still compiled against

Re: [OMPI users] building openmpi-v1.8.5-46-g9f5f498 still breaks

2015-06-10 Thread Gilles Gouaillardet
rm -rf openmpi-1.8.6-${SYSTEM_ENV}.${MACHINE_ENV}.64_gcc mkdir openmpi-1.8.6-${SYSTEM_ENV}.${MACHINE_ENV}.64_gcc cd openmpi-1.8.6-${SYSTEM_ENV}.${MACHINE_ENV}.64_gcc Cheers, Gilles On 6/10/2015 2:42 PM, Siegmar Gross wrote: Hi Gilles, a simple workaround is you always run configure in an

Re: [OMPI users] building openmpi-v1.8.5-46-g9f5f498 still breaks

2015-06-10 Thread Gilles Gouaillardet
hat caused one crash at least, and likely silently compiles old sources most of the time) Cheers, Gilles On 6/10/2015 10:01 AM, Jeff Squyres (jsquyres) wrote: Siegmar -- I don't see any reason why this should be happening to you only sometimes; this code has been unchanged in *forever*.

Re: [OMPI users] Building OpenMPI on Raspberry Pi 2

2015-06-09 Thread Gilles Gouaillardet
ls with an understandable error message when building for armv6 Cheers, Gilles On 6/10/2015 2:18 AM, Jeff Layton wrote: Gilles, I was looking in Raspbian a little and do you know what I found? When I do "gcc -v" it says it was built with "--with-arch=armv6". Since I'm t

Re: [OMPI users] Building OpenMPI on Raspberry Pi 2

2015-06-09 Thread Gilles Gouaillardet
Jeff, btw, did you try a pI 1 before a pi2 ? I checked some forums, and you will likely have to upgrade gcc to 4.8 a simpler option could be linaro https://www.raspberrypi.org/forums/viewtopic.php?f=56&t=98997 Cheers, Gilles On Tuesday, June 9, 2015, Gilles Gouaillardet wrote: &g

Re: [OMPI users] Building OpenMPI on Raspberry Pi 2

2015-06-09 Thread Gilles Gouaillardet
Jeff, can you gcc -march=armv7-a foo.c Cheers, Gilles On Tuesday, June 9, 2015, Jeff Layton wrote: > Gilles, > > I'm not cross-compiling - I'm building on the Pi 2. > > I'm not sure how to check if gcc can generate armv7 code. > I'm using Raspbian an

<    5   6   7   8   9   10   11   12   >