Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Anastasia Kruchinina
Ok, thanks for your answers! I was not aware that it is a known issue. I guess I will just try to find a machine with OpenMPI/2.0.2 and try there. On 16 February 2017 at 00:01, r...@open-mpi.org wrote: > Yes, 2.0.1 has a spawn issue. We believe that 2.0.2 is okay if you want

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread Gilles Gouaillardet
Ralph, i was able to rewrite some macros to make Oracle compilers happy, and filed https://github.com/pmix/master/pull/309 for that Siegmar, meanwhile, feel free to manually apply the attached patch Cheers, Gilles On 2/16/2017 8:09 AM, r...@open-mpi.org wrote: I guess it was the

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread Siegmar Gross
Hi Ralph and Gilles, I guess it was the next nightly tarball, but not next commit. Yes. How do I know which commits are in which order applied to the trunk? How do I select/download a special commit? I can try the commits, if I know how to do it. Kind regards Siegmar However, it was

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread r...@open-mpi.org
I guess it was the next nightly tarball, but not next commit. However, it was almost certainly 7acef48 from Gilles that updated the PMIx code. Gilles: can you perhaps take a peek? Sent from my iPad > On Feb 15, 2017, at 11:43 AM, Siegmar Gross > wrote: >

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Yes, 2.0.1 has a spawn issue. We believe that 2.0.2 is okay if you want to give it a try Sent from my iPad > On Feb 15, 2017, at 1:14 PM, Jason Maldonis wrote: > > Just to throw this out there -- to me, that doesn't seem to be just a problem > with SLURM. I'm guessing the

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Jason Maldonis
Just to throw this out there -- to me, that doesn't seem to be just a problem with SLURM. I'm guessing the exact same error would be thrown interactively (unless I didn't read the above messages carefully enough). I had a lot of problems running spawned jobs on 2.0.x a few months ago, so I

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread Siegmar Gross
Hi Ralph, I get the error already with openmpi-master-201702100209-51def91 which is the next version after openmpi-master-201702080209-bc2890e, if I'm right. loki openmpi-master 146 grep Error \ openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc \

[OMPI users] MPI-1 ops which perform w/ CPU affinity ops

2017-02-15 Thread Sasso, John (GE Global Research, consultant)
While doing some benchmarking of an application from 32 ranks up to 512 ranks (increments in powers of 2) using an Intel 14 compiler build of OpenMPI 1.6.5, we are finding improved performance if '-bysocket -bind-to-socket' is spec'd to mpirun. The application uses a variety of MPI-1

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Anastasia Kruchinina
Hi! I am doing like this: sbatch -N 2 -n 5 ./job.sh where job.sh is: #!/bin/bash -l module load openmpi/2.0.1-icc mpirun -np 1 ./manager 4 On 15 February 2017 at 17:58, r...@open-mpi.org wrote: > The cmd line looks fine - when you do your “sbatch” request, what is

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
The cmd line looks fine - when you do your “sbatch” request, what is in the shell script you give it? Or are you saying you just “sbatch” the mpirun cmd directly? > On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina > wrote: > > Hi, > > I am running like this:

Re: [OMPI users] numaif.h present but not usable with openmpi-master-201702080209-bc2890e on Linux

2017-02-15 Thread Jeff Squyres (jsquyres)
> On Feb 15, 2017, at 11:34 AM, Siegmar Gross > wrote: > >> Did adding these flags to CPPFLAGS/CXXCPPFLAGS also solve the cuda.h issues? > > Yes, but it would be great if "configure" would add the flags > automatically when "--with-cuda=..." is available.

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread r...@open-mpi.org
If we knew what line in that file was causing the compiler to barf, we could at least address it. There is probably something added in recent commits that is causing problems for the compiler. So checking to see what commit might be triggering the failure would be most helpful. > On Feb 15,

Re: [OMPI users] numaif.h present but not usable with openmpi-master-201702080209-bc2890e on Linux

2017-02-15 Thread Siegmar Gross
Hi Jeff, Did adding these flags to CPPFLAGS/CXXCPPFLAGS also solve the cuda.h issues? Yes, but it would be great if "configure" would add the flags automatically when "--with-cuda=..." is available. Kind regards Siegmar On Feb 15, 2017, at 11:13 AM, Siegmar Gross

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread Siegmar Gross
Hi Gilles, this looks like a compiler crash, and it should be reported to Oracle. I can try, but I don't think that they are interested, because we don't have a contract any longer. I didn't get the error building openmpi-master-201702080209-bc2890e as you can see below. Would it be helpful

Re: [OMPI users] numaif.h present but not usable with openmpi-master-201702080209-bc2890e on Linux

2017-02-15 Thread Jeff Squyres (jsquyres)
Did adding these flags to CPPFLAGS/CXXCPPFLAGS also solve the cuda.h issues? > On Feb 15, 2017, at 11:13 AM, Siegmar Gross > wrote: > > Hi Jeff and Gilles, > > thank you very much for your answers. I added -I flags for > "valgrind.h" und "cuda.h" to the

Re: [OMPI users] numaif.h present but not usable with openmpi-master-201702080209-bc2890e on Linux

2017-02-15 Thread Siegmar Gross
Hi Jeff and Gilles, thank you very much for your answers. I added -I flags for "valgrind.h" und "cuda.h" to the CPPFLAGS and CXXCPPFLAGS and related -L flags to LDFLAGS. Now the header files are usable und I was able to build Open MPI master without errors with gcc. Tomorrow I can test the

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Anastasia Kruchinina
Hi, I am running like this: mpirun -np 1 ./manager Should I do it differently? I also thought that all sbatch does is create an allocation and then run my script in it. But it seems it is not since I am getting these results... I would like to upgrade to OpenMPI, but no clusters near me have

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Howard Pritchard
Hi Anastasia, Definitely check the mpirun when in batch environment but you may also want to upgrade to Open MPI 2.0.2. Howard r...@open-mpi.org schrieb am Mi. 15. Feb. 2017 um 07:49: > Nothing immediate comes to mind - all sbatch does is create an allocation > and then run

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Nothing immediate comes to mind - all sbatch does is create an allocation and then run your script in it. Perhaps your script is using a different “mpirun” command than when you type it interactively? > On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina > wrote: >

Re: [OMPI users] Specify the core binding when spawning a process

2017-02-15 Thread r...@open-mpi.org
Sorry for slow response - was away for awhile. What version of OMPI are you using? > On Feb 8, 2017, at 1:59 PM, Allan Ma wrote: > > Hello, > > I'm designing a program on a dual socket system that needs the parent process > and spawned child process to be at least

Re: [OMPI users] configure test doesn't find cuda.h and valgrind.h for openmpi-master-201702150209-404fe32

2017-02-15 Thread Jeff Squyres (jsquyres)
Siegmar -- Thanks for the reminder; sorry for not replying to your initial email earlier! I just replied about the valgrind.h issue -- check out https://www.mail-archive.com/users@lists.open-mpi.org/msg30631.html. I'm not quite sure what is going on with cuda.h, though -- I've asked Sylvain

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
> On Feb 15, 2017, at 5:45 AM, Mark Dixon wrote: > > On Wed, 15 Feb 2017, r...@open-mpi.org wrote: > >> Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - >> the logic is looking expressly for values > 1 as we hadn’t anticipated this >>

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread Mark Dixon
On Wed, 15 Feb 2017, r...@open-mpi.org wrote: Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - the logic is looking expressly for values > 1 as we hadn’t anticipated this use-case. Is it a sensible use-case, or am I crazy? I can make that change. I’m off to a

Re: [OMPI users] numaif.h present but not usable with openmpi-master-201702080209-bc2890e on Linux

2017-02-15 Thread Jeff Squyres (jsquyres)
Siegmar -- Sorry for the delay in replying. You should actually put -I flags in CPPFLAGS and CXXCPPFLAGS, not CFLAGS and CXXFLAGS. The difference is: 1. CFLAGS is given to the C compiler when compiling 2. CPPFLAFS is given to the C compiler when compiling and to the C preprocessor when

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - the logic is looking expressly for values > 1 as we hadn’t anticipated this use-case. I can make that change. I’m off to a workshop for the next day or so, but can probably do this on the plane. > On Feb 15, 2017,

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - the logic is looking expressly for values > 1 as we hadn’t anticipated this use-case. I can make that change. I’m off to a workshop for the next day or so, but can probably do this on the plane. > On Feb 15, 2017,

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread Gilles Gouaillardet
Siegmar, this looks like a compiler crash, and it should be reported to Oracle. Cheers, Gilles On Wednesday, February 15, 2017, Siegmar Gross < siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > I tried to install openmpi-master-201702150209-404fe32 on my "SUSE Linux > Enterprise Server

Re: [OMPI users] configure test doesn't find cuda.h and valgrind.h for openmpi-master-201702150209-404fe32

2017-02-15 Thread Gilles Gouaillardet
Siegmar, i do not think this should be fixed within Open MPI. unlike GNU gcc, sun cc does not search in /usr/local. so i believe it is up to the user to pass the right CPPFLAGS and LDFLAGS to the configure command line. if you think sun cc should search in /usr/local, then i suggest you report

[OMPI users] configure test doesn't find cuda.h and valgrind.h for openmpi-master-201702150209-404fe32

2017-02-15 Thread Siegmar Gross
Hi, I tried to install openmpi-master-201702150209-404fe32 on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Unfortunately, configure tests don't find cuda.h and valgrind.h, although they are available. I had reported this problem already for

[OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread Siegmar Gross
Hi, I tried to install openmpi-master-201702150209-404fe32 on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.14. Unfortunately, "make" breaks with the following error. I've had no problems with gcc-6.3.0. ...

[OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread Mark Dixon
Hi, When combining OpenMPI 2.0.2 with OpenMP, I'm interested in launching a number of ranks and allocating a number of cores to each rank. Using "-map-by socket:PE=", switching to "-map-by node:PE=" if I want to allocate more than a single socket to a rank, seems to do what I want. Except

Re: [OMPI users] MPI_Win_allocate: Memory alignment

2017-02-15 Thread Joseph Schuchart
Gilles, Thanks for the quick reply and the immediate fix. I can confirm that allocations from both MPI_Win_allocate_shared and MPI_Win_allocate are now consistently aligned at 8-byte boundaries and the application runs fine now. For the records, allocations from malloc and MPI_Mem_alloc are