Re: [OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686 on Solaris

2014-10-24 Thread Gilles Gouaillardet
Siegmar, how did you configure openmpi ? which java version did you use ? i just found a regression and you currently have to explicitly add CFLAGS=-D_REENTRANT CPPFLAGS=-D_REENTRANT to your configure command line if you want to debug this issue (i cannot reproduce it on a solaris 11 x86

Re: [OMPI users] OMPI users] low CPU utilization with OpenMPI

2014-10-24 Thread Gilles Gouaillardet
Can you also check there is no cpu binding issue (several mpi tasks and/or OpenMP threads if any, bound to the same core and doing time sharing ? A simple way to check that is to log into a compute node, run top and then press 1 f j If some cores have higher usage than others, you are likely

Re: [OMPI users] OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris

2014-10-25 Thread Gilles Gouaillardet
Hi Siegmar, You might need to configure with --enable-debug and add -g -O0 to your CFLAGS and LDFLAGS Then once you attach with gdb, you have to find the thread that is polling : thread 1 bt thread 2 bt and so on until you find the good thread If _dbg is a local variable, you need to select the

Re: [OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Gilles Gouaillardet
It looks like we faced a similar issue : opal_process_name_t is 64 bits aligned wheteas orte_process_name_t is 32 bits aligned. If you run an alignment sensitive cpu such as sparc and you are not lucky (so to speak) you can run into this issue. i will make a patch for this shortly Ralph Castain

Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-26 Thread Gilles Gouaillardet
variable declaration only. Any thought ? Ralph Castain <r...@open-mpi.org> wrote: >Will PR#249 solve it? If so, we should just go with it as I suspect that is >the long-term solution. > >> On Oct 26, 2014, at 4:25 PM, Gilles Gouaillardet >> <gilles.gouaillar...@gma

Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-27 Thread Gilles Gouaillardet
es to your branch, I can pass you a patch with my suggested > alterations. > >> On Oct 26, 2014, at 5:55 PM, Gilles Gouaillardet >> <gilles.gouaillar...@gmail.com> wrote: >> >> No :-( >> I need some extra work to stop declaring orte_process_name_t an

Re: [OMPI users] which info is needed for SIGSEGV in Java foropenmpi-dev-124-g91e9686on Solaris

2014-10-27 Thread Gilles Gouaillardet
;>>>>> while >>> (_dbg) poll(NULL, 0, 1); >>>>>> tyr java 400 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i _dbg >>>>>> tyr java 401 nm /usr/local/openmpi-1.9.0_64_gcc/lib64/*.so | grep -i >>>>>> JNI_

Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-27 Thread Gilles Gouaillardet
Hi, i tested on a RedHat 6 like linux server and could not observe any memory leak. BTW, are you running 32 or 64 bits cygwin ? and what is your configure command line ? Thanks, Gilles On 2014/10/27 18:26, Marco Atzeri wrote: > On 10/27/2014 8:30 AM, maxinator333 wrote: >> Hello, >> >> I

Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-27 Thread Gilles Gouaillardet
Thanks Marco, I could reproduce the issue even with one node sending/receiving to itself. I will investigate this tomorrow Cheers, Gilles Marco Atzeri <marco.atz...@gmail.com> wrote: > > >On 10/27/2014 10:30 AM, Gilles Gouaillardet wrote: >> Hi, >> >> i teste

Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Gilles Gouaillardet
Michael, Could you please run mpirun -np 1 df -h mpirun -np 1 df -hi on both compute and login nodes Thanks Gilles michael.rach...@dlr.de wrote: >Dear developers of OPENMPI, > >We have now installed and tested the bugfixed OPENMPI Nightly Tarball of >2014-10-24

Re: [OMPI users] WG: Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-10-27 Thread Gilles Gouaillardet
Michael, The available space must be greater than the requested size + 5% From the logs, the error message makes sense to me : there is not enough space in /tmp Since the compute nodes have a lot of memory, you might want to try using /dev/shm instead of /tmp for the backing files Cheers,

Re: [OMPI users] OMPI users] OMPI users] OMPI users] which info is needed for SIGSEGV inJava foropenmpi-dev-124-g91e9686on Solaris

2014-10-27 Thread Gilles Gouaillardet
Ralph, On 2014/10/28 0:46, Ralph Castain wrote: > Actually, I propose to also remove that issue. Simple enough to use a > hash_table_32 to handle the jobids, and let that point to a > hash_table_32 of vpids. Since we rarely have more than one jobid > anyway, the memory overhead actually decreases

Re: [OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Gilles Gouaillardet
Marco, here is attached a patch that fixes the issue /* i could not find yet why this does not occurs on Linux ... */ could you please give it a try ? Cheers, Gilles On 2014/10/27 18:45, Marco Atzeri wrote: > > > On 10/27/2014 10:30 AM, Gilles Gouaillardet wrote: >> Hi,

Re: [OMPI users] SIGBUS in openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc

2014-10-28 Thread Gilles Gouaillardet
Hi Siegmar, From the jvm logs, there is an alignment error in native_get_attr but i could not find it by reading the source code. Could you please do ulimit -c unlimited mpiexec ... and then gdb /bin/java core And run bt on all threads until you get a line number in native_get_attr Thanks

Re: [OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Gilles Gouaillardet
Thanks Marco, pthread_mutex_init calls calloc under cygwin but does not allocate memory under linux, so not invoking pthread_mutex_destroy causes a memory leak only under cygwin. Gilles Marco Atzeri <marco.atz...@gmail.com> wrote: >On 10/28/2014 12:04 PM, Gilles Gouaillardet wrote:

Re: [OMPI users] OMPI users] OMPI users] Possible Memory Leak in simple PingPong-Routine with OpenMPI 1.8.3?

2014-10-28 Thread Gilles Gouaillardet
Yep, will do today Ralph Castain <r...@open-mpi.org> wrote: >Gilles: will you be committing this to trunk and PR to 1.8? > > >> On Oct 28, 2014, at 11:05 AM, Marco Atzeri <marco.atz...@gmail.com> wrote: >> >> On 10/28/2014 4:41 PM, Gill

Re: [OMPI users] SIGBUS in openmpi-dev-178-ga16c1e4 on Solaris 10 Sparc

2014-10-29 Thread Gilles Gouaillardet
1 (LWP 1)] >>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to >>> satisfy query >>> (gdb) bt >>> #0 0x7f6173d0 in rtld_db_dlactivity () from >>> /usr/lib/sparcv9/ld.so.1 >>> #1 0x7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1 >>&g

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-11-05 Thread Gilles Gouaillardet
Michael, could you please share your test program so we can investigate it ? Cheers, Gilles On 2014/10/31 18:53, michael.rach...@dlr.de wrote: > Dear developers of OPENMPI, > > There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED. > > But first: > Thank you for your advices to employ

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

2014-11-05 Thread Gilles Gouaillardet
ved with our large CFD-code. > > Are OPENMPI-developers nevertheless interested in that testprogram? > > Greetings > Michael > > > > > > > -Ursprüngliche Nachricht- > Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles > Gouaillar

Re: [OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-05 Thread Gilles Gouaillardet
Michael, the root cause is openmpi was not compiled with the intel compilers but the gnu compiler. fortran modules are not binary compatible so openmpi and your application must be compiled with the same compiler. Cheers, Gilles On 2014/11/05 18:25, michael.rach...@dlr.de wrote: > Dear OPENMPI

Re: [OMPI users] OMPI users] OPENMPI-1.8.3: missing fortran bindings for MPI_SIZEOF

2014-11-05 Thread Gilles Gouaillardet
an >mpi.mod file, because the User can look inside the module >and can directly see, if something is missing or possibly wrongly coded. > >Greetings > Michael Rachner > > >-Ursprüngliche Nachricht----- >Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gill

Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Gilles Gouaillardet
Brock, Is your post related to ib0/eoib0 being used at all, or being used with load balancing ? let me clarify this : --mca btl ^openib disables the openib btl aka *native* infiniband. This does not disable ib0 and eoib0 that are handled by the tcp btl. As you already figured out,

Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Gilles Gouaillardet
Ralph, IIRC there is load balancing accros all the btl, for example between vader and scif. So load balancing between ib0 and eoib0 is just a particular case that might not necessarily be handled by the btl tcp. Cheers, Gilles Ralph Castain wrote: >OMPI discovers all

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Gilles Gouaillardet
Hi, IIRC there were some bug fixes between 1.8.1 and 1.8.2 in order to really use all the published interfaces. by any change, are you running a firewall on your head node ? one possible explanation is the compute node tries to access the public interface of the head node, and packets get

Re: [OMPI users] OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Gilles Gouaillardet
Could you please send the output of netstat -nr on both head and compute node ? no problem obfuscating the ip of the head node, i am only interested in netmasks and routes. Ralph Castain wrote: > >> On Nov 12, 2014, at 2:45 PM, Reuti wrote: >> >>

Re: [OMPI users] mpirun fails across nodes

2014-11-13 Thread Gilles Gouaillardet
Hi, it seems you messed up the command line could you try $ mpirun --mca btl ^openib --host compute-01-01,compute-01-06 ring_c can you also try to run mpirun from a compute node instead of the head node ? Cheers, Gilles On 2014/11/13 16:07, Syed Ahsan Ali wrote: > Here is what I see when

Re: [OMPI users] mpirun fails across nodes

2014-11-13 Thread Gilles Gouaillardet
9 > [compute-01-01.private.dns.zone][[11064,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] > connect() to 192.168.108.10 failed: No route to host (113) > > > On Thu, Nov 13, 2014 at 12:11 PM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote:

Re: [OMPI users] mpirun fails across nodes

2014-11-13 Thread Gilles Gouaillardet
.0 b) TX bytes:0 (0.0 b) > > > > So the point is why mpirun is following the ib path while I it has > been disabled. Possible solutions? > > On Thu, Nov 13, 2014 at 12:32 PM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wrote: >> mpirun complains about the

Re: [OMPI users] mpirun fails across nodes

2014-11-13 Thread Gilles Gouaillardet
; ib0 Link encap:InfiniBand HWaddr >>> 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 >>> inet addr:192.168.108.14 Bcast:192.168.108.255 >>> Mask:255.255.255.0 >>> UP BROADCAST MULTICAST MTU:65520 Metric:1 >>

Re: [OMPI users] mpirun fails across nodes

2014-11-13 Thread Gilles Gouaillardet
.0 255.0.0.0 U 0 0 0 eth0 > 0.0.0.0 10.0.0.10.0.0.0 UG0 0 0 eth0 > [pmdtest@compute-01-06 ~]$ > > > On Thu, Nov 13, 2014 at 12:56 PM, Gilles Gouaillardet > <gilles.gouaillar...@iferc.org> wro

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Gilles Gouaillardet
My 0.02 US$ first, the root cause of the problem was a default gateway was configured on the node, but this gateway was unreachable. imho, this is incorrect system setting that can lead to unpredictable results : - openmpi 1.8.1 works (you are lucky, good for you) - openmpi 1.8.3 fails (no luck

Re: [OMPI users] OMPI users] error building openmpi-dev-274-g2177f9e withgcc-4.9.2

2014-11-16 Thread Gilles Gouaillardet
Siegmar, This is correct, --enable-heterogenous is now fixed in the trunk. Please also note that -D_REENTRANT is now automatically set on solaris Cheers Gilles Siegmar Gross wrote: >Hi Jeff, hi Ralph, > >> This issue should now be fixed, too. > >Yes, it

Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently

2014-11-17 Thread Gilles Gouaillardet
Hi John, do you MPI_Init() or do you MPI_Init_thread(MPI_THREAD_MULTIPLE) ? does your program calls MPI anywhere from an OpenMP region ? does your program calls MPI only within an !$OMP MASTER section ? does your program does not invoke MPI at all from any OpenMP region ? can you reproduce this

Re: [OMPI users] collective algorithms

2014-11-17 Thread Gilles Gouaillardet
Daniel, you can run $ ompi_info --parseable --all | grep _algorithm: | grep enumerator that will give you the list of supported algo for the collectives, here is a sample output : mca:coll:tuned:param:coll_tuned_allreduce_algorithm:enumerator:value:0:ignore

Re: [OMPI users] MPI_Neighbor_alltoallw fails with mpi-1.8.3

2014-11-21 Thread Gilles Gouaillardet
Hi Ghislain, that sound like a but in MPI_Dist_graph_create :-( you can use MPI_Dist_graph_create_adjacent instead : MPI_Dist_graph_create_adjacent(MPI_COMM_WORLD, degrees, [0], [0], degrees, [0], [0], info, rankReordering, ); it does not crash and as far as i

Re: [OMPI users] MPI_Neighbor_alltoallw fails with mpi-1.8.3

2014-11-21 Thread Gilles Gouaillardet
t reagrds, > Ghislain > > 2014-11-21 7:23 GMT+01:00 Gilles Gouaillardet <gilles.gouaillar...@iferc.org >> : >> Hi Ghislain, >> >> that sound like a but in MPI_Dist_graph_create :-( >> >> you can use MPI_Dist_graph_create_adjacent instead : >

Re: [OMPI users] MPI_Neighbor_alltoallw fails with mpi-1.8.3

2014-11-25 Thread Gilles Gouaillardet
based on prior knowledge. > > George. > > > On Fri, Nov 21, 2014 at 3:48 AM, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Ghislain, >> >> i can confirm there is a bug in mca_topo_base_dist_graph_distribute >> >> FYI a proof of

Re: [OMPI users] "default-only MCA variable"?

2014-11-27 Thread Gilles Gouaillardet
It could be because configure did not find the knem headers and hence knem is not supported and hence this mca parameter is read-only My 0.2 us$ ... Dave Love さんのメール: >Why can't I set parameters like this (not the only one) with 1.8.3? > > WARNING: A user-supplied value

Re: [OMPI users] Warning about not enough registerable memory on SL6.6

2014-12-08 Thread Gilles Gouaillardet
Folks, FWIW, i observe a similar behaviour on my system. imho, the root cause is OFED has been upgraded from a (quite) older version to latest 3.12 version here is the relevant part of code (btl_openib.c from the master) : static uint64_t calculate_max_reg (void) { if (0 ==

Re: [OMPI users] Open mpi based program runs as root and gives SIGSEGV under unprivileged user

2014-12-10 Thread Gilles Gouaillardet
Luca, your email mentions openmpi 1.6.5 but gdb output points to openmpi 1.8.1. could the root cause be a mix of versions that does not occur with root account ? which openmpi version are you expecting ? you can run pmap when your binary is running and/or under gdb to confirm the openmpi

Re: [OMPI users] Open mpi based program runs as root and gives SIGSEGV under unprivileged user

2014-12-11 Thread Gilles Gouaillardet
the max locked memory size should be >> unlimited. >> Check /etc/security/limits.conf and "ulimit -a". >> >> I hope this helps, >> Gus Correa >> >> On 12/10/2014 08:28 AM, Gilles Gouaillardet wrote: >>> Luca, >>> >>> your

Re: [OMPI users] MPI inside MPI (still)

2014-12-11 Thread Gilles Gouaillardet
Alex, can you try something like call system(sh -c 'env -i /.../mpirun -np 2 /.../app_name') -i start with an empty environment that being said, you might need to set a few environment variables manually : env -i PATH=/bin ... and that being also said, this "trick" could be just a bad idea :

Re: [OMPI users] MPI inside MPI (still)

2014-12-11 Thread Gilles Gouaillardet
ize > getting passed over a job scheduler with this approach might not work at > all... > > I have looked at the MPI_Comm_spawn call but I failed to understand how it > could help here. For instance, can I use it to launch an mpi app with the > option "-n 5&quo

Re: [OMPI users] MPI inside MPI (still)

2014-12-11 Thread Gilles Gouaillardet
MPI_COMM_WORLD,my_intercomm,MPI_ERRCODES_IGNORE,status) > enddo > > I do get 15 instances of the 'hello_world' app running: 5 for each parent > rank 1, 2 and 3. > > Thanks a lot, Gilles. > > Best regargs, > > Alex > > > > > 2014-12-12 1:32 GMT-02:00 Gilles Goua

Re: [OMPI users] OMPI users] MPI inside MPI (still)

2014-12-12 Thread Gilles Gouaillardet
just >a front end to use those, but since we have a lot of data to process > >it also benefits from a parallel environment. > > >Alex > >  > > >2014-12-12 2:30 GMT-02:00 Gilles Gouaillardet <gilles.gouaillar...@iferc.org>: > >Alex, > >just to m

Re: [OMPI users] OMPI users] OMPI users] MPI inside MPI (still)

2014-12-13 Thread Gilles Gouaillardet
call back to the scheduler >queue. How would I track each one for their completion? > >Alex > > >2014-12-12 22:35 GMT-02:00 Gilles Gouaillardet <gilles.gouaillar...@gmail.com>: > >Alex, > >You need MPI_Comm_disconnect at least. >I am not sure if

Re: [OMPI users] OMPI users] OMPI users] OMPI users] MPI inside MPI (still)

2014-12-13 Thread Gilles Gouaillardet
n running these codes in serial mode. No need to say that >we could do a lot better if they could be executed in parallel. > >I am not familiar with DMRAA but it seems to be the right choice to deal with >job schedulers as it covers the ones I am interested in (pbs/torque and >loadl

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Gilles Gouaillardet
Eric, can you make your test case (source + input file + howto) available so i can try to reproduce and fix this ? Based on the stack trace, i assume this is a complete end user application. have you tried/been able to reproduce the same kind of crash with a trimmed test program ? BTW, what

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Gilles Gouaillardet
Eric, i checked the source code (v1.8) and the limit for the shared_fp_fname is 256 (hard coded). i am now checking if the overflow is correctly detected (that could explain the one byte overflow reported by valgrind) Cheers, Gilles On 2014/12/15 11:52, Eric Chamberland wrote: > Hi again, > >

Re: [OMPI users] ERROR: C_FUNLOC function

2014-12-15 Thread Gilles Gouaillardet
Hi Siegmar, a similar issue was reported in mpich with xlf compilers : http://trac.mpich.org/projects/mpich/ticket/2144 They concluded this is a compiler issue (e.g. the compiler does not implement TS 29113 subclause 8.1) Jeff, i made PR 315 https://github.com/open-mpi/ompi/pull/315 f08

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-15 Thread Gilles Gouaillardet
Eric, thanks for the simple test program. i think i see what is going wrong and i will make some changes to avoid the memory overflow. that being said, there is a hard coded limit of 256 characters, and your path is bigger than 300 characters. bottom line, and even if there is no more memory

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-15 Thread Gilles Gouaillardet
. Cheers, Gilles On 2014/12/16 12:43, Gilles Gouaillardet wrote: > Eric, > > thanks for the simple test program. > > i think i see what is going wrong and i will make some changes to avoid > the memory overflow. > > that being said, there is a hard coded limit of 256 cha

Re: [OMPI users] OMPI users] OMPI users] OMPI users] OMPI users] MPI inside MPI (still)

2014-12-17 Thread Gilles Gouaillardet
> >So far, I could not find anything about how to set an stdin file for an >spawnee process. >Specifiyng it in a app context file doesn't seem to work. Can it be done? >Maybe through >an MCA parameter? > > >Alex > > > > > > >2014-12-15 2:43 GM

Re: [OMPI users] OMPI users] ERROR: C_FUNLOC function

2014-12-18 Thread Gilles Gouaillardet
FWIW I faced a simlar issue on my linux virtualbox. My shared folder is a vboxfs filesystem, but statfs returns the nfs magic id. That causes some mess and the test fails. At this stage i cannot tell whether i should blame the glibc, the kernel, a virtualbox driver or myself Cheer, Gilles

Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4

2014-12-24 Thread Gilles Gouaillardet
Siegmar, could you please give a try to the attached patch ? /* and keep in mind this is just a workaround that happen to work */ Cheers, Gilles On 2014/12/22 22:48, Siegmar Gross wrote: > Hi, > > today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc, > Solaris 10 x86_64,

Re: [OMPI users] processes hang with openmpi-dev-602-g82c02b4

2014-12-24 Thread Gilles Gouaillardet
Kawashima-san, i'd rather consider this as a bug in the README (!) heterogenous support has been broken for some time, but it was eventually fixed. truth is there are *very* limited resources (both human and hardware) maintaining heterogeneous support, but that does not mean heterogeneous

Re: [OMPI users] OMPI users] What could cause a segfault in OpenMPI?

2014-12-28 Thread Gilles Gouaillardet
Where does the error occurs ? MPI_Init ? MPI_Finalize ? In between ? In the first case, the bug is likely a mishandled error case, which means OpenMPI is unlikely the root cause of the crash. Did you check infniband is up and running on your cluster ? Cheers, Gilles Saliya Ekanayake

Re: [OMPI users] OMPI users] Icreasing OFED registerable memory

2014-12-30 Thread Gilles Gouaillardet
FWIW ompi does not yet support XRC with OFED 3.12. Cheers, Gilles Deva さんのメール: >Hi Waleed, > > >It is highly recommended to upgrade to latest OFED.  Meanwhile, Can you try >latest OMPI release (v1.8.4), where this warning is ignored on older OFEDs > > >-Devendar  >

Re: [OMPI users] MPI_Type_Create_Struct + MPI_TYPE_CREATE_RESIZED

2015-01-02 Thread Gilles Gouaillardet
Diego, First, i recommend you redefine tParticle and add a padding integer so everything is aligned. Before invoking MPI_Type_create_struct, you need to call MPI_Get_address(dummy, base, MPI%err) displacements = displacements - base MPI_Type_create_resized might be unnecessary if tParticle

Re: [OMPI users] OMPI users] MPI_Type_Create_Struct + MPI_TYPE_CREATE_RESIZED

2015-01-02 Thread Gilles Gouaillardet
DRESS(dummy[1]), newt) "" > > >What do you think? > >George, Did i miss something? > > >Thanks a lot > > > > >Diego > > >On 2 January 2015 at 12:51, Gilles Gouaillardet ><gilles.gouaillar...@gmail.com> wrote: > >Diego, > >Fi

Re: [OMPI users] OMPI users] MPI_Type_Create_Struct + MPI_TYPE_CREATE_RESIZED

2015-01-04 Thread Gilles Gouaillardet
nt you can find the program. > > What do you meam "remove mpi_get_address(dummy) from all displacements". > > Thanks for all your help > > Diego > > > > Diego > > > On 3 January 2015 at 00:45, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com>

Re: [OMPI users] OMPI users] MPI_Type_Create_Struct + MPI_TYPE_CREATE_RESIZED

2015-01-04 Thread Gilles Gouaillardet
ACEMENTS* > * ENDIF* > > and the results is: > >*139835891001320 -139835852218120 -139835852213832* > * -139835852195016 8030673735967299609* > > I am not able to understand it. > > Thanks a lot. > > In the attachment you can find the program > > > >

Re: [OMPI users] OMPI users] OMPI users] MPI_Type_Create_Struct + MPI_TYPE_CREATE_RESIZED

2015-01-05 Thread Gilles Gouaillardet
  > >Why do I have 16 spaces in displacements(2), I have only an integer in >dummy%ip? > >Why do you use dummy(1) and dummy(2)? > > >Thanks a lot     > > > >Diego > > >On 5 January 2015 at 02:44, Gilles Gouaillardet ><gilles.gouaillar...@iferc.

Re: [OMPI users] OMPI users] OMPI users] MPI_Type_Create_Struct + MPI_TYPE_CREATE_RESIZED

2015-01-07 Thread Gilles Gouaillardet
Diego, my bad, i should have passed displacements(1) to MPI_Type_create_struct here is an updated version (note you have to use a REQUEST integer for MPI_Isend and MPI_Irecv, and you also have to call MPI_Wait to ensure the requests complete) Cheers, Gilles On 2015/01/08 8:23, Diego Avesani

Re: [OMPI users] difference of behaviour for MPI_Publish_name between openmpi-1.4.5 and openmpi-1.8.4

2015-01-07 Thread Gilles Gouaillardet
Well, per the source code, this is not a bug but a feature : from publish function from ompi/mca/pubsub/orte/pubsub_orte.c ompi_info_get_bool(info, "ompi_unique", , ); if (0 == flag) { /* uniqueness not specified - overwrite by default */ unique = false; } fwiw, and

Re: [OMPI users] difference of behaviour for MPI_Publish_name between openmpi-1.4.5 and openmpi-1.8.4

2015-01-07 Thread Gilles Gouaillardet
b) it seemed just as > reasonable as the alternative (I believe we flipped a coin) > > >> On Jan 7, 2015, at 6:47 PM, Gilles Gouaillardet >> <gilles.gouaillar...@iferc.org> wrote: >> >> Well, per the source code, this is not a bug but a feature : >&

Re: [OMPI users] OMPI users] OMPI users] MPI_Type_Create_Struct + MPI_TYPE_CREATE_RESIZED

2015-01-08 Thread Gilles Gouaillardet
the program run in your case? > > Thanks again > > > > Diego > > > On 8 January 2015 at 03:02, Gilles Gouaillardet < > gilles.gouaillar...@iferc.org> wrote: > >> Diego, >> >> my bad, i should have passed displacements(1) to MPI_Type_create_

Re: [OMPI users] MPI_Type_Create_Struct + MPI_TYPE_CREATE_RESIZED

2015-01-12 Thread Gilles Gouaillardet
ed is my copy of your program with fixes for the above-mentioned issues. > > BTW, I missed the beginning of this thread -- I assume that this is an > artificial use of mpi_type_create_resized for the purposes of a small > example. The specific use of it in this program appears to

Re: [OMPI users] error building openmpi-dev-685-g881b1dc on Soalris 10

2015-01-13 Thread Gilles Gouaillardet
Hi Siegmar, could you please try again with adding '-D_STDC_C99' to your CFLAGS ? Thanks and regards, Gilles On 2015/01/12 20:54, Siegmar Gross wrote: > Hi, > > today I tried to build openmpi-dev-685-g881b1dc on my machines > (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 >

Re: [OMPI users] Problems compiling OpenMPI 1.8.4 with GCC 4.9.2

2015-01-14 Thread Gilles Gouaillardet
Ryan, this issue has already been reported. please refer to http://www.open-mpi.org/community/lists/users/2015/01/26134.php for a workaround Cheers, Gilles On 2015/01/14 16:35, Novosielski, Ryan wrote: > OpenMPI 1.8.4 does not appear to be buildable with GCC 4.9.2. The output, as > requested

Re: [OMPI users] Segfault in mpi-java

2015-01-22 Thread Gilles Gouaillardet
Alexander, i was able to reproduce this behaviour. basically, bad things happen when the garbage collector is invoked ... i was even able to reproduce some crashes (but that happen at random stages) very early in the code by manually inserting calls to the garbage collector (e.g. System.gc();)

Re: [OMPI users] using multiple IB connections between hosts

2015-02-01 Thread Gilles Gouaillardet
Dave, the QDR Infiniband uses the openib btl (by default : btl_openib_exclusivity=1024) i assume the RoCE 10Gbps card is using the tcp btl (by default : btl_tcp_exclusivity=100) that means that by default, when both openib and tcp btl could be used, the tcp btl is discarded. could you give a

Re: [OMPI users] cross-compiling openmpi-1.8.4 with static linking

2015-02-09 Thread Gilles Gouaillardet
Simona, On 2015/02/08 20:45, simona bellavista wrote: > I have two systems A (aka Host) and B (aka Target). On A a compiler suite > is installed (intel 14.0.2), on B there is no compiler. I want to compile > openmpi on A for running it on system B (in particular, I want to use > mpirun and

Re: [OMPI users] Open MPI collectives algorithm selection

2015-03-10 Thread Gilles Gouaillardet
Khalid, i am not aware of such a mechanism. /* there might be a way to use MPI_T_* mechanisms to force the algorithm, and i will let other folks comment on that */ you definetly cannot directly invoke ompi_coll_tuned_bcast_intra_binomial (abstraction violation, non portable, and you miss the

Re: [OMPI users] open mpi on blue waters

2015-03-25 Thread Gilles Gouaillardet
you know what you are doing, you can try mpirun -mca sec basic) on blue waters, that would mean ompi does not run out of the box, but fails with an understandable message. that would be less user friendly, but more secure any thoughts ? Cheers, Gilles [gouaillardet@node0 ~]$ echo c

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Gilles Gouaillardet
On 2015/03/26 13:00, Ralph Castain wrote: > Well, I did some digging around, and this PR looks like the right solution. ok then :-) following stuff is not directly related to ompi, but you might want to comment on that anyway ... > Second, the running of munge on the IO nodes is not only okay but

Re: [OMPI users] open mpi on blue waters

2015-03-26 Thread Gilles Gouaillardet
can see Munge is/can be used by both SLURM and > TORQUE. > (http://docs.adaptivecomputing.com/torque/4-0-2/Content/topics/1-installConfig/serverConfig.htm#usingMUNGEAuth) > > If I misunderstood the drift, please ignore ;-) > > Mark > > >> On 26 Mar 2015, at 5:38 , Gil

Re: [OMPI users] Running mpi with different account

2015-04-13 Thread Gilles Gouaillardet
Xing, an other approach is to use ompi-server and Publish_name / Lookup_name : run ompi-server and pass the uri to two jobs (one per user) then you will have to "merge" the two jobs. this is obviously a bit more effort, but this is a cleaner approach imho. while sharing accounts is generally

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-14 Thread Gilles Gouaillardet
Andy, what about reconfiguring Open MPI with LDFLAGS="-Wl,-rpath,/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic" ? IIRC, an other option is : LDFLAGS="-static-intel" last but not least, you can always replace orted with a simple script that sets the LD_LIBRARY_PATH and exec the

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-16 Thread Gilles Gouaillardet
this option could overwhelm it and cause failures. I’d try the static method first, or perhaps the LDFLAGS Gilles suggested. On Apr 14, 2015, at 5:11 PM, Gilles Gouaillardet <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote: Andy, what about reconfiguring Open MPI with LDFLAGS

Re: [OMPI users] MPI_Comm_spawn and shared memory

2015-05-14 Thread Gilles Gouaillardet
This is a known limitation of the sm btl. FWIW, the vader btl (available in Open MPI 1.8) has the same limitation, thought i heard there are some works in progress to get rid of this limitation. Cheers, Gilles On 5/14/2015 3:52 PM, Radoslaw Martyniszyn wrote: Dear developers of Open MPI,

Re: [OMPI users] openmpi-1.8.5: Java UnsupportedClassVersionError for Solaris

2015-05-15 Thread Gilles Gouaillardet
Siegmar, do sunpc0 and sunpc1 run the same java version ? from sunpc1, can you run mpiexec -np 1 java InitFinalizeMain ? Cheers, Gilles On Friday, May 15, 2015, Siegmar Gross wrote: > Hi, > > I successfully installed openmpi-1.8.5 on my machines

Re: [OMPI users] openmpi-1.8.5: ORTE was unable to start daemons

2015-05-15 Thread Gilles Gouaillardet
Siegmar, can you run LD_LIBRARY_PATH= LD_LIBRARY_PATH64= /usr/bin/ssh on all your boxes ? the root cause could be you try to run ssh on box A with the env of box B can you also run with the -output-tag (or -tag-output) so we can figure out on which box ssh is failing Cheers, Gilles On

Re: [OMPI users] Open MPI collectives algorithm selection

2015-05-19 Thread Gilles Gouaillardet
Hi Khalid, i checked the source code and it turns out rules must be ordered : - first by communicator size - second by message size Here is attached an updated version of the ompi_tuned_file.conf you should use Cheers, Gilles On 5/20/2015 8:39 AM, Khalid Hasanov wrote: Hello, I am trying

Re: [OMPI users] Open MPI collectives algorithm selection

2015-05-19 Thread Gilles Gouaillardet
fig for the communicator size 16 (the second one). I am writing this just in case it is not expected behaviour. Thanks again. Best regards, Khalid On Wed, May 20, 2015 at 2:12 AM, Gilles Gouaillardet <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote: Hi Khalid, i che

Re: [OMPI users] 'The MPI_Comm_rank() function was called before MPI_INIT was invoked'

2015-05-20 Thread Gilles Gouaillardet
Hi Mohammad, the error message is self explanatory. you cannot invoke MPI functions before invoking MPI_Init or after MPI_Finalize the easiest way to solve your problem is to move the MPI_Init call to the beginning of your program. Cheers, Gilles On Wednesday, May 20, 2015, #MOHAMMAD ASIF

Re: [OMPI users] Problems running linpack benchmark on old Sunfire opteron nodes

2015-05-23 Thread Gilles Gouaillardet
Bill, the root cause is likely there is not enough free space in /tmp. the simplest, but slowest, option is to run mpirun --mac btl tcp ... if you cannot make enough space under /tmp (maybe you run diskless) there are some options to create these kind of files under /dev/shm Cheers, Gilles

Re: [OMPI users] OPENMPI only recognize 4 cores of AWS EC2 machine

2015-05-24 Thread Gilles Gouaillardet
Hi Xing, iirc, open MPI default behavior is to bind to cores (vs hyperthreads), hence the error message. I cannot remember the option to bind to threads, but you can mpirun --oversubscribe if you are currently stuck Cheers, Gilles On Sunday, May 24, 2015, XingFENG

Re: [OMPI users] OPENMPI only recognize 4 cores of AWS EC2 machine

2015-05-24 Thread Gilles Gouaillardet
ubscribe, would the > performance be influenced? > > On Sun, May 24, 2015 at 7:24 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: > >> Hi Xing, >> >> iirc, open MPI d

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Gilles Gouaillardet
Rahul, per the logs, it seems the /sys pseudo filesystem is not mounted in your chroot. at first, can you make sure this is mounted and try again ? Cheers, Gilles On 5/26/2015 12:51 PM, Rahul Yadav wrote: We were able to solve ssh problem. But now MPI is not able to use component yalla.

Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Gilles Gouaillardet
At first glance, it seems all mpi tasks believe they are rank zero and comm world size is 1 (!) Did you compile xhpl with OpenMPI (and not a stub library for serial version only) ? can you make sure there is nothing wrong with your LD_LIBRARY_PATH and you do not mix MPI librairies (e.g.

Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

2015-05-26 Thread Gilles Gouaillardet
entry *From:*users [mailto:users-boun...@open-mpi.org] *On Behalf Of *Gilles Gouaillardet *Sent:* Tuesday, May 26, 2015 8:14 PM *To:* Open MPI Users *Subject:* Re: [OMPI users] Running HPL on RPi cluster, seems like MPI is somehow not configured properly since it work with 1 node but not more

Re: [OMPI users] Building OpenMPI on Raspberry Pi 2

2015-05-29 Thread Gilles Gouaillardet
Jeff, shall I assume you made a typo and wrote CCFLAGS instead of CFLAGS ? also, can you double check the flags are correctly passed to the assembler with cd opal/asm make -n atomic-asm.lo Cheers, Gilles On Friday, May 29, 2015, Jeff Layton wrote: > Good morning, > > I'm

Re: [OMPI users] mpirun

2015-05-29 Thread Gilles Gouaillardet
Walt, can you disable firewall and network if possible and give it an other try ? Cheers, Gilles On Saturday, May 30, 2015, Walt Brainerd wrote: > It behaved this way with the Cygwin version (very recent update) > and with 1.8.5 that I built from source. > > On

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-02 Thread Gilles Gouaillardet
Steve, MCA_BTL_OPENIB_MODEX_MSG_{HTON,NTOH} do not convert all the fields of the mca_btl_openib_modex_message_t struct. I would start here ... Cheers, Gilles On Wednesday, June 3, 2015, Jeff Squyres (jsquyres) wrote: > Steve -- > > I think that this falls directly in

Re: [OMPI users] Bug: Disabled mpi_leave_pinned for GPUDirect and InfiniBand during run-time caused by GCC optimizations

2015-06-04 Thread Gilles Gouaillardet
Jeff, imho, this is a grey area ... 99.999% of the time, posix_memalign is a "pure" function. "pure" means it has no side effects. unfortunatly, this part of the code is the 0.001% case in which we explicitly rely on a side effect (e.g. posix_memalign calls an Open MPI wrapper that updates a

Re: [OMPI users] Bug: Disabled mpi_leave_pinned for GPUDirect and InfiniBand during run-time caused by GCC optimizations

2015-06-05 Thread Gilles Gouaillardet
de how to move forward on this. > > George. > > > > On Jun 4, 2015, at 22:47 , Gilles Gouaillardet <gil...@rist.or.jp > <javascript:;>> wrote: > > > > Jeff, > > > > imho, this is a grey area ... > > > > 99.999% of the ti

Re: [OMPI users] Bug: Disabled mpi_leave_pinned for GPUDirect and InfiniBand during run-time caused by GCC optimizations

2015-06-09 Thread Gilles Gouaillardet
i wrote a reproducer i sent to the GCC folks https://gcc.gnu.org/ml/gcc-bugs/2015-06/msg00757.html Cheers, Gilles On Tue, Jun 9, 2015 at 3:20 AM, Jeff Squyres (jsquyres) wrote: > On Jun 8, 2015, at 11:27 AM, Dave Goodell (dgoodell) > wrote: >> >> My

Re: [OMPI users] Building OpenMPI on Raspberry Pi 2

2015-06-09 Thread Gilles Gouaillardet
Jeff, dmb is available only on ARMv7 (Pi 2) if i remember correctly, you are building Open MPI on ARMv7 as well (Pi 2), so this is not a cross compilation issue. if you configure with -march=armv7, the relevant log is libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../opal/include

Re: [OMPI users] Building OpenMPI on Raspberry Pi 2

2015-06-09 Thread Gilles Gouaillardet
Jeff, can you gcc -march=armv7-a foo.c Cheers, Gilles On Tuesday, June 9, 2015, Jeff Layton wrote: > Gilles, > > I'm not cross-compiling - I'm building on the Pi 2. > > I'm not sure how to check if gcc can generate armv7 code. > I'm using Raspbian and I'm just using the

  1   2   3   4   5   6   7   8   9   10   >