Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-26 Thread Ralph Castain
Kewl! Let us know if it breaks again. > On Apr 26, 2015, at 4:29 PM, Andy Riebs wrote: > > Yes, it just worked -- I took the old command line, just to ensure that I was > testing the correct problem, and it worked. Then I remembered that I had set >

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-26 Thread Andy Riebs
Yes, it just worked -- I took the old command line, just to ensure that I was testing the correct problem, and it worked. Then I remembered that I had set OMPI_MCA_plm_rsh_pass_path and OMPI_MCA_plm_rsh_pass_libpath in my test setup, so I removed those from my

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-26 Thread Ralph Castain
Not intentionally - I did add that new MCA param as we discussed, but don’t recall making any other changes in this area. There have been some other build system changes made as a result of more extensive testing of the 1.8 release candidate - it is possible that something in that area had an

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-26 Thread Andy Riebs
Hi Ralph, Did you solve this problem in a more general way? I finally sat down this morning to try this with the openmpi-dev-1567-g11e8c20.tar.bz2 nightly kit from last week, and can't reproduce the problem at all. Andy On 04/16/2015 12:15 PM, Ralph

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-16 Thread Ralph Castain
Sorry - I had to revert the commit due to a reported MTT problem. I'll reinsert it after I get home and can debug the problem this weekend. On Thu, Apr 16, 2015 at 9:41 AM, Andy Riebs wrote: > Hi Ralph, > > If I did this right (NEVER a good bet :-) ), it didn't work... > >

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-16 Thread Andy Riebs
Hi Ralph, If I did this right (NEVER a good bet :-) ), it didn't work... Using last night's master nightly, openmpi-dev-1515-gc869490.tar.bz2, I built with the same script as yesterday, but removing the LDFLAGS=-Wl, stuff: $ ./configure

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-16 Thread Ralph Castain
FWIW: I just added (last night) a pair of new MCA params for this purpose: plm_rsh_pass_pathprepends the designated path to the remote shell's PATH prior to executing orted plm_rsh_pass_libpath same thing for LD_LIBRARY_PATH I believe that will resolve the problem for Andy regardless of

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-16 Thread Thomas Jahns
Hello, On Apr 15, 2015, at 02:11 , Gilles Gouaillardet wrote: what about reconfiguring Open MPI with LDFLAGS="-Wl,-rpath,/opt/ intel/15.0/composer_xe_2015.2.164/compiler/lib/mic" ? IIRC, an other option is : LDFLAGS="-static-intel" let me first state that I have no experience developing

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-16 Thread Gilles Gouaillardet
Ralph, now i remember this part ... IIRC, LD_LIBRARY_PATH was never forwarded when remote starting orted. i simply avoided this issue by using gnu compilers, or gcc/g++/ifort if i need intel fortran /* you already mentionned this is not officially supported by Intel */ What about adding a new

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-15 Thread Andy Riebs
Gilles and Ralph, thanks! $ shmemrun -H mic0,mic1 -n 2 -x SHMEM_SYMMETRIC_HEAP_SIZE=1M $PWD/mic.out [atl1-01-mic0:192474] [[29886,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 440 Hello World from process 0 of 2 Hello World

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-15 Thread Ralph Castain
I think Gilles may be correct here. In reviewing the code, it appears we have never (going back to the 1.6 series, at least) forwarded the local LD_LIBRARY_PATH to the remote node when exec’ing the orted. The only thing we have done is to set the PATH and LD_LIBRARY_PATH to support the OMPI

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-14 Thread Gilles Gouaillardet
Andy, what about reconfiguring Open MPI with LDFLAGS="-Wl,-rpath,/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic" ? IIRC, an other option is : LDFLAGS="-static-intel" last but not least, you can always replace orted with a simple script that sets the LD_LIBRARY_PATH and exec the

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-14 Thread Ralph Castain
Hmmm…certainly looks that way. I’ll investigate. > On Apr 14, 2015, at 6:06 AM, Andy Riebs wrote: > > Hi Ralph, > > Still no happiness... It looks like my LD_LIBRARY_PATH just isn't getting > propagated? > > $ ldd /home/ariebs/mic/mpi-nightly/bin/orted >

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-14 Thread Andy Riebs
Hi Ralph, Still no happiness... It looks like my LD_LIBRARY_PATH just isn't getting propagated? $ ldd /home/ariebs/mic/mpi-nightly/bin/orted     linux-vdso.so.1 =>  (0x7fffa1d3b000)     libopen-rte.so.0 =>

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-13 Thread Ralph Castain
Weird. I’m not sure what to try at that point - IIRC, building static won’t resolve this problem (but you could try and see). You could add the following to the cmd line and see if it tells us anything useful: —leave-session-attached —mca mca_component_show_load_errors 1 You might also do an

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-13 Thread Andy Riebs
Ralph and Nathan, The problem may be something trivial, as I don't typically use "shmemrun" to start jobs. With the following, I *think* I've  demonstrated that the problem library is where it belongs on the remote system: $ ldd mic.out    

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-13 Thread Nathan Hjelm
For talking between PHIs on the same system I recommend using the scif BTL NOT tcp. That said, it looks like the LD_LIBRARY_PATH is wrong on the remote system. It looks like it can't find the intel compiler libraries. -Nathan Hjelm HPC-5, LANL On Mon, Apr 13, 2015 at 04:06:21PM -0400, Andy

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-13 Thread Ralph Castain
I don’t see that LD_PRELOAD showing up on the ssh path, Andy > /usr/bin/ssh mic1 PATH=/home/ariebs/mic/mpi-nightly/bin:$PATH ; export > PATH ; LD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$LD_LIBRARY_PATH ; > export LD_LIBRARY_PATH ; >

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-13 Thread Andy Riebs
Progress!  I can run my trivial program on the local PHI, but not the other PHI, on the system. Here are the interesting parts: A pretty good recipe with last night's nightly master: $ ./configure --prefix=/home/ariebs/mic/mpi-nightly CC="icc -mmic" CXX="icpc

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-13 Thread Andy Riebs
Hi Ralph, Here are the results with last night's "master" nightly, openmpi-dev-1487-g9c6d452.tar.bz2, and adding the memheap_base_verbose option (yes, it looks like the "ERROR_LOG" problem has gone away): $ cat /proc/sys/kernel/shmmax 33554432 $ cat

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-12 Thread Riebs, Andy
My fault, I thought the tar ball name looked funny :-) Will try again tomorrow Andy -- Andy Riebs andy.ri...@hp.com Original message From: Ralph Castain Date:04/12/2015 3:10 PM (GMT-05:00) To: Open MPI Users Subject: Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-12 Thread Ralph Castain
Sorry about that - I hadn’t brought it over to the 1.8 branch yet. I’ve done so now, which means the ERROR_LOG shouldn’t show up any more. It won’t fix the memheap problem, though. You might try adding “--mca memheap_base_verbose 100” to your cmd line so we can see why none of the memheap

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-12 Thread Andy Riebs
Hi Ralph, Here's the output with openmpi-v1.8.4-202-gc2da6a5.tar.bz2: $ shmemrun -H localhost -N 2 --mca sshmem mmap  --mca plm_base_verbose 5 $PWD/mic.out [atl1-01-mic0:190189] mca:base:select:(  plm) Querying component [rsh] [atl1-01-mic0:190189]

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-11 Thread Ralph Castain
Got it - thanks. I fixed that ERROR_LOG issue (I think- please verify). I suspect the memheap issue relates to something else, but I probably need to let the OSHMEM folks comment on it > On Apr 11, 2015, at 9:52 AM, Andy Riebs wrote: > > Everything is built on the Xeon

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-11 Thread Andy Riebs
Everything is built on the Xeon side, with the icc "-mmic" switch. I then ssh into one of the PHIs, and run shmemrun from there. On 04/11/2015 12:00 PM, Ralph Castain wrote: Let me try to understand the setup a little better. Are you

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-11 Thread Ralph Castain
Let me try to understand the setup a little better. Are you running shmemrun on the PHI itself? Or is it running on the host processor, and you are trying to spawn a process onto the Phi? > On Apr 11, 2015, at 7:55 AM, Andy Riebs wrote: > > Hi Ralph, > > Yes, this is

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-11 Thread Andy Riebs
Hi Ralph, Yes, this is attempting to get OSHMEM to run on the Phi. I grabbed openmpi-dev-1484-g033418f.tar.bz2 and configured it with $ ./configure --prefix=/home/ariebs/mic/mpi-nightly    CC=icc -mmic CXX=icpc -mmic    \    

Re: [OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-10 Thread Ralph Castain
Andy - could you please try the current 1.8.5 nightly tarball and see if it helps? The error log indicates that it is failing to get the topology from some daemon, I’m assuming the one on the Phi? You might also add —enable-debug to that configure line and then put -mca plm_base_verbose on the

[OMPI users] Problems using Open MPI 1.8.4 OSHMEM on Intel Xeon Phi/MIC

2015-04-10 Thread Andy Riebs
Summary: MPI jobs work fine, SHMEM jobs work just often enough to be tantalizing, on an Intel Xeon Phi/MIC system. Longer version Thanks to the excellent write-up last June (), I have been able to build a version of Open MPI for the Xeon Phi coprocessor