Howard, On Wed, Nov 14, 2018 at 5:26 AM Howard Pritchard <hpprit...@gmail.com> wrote: > > Hello Bert, > > What OS are you running on your notebook?
Ubuntu 18.04 > > If you are running Linux, and you have root access to your system, then > you should be able to resolve the Open SHMEM support issue by installing > the XPMEM device driver on your system, and rebuilding UCX so it picks > up XPMEM support. > > The source code is on GitHub: > > https://github.com/hjelmn/xpmem > > Some instructions on how to build the xpmem device driver are at > > https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM > > You will need to install the kernel source and symbols rpms on your > system before building the xpmem device driver. I will try that. I already tried KNEM, which also did not worked. Though thats definitely leaving the country of convenience. For a development machine where performance doesn't matter, its a huge step back for Open MPI I think. I wil report back if that works. Thanks. Best, Bert > > Hope this helps, > > Howard > > > Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users > <users@lists.open-mpi.org>: >> >> Hi, >> >> On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce >> <annou...@lists.open-mpi.org> wrote: >> > >> > The Open MPI Team, representing a consortium of research, academic, and >> > industry partners, is pleased to announce the release of Open MPI version >> > 4.0.0. >> > >> > v4.0.0 is the start of a new release series for Open MPI. Starting with >> > this release, the OpenIB BTL supports only iWarp and RoCE by default. >> > Starting with this release, UCX is the preferred transport protocol >> > for Infiniband interconnects. The embedded PMIx runtime has been updated >> > to 3.0.2. The embedded Romio has been updated to 3.2.1. This >> > release is ABI compatible with the 3.x release streams. There have been >> > numerous >> > other bug fixes and performance improvements. >> > >> > Note that starting with Open MPI v4.0.0, prototypes for several >> > MPI-1 symbols that were deleted in the MPI-3.0 specification >> > (which was published in 2012) are no longer available by default in >> > mpi.h. See the README for further details. >> > >> > Version 4.0.0 can be downloaded from the main Open MPI web site: >> > >> > https://www.open-mpi.org/software/ompi/v4.0/ >> > >> > >> > 4.0.0 -- September, 2018 >> > ------------------------ >> > >> > - OSHMEM updated to the OpenSHMEM 1.4 API. >> > - Do not build OpenSHMEM layer when there are no SPMLs available. >> > Currently, this means the OpenSHMEM layer will only build if >> > a MXM or UCX library is found. >> >> so what is the most convenience way to get SHMEM working on a single >> shared memory node (aka. notebook)? I just realized that I don't have >> a SHMEM since Open MPI 3.0. But building with UCX does not help >> either. I tried with UCX 1.4 but Open MPI SHMEM >> still does not work: >> >> $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c >> $ oshrun -np 2 ./shmem_hello_world-4.0.0 >> [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR >> no remote registered memory access transport to tudtug:27716: >> self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, >> tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, >> mm/posix - Destination is unreachable, cma/cma - no put short >> [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR >> no remote registered memory access transport to tudtug:27715: >> self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, >> tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, >> mm/posix - Destination is unreachable, cma/cma - no put short >> [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 >> Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable >> [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 >> Error: add procs FAILED rc=-2 >> [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 >> Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable >> [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 >> Error: add procs FAILED rc=-2 >> -------------------------------------------------------------------------- >> It looks like SHMEM_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during SHMEM_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open SHMEM >> developer): >> >> SPML add procs failed >> --> Returned "Out of resource" (-2) instead of "Success" (0) >> -------------------------------------------------------------------------- >> [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to >> initialize - aborting >> [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to >> initialize - aborting >> -------------------------------------------------------------------------- >> SHMEM_ABORT was invoked on rank 0 (pid 27715, host=tudtug) with errorcode -1. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> A SHMEM process is aborting at a time when it cannot guarantee that all >> of its peer processes in the job will be killed properly. You should >> double check that everything has shut down cleanly. >> >> Local host: tudtug >> PID: 27715 >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> Primary job terminated normally, but 1 process returned >> a non-zero exit code. Per user-direction, the job has been aborted. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> oshrun detected that one or more processes exited with non-zero >> status, thus causing >> the job to be terminated. The first process to do so was: >> >> Process name: [[2212,1],1] >> Exit code: 255 >> -------------------------------------------------------------------------- >> [tudtug:27710] 1 more process has sent help message >> help-shmem-runtime.txt / shmem_init:startup:internal-failure >> [tudtug:27710] Set MCA parameter "orte_base_help_aggregate" to 0 to >> see all help / error messages >> [tudtug:27710] 1 more process has sent help message help-shmem-api.txt >> / shmem-abort >> [tudtug:27710] 1 more process has sent help message >> help-shmem-runtime.txt / oshmem shmem abort:cannot guarantee all >> killed >> >> MPI works as expected: >> >> $ mpicc -o mpi_hello_world-4.0.0 openmpi-4.0.0/examples/hello_c.c >> $ mpirun -np 2 ./mpi_hello_world-4.0.0 >> Hello, world, I am 0 of 2, (Open MPI v4.0.0, package: Open MPI >> wesarg@tudtug Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, >> 2018, 108) >> Hello, world, I am 1 of 2, (Open MPI v4.0.0, package: Open MPI >> wesarg@tudtug Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, >> 2018, 108) >> >> I'm attaching the output from 'ompi_info -a' and also from 'ucx_info >> -b -d -c -s'. >> >> Thanks for the help. >> >> Best, >> Bert >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users