Hi Bert, If you'd prefer to return to the land of convenience and don't need to mix MPI and OpenSHMEM, then you may want to try the path I outlined in the email archived at the following link
https://www.mail-archive.com/users@lists.open-mpi.org/msg32274.html Howard Am Di., 13. Nov. 2018 um 23:10 Uhr schrieb Bert Wesarg via users < users@lists.open-mpi.org>: > Dear Takahiro, > On Wed, Nov 14, 2018 at 5:38 AM Kawashima, Takahiro > <t-kawash...@jp.fujitsu.com> wrote: > > > > XPMEM moved to GitLab. > > > > https://gitlab.com/hjelmn/xpmem > > the first words from the README aren't very pleasant to read: > > This is an experimental version of XPMEM based on a version provided by > Cray and uploaded to https://code.google.com/p/xpmem. This version > supports > any kernel 3.12 and newer. *Keep in mind there may be bugs and this version > may cause kernel panics, code crashes, eat your cat, etc.* > > Installing this on my laptop where I just want developing with SHMEM > it would be a pitty to lose work just because of that. > > Best, > Bert > > > > > Thanks, > > Takahiro Kawashima, > > Fujitsu > > > > > Hello Bert, > > > > > > What OS are you running on your notebook? > > > > > > If you are running Linux, and you have root access to your system, > then > > > you should be able to resolve the Open SHMEM support issue by > installing > > > the XPMEM device driver on your system, and rebuilding UCX so it picks > > > up XPMEM support. > > > > > > The source code is on GitHub: > > > > > > https://github.com/hjelmn/xpmem > > > > > > Some instructions on how to build the xpmem device driver are at > > > > > > https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM > > > > > > You will need to install the kernel source and symbols rpms on your > > > system before building the xpmem device driver. > > > > > > Hope this helps, > > > > > > Howard > > > > > > > > > Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users < > > > users@lists.open-mpi.org>: > > > > > > > Hi, > > > > > > > > On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce > > > > <annou...@lists.open-mpi.org> wrote: > > > > > > > > > > The Open MPI Team, representing a consortium of research, > academic, and > > > > > industry partners, is pleased to announce the release of Open MPI > version > > > > > 4.0.0. > > > > > > > > > > v4.0.0 is the start of a new release series for Open MPI. > Starting with > > > > > this release, the OpenIB BTL supports only iWarp and RoCE by > default. > > > > > Starting with this release, UCX is the preferred transport > protocol > > > > > for Infiniband interconnects. The embedded PMIx runtime has been > updated > > > > > to 3.0.2. The embedded Romio has been updated to 3.2.1. This > > > > > release is ABI compatible with the 3.x release streams. There have > been > > > > numerous > > > > > other bug fixes and performance improvements. > > > > > > > > > > Note that starting with Open MPI v4.0.0, prototypes for several > > > > > MPI-1 symbols that were deleted in the MPI-3.0 specification > > > > > (which was published in 2012) are no longer available by default in > > > > > mpi.h. See the README for further details. > > > > > > > > > > Version 4.0.0 can be downloaded from the main Open MPI web site: > > > > > > > > > > https://www.open-mpi.org/software/ompi/v4.0/ > > > > > > > > > > > > > > > 4.0.0 -- September, 2018 > > > > > ------------------------ > > > > > > > > > > - OSHMEM updated to the OpenSHMEM 1.4 API. > > > > > - Do not build OpenSHMEM layer when there are no SPMLs available. > > > > > Currently, this means the OpenSHMEM layer will only build if > > > > > a MXM or UCX library is found. > > > > > > > > so what is the most convenience way to get SHMEM working on a single > > > > shared memory node (aka. notebook)? I just realized that I don't have > > > > a SHMEM since Open MPI 3.0. But building with UCX does not help > > > > either. I tried with UCX 1.4 but Open MPI SHMEM > > > > still does not work: > > > > > > > > $ oshcc -o shmem_hello_world-4.0.0 > openmpi-4.0.0/examples/hello_oshmem_c.c > > > > $ oshrun -np 2 ./shmem_hello_world-4.0.0 > > > > [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR > > > > no remote registered memory access transport to tudtug:27716: > > > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, > > > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, > > > > mm/posix - Destination is unreachable, cma/cma - no put short > > > > [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR > > > > no remote registered memory access transport to tudtug:27715: > > > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, > > > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, > > > > mm/posix - Destination is unreachable, cma/cma - no put short > > > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 > > > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable > > > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 > > > > Error: add procs FAILED rc=-2 > > > > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 > > > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable > > > > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 > > > > Error: add procs FAILED rc=-2 > > > > > -------------------------------------------------------------------------- > > > > It looks like SHMEM_INIT failed for some reason; your parallel > process is > > > > likely to abort. There are many reasons that a parallel process can > > > > fail during SHMEM_INIT; some of which are due to configuration or > > > > environment > > > > problems. This failure appears to be an internal failure; here's > some > > > > additional information (which may only be relevant to an Open SHMEM > > > > developer): > > > > > > > > SPML add procs failed > > > > --> Returned "Out of resource" (-2) instead of "Success" (0) > > > > > -------------------------------------------------------------------------- > > > > [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed > to > > > > initialize - aborting > > > > [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed > to > > > > initialize - aborting > > > > > -------------------------------------------------------------------------- > > > > SHMEM_ABORT was invoked on rank 0 (pid 27715, host=tudtug) with > errorcode > > > > -1. > > > > > -------------------------------------------------------------------------- > > > > > -------------------------------------------------------------------------- > > > > A SHMEM process is aborting at a time when it cannot guarantee that > all > > > > of its peer processes in the job will be killed properly. You should > > > > double check that everything has shut down cleanly. > > > > > > > > Local host: tudtug > > > > PID: 27715 > > > > > -------------------------------------------------------------------------- > > > > > -------------------------------------------------------------------------- > > > > Primary job terminated normally, but 1 process returned > > > > a non-zero exit code. Per user-direction, the job has been aborted. > > > > > -------------------------------------------------------------------------- > > > > > -------------------------------------------------------------------------- > > > > oshrun detected that one or more processes exited with non-zero > > > > status, thus causing > > > > the job to be terminated. The first process to do so was: > > > > > > > > Process name: [[2212,1],1] > > > > Exit code: 255 > > > > > -------------------------------------------------------------------------- > > > > [tudtug:27710] 1 more process has sent help message > > > > help-shmem-runtime.txt / shmem_init:startup:internal-failure > > > > [tudtug:27710] Set MCA parameter "orte_base_help_aggregate" to 0 to > > > > see all help / error messages > > > > [tudtug:27710] 1 more process has sent help message > help-shmem-api.txt > > > > / shmem-abort > > > > [tudtug:27710] 1 more process has sent help message > > > > help-shmem-runtime.txt / oshmem shmem abort:cannot guarantee all > > > > killed > > > > > > > > MPI works as expected: > > > > > > > > $ mpicc -o mpi_hello_world-4.0.0 openmpi-4.0.0/examples/hello_c.c > > > > $ mpirun -np 2 ./mpi_hello_world-4.0.0 > > > > Hello, world, I am 0 of 2, (Open MPI v4.0.0, package: Open MPI > > > > wesarg@tudtug Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, > > > > 2018, 108) > > > > Hello, world, I am 1 of 2, (Open MPI v4.0.0, package: Open MPI > > > > wesarg@tudtug Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, > > > > 2018, 108) > > > > > > > > I'm attaching the output from 'ompi_info -a' and also from 'ucx_info > > > > -b -d -c -s'. > > > > > > > > Thanks for the help. > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users