Dear Takahiro,
On Wed, Nov 14, 2018 at 5:38 AM Kawashima, Takahiro
<> wrote:
> XPMEM moved to GitLab.

the first words from the README aren't very pleasant to read:

This is an experimental version of XPMEM based on a version provided by
Cray and uploaded to This version supports
any kernel 3.12 and newer. *Keep in mind there may be bugs and this version
may cause kernel panics, code crashes, eat your cat, etc.*

Installing this on my laptop where I just want developing with SHMEM
it would be a pitty to lose work just because of that.


> Thanks,
> Takahiro Kawashima,
> Fujitsu
> > Hello Bert,
> >
> > What OS are you running on your notebook?
> >
> > If you are running Linux, and you have root access to your system,  then
> > you should be able to resolve the Open SHMEM support issue by installing
> > the XPMEM device driver on your system, and rebuilding UCX so it picks
> > up XPMEM support.
> >
> > The source code is on GitHub:
> >
> >
> >
> > Some instructions on how to build the xpmem device driver are at
> >
> >
> >
> > You will need to install the kernel source and symbols rpms on your
> > system before building the xpmem device driver.
> >
> > Hope this helps,
> >
> > Howard
> >
> >
> > Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users <
> >>:
> >
> > > Hi,
> > >
> > > On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce
> > > <> wrote:
> > > >
> > > > The Open MPI Team, representing a consortium of research, academic, and
> > > > industry partners, is pleased to announce the release of Open MPI 
> > > > version
> > > > 4.0.0.
> > > >
> > > > v4.0.0 is the start of a new release series for Open MPI.  Starting with
> > > > this release, the OpenIB BTL supports only iWarp and RoCE by default.
> > > > Starting with this release,  UCX is the preferred transport protocol
> > > > for Infiniband interconnects. The embedded PMIx runtime has been updated
> > > > to 3.0.2.  The embedded Romio has been updated to 3.2.1.  This
> > > > release is ABI compatible with the 3.x release streams. There have been
> > > numerous
> > > > other bug fixes and performance improvements.
> > > >
> > > > Note that starting with Open MPI v4.0.0, prototypes for several
> > > > MPI-1 symbols that were deleted in the MPI-3.0 specification
> > > > (which was published in 2012) are no longer available by default in
> > > > mpi.h. See the README for further details.
> > > >
> > > > Version 4.0.0 can be downloaded from the main Open MPI web site:
> > > >
> > > >
> > > >
> > > >
> > > > 4.0.0 -- September, 2018
> > > > ------------------------
> > > >
> > > > - OSHMEM updated to the OpenSHMEM 1.4 API.
> > > > - Do not build OpenSHMEM layer when there are no SPMLs available.
> > > >   Currently, this means the OpenSHMEM layer will only build if
> > > >   a MXM or UCX library is found.
> > >
> > > so what is the most convenience way to get SHMEM working on a single
> > > shared memory node (aka. notebook)? I just realized that I don't have
> > > a SHMEM since Open MPI 3.0. But building with UCX does not help
> > > either. I tried with UCX 1.4 but Open MPI SHMEM
> > > still does not work:
> > >
> > > $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c
> > > $ oshrun -np 2 ./shmem_hello_world-4.0.0
> > > [1542109710.217344] [tudtug:27715:0]         select.c:406  UCX  ERROR
> > > no remote registered memory access transport to tudtug:27716:
> > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
> > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
> > > mm/posix - Destination is unreachable, cma/cma - no put short
> > > [1542109710.217344] [tudtug:27716:0]         select.c:406  UCX  ERROR
> > > no remote registered memory access transport to tudtug:27715:
> > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short,
> > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable,
> > > mm/posix - Destination is unreachable, cma/cma - no put short
> > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
> > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
> > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
> > > Error: add procs FAILED rc=-2
> > > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266
> > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable
> > > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305
> > > Error: add procs FAILED rc=-2
> > > --------------------------------------------------------------------------
> > > It looks like SHMEM_INIT failed for some reason; your parallel process is
> > > likely to abort.  There are many reasons that a parallel process can
> > > fail during SHMEM_INIT; some of which are due to configuration or
> > > environment
> > > problems.  This failure appears to be an internal failure; here's some
> > > additional information (which may only be relevant to an Open SHMEM
> > > developer):
> > >
> > >   SPML add procs failed
> > >   --> Returned "Out of resource" (-2) instead of "Success" (0)
> > > --------------------------------------------------------------------------
> > > [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to
> > > initialize - aborting
> > > [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to
> > > initialize - aborting
> > > --------------------------------------------------------------------------
> > > SHMEM_ABORT was invoked on rank 0 (pid 27715, host=tudtug) with errorcode
> > > -1.
> > > --------------------------------------------------------------------------
> > > --------------------------------------------------------------------------
> > > A SHMEM process is aborting at a time when it cannot guarantee that all
> > > of its peer processes in the job will be killed properly.  You should
> > > double check that everything has shut down cleanly.
> > >
> > > Local host: tudtug
> > > PID:        27715
> > > --------------------------------------------------------------------------
> > > --------------------------------------------------------------------------
> > > Primary job  terminated normally, but 1 process returned
> > > a non-zero exit code. Per user-direction, the job has been aborted.
> > > --------------------------------------------------------------------------
> > > --------------------------------------------------------------------------
> > > oshrun detected that one or more processes exited with non-zero
> > > status, thus causing
> > > the job to be terminated. The first process to do so was:
> > >
> > >   Process name: [[2212,1],1]
> > >   Exit code:    255
> > > --------------------------------------------------------------------------
> > > [tudtug:27710] 1 more process has sent help message
> > > help-shmem-runtime.txt / shmem_init:startup:internal-failure
> > > [tudtug:27710] Set MCA parameter "orte_base_help_aggregate" to 0 to
> > > see all help / error messages
> > > [tudtug:27710] 1 more process has sent help message help-shmem-api.txt
> > > / shmem-abort
> > > [tudtug:27710] 1 more process has sent help message
> > > help-shmem-runtime.txt / oshmem shmem abort:cannot guarantee all
> > > killed
> > >
> > > MPI works as expected:
> > >
> > > $ mpicc -o mpi_hello_world-4.0.0 openmpi-4.0.0/examples/hello_c.c
> > > $ mpirun -np 2 ./mpi_hello_world-4.0.0
> > > Hello, world, I am 0 of 2, (Open MPI v4.0.0, package: Open MPI
> > > wesarg@tudtug Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12,
> > > 2018, 108)
> > > Hello, world, I am 1 of 2, (Open MPI v4.0.0, package: Open MPI
> > > wesarg@tudtug Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12,
> > > 2018, 108)
> > >
> > > I'm attaching the output from 'ompi_info -a' and also from 'ucx_info
> > > -b -d -c -s'.
> > >
> > > Thanks for the help.
> _______________________________________________
> users mailing list
users mailing list

Reply via email to