Howard,
PSM2_DEVICES, I went back to the roots and found that shm is the only device
supporting communication between ranks in the same node. Therefore, the below
error “Endpoint could not be reached” would be expected.
Back to the psm2_ep_connect() hanging, I cloned the same psm2 as you have f
Hello,
I have been investigating using XRC on a cluster with a mellanox
interconnect. I have found that in a certain situation I get a seg
fault. I am using 1.10.2 compiled with gcc 5.3.0, and the simplest
configure line that I have found that still results in the seg fault is
as follows:
$
Hi Matias,
My usual favorites in ompi/examples/hello_c.c and ompi/examples/ring_c.c.
If I disable the shared memory device using the PSM2_DEVICES option
it looks like psm2 is unhappy:
kit001.localdomain:08222] PSM2 EP connect error (Endpoint could not be
reached):
[kit001.localdomain:08222] ki
Errata:
PSM2_DEVICES="self,hfi"
_MAC
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Cabral, Matias A
Sent: Tuesday, April 19, 2016 11:25 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] PSM2 Intel folks question
Hi Howard,
Couple more questions to understand a little better
Hi Howard,
Couple more questions to understand a little better the context:
- What type of job running?
- Is this also under srun?
For PSM2 you may find more details in the programmer’s guide:
http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel
Hi Folks,
I'm making progress with issue #1559 (patches on the mail list didn't help),
and I'll open a PR to help the PSM2 MTL work on a single node, but I'm
noticing something more troublesome.
If I run on just one node, and I use more than one process, process zero
consistently hangs in psm2_ep