Re: [OMPI devel] PSM2 Intel folks question

2016-04-19 Thread Cabral, Matias A
Howard, PSM2_DEVICES, I went back to the roots and found that shm is the only device supporting communication between ranks in the same node. Therefore, the below error “Endpoint could not be reached” would be expected. Back to the psm2_ep_connect() hanging, I cloned the same psm2 as you have f

[OMPI devel] seg fault when using yalla, XRC, and yalla

2016-04-19 Thread David Shrader
Hello, I have been investigating using XRC on a cluster with a mellanox interconnect. I have found that in a certain situation I get a seg fault. I am using 1.10.2 compiled with gcc 5.3.0, and the simplest configure line that I have found that still results in the seg fault is as follows: $

Re: [OMPI devel] PSM2 Intel folks question

2016-04-19 Thread Howard Pritchard
Hi Matias, My usual favorites in ompi/examples/hello_c.c and ompi/examples/ring_c.c. If I disable the shared memory device using the PSM2_DEVICES option it looks like psm2 is unhappy: kit001.localdomain:08222] PSM2 EP connect error (Endpoint could not be reached): [kit001.localdomain:08222] ki

Re: [OMPI devel] PSM2 Intel folks question

2016-04-19 Thread Cabral, Matias A
Errata: PSM2_DEVICES="self,hfi" _MAC From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Cabral, Matias A Sent: Tuesday, April 19, 2016 11:25 AM To: Open MPI Developers Subject: Re: [OMPI devel] PSM2 Intel folks question Hi Howard, Couple more questions to understand a little better

Re: [OMPI devel] PSM2 Intel folks question

2016-04-19 Thread Cabral, Matias A
Hi Howard, Couple more questions to understand a little better the context: - What type of job running? - Is this also under srun? For PSM2 you may find more details in the programmer’s guide: http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel

[OMPI devel] PSM2 Intel folks question

2016-04-19 Thread Howard Pritchard
Hi Folks, I'm making progress with issue #1559 (patches on the mail list didn't help), and I'll open a PR to help the PSM2 MTL work on a single node, but I'm noticing something more troublesome. If I run on just one node, and I use more than one process, process zero consistently hangs in psm2_ep