See my comment on https://github.com/open-mpi/ompi/issues/347
On Thu, Jan 15, 2015 at 05:01:00PM -0500, George Bosilca wrote:
> Skimming through the PSM code shows that the return values of the PSM
> functions are handled in most cases. Thus, removing the default error
> handler might not be such
It even says so in the code:
ompi/mca/mtl/psm/mtl_psm.c:
/* Default error handling is enabled, errors will not be returned to
* user. PSM prints the error and the offending endpoint's hostname
* and exits with -1 */
Disabling the default PSM error handler makes
As PSM on master is still broken I applied it on 1.8.4. Unfortunately it
does not work. The error is the same as before.
Looking at your patch I would also expect that this is the correct fix
and I even tried to change ompi_mtl_psm_cancel() to always return
OMPI_SUCCESS. MPI_Cancel() still fails.
Doing
MPI_Isend()
followed by a
MPI_Cancel()
fails on my PSM based system with 1.8.4 like this:
n040108:0.1.Cannot cancel send requests (req=0x2b6279787f80)
n040108:0.0.Cannot cancel send requests (req=0x2b3a3dc92f80)
---
Primary job