I am usually able to find the answer to my problems by searching the archive 
but I've run up against one that I can't suss out.

bison-opt: relocation error: 
/home/pbme002/opt/gcc-4.8.2-tpls/openmpi-1.8.4/lib/libmpi.so.1: symbol 
rdma_get_src_port, version RDMACM_1.0 not defined in file librdmacm.so.1 with 
link time reference

There is the error I am getting, the problem is that it's not consistent. This 
happens to a random few jobs in a series of the same job on different data 
sets. The ones that fail and produce the error run fine when a second attempt 
is made. I am the admin for this cluster and the user is using their own 
compiled OpenMPI and not the system OpenMPI so I can't say for certain that it 
was compiled correctly but it strikes me as odd that jobs would fail with the 
above error but run perfectly fine when a second attempt is made.

I'm looking for any help sussing out what could be causing this issue.

Regards,

Mark L. Potter


Reply via email to