It may be because the other system is running upgraded version of linux which is not having infiniband drivers. Any solution?
On Tue, Mar 26, 2013 at 12:42 PM, Syed Ahsan Ali <ahsansha...@gmail.com>wrote: > Tried this but mpirun exits with this error > > mpirun -np 40 /home/MET/hrm/bin/hrm > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > librdmacm: couldn't read ABI version. > CMA: unable to get RDMA device list > CMA: unable to get RDMA device list > CMA: unable to get RDMA device list > CMA: unable to get RDMA device list > librdmacm: assuming: 4 > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > CMA: unable to get RDMA device list > CMA: unable to get RDMA device list > librdmacm: couldn't read ABI version. > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > CMA: unable to get RDMA device list > librdmacm: assuming: 4 > CMA: unable to get RDMA device list > -------------------------------------------------------------------------- > [[33095,1],8]: A high-performance Open MPI point-to-point messaging module > was unable to find any relevant network interfaces: > Module: OpenFabrics (openib) > Host: pmd04.pakmet.com > Another transport will be used instead, although this may result in > lower performance. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > Process 1 ([[33095,1],28]) is on host: > compute-02-00.private02.pakmet.com > Process 2 ([[33095,1],0]) is on host: pmd02 > BTLs attempted: openib self sm > Your MPI job is now going to abort; sorry. > -------------------------------------------------------------------------- > > > Ahsan > > On Fri, Mar 22, 2013 at 7:09 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> >> On Mar 22, 2013, at 3:42 AM, Syed Ahsan Ali <ahsansha...@gmail.com> >> wrote: >> >> Actually due to some data base corruption I am not able to add any new >> node to cluster from the installer node. So I want to run parallel job on >> more nodes without adding them to existing cluster. >> You are right the binaries must be present on the remote node as well. >> Is this possible throught nfs? just as the compute nodes are nfs mounted >> with the installer node. >> >> >> Sure - OMPI doesn't care how the binaries got there. Just so long as they >> are present on the compute node. >> >> >> Ahsan >> >> >> On Fri, Mar 22, 2013 at 3:33 PM, Reuti <re...@staff.uni-marburg.de>wrote: >> >>> Am 22.03.2013 um 10:14 schrieb Syed Ahsan Ali: >>> >>> > I have a very basic question. If we want to run mpirun job on two >>> systems which are not part of cluster, then how we can make it possible. >>> Can the host be specifiend on mpirun which is not compute node, rather a >>> stand alone system. >>> >>> Sure, the machines can be specified as argument to `mpiexec`. But do you >>> want to run applications just between these two machines, or should they >>> participate on a larger parallel job with machines of the cluster: then a >>> direct network connection between outside and inside of the cluster is >>> necessary by some kind of forwarding in case these are separated networks. >>> >>> Also the paths to the started binaries may be different, in case the two >>> machines are not sharing the same /home with the cluster and this needs to >>> be honored. >>> >>> In case you are using a queuing system and want to route jobs to outside >>> machines of the set up cluster: it's necessary to negotiate with the admin >>> to allow jobs being scheduled thereto. >>> >>> -- Reuti >>> >>> >>> > Thanks >>> > Ahsan >>> > _______________________________________________ >>> > users mailing list >>> > us...@open-mpi.org >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> >> -- >> Syed Ahsan Ali Bokhari >> Electronic Engineer (EE) >> >> Research & Development Division >> Pakistan Meteorological Department H-8/4, Islamabad. >> Phone # off +92518358714 >> Cell # +923155145014 >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- Syed Ahsan Ali Bokhari Electronic Engineer (EE) Research & Development Division Pakistan Meteorological Department H-8/4, Islamabad. Phone # off +92518358714 Cell # +923155145014