To be clear: this is a common misconception.

Open MPI does not determine which network to use for MPI communication by the 
hostname(s) you use to launch your application.  Specifically: the hostnames 
that you list in the hostfile, command line, or whatever your resource manager 
provides are *not* used to determine which networks to use for MPI 
communication.

Open MPI only uses hostnames to identify unique servers (so that we can launch 
processeson them).  We use different controls -- outlined by Ralph -- to 
determine which network(s) to use for MPI communication.

Hope that helps.


On Feb 2, 2013, at 6:43 AM, Ralph Castain <r...@open-mpi.org> wrote:

> I'm afraid this doesn't make much sense to me. LSF has dispatched node1 and 
> node2 - correct? It sounds like you have also given those names aliases that 
> refer to their IB ports - generally a very bad practice, but let's set that 
> aside for now.
> 
> If they are the same physical nodes, then the node name makes no difference - 
> OMPI will see both TCP and IB on the node and use them. You can control which 
> interfaces get used by simply telling OMPI on its command line:
> 
> mpirun -mca btl tcp,sm,self ...  will use shared memory and TCP
> 
> mpirun -mca openib,sm,self ...  will use IB and shared memory
> 
> Using host names to try and control which network gets used isn't going to 
> work - the software is too smart to be fooled that way.
> 
> 
> On Feb 2, 2013, at 6:33 AM, HM Li <li...@163.com> wrote:
> 
>> Can you help me?  
>> 
>> The bnode1.bnode2 and node1,node2 are the hostnames of the same nodes 
>> corresponding to the InfiniBand and ethernet network respectively.
>> The node1,node2 are the nodes declarated in lsf.cluster.name
>> In order to use the IB network, I have modified the lsf mpijob script, and 
>> modified the HOSTFILE containing the nodes which LSF dispatched from node to 
>> bnode.
>> Then use "mpiexec -machinefile $HOSTFILE $COMMANDLINE" to run my jobs.
>> But the job exits and shows:
>> -------------------------------------------------------------
>> A hostfile was provided that contains at least one node not
>> present in the allocation:
>> 
>>   hostfile:  /home/nic/hmli/.lsbatch/bhost23263.node1
>>   node:      bnode2
>> 
>> If you are operating in a resource-managed environment, then only
>> nodes that are in the allocation can be used in the hostfile. You
>> may find relative node syntax to be a useful alternative to
>> specifying absolute node names see the orte_hosts man page for
>> further information.
>> -------------------------------------------------------------
>> 
>> I don't want to change the hostname from node to bnode in lsf.cluster.name.
>> 
>> Thank you very much. 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to