I don't think so, It's always the 66th node, even if I swap between 65th and 66th I also get the same error when setting np=66, while having only 65 hosts in hostfile (I am using only tcp btl )
Lenny Verkhovsky SW Engineer, Mellanox Technologies www.mellanox.com<http://www.mellanox.com> Office: +972 74 712 9244 Mobile: +972 54 554 0233 Fax: +972 72 257 9400 From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Monday, August 11, 2014 1:07 AM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI fails with np > 65 Looks to me like your 65th host is missing the dstore library - is it possible you don't have your paths set correctly on all hosts in your hostfile? On Aug 10, 2014, at 1:13 PM, Lenny Verkhovsky <len...@mellanox.com<mailto:len...@mellanox.com>> wrote: Hi all, Trying to run OpenMPI ( trunk Revision: 32428 ) I faced the problem running OMPI with more than 65 procs. It looks like MPI failes to open 66th connection even with running `hostname` over tcp. It also seems to unrelated to specific host. All hosts are Ubuntu 12.04.1 LTS mpirun -np 66 --hostfile /proj/SSA/Mellanox/tmp//20140810_070156_hostfile.txt --mca btl tcp,self hostname [nodename] [[4452,0],65] ORTE_ERROR_LOG: Error in file base/ess_base_std_orted.c at line 288 ....................................... It looks like environment issue, but I can't find any limit related. Any ideas ? Thanks. Lenny Verkhovsky SW Engineer, Mellanox Technologies www.mellanox.com<http://www.mellanox.com/> Office: +972 74 712 9244 Mobile: +972 54 554 0233 Fax: +972 72 257 9400 _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/24961.php