Actually all machines use iptables as firewall. I compared the rules triops and kraken use and found that triops had the line REJECT all -- anywhere anywhere reject-with icmp-host-prohibited which kraken did not have (otherwise they were identical). I removed that line from triops' rules, restarted iptables and now communication works in all directions!
Thank You Jody On Tue, May 3, 2016 at 7:00 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > Have you disabled firewalls between these machines? > > > On May 3, 2016, at 11:26 AM, jody <jody....@gmail.com> wrote: > > > > ...my bad! > > > > I had set up things so that PATH and LD_LIBRARY_PATH were correct in > interactive mode, > > but they were wrong ssh was called non-interactively. > > > > Now i have a new problem: > > When i do > > mpirun -np 6 --hostfile krakenhosts hostname > > from triops, sometimes it seems to hang (i.e. no output, doesn't end) > > and at other time i get the ouput > > ---- > > [aim-kraken:24527] [[45056,0],1] tcp_peer_send_blocking: send() to > socket 9 failed: Broken pipe (32) > > > -------------------------------------------------------------------------- > > ORTE was unable to reliably start one or more daemons. > > This usually is caused by: > > ... > > > -------------------------------------------------------------------------- > > ----- > > Again, i can call mpirun on triops from kraken und all squid_XX without > a problem... > > > > What could cause this problem? > > > > Thank You > > Jody > > > > > > On Tue, May 3, 2016 at 2:54 PM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > > Have you verified that you are running the same version of Open MPI on > both servers when launched from non-interactive logins? > > > > This kind of error is somewhat typical if you accidentally mixed, for > example, Open MPI v1.6.x and v1.10.2 (i.e., v1.10.2 understands the > --hnp-topo-sig back end option, but v1.6.x does not). > > > > > > > On May 3, 2016, at 6:35 AM, jody <jody....@gmail.com> wrote: > > > > > > Hi > > > I have installed Open MPI v 1.10.2 on two machines today using only > the prefix-option for configure, and then doing 'make all install'. > > > > > > On both machines i changed .bashrc to set PATH and LD_LIBRARY_PATH > correctly. > > > (I checked by running 'mpirun --version' and verifying that the output > does indeed say 1.10.2) > > > > > > Password-less ssh is enabled on both machines in both directions. > > > > > > When i start mpirun form one machine (kraken) with a hostfile > specifying the other machine ("triops slots=8 max-slots=8), > > > it works: > > > ----- > > > jody@kraken ~ $ mpirun -np 3 --hostfile triopshosts uptime > > > 12:24:04 up 7 days, 43 min, 17 users, load average: 0.06, 0.68, 0.65 > > > 12:24:04 up 7 days, 43 min, 17 users, load average: 0.06, 0.68, 0.65 > > > 12:24:04 up 7 days, 43 min, 17 users, load average: 0.06, 0.68, 0.65 > > > ----- > > > > > > But when i start mpirun form triops with a hostfile specifying kraken > ("kraken slots=8 max-slots=8"), > > > it fails: > > > ----- > > > jody@triops ~ $ mpirun -np 3 --hostfile krakenhosts hostname > > > [aim-kraken:21973] Error: unknown option "--hnp-topo-sig" > > > input in flex scanner failed > > > > -------------------------------------------------------------------------- > > > ORTE was unable to reliably start one or more daemons. > > > This usually is caused by: > > > > > > * not finding the required libraries and/or binaries on > > > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > > > settings, or configure OMPI with --enable-orterun-prefix-by-default > > > > > > * lack of authority to execute on one or more specified nodes. > > > Please verify your allocation and authorities. > > > > > > * the inability to write startup files into /tmp > (--tmpdir/orte_tmpdir_base). > > > Please check with your sys admin to determine the correct location > to use. > > > > > > * compilation of the orted with dynamic libraries when static are > required > > > (e.g., on Cray). Please check your configure cmd line and consider > using > > > one of the contrib/platform definitions for your system type. > > > > > > * an inability to create a connection back to mpirun due to a > > > lack of common network interfaces and/or no route found between > > > them. Please check network connectivity (including firewalls > > > and network routing requirements). > > > > -------------------------------------------------------------------------- > > > > > > The same error happens when i use '--host kraken'. > > > > > > I verified that PATH and LD_LIBRARY_PATH are correctly set on both > machines. > > > And on both machines /tmp is readable, writeable and executable for > all. > > > The connection should be okay (i can do a ssh from kraken to triops > and vice versa). > > > > > > Any idea what the problem is? > > > > > > Thank You > > > Jody > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29074.php > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29075.php > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29078.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29079.php >