Re: [OMPI users] Error message related to infiniband

2014-01-19 Thread Gustavo Correa
Is your IB card in compute-01-10.private.dns.zone working? Did you check it with ibstat? Do you have a dual port IB card in compute-01-15.private.dns.zone? Did you connect both ports to the same switch on the same subnet? TCP "no route to host": If it is not a firewall problem, could it bad Ether

Re: [OMPI users] Error message related to infiniband

2014-01-19 Thread Syed Ahsan Ali
I agree with you and still struglling with subnet ID settings because I couldn't find /var/cache/opensm/opensm.opts file. Secondly, if OMPI is going for TCP then it should be able to find as compute nodes are available via ping and ssh On Sun, Jan 19, 2014 at 9:38 PM, Ralph Castain wrote: > If

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-19 Thread tmishima
Hi Ralph, I confirmed that it worked quite well for my purpose. Thank you very much. I would point out just a small thing. Since the debug information in the rank-file block is useful even when a host is initially detected, OPAL_OUTPUT_VERBOSE in the line 302 should be out of the else-clause as

Re: [OMPI users] simple test problem hangs on mpi_finalize and consumes all system resources

2014-01-19 Thread Fischer, Greg A.
I just tried running "hello_f90.f90" and see the same behavior: 100% CPU usage, gradually increasing memory consumption, and failure to get past mpi_finalize. LD_LIBRARY_PATH is set as: /tools/casl_sles10/vera_clean/gcc-4.6.1/toolset/openmpi-1.6.5/lib The installation target fo

Re: [OMPI users] random error bugging me..

2014-01-19 Thread George Bosilca
Thomas, Here is a quick way to see how a function get called after MPI_Finalize. In the following I will use gdb scripting, but with little effort you can adapt this to work with your preferred debugger (lldb as an example). The idea is to break on the function generating the error you get on t

Re: [OMPI users] random error bugging me..

2014-01-19 Thread Ralph Castain
Hard to say what could be the cause of the problem without a better understanding of the code, but the root cause appears to be some code path that allows you to call an MPI function after you called MPI_Finalize. From your description, it appears you have a race condition in the code that activ

Re: [OMPI users] Error message related to infiniband

2014-01-19 Thread Ralph Castain
If OMPI finds infiniband support on the node, it will attempt to use it. In this case, it would appear you have an incorrectly configured IB adaptor on the node, so you get the additional warning about that fact. OMPI then falls back to look for another transport, in this case TCP. However, the

Re: [OMPI users] simple test problem hangs on mpi_finalize and consumes all system resources

2014-01-19 Thread Ralph Castain
The OFED warning about registration is something OMPI added at one point when we isolated the cause of jobs occasionally hanging, so you won't see that warning from other MPIs or earlier versions of OMPI (I forget exactly when we added it). The problem you describe doesn't sound like an OMPI is

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-19 Thread Ralph Castain
On Jan 19, 2014, at 1:36 AM, tmish...@jcity.maeda.co.jp wrote: > > > Thank you for your fix. I will try it tomorrow. > > Before that, although I could not understand everything, > let me ask some questions about the new hostfile.c. > > 1. The line 244-248 is included in else-clause, which mig

Re: [OMPI users] random error bugging me..

2014-01-19 Thread thomas . forde
Yes. It's a shared NSF partition on the nodes. Sendt fra min iPhone > Den 19. jan. 2014 kl. 13:29 skrev "Reuti" : > > Hi, > > Am 18.01.2014 um 22:43 schrieb thomas.fo...@ulstein.com: > > > I have had a running cluster going good for a while, and 2 days ago we decided to upgrade it from 128 to 2

[OMPI users] simple test problem hangs on mpi_finalize and consumes all system resources

2014-01-19 Thread Fischer, Greg A.
Hello, I have a simple, 1-process test case that gets stuck on the mpi_finalize call. The test case is a dead-simple calculation of pi - 50 lines of Fortran. The process gradually consumes more and more memory until the system becomes unresponsive and needs to be rebooted, unless the job is kil

Re: [OMPI users] random error bugging me..

2014-01-19 Thread Reuti
Hi, Am 18.01.2014 um 22:43 schrieb thomas.fo...@ulstein.com: > I have had a running cluster going good for a while, and 2 days ago we > decided to upgrade it from 128 to 256 cores. > > Most om my deployment of nodes goes through cobbler and scripting, and it has > worked fine before.on the fi

[OMPI users] Error message related to infiniband

2014-01-19 Thread Syed Ahsan Ali
Dear All I am getting infiniband errors while running mpirun applications on cluster. I get these errors even when I don't include infiniband usage flags in mpirun command. Please guide mpirun -np 72 -hostfile hostlist ../bin/regcmMPI regcm.in

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-19 Thread tmishima
Thank you for your fix. I will try it tomorrow. Before that, although I could not understand everything, let me ask some questions about the new hostfile.c. 1. The line 244-248 is included in else-clause, which might cause memory leak(it seems to me). Should it be out of the clause? 244