Is your IB card in compute-01-10.private.dns.zone working?
Did you check it with ibstat?
Do you have a dual port IB card in compute-01-15.private.dns.zone?
Did you connect both ports to the same switch on the same subnet?
TCP "no route to host":
If it is not a firewall problem, could it bad Ether
I agree with you and still struglling with subnet ID settings because I
couldn't find /var/cache/opensm/opensm.opts file.
Secondly, if OMPI is going for TCP then it should be able to find as
compute nodes are available via ping and ssh
On Sun, Jan 19, 2014 at 9:38 PM, Ralph Castain wrote:
> If
Hi Ralph,
I confirmed that it worked quite well for my purpose.
Thank you very much.
I would point out just a small thing. Since the debug
information in the rank-file block is useful even
when a host is initially detected, OPAL_OUTPUT_VERBOSE
in the line 302 should be out of the else-clause as
I just tried running "hello_f90.f90" and see the same behavior: 100% CPU usage,
gradually increasing memory consumption, and failure to get past mpi_finalize.
LD_LIBRARY_PATH is set as:
/tools/casl_sles10/vera_clean/gcc-4.6.1/toolset/openmpi-1.6.5/lib
The installation target fo
Thomas,
Here is a quick way to see how a function get called after MPI_Finalize. In the
following I will use gdb scripting, but with little effort you can adapt this
to work with your preferred debugger (lldb as an example).
The idea is to break on the function generating the error you get on t
Hard to say what could be the cause of the problem without a better
understanding of the code, but the root cause appears to be some code path that
allows you to call an MPI function after you called MPI_Finalize. From your
description, it appears you have a race condition in the code that activ
If OMPI finds infiniband support on the node, it will attempt to use it. In
this case, it would appear you have an incorrectly configured IB adaptor on the
node, so you get the additional warning about that fact.
OMPI then falls back to look for another transport, in this case TCP. However,
the
The OFED warning about registration is something OMPI added at one point when
we isolated the cause of jobs occasionally hanging, so you won't see that
warning from other MPIs or earlier versions of OMPI (I forget exactly when we
added it).
The problem you describe doesn't sound like an OMPI is
On Jan 19, 2014, at 1:36 AM, tmish...@jcity.maeda.co.jp wrote:
>
>
> Thank you for your fix. I will try it tomorrow.
>
> Before that, although I could not understand everything,
> let me ask some questions about the new hostfile.c.
>
> 1. The line 244-248 is included in else-clause, which mig
Yes. It's a shared NSF partition on the nodes.
Sendt fra min iPhone
> Den 19. jan. 2014 kl. 13:29 skrev "Reuti" :
>
> Hi,
>
> Am 18.01.2014 um 22:43 schrieb thomas.fo...@ulstein.com:
>
> > I have had a running cluster going good for a while, and 2 days ago we
decided to upgrade it from 128 to 2
Hello,
I have a simple, 1-process test case that gets stuck on the mpi_finalize call.
The test case is a dead-simple calculation of pi - 50 lines of Fortran. The
process gradually consumes more and more memory until the system becomes
unresponsive and needs to be rebooted, unless the job is kil
Hi,
Am 18.01.2014 um 22:43 schrieb thomas.fo...@ulstein.com:
> I have had a running cluster going good for a while, and 2 days ago we
> decided to upgrade it from 128 to 256 cores.
>
> Most om my deployment of nodes goes through cobbler and scripting, and it has
> worked fine before.on the fi
Dear All
I am getting infiniband errors while running mpirun applications on
cluster. I get these errors even when I don't include infiniband usage
flags in mpirun command. Please guide
mpirun -np 72 -hostfile hostlist ../bin/regcmMPI regcm.in
Thank you for your fix. I will try it tomorrow.
Before that, although I could not understand everything,
let me ask some questions about the new hostfile.c.
1. The line 244-248 is included in else-clause, which might cause
memory leak(it seems to me). Should it be out of the clause?
244
14 matches
Mail list logo