On Mar 5, 2009, at 7:05 PM, Shinta Bonnefoy wrote:
Thanks, the option --mca btl ^openib works fine !
Half of the cluster has Infiniband/OpenFabrics (from node49 to
node96)
and the other half (nodes from 01 to 48) doesn't.
Ah... this explains things. I wonder if we have not
If you have a contact with Lahey support, it would be great to contact
them. Perhaps somehow the support in Libtool 2.2.6a wasn't complete...?
On Mar 5, 2009, at 7:28 PM, Tiago Silva wrote:
Yes, I am using 8.1a
lfc --version
Lahey/Fujitsu Linux64 Fortran Compiler Release L8.10a
Tiago
Hi Jeff,
Thanks, the option --mca btl ^openib works fine !
Half of the cluster has Infiniband/OpenFabrics (from node49 to node96)
and the other half (nodes from 01 to 48) doesn't.
I just wanted to make openmpi run over ethernet/tcp first.
I will try to make it run using OpenFabrics but I
Many thanks for your help, it was not clear to me whether it was opal,
my application or the standard C libs that were causing the segfault. It
is already good news that the problem is not at the level of OpenMPI,
since this would have meant upgrading that library. My first reaction
would be
Is gamess calling fork(), perchance? Perhaps through a system() or
popen() call?
On Mar 5, 2009, at 3:50 AM, Thomas Exner wrote:
Dear Jeff:
Thank you very much for your reply. Unfortunately, the overloading is
not the problem. The phenomenon also appears if we use only two
processes on the
Whoops; we shouldn't be seg faulting. :-\
The warning is exactly what it implies -- it found the OpenFabrics
network stack by no functioning OpenFabrics-capable hardware. You can
disable it (and the segv) by disabling the openfabrics BTL from running:
mpirun --mca btl ^openib
But what
Oops. I ran it on the head node and not the compute node. Here is the
output from a compute node:
hca_id: mlx4_0
fw_ver: 2.3.000
node_guid: 0018:8b90:97fe:1b6d
sys_image_guid: 0018:8b90:97fe:1b70
vendor_id:
Thanks,
I am reporting what I found out for the benefit of other lahey users out
there. I have been told by people at Lahey that libtool has been updated
to support their compiler.
http://www.linux-archive.org/archlinux-development/156171-libtool-2-2-6a-1-a.html
Unfortunately this seems to
Do you have the same HCA adapter type on all of your machines ?
In the error log I see mlx4 error message , and mlx4 is connectX driver,
but ibv_devinfo show some older hca.
Pasha
Jeff Layton wrote:
Pasha,
Here you go... :) Thanks for looking at this.
Jeff
hca_id: mthca0
fw_ver:
Thanks Pasha!
ibdiagnet reports the following:
-I---
-I- IPoIB Subnets Check
-I---
-I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00
-W- Port localhost/P1 lid=0x00e2
On Thu, Mar 05, 2009 at 10:27:27AM +0200, Pavel Shamis (Pasha) wrote:
>
> >Time to dig up diagnostics tools and look at port statistics.
> >
> You may use ibdiagnet tool for the network debug -
> *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED.
>
> Pasha.
>
Pasha,
Here you go... :) Thanks for looking at this.
Jeff
hca_id: mthca0
fw_ver: 4.8.200
node_guid: 0003:ba00:0100:38ac
sys_image_guid: 0003:ba00:0100:38af
vendor_id: 0x02c9
On Mar 5, 2009, at 1:54 AM, Sangamesh B wrote:
The fortran application I'm using here is the CPMD-3.11.
I don't think the processor is Nehalem:
Intel(R) Xeon(R) CPU X5472 @ 3.00GHz
Installation procedure was same on both the clusters. I've not set
mpi_affinity.
This is a
Hi All
Joe Landman wrote:
Ralph Castain wrote:
Ummmnot to put gasoline on the fire, but...if the data exchange is
blocking, why do you need to call a barrier op first? Just use an
appropriate blocking data exchange call (collective or whatever) and
it will "barrier" anyway.
Since I
Could you tell us what version of Open MPI you are using, a little
about your system (I would assume you are using ssh?), and how this
was configured?
Thanks
Ralph
On Mar 5, 2009, at 9:31 AM, justin oppenheim wrote:
Hi:
When I execute something like
mpirun -machinefile machinefile
Bah, I should have been more precise in this:
not just any old tests/benchmarks but
recommended, reliable tests/benchmarks?
Yury Tarasievich wrote:
Are there any recommended tests/benchmarks for the heterogenous
installations? I'd like to have something measuring the throughput of
lengthy
Thank you, Jeff and Ganesh.
My current research is trying to rewrite some collective MPI
operations to work with our system. Barrier is my first step, maybe I
will have bcast and reduce in the future. I understand that some
applications used too many unnecessary barriers. But here what I
Thank you, Jeff and Ganesh.
My current research is trying to rewrite some collective MPI
operations to work with our system. Barrier is my first step, maybe
I will have bcast and reduce in the future. I understand that some
applications used too many unnecessary barriers. But here what
Jeff Squyres wrote:
On Mar 5, 2009, at 10:33 AM, Gerry Creager wrote:
We've been playing with it in a coupled atmosphere-ocean model to allow
the two to synchronize and exchange data. The models have differing
levels of physics complexity and the time step requirements are
significantly
On Mar 5, 2009, at 10:33 AM, Gerry Creager wrote:
We've been playing with it in a coupled atmosphere-ocean model to
allow
the two to synchronize and exchange data. The models have differing
levels of physics complexity and the time step requirements are
significantly different. To sync them
Are there any recommended tests/benchmarks for the heterogenous
installations? I'd like to have something measuring the throughput of
lengthy computations, which would be executed on the installation with
the heterogenous nodes.
Thanks.
21 matches
Mail list logo