Re: [OMPI users] Runtime error only on one node.

2009-03-05 Thread Jeff Squyres
On Mar 5, 2009, at 7:05 PM, Shinta Bonnefoy wrote: Thanks, the option --mca btl ^openib works fine ! Half of the cluster has Infiniband/OpenFabrics (from node49 to node96) and the other half (nodes from 01 to 48) doesn't. Ah... this explains things. I wonder if we have not

Re: [OMPI users] Lahey 64 bit and openmpi 1.3?

2009-03-05 Thread Jeff Squyres
If you have a contact with Lahey support, it would be great to contact them. Perhaps somehow the support in Libtool 2.2.6a wasn't complete...? On Mar 5, 2009, at 7:28 PM, Tiago Silva wrote: Yes, I am using 8.1a lfc --version Lahey/Fujitsu Linux64 Fortran Compiler Release L8.10a Tiago

Re: [OMPI users] Runtime error only on one node.

2009-03-05 Thread Shinta Bonnefoy
Hi Jeff, Thanks, the option --mca btl ^openib works fine ! Half of the cluster has Infiniband/OpenFabrics (from node49 to node96) and the other half (nodes from 01 to 48) doesn't. I just wanted to make openmpi run over ethernet/tcp first. I will try to make it run using OpenFabrics but I

Re: [OMPI users] "casual" error

2009-03-05 Thread Biagio Lucini
Many thanks for your help, it was not clear to me whether it was opal, my application or the standard C libs that were causing the segfault. It is already good news that the problem is not at the level of OpenMPI, since this would have meant upgrading that library. My first reaction would be

Re: [OMPI users] Gamess with openmpi

2009-03-05 Thread Jeff Squyres
Is gamess calling fork(), perchance? Perhaps through a system() or popen() call? On Mar 5, 2009, at 3:50 AM, Thomas Exner wrote: Dear Jeff: Thank you very much for your reply. Unfortunately, the overloading is not the problem. The phenomenon also appears if we use only two processes on the

Re: [OMPI users] Runtime error only on one node.

2009-03-05 Thread Jeff Squyres
Whoops; we shouldn't be seg faulting. :-\ The warning is exactly what it implies -- it found the OpenFabrics network stack by no functioning OpenFabrics-capable hardware. You can disable it (and the segv) by disabling the openfabrics BTL from running: mpirun --mca btl ^openib But what

Re: [OMPI users] mlx4 error - looking for guidance

2009-03-05 Thread Jeff Layton
Oops. I ran it on the head node and not the compute node. Here is the output from a compute node: hca_id: mlx4_0 fw_ver: 2.3.000 node_guid: 0018:8b90:97fe:1b6d sys_image_guid: 0018:8b90:97fe:1b70 vendor_id:

Re: [OMPI users] Lahey 64 bit and openmpi 1.3?

2009-03-05 Thread Tiago Silva
Thanks, I am reporting what I found out for the benefit of other lahey users out there. I have been told by people at Lahey that libtool has been updated to support their compiler. http://www.linux-archive.org/archlinux-development/156171-libtool-2-2-6a-1-a.html Unfortunately this seems to

Re: [OMPI users] mlx4 error - looking for guidance

2009-03-05 Thread Pavel Shamis (Pasha)
Do you have the same HCA adapter type on all of your machines ? In the error log I see mlx4 error message , and mlx4 is connectX driver, but ibv_devinfo show some older hca. Pasha Jeff Layton wrote: Pasha, Here you go... :) Thanks for looking at this. Jeff hca_id: mthca0 fw_ver:

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-05 Thread Pavel Shamis (Pasha)
Thanks Pasha! ibdiagnet reports the following: -I--- -I- IPoIB Subnets Check -I--- -I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Port localhost/P1 lid=0x00e2

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-05 Thread Jan Lindheim
On Thu, Mar 05, 2009 at 10:27:27AM +0200, Pavel Shamis (Pasha) wrote: > > >Time to dig up diagnostics tools and look at port statistics. > > > You may use ibdiagnet tool for the network debug - > *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED. > > Pasha. >

Re: [OMPI users] mlx4 error - looking for guidance

2009-03-05 Thread Jeff Layton
Pasha, Here you go... :) Thanks for looking at this. Jeff hca_id: mthca0 fw_ver: 4.8.200 node_guid: 0003:ba00:0100:38ac sys_image_guid: 0003:ba00:0100:38af vendor_id: 0x02c9

Re: [OMPI users] Low performance of Open MPI-1.3 over Gigabit

2009-03-05 Thread Jeff Squyres
On Mar 5, 2009, at 1:54 AM, Sangamesh B wrote: The fortran application I'm using here is the CPMD-3.11. I don't think the processor is Nehalem: Intel(R) Xeon(R) CPU X5472 @ 3.00GHz Installation procedure was same on both the clusters. I've not set mpi_affinity. This is a

Re: [OMPI users] Any scientific application heavilyusing MPI_Barrier?

2009-03-05 Thread Gus Correa
Hi All Joe Landman wrote: Ralph Castain wrote: Ummmnot to put gasoline on the fire, but...if the data exchange is blocking, why do you need to call a barrier op first? Just use an appropriate blocking data exchange call (collective or whatever) and it will "barrier" anyway. Since I

Re: [OMPI users] Run-time problem

2009-03-05 Thread Ralph Castain
Could you tell us what version of Open MPI you are using, a little about your system (I would assume you are using ssh?), and how this was configured? Thanks Ralph On Mar 5, 2009, at 9:31 AM, justin oppenheim wrote: Hi: When I execute something like mpirun -machinefile machinefile

Re: [OMPI users] tests for heterogenous installations?

2009-03-05 Thread Yury Tarasievich
Bah, I should have been more precise in this: not just any old tests/benchmarks but recommended, reliable tests/benchmarks? Yury Tarasievich wrote: Are there any recommended tests/benchmarks for the heterogenous installations? I'd like to have something measuring the throughput of lengthy

Re: [OMPI users] Any scientific application heavily using MPI_Barrier?

2009-03-05 Thread Ganesh
Thank you, Jeff and Ganesh. My current research is trying to rewrite some collective MPI operations to work with our system. Barrier is my first step, maybe I will have bcast and reduce in the future. I understand that some applications used too many unnecessary barriers. But here what I

Re: [OMPI users] Any scientific application heavily using MPI_Barrier?

2009-03-05 Thread Shanyuan Gao
Thank you, Jeff and Ganesh. My current research is trying to rewrite some collective MPI operations to work with our system. Barrier is my first step, maybe I will have bcast and reduce in the future. I understand that some applications used too many unnecessary barriers. But here what

Re: [OMPI users] Any scientific application heavilyusing MPI_Barrier?

2009-03-05 Thread Joe Landman
Jeff Squyres wrote: On Mar 5, 2009, at 10:33 AM, Gerry Creager wrote: We've been playing with it in a coupled atmosphere-ocean model to allow the two to synchronize and exchange data. The models have differing levels of physics complexity and the time step requirements are significantly

Re: [OMPI users] Any scientific application heavilyusing MPI_Barrier?

2009-03-05 Thread Jeff Squyres
On Mar 5, 2009, at 10:33 AM, Gerry Creager wrote: We've been playing with it in a coupled atmosphere-ocean model to allow the two to synchronize and exchange data. The models have differing levels of physics complexity and the time step requirements are significantly different. To sync them

[OMPI users] tests for heterogenous installations?

2009-03-05 Thread Yury Tarasievich
Are there any recommended tests/benchmarks for the heterogenous installations? I'd like to have something measuring the throughput of lengthy computations, which would be executed on the installation with the heterogenous nodes. Thanks.