Re: [OMPI users] TCP instead of openIB doesn't work

2009-02-27 Thread Jeff Squyres
I notice the following: - you're creating an *enormous* array on the stack. you might be better allocating it on the heap. - the value of "exchanged" will quickly grow beyond 2^31 (i.e., MAX_INT) which is the max that the MPI API can handle. Bad Things can/ will happen beyond that value

Re: [OMPI users] TCP instead of openIB doesn't work

2009-02-27 Thread Vittorio Giovara
Hello, i ve corrected the syntax and added the flag you suggested, but unfortunately the result doen't change. randori ~ # mpirun --display-map --mca btl tcp,self -np 2 -host randori,tatami graph [randori:22322] Map for job: 1Generated by mapping mode: byslot Starting vpid: 0Vpid

Re: [OMPI users] Latest SVN failures

2009-02-27 Thread Rolf Vandevaart
With further investigation, I have reproduced this problem. I think I was originally testing against a version that was not recent enough. I do not see it with r20594 which is from February 19. So, something must have happened over the last 8 days. I will try and narrow down the issue.

Re: [OMPI users] OMPI, and HPUX

2009-02-27 Thread Jeff Squyres
I don't know if anyone has tried OMPI on HP-UX, sorry. On Feb 26, 2009, at 9:14 AM, Nader wrote: Hello, Does anyone has installed OMPI on a HPUX system? I do apprciate any info. Best Regards. Nader ___ users mailing list us...@open-mpi.org

Re: [OMPI users] defining different values for same environment variable

2009-02-27 Thread Nicolas Deladerriere
Matt, Thanks for your solution, but I thought about that and it is not really convenient in my configuration to change the executable on each node. I would like to change only mpirun command. 2009/2/27 Matt Hughes > > 2009/2/27

Re: [OMPI users] valgrind problems

2009-02-27 Thread Douglas Guptill
On Thu, Feb 26, 2009 at 08:27:15PM -0700, Justin wrote: > Also the stable version of openmpi on Debian is 1.2.7rc2. Are there any > known issues with this version and valgrid? For a now-forgotten reason, I ditched the openmpi that comes on Debian etch, and installed 1.2.8 in /usr/local. HTH,

[OMPI users] Threading fault

2009-02-27 Thread Mahmoud Payami
Dear All, I am using intel lc_prof-11 (and its own mkl) and have built openmpi-1.3.1 with connfigure options: "FC=ifort F77=ifort CC=icc CXX=icpc". Then I have built my application. The linux box is 2Xamd64 quad. In the middle of running of my application (after some 15 iterations), I receive the

Re: [OMPI users] defining different values for same environment variable

2009-02-27 Thread Matt Hughes
2009/2/27 Nicolas Deladerriere : > I am looking for a way to set environment variable with different value on > each node before running MPI executable. (not only export the environment > variable !) I typically use a script for things like this. So instead of

[OMPI users] defining different values for same environment variable

2009-02-27 Thread Nicolas Deladerriere
Hello I am looking for a way to set environment variable with different value on each node before running MPI executable. (not only export the environment variable !) Let's consider that I have cluster with two nodes (n001 and n002) and I want to set the environment variable GMON_OUT_PREFIX with

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Pavel Shamis (Pasha)
Usually "retry exceeded error" points to some network issues, like bad cable or some bad connector. You may use ibdiagnet tool for the network debug - *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED. Pasha Brett Pemberton wrote: Hey, I've had a couple of errors recently,

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Åke Sandgren
On Fri, 2009-02-27 at 09:54 -0700, Matt Hughes wrote: > 2009/2/26 Brett Pemberton : > > [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org > > to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status > > number 12 for wr_id 38996224 opcode 0

Re: [OMPI users] openib RETRY EXCEEDED ERROR

2009-02-27 Thread Matt Hughes
2009/2/26 Brett Pemberton : > [[1176,1],0][btl_openib_component.c:2905:handle_wc] from tango092.vpac.org > to: tango090 error polling LP CQ with status RETRY EXCEEDED ERROR status > number 12 for wr_id 38996224 opcode 0 qp_idx 0 What OS are you using? I've seen this error and

[OMPI users] TCP instead of openIB doesn't work

2009-02-27 Thread Vittorio Giovara
Hello, i'm posting here another problem of my installation I wanted to benchmark the differences between tcp and openib transport if i run a simple non mpi application i get randori ~ # mpirun --mca btl tcp,self -np 2 -host randori -host tatami hostname randori tatami but as soon as i switch

Re: [OMPI users] 3.5 seconds before application launches

2009-02-27 Thread Vittorio Giovara
Hello, and thanks for both replies, I've tried to run non-mpi program but i still measured some latency time before starting, something around 2 seconds this time. SSH should be properly configured, in fact i can login to both machines without password; openmpi and mvapich use ssh as default.

Re: [OMPI users] Latest SVN failures

2009-02-27 Thread Rolf Vandevaart
I just tried trunk-1.4a1r20458 and I did not see this error, although my configuration was rather different. I ran across 100 2-CPU sparc nodes, np=256, connected with TCP. Hopefully George's comment helps out with this issue. One other thought to see whether SGE has anything to do with

[OMPI users] more XGrid Problems with openmpi1.2.9

2009-02-27 Thread Ricardo Fernández-Perea
Hi It seems to me more like time issues. All the runs end with something similar to Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x45485308 Crashed Thread: 0 Thread 0 Crashed: 0 libSystem.B.dylib 0x95208f04 strcmp + 84 1