Re: [OMPI users] excluding hosts

2010-04-06 Thread Ralph Castain
On Apr 6, 2010, at 4:59 PM, David Turner wrote: > Hi Ralph, > >> Are you using a scheduler of some kind? If so, you can add this to your >> default mca param file: > > Yes, we are running torque/moab. > >> orte_allocation_required = 1 >> This will prevent anyone running without having an allo

Re: [OMPI users] detect hung node

2010-04-06 Thread Jeff Squyres
On Apr 6, 2010, at 1:03 PM, Sam Preston wrote: > I have a problem with the cluster I'm currently using where nodes > 'hang' silently from time to time during an MPI call. This causes the > blocked MPI processes to block indefinitely -- the only way to detect > an error is to notice that no more o

Re: [OMPI users] excluding hosts

2010-04-06 Thread David Turner
Hi Ralph, Are you using a scheduler of some kind? If so, you can add this to your default mca param file: Yes, we are running torque/moab. orte_allocation_required = 1 This will prevent anyone running without having an allocation. You can also set Ah. An "allocation". Not much info on

Re: [OMPI users] excluding hosts

2010-04-06 Thread Ralph Castain
Hi David Are you using a scheduler of some kind? If so, you can add this to your default mca param file: orte_allocation_required = 1 This will prevent anyone running without having an allocation. You can also set rmaps_base_no_schedule_local = 1 which tells mpirun not to schedule any MPI pro

Re: [OMPI users] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-06 Thread Oliver Geisler
On 4/6/2010 2:53 PM, Jeff Squyres wrote: > > Try NetPIPE -- it has both MPI communication benchmarking and TCP > benchmarking. Then you can see if there is a noticable difference between > TCP and MPI (there shouldn't be). There's also a "memcpy" mode in netpipe, > but it's not quite the sam

[OMPI users] excluding hosts

2010-04-06 Thread David Turner
Hi, Our cluster has a handful of login nodes, and then a bunch of compute nodes. OpenMPI is installed in a global file system visible from both sets of nodes. This means users can type "mpirun" from an interactive prompt, and quickly oversubscribe the login node. So, is there a way to explicit

Re: [OMPI users] Hide Abort output

2010-04-06 Thread Jeff Squyres
BTW, we diverged quite a bit on this thread -- Yves -- does the functionality that was fixed by Ralph address your original issue? On Apr 2, 2010, at 10:21 AM, Ralph Castain wrote: > Testing found that I had missed a spot here, so we weren't fully suppressing > messages (including MPI_Abort).

Re: [OMPI users] Hide Abort output

2010-04-06 Thread Jeff Squyres
For those wishing to follow the thread Dick started on the MPI forum list: http://lists.mpi-forum.org/mpi-forum/2010/04/0606.php On Apr 6, 2010, at 2:31 PM, Richard Treumann wrote: > The MPI standard says that MPI_Abort makes a "best effort". It also says that > an MPI implementation is fr

Re: [OMPI users] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-06 Thread Jeff Squyres
On Apr 1, 2010, at 4:17 PM, Oliver Geisler wrote: > > However, reading through your initial description on Tuesday, none of these > > fit: You want to actually measure the kernel time on TCP communication > > costs. > > Since the problem occurs also on node only configuration and mca-option > bt

Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine

2010-04-06 Thread Ralph Castain
If you run your cmd with the hostfile option and add --display-allocation, what does it say? On Apr 6, 2010, at 12:18 PM, Serge wrote: > Hi, > > OpenMPI integrates with Sun Grid Engine really well, and one does not need to > specify any parameters for the mpirun command to launch the processes

Re: [OMPI users] Hide Abort output

2010-04-06 Thread Richard Treumann
The MPI standard says that MPI_Abort makes a "best effort". It also says that an MPI implementation is free to lose the value passed into MPI_Abort and deliver some other RC.. The standard does not say that MPI_Abort becomes a valid way to end a parallel job if it is passed a zero. To me it see

[OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine

2010-04-06 Thread Serge
Hi, OpenMPI integrates with Sun Grid Engine really well, and one does not need to specify any parameters for the mpirun command to launch the processes on the compute nodes, that is having in the submission script "mpirun ./program" is enough; there is no need for "-np XX" or "-hostfile file_

Re: [OMPI users] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-06 Thread Oliver Geisler
On 4/1/2010 12:49 PM, Rainer Keller wrote: > On Thursday 01 April 2010 12:16:25 pm Oliver Geisler wrote: >> Does anyone know a benchmark program, I could use for testing? > There's an abundance of benchmarks (IMB, netpipe, SkaMPI...) and performance > analysis tools (Scalasca, Vampir, Paraver, Op

Re: [OMPI users] Hide Abort output

2010-04-06 Thread Terry Frankcombe
> Jeff - > > I started a discussion of MPI_Quit on the MPI Forum reflector. I raised > the question because I do not think using MPI_Abort is appropriate. > > The situation is when a single task decides the parallel program has > arrived at the desired answer and therefore whatever the other task

Re: [OMPI users] MPI Literature Survey on Multicores

2010-04-06 Thread Jeff Squyres
Since I moved to industry a few years ago (i.e., out of academia), I haven't kept as close of tabs on research papers as I used to, but I would search for MPI papers dealing with shared memory optimizations. That would likely be a good starting point. You also might want to look at some (somew

[OMPI users] MPI Literature Survey on Multicores

2010-04-06 Thread vaibhav dutt
Hi all, I am doing a literature survey on MPI optimizations on Multi Cores and was searching for some good papers.I have got some papers on MPI Intra Node Data Transfer. Can anybody please suggest me how to go about it I mean how can I organize the Survey and also some good source of papers on the

[OMPI users] detect hung node

2010-04-06 Thread Sam Preston
Hi all, I have a problem with the cluster I'm currently using where nodes 'hang' silently from time to time during an MPI call. This causes the blocked MPI processes to block indefinitely -- the only way to detect an error is to notice that no more output is being written to the log files. We're

Re: [OMPI users] Problem running mpirun with ssh on remote nodes -Daemon did not report back when launched problem

2010-04-06 Thread Jeff Squyres
Open MPI opens random TCP sockets during the startup phase of MPI processes -- mostly from the "orted" helper process that is started on each node (or VM) back to the initiating mpirun process. Do you have firewalling or other TCP blocking software running? Or are the appropriate TCP routes se

Re: [OMPI users] Hide Abort output

2010-04-06 Thread Richard Treumann
Jeff - I started a discussion of MPI_Quit on the MPI Forum reflector. I raised the question because I do not think using MPI_Abort is appropriate. The situation is when a single task decides the parallel program has arrived at the desired answer and therefore whatever the other tasks are c

Re: [OMPI users] Hide Abort output

2010-04-06 Thread Jeff Squyres
I'm not sure I understand what your MPI_Quit function would do differently than MPI_Abort and/or MPI_Finalize...? On Apr 6, 2010, at 3:13 AM, Yves Caniou wrote: > I really understand the failure idea of the MPI_Abort() function, and it > clearly appeared in the recent mails. > > There is an evi

Re: [OMPI users] IPoIB

2010-04-06 Thread Jeff Squyres
More specifically, Open MPI still uses a TCP layer for its run-time setup/teardown. That TCP can be regular ethernet, IPoIB, or any other emulation layer. Ralph's referring to the oob_tcp_if_in|exclude flags allows you to specify using your IPoIB devices, pure ethernet devices, etc. On Apr 5

Re: [OMPI users] Trouble building openmpi 1.2.8 with intelcompilers 10.0.23

2010-04-06 Thread Jeff Squyres
Additionally, Open MPI v1.2.8 is ancient. Can you upgrade to Open MPI v1.4.1? On Apr 6, 2010, at 3:05 AM, Dmitry Kuzmin wrote: > If you have a problem with Intel compiler it would be better to ask at Intel > Forum: http://software.intel.com/en-us/forums/intel-c-compiler/ > icpc 10.0.023 is qui

Re: [OMPI users] Help om Openmpi

2010-04-06 Thread jody
@Trent > the 1024 RSA has already been cracked. Yeah but unless you've got 3 guys spending 100 hours varying the voltage of your processors it is still safe... :) On Tue, Apr 6, 2010 at 11:35 AM, Reuti wrote: > Hi, > > Am 06.04.2010 um 09:48 schrieb Terry Frankcombe: > >>>   1. Run the following

Re: [OMPI users] Help om Openmpi

2010-04-06 Thread Reuti
Hi, Am 06.04.2010 um 09:48 schrieb Terry Frankcombe: >> 1. Run the following command on the client >> * -> ssh-keygen -t dsa >> 2. File id_dsa and id_dsa.pub will be created inside $HOME/.ssh >> 3. Copy id_dsa.pub to the server's .ssh directory >> * -> scp $HOME/.ssh/id_ds

Re: [OMPI users] Help om Openmpi

2010-04-06 Thread Trent Creekmore
I have and the 1024 RSA has already been cracked. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Terry Frankcombe Sent: Tuesday, April 06, 2010 2:44 AM To: Open MPI Users Subject: Re: [OMPI users] Help om Openmpi On Tue, 2010-04-06 a

Re: [OMPI users] Help om Openmpi

2010-04-06 Thread Terry Frankcombe
>1. Run the following command on the client > * -> ssh-keygen -t dsa >2. File id_dsa and id_dsa.pub will be created inside $HOME/.ssh >3. Copy id_dsa.pub to the server's .ssh directory > * -> scp $HOME/.ssh/id_dsa.pub user@server:/home/user/.ssh >4. Change to /

Re: [OMPI users] Help om Openmpi

2010-04-06 Thread Terry Frankcombe
On Tue, 2010-04-06 at 02:33 -0500, Trent Creekmore wrote: > SSH means SECURE Shell. That being said, it would not be very secure > without a password, now would it? I think you need to read about public key authentication. It is secure.

Re: [OMPI users] Help om Openmpi

2010-04-06 Thread Trent Creekmore
SSH means SECURE Shell. That being said, it would not be very secure without a password, now would it? Besides it is the user account that requires passwords, not SHH. From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Huynh Thuc Cuoc Sent: Monday, April 05, 2

Re: [OMPI users] Hide Abort output

2010-04-06 Thread Yves Caniou
I really understand the failure idea of the MPI_Abort() function, and it clearly appeared in the recent mails. There is an evident advantage for me to have an MPI_Quit() function: Having such a function would be great in the sens that someone would not have to code the termination mechanism, who

Re: [OMPI users] Trouble building openmpi 1.2.8 with intel compilers 10.0.23

2010-04-06 Thread Dmitry Kuzmin
If you have a problem with Intel compiler it would be better to ask at Intel Forum: http://software.intel.com/en-us/forums/intel-c-compiler/ icpc 10.0.023 is quite old and probably worth to update it. Regards! Dmitry 2010/4/6 Peter Kjellstrom > On Monday 05 April 2010, Steve Swanekamp (L3-Ti

Re: [OMPI users] Trouble building openmpi 1.2.8 with intel compilers 10.0.23

2010-04-06 Thread Peter Kjellstrom
On Monday 05 April 2010, Steve Swanekamp (L3-Titan Contractor) wrote: > When I try to run the configure utility I get the message that the c++ > compiler can not compile simple c programs. Any ideas? (at least some) Intel compilers need the gcc-c++ distribution package. Have you tested icpc with