Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Jack Bryan
Hi, I find the reason why the program is killed by operating system in the case that the problem size is large. It consumes more memory and leads to more memory swap. This also degrade the program performance. But, I cannot determine which function of the worker process causes the problem.

Re: [OMPI users] MPI one-sided passive synchronization.

2011-04-13 Thread James Dinan
Sudheer, Locks in MPI don't mean mutexes, they mark the beginning and end of a passive mode communication epoch. All MPI operations within an epoch logically occur concurrently and must be non-conflicting. So, what you're written below is incorrect: the get is not guaranteed to complete

Re: [OMPI users] MPI one-sided passive synchronization.

2011-04-13 Thread Abhishek Kulkarni
On Wed, Apr 13, 2011 at 2:49 PM, Barrett, Brian W wrote: > This is mostly an issue of how MPICH2 and Open MPI implement lock/unlock. > Some might call what I'm about to describe erroneous. I wrote the > one-sided code in Open MPI and may be among those people. > > In both

Re: [OMPI users] MPI one-sided passive synchronization.

2011-04-13 Thread Barrett, Brian W
This is mostly an issue of how MPICH2 and Open MPI implement lock/unlock. Some might call what I'm about to describe erroneous. I wrote the one-sided code in Open MPI and may be among those people. In both implementations, one-sided communication is not necessarily truly asynchronous. That is,

[OMPI users] MPI one-sided passive synchronization.

2011-04-13 Thread Abhishek Kulkarni
Hello, I am trying to better understand the semantics of passive synchronization in one-sided communication calls. Doesn't MPI_Win_unlock() block to ensure that all the preceeding RMA calls in that epoch have been synced? In that case, why is an undefined value returned when trying to load from a

[OMPI users] Process to resource mapping in ompi-restart

2011-04-13 Thread kishor kharbas
Hello All, I have been enjoying using Transparent CR in Open MPI for my research ! I have few questions regarding working of ompi-restart: 1. I there a fixed mapping of processes to resources when ompi-restart is done. 2. Is there a way for the user to control it. If I am correct, ompi-restart

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Ralph Castain
On Apr 13, 2011, at 10:19 AM, Jack Bryan wrote: > Hi, I am using > > mpirun (Open MPI) 1.3.4 > > But, I have these, > > orte-clean orted orte-ioforte-ps orterun > > Can they do the same thing ? Unfortunately, no > > If I use them, will they use a lot of memory on each

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Ralph Castain
On Apr 13, 2011, at 10:29 AM, Jack Bryan wrote: > Hi , > > If I cannot ssh to a worker node, it means that my program cannot work > correctly ? No, that's not true. People thought you were on a cluster using ssh as the launcher. From prior notes, you were using Torque, so not being allowed

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Jack Bryan
Hi, I do not have qrsh I have qrerunqrls qrttoppm qrun Can they do the same thing ? thanks > From: re...@staff.uni-marburg.de > Date: Wed, 13 Apr 2011 16:28:14 +0200 > To: us...@open-mpi.org > Subject: Re: [OMPI users] OMPI monitor each process behavior > > Am 13.04.2011 um 05:55

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Jack Bryan
Hi , If I cannot ssh to a worker node, it means that my program cannot work correctly ? I can run it on 32 nodes *4 cores/node parallel processes. But, for larger parallel processes, 128 nodes * 1 cpu/node, it is killed by signal 9. Is this a reason ? thanks > Date: Wed, 13 Apr 2011

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Jack Bryan
Hi, I am using mpirun (Open MPI) 1.3.4 But, I have these, orte-clean orted orte-ioforte-ps orterun Can they do the same thing ? If I use them, will they use a lot of memory on each worker node and print out a lot of things on some log files ? Any help is really appreciated.

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin
The 16 cores refers to x3755-m2s. We have a mix of 3550s and 3755s in the cluster. It could be memory, but I think not. The jobs are well within memory capacity, and the memory is mainly static. If out of memory then the jobs would be first candidate for the job. Larger jobs run on the 3755s

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin
Inline -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Stergiou, Jonathan C CIV NSWCCD West Bethesda,6640 > Sent: 13 April 2011 16:52 > To: Open MPI Users > Subject: Re: [OMPI users] Over committing? > > Martin, > > We have seen

Re: [OMPI users] Over committing?

2011-04-13 Thread Reuti
Am 13.04.2011 um 17:09 schrieb Rushton Martin: > Version 1.3.2 > > Consider a job that will run with 28 processes. The user submits it > with: > > $ qsub -l nodes=4:ppn=7 ... > > which reserves 7 cores on (in this case) each of x3550x014 x3550x015 and > x3550x016 x3550x020. Torque generates

Re: [OMPI users] Over committing?

2011-04-13 Thread Stergiou, Jonathan C CIV NSWCCD West Bethesda, 6640
Martin, We have seen similar behavior when using certain codes. CodeA can run at ppn=8 with no problem, but CodeB will run much more slowly (or hang) with ppn=8; instead we use ppn=7 for CodeB. This becomes complicated when we run CodeA and CodeB together (coupled simulations). It requires

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin
I'm afraid I can't comment on how OMPI was configured, "as supplied by the suppliers"! The users experiencing these problems use the Intel bindings, loaded via the modules command. We are running CentOS 5.3. Martin Rushton HPC System Manager, Weapons Technologies Tel: 01959 514777, Mobile:

Re: [OMPI users] Over committing?

2011-04-13 Thread Ralph Castain
Afraid I have no idea - we regularly run on Torque machines with the nodes fully populated. While most runs are only for a few hours, some runs go for days. How was OMPI configured? What OS version? On Apr 13, 2011, at 9:09 AM, Rushton Martin wrote: > Version 1.3.2 > > Consider a job that

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin
Version 1.3.2 Consider a job that will run with 28 processes. The user submits it with: $ qsub -l nodes=4:ppn=7 ... which reserves 7 cores on (in this case) each of x3550x014 x3550x015 and x3550x016 x3550x020. Torque generates a file (PBS_NODEFILE) which lists each node 7 times. The mpirun

Re: [OMPI users] Over committing?

2011-04-13 Thread Ralph Castain
On Apr 13, 2011, at 8:13 AM, Rushton Martin wrote: > The bulk of our compute nodes are 8 cores (twin 4-core IBM x3550-m2). > Jobs are submitted by Torque/MOAB. When run with up to np=8 there is > good performance. Attempting to run with more processors brings > problems, specifically if any

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Reuti
Am 13.04.2011 um 05:55 schrieb Jack Bryan: > I need to monitor the memory usage of each parallel process on a linux Open > MPI cluster. > > But, top, ps command cannot help here because they only show the head node > information. > > I need to follow the behavior of each process on each

[OMPI users] Over committing?

2011-04-13 Thread Rushton Martin
The bulk of our compute nodes are 8 cores (twin 4-core IBM x3550-m2). Jobs are submitted by Torque/MOAB. When run with up to np=8 there is good performance. Attempting to run with more processors brings problems, specifically if any one node of a group of nodes has all 8 cores in use the job

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Ralph Castain
What version are you using? If you are using 1.5.x, there is an "orte-top" command that will do what you ask. It queries the daemons to get the info. On Apr 12, 2011, at 9:55 PM, Jack Bryan wrote: > Hi , All: > > I need to monitor the memory usage of each parallel process on a linux Open >

Re: [OMPI users] Problem with setting up openmpi-1.4.3

2011-04-13 Thread Eugene Loh
amosl...@gmail.com wrote: Hi, I am embarrassed! I submitted a note to the users on setting up openmpi-1.4.3 using SUSE-11.3 under Linux and received several replies. I wanted to transfer them but they disappeared for no apparent reason. I hope that those that sent me messages

Re: [OMPI users] OpenMPI 1.4.2 Hangs When Using More Than 1 Proc

2011-04-13 Thread Stergiou, Jonathan C CIV NSWCCD West Bethesda, 6640
All, It looks like the issue is solved. Our sysadmin had been working on the issue too - he noticed a lot of "junk" in my /etc/ld.so.conf.d/ directory. After "cleaning" it out (I think he ended up wiping everything out, then rebooting the machine, then re-configuring specific items as

Re: [OMPI users] "Value out of bounds in file" error

2011-04-13 Thread hi
Hi Rainer, When executing "mpirun blacs_hello_example.exe" (reference: http://www.netlib.org/blacs/BLACS/Examples.html#HELLO), I am now getting folloing error... << C:\blacs_examples>mpirun blacs_hello_example.exe forrtl: severe (157): Program Exception - access violation Image

Re: [OMPI users] "Value out of bounds in file" error

2011-04-13 Thread hi
Hi Rainer, Thanks for acknowledgment. > You may want to port/compile BLACS from netlib yourselve, see here: > http://icl.cs.utk.edu/lapack-for-windows/VisualStudio_install.html With that I am seeing compilation errors as reported in... http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=12=2354

[OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Jack Bryan
Hi , All: I need to monitor the memory usage of each parallel process on a linux Open MPI cluster. But, top, ps command cannot help here because they only show the head node information. I need to follow the behavior of each process on each cluster node. I cannot use ssh to access each node.