Re: [OMPI users] Behaviour of MPI_Cancel when using 'large' messages

Jeff Squyres Wed, 9 Jun 2010 15:52:42 -0400

Yes, Open MPI does not implement cancels for sends.  

Cancels *could* be implemented in Open MPI, but no one has done so.  There are 
three reasons why:


1. It can be really, really hard to implement cancels (lots of race conditions 
and corner cases involved).
2. Very, very few people ask for it (i.e., can't justify the time to do #1 
properly)
3. The MPI spec allows MPI_CANCEL's to fail (i.e., we still adhere to the MPI 
spec, even if we allow cancels to fail).

Sorry!  :-(


On Jun 7, 2010, at 5:00 PM, Jovana Knezevic wrote:

> Hello Gijsbert,
> 
> I had the same problem few months ago. I even could not cancel the
> messages for which I did not have a matching receive on the other side
> (thus, they could not have been received! :-)). I was wondering really
> what was going on... I have some experience with MPI, but I am not an
> expert. I would really appreciate an explanation from the developers.
> While "google"-ing the potential solution, I found out that some
> distributions (not Open-MPI) do not allow canceling, thus, I think
> that one cannot rely on MPI_Cancel(). If I am right, the question is
> then: why implement it?! Is the logic behind "better ever than never"?
> :-) So, use it when it is better to do the cancellation, but don't
> really rely on it... ?! As I said, I am not an expert, but it would be
> great to hear about this from them. If, however, YOU find any
> solution, it would be great if you wrote about it on this list! Thanks
> in advance.
> 
> Regards,
> Jovana Knezevic
> 
> 2010/6/7  <users-requ...@open-mpi.org>:
> > Send users mailing list submissions to
> >        us...@open-mpi.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >        http://www.open-mpi.org/mailman/listinfo.cgi/users
> > or, via email, send a message with subject or body 'help' to
> >        users-requ...@open-mpi.org
> >
> > You can reach the person managing the list at
> >        users-ow...@open-mpi.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of users digest..."
> >
> >
> > Today's Topics:
> >
> >   1. Re: mpi_iprobe not behaving as expect (David Zhang)
> >   2. Re: mpi_iprobe not behaving as expect (David Zhang)
> >   3. Re: mpi_iprobe not behaving as expect (David Zhang)
> >   4. Behaviour of MPI_Cancel when using 'large' messages
> >      (Gijsbert Wiesenekker)
> >   5. Re: [sge::tight-integration] slot scheduling and  resources
> >      handling (Eloi Gaudry)
> >   6. ompi-restart, ompi-ps problem (Nguyen Kim Son)
> >   7. ompi-restart failed (Nguyen Toan)
> >   8. Re: ompi-restart failed (Nguyen Toan)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Sun, 6 Jun 2010 11:08:41 -0700
> > From: David Zhang <solarbik...@gmail.com>
> > Subject: Re: [OMPI users] mpi_iprobe not behaving as expect
> > To: users <us...@open-mpi.org>
> > Message-ID:
> >        <aanlktincozh2n7w3n0z3eu4lbiklbaf4n2uw5jquq...@mail.gmail.com>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > On Sat, Jun 5, 2010 at 2:44 PM, David Zhang <solarbik...@gmail.com> wrote:
> >
> >> Dear all:
> >>
> >> I'm using mpi_iprobe to serve as a way to send signals between different
> >> mpi executables. I'm using the following test codes (fortran):
> >>
> >> #1
> >> program send
> >> implicit none
> >>         include 'mpif.h'
> >>
> >> real*8 :: vec(20000)=1.0
> >> integer :: ierr,i=0,request(1)
> >>
> >>         call mpi_init(ierr)
> >>         do
> >>                 call mpi_isend(vec,20000,mpi_real8,
> >> 0,1,mpi_comm_world,request(1),ierr)
> >>                 i=i+1
> >>                 print *,i
> >>                 vec=-vec
> >>                 call usleep_fortran(2.d0)
> >>                 call mpi_wait(request(1),MPI_STATUS_IGNORE,ierr)
> >>         end do
> >>
> >> end program send
> >> --------------------------------------------------
> >> #2
> >> program send
> >> implicit none
> >>         include 'mpif.h'
> >>
> >> real*8 :: vec(20000)
> >> integer :: ierr
> >>
> >>         call mpi_init(ierr)
> >>         do
> >>                 if(key_present()) then
> >>                         call
> >> mpi_recv(vec,20000,mpi_real8,1,1,mpi_comm_world,MPI_STATUS_IGNORE,ierr)
> >>                 end if
> >>                 call usleep_fortran(0.05d0)
> >>
> >>         end do
> >>
> >> contains
> >>
> >> function key_present()
> >> implicit none
> >>   logical :: key_present
> >>
> >>         key_present = .false.
> >>         call
> >> mpi_iprobe(1,1,mpi_comm_world,key_present,MPI_STATUS_IGNORE,ierr)
> >>         print *, key_present
> >>
> >> end function key_present
> >>
> >> end program send
> >> -----------------------------------
> >> The usleep_fortran is a routine I've written to pause the program for that
> >> amount of time (in seconds).  As you can see, on the receiving end I'm
> >> probing to see whether the message has being received every 0.05 seconds,
> >> where each probing would result a print of the probing result; while the
> >> sending is once every 2 seconds.
> >>
> >> Doing
> >> mpirun -np 1 recv : -np 1 send
> >>  Naturally I expect the output to be something like:
> >>
> >> 1
> >> (fourty or so F)
> >> T
> >> 2
> >> (another fourty or so F)
> >> T
> >> 3
> >>
> >> however this is the output I get:
> >>
> >> 1
> >> (fourty or so F)
> >> T
> >> 2
> >> (about a two second delay)
> >> T
> >> 3
> >>
> >> It seems to me that after the first set of probes, once the message was
> >> received, the non-blocking mpi probe becomes blocking for some strange
> >> reason.  I'm using mpi_iprobe for the first time, so I'm not sure if I'm
> >> doing something blatantly wrong.
> >>
> >>
> >> --
> >> David Zhang
> >> University of California, San Diego
> >>
> >
> >
> >
> > --
> > David Zhang
> > University of California, San Diego
> > -------------- next part --------------
> > HTML attachment scrubbed and removed
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Sun, 6 Jun 2010 11:35:29 -0700
> > From: David Zhang <solarbik...@gmail.com>
> > Subject: Re: [OMPI users] mpi_iprobe not behaving as expect
> > To: users <us...@open-mpi.org>
> > Message-ID:
> >        
> > <15739_1275849354_o56izn5f009814_aanlktimvv-bomw2rknbqj26vham0ndefnk01alh5l...@mail.gmail.com>
> >
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > I have modified the code so that all the terminal outputs are done by one
> > executable.  I have attached the source files, after compiling type "make
> > go" and the code will execute.
> >
> > The previous code output was from a supercomputer cluster where the two
> > processes resides on two different nodes.  When running the same code on a
> > regular-multiprocessor machine (mac mini in this case), I got this output:
> >  F
> >  F
> >  T
> >           1
> >  F
> >  T
> >           2
> >  F
> >  T
> >           3
> >  F
> >  T
> >           4
> >
> > If I'm sending a message every 2 seconds and I'm polling every 0.05 second,
> > I would expect 39 F and 1 T between each number.  At least when I ran it on
> > the supercomputer, this is true during the very beginning; however I don't
> > even see this when I'm running the code on my mac mini.
> >
> > On Sat, Jun 5, 2010 at 2:44 PM, David Zhang <solarbik...@gmail.com> wrote:
> >
> >> Dear all:
> >>
> >> I'm using mpi_iprobe to serve as a way to send signals between different
> >> mpi executables. I'm using the following test codes (fortran):
> >>
> >> #1
> >> program send
> >> implicit none
> >>         include 'mpif.h'
> >>
> >> real*8 :: vec(20000)=1.0
> >> integer :: ierr,i=0,request(1)
> >>
> >>         call mpi_init(ierr)
> >>         do
> >>                 call mpi_isend(vec,20000,mpi_real8,
> >> 0,1,mpi_comm_world,request(1),ierr)
> >>                 i=i+1
> >>                 print *,i
> >>                 vec=-vec
> >>                 call usleep_fortran(2.d0)
> >>                 call mpi_wait(request(1),MPI_STATUS_IGNORE,ierr)
> >>         end do
> >>
> >> end program send
> >> --------------------------------------------------
> >> #2
> >> program send
> >> implicit none
> >>         include 'mpif.h'
> >>
> >> real*8 :: vec(20000)
> >> integer :: ierr
> >>
> >>         call mpi_init(ierr)
> >>         do
> >>                 if(key_present()) then
> >>                         call
> >> mpi_recv(vec,20000,mpi_real8,1,1,mpi_comm_world,MPI_STATUS_IGNORE,ierr)
> >>                 end if
> >>                 call usleep_fortran(0.05d0)
> >>
> >>         end do
> >>
> >> contains
> >>
> >> function key_present()
> >> implicit none
> >>   logical :: key_present
> >>
> >>         key_present = .false.
> >>         call
> >> mpi_iprobe(1,1,mpi_comm_world,key_present,MPI_STATUS_IGNORE,ierr)
> >>         print *, key_present
> >>
> >> end function key_present
> >>
> >> end program send
> >> -----------------------------------
> >> The usleep_fortran is a routine I've written to pause the program for that
> >> amount of time (in seconds).  As you can see, on the receiving end I'm
> >> probing to see whether the message has being received every 0.05 seconds,
> >> where each probing would result a print of the probing result; while the
> >> sending is once every 2 seconds.
> >>
> >> Doing
> >> mpirun -np 1 recv : -np 1 send
> >>  Naturally I expect the output to be something like:
> >>
> >> 1
> >> (fourty or so F)
> >> T
> >> 2
> >> (another fourty or so F)
> >> T
> >> 3
> >>
> >> however this is the output I get:
> >>
> >> 1
> >> (fourty or so F)
> >> T
> >> 2
> >> (about a two second delay)
> >> T
> >> 3
> >>
> >> It seems to me that after the first set of probes, once the message was
> >> received, the non-blocking mpi probe becomes blocking for some strange
> >> reason.  I'm using mpi_iprobe for the first time, so I'm not sure if I'm
> >> doing something blatantly wrong.
> >>
> >>
> >> --
> >> David Zhang
> >> University of California, San Diego
> >>
> >
> >
> >
> > --
> > David Zhang
> > University of California, San Diego
> >
> >
> > *****************************************************************************
> > **                                                                         
> > **
> > ** WARNING:  This email contains an attachment of a very suspicious type.  
> > **
> > ** You are urged NOT to open this attachment unless you are absolutely     
> > **
> > ** sure it is legitimate.  Opening this attachment may cause irreparable   
> > **
> > ** damage to your computer and your files.  If you have any questions      
> > **
> > ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. 
> > **
> > **                                                                         
> > **
> > ** This warning was added by the IU Computer Science Dept. mail scanner.   
> > **
> > *****************************************************************************
> >
> >
> > -------------- next part --------------
> > HTML attachment scrubbed and removed
> > -------------- next part --------------
> > A non-text attachment was scrubbed...
> > Name: send_recv.zip
> > Type: application/zip
> > Size: 1578 bytes
> > Desc: not available
> > URL: 
> > <http://www.open-mpi.org/MailArchives/users/attachments/20100606/329462e9/attachment.zip>
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Sun, 6 Jun 2010 18:36:57 +0000
> > From: David Zhang <solarbik...@gmail.com>
> > Subject: Re: [OMPI users] mpi_iprobe not behaving as expect
> > To: users <us...@open-mpi.org>
> > Message-ID: <1929187089.1275849417...@ace.smi>
> > Content-Type: text/plain; charset=us-ascii
> >
> > This email contained a .zip file attachment. Raytheon does not allow email 
> > attachments that are considered likely to contain malicious code. For your 
> > protection this attachment has been removed.
> >
> > If this email is from an unknown source, please simply delete this email.
> >
> > If this email was expected, and it is from a known sender, you may follow 
> > the below suggested instructions to obtain these types of attachments.
> >
> > + Instruct the sender to enclose the file(s) in a ".zip" compressed file, 
> > and rename the ".zip" compressed file with a different extension, such as, 
> > ".rtnzip".  Password protecting the renamed ".zip" compressed file adds an 
> > additional layer of protection. When you receive the file, please rename it 
> > with the extension ".zip".
> >
> > Additional instructions and options on how to receive these attachments can 
> > be found at:
> >
> > http://security.it.ray.com/antivirus/extensions.html
> > http://security.it.ray.com/news/2007/zipfiles.html
> >
> > Should you have any questions or difficulty with these instructions, please 
> > contact the Help Desk at 877.844.4712
> >
> > ---
> >
> > I have modified the code so that all the terminal outputs are done by one
> > executable.  I have attached the source files, after compiling type "make
> > go" and the code will execute.
> >
> > The previous code output was from a supercomputer cluster where the two
> > processes resides on two different nodes.  When running the same code on a
> > regular-multiprocessor machine (mac mini in this case), I got this output:
> >  F
> >  F
> >  T
> >           1
> >  F
> >  T
> >           2
> >  F
> >  T
> >           3
> >  F
> >  T
> >           4
> >
> > If I'm sending a message every 2 seconds and I'm polling every 0.05 second,
> > I would expect 39 F and 1 T between each number.  At least when I ran it on
> > the supercomputer, this is true during the very beginning; however I don't
> > even see this when I'm running the code on my mac mini.
> >
> > On Sat, Jun 5, 2010 at 2:44 PM, David Zhang <solarbik...@gmail.com> wrote:
> >
> >> Dear all:
> >>
> >> I'm using mpi_iprobe to serve as a way to send signals between different
> >> mpi executables. I'm using the following test codes (fortran):
> >>
> >> #1
> >> program send
> >> implicit none
> >>         include 'mpif.h'
> >>
> >> real*8 :: vec(20000)=1.0
> >> integer :: ierr,i=0,request(1)
> >>
> >>         call mpi_init(ierr)
> >>         do
> >>                 call mpi_isend(vec,20000,mpi_real8,
> >> 0,1,mpi_comm_world,request(1),ierr)
> >>                 i=i+1
> >>                 print *,i
> >>                 vec=-vec
> >>                 call usleep_fortran(2.d0)
> >>                 call mpi_wait(request(1),MPI_STATUS_IGNORE,ierr)
> >>         end do
> >>
> >> end program send
> >> --------------------------------------------------
> >> #2
> >> program send
> >> implicit none
> >>         include 'mpif.h'
> >>
> >> real*8 :: vec(20000)
> >> integer :: ierr
> >>
> >>         call mpi_init(ierr)
> >>         do
> >>                 if(key_present()) then
> >>                         call
> >> mpi_recv(vec,20000,mpi_real8,1,1,mpi_comm_world,MPI_STATUS_IGNORE,ierr)
> >>                 end if
> >>                 call usleep_fortran(0.05d0)
> >>
> >>         end do
> >>
> >> contains
> >>
> >> function key_present()
> >> implicit none
> >>   logical :: key_present
> >>
> >>         key_present = .false.
> >>         call
> >> mpi_iprobe(1,1,mpi_comm_world,key_present,MPI_STATUS_IGNORE,ierr)
> >>         print *, key_present
> >>
> >> end function key_present
> >>
> >> end program send
> >> -----------------------------------
> >> The usleep_fortran is a routine I've written to pause the program for that
> >> amount of time (in seconds).  As you can see, on the receiving end I'm
> >> probing to see whether the message has being received every 0.05 seconds,
> >> where each probing would result a print of the probing result; while the
> >> sending is once every 2 seconds.
> >>
> >> Doing
> >> mpirun -np 1 recv : -np 1 send
> >>  Naturally I expect the output to be something like:
> >>
> >> 1
> >> (fourty or so F)
> >> T
> >> 2
> >> (another fourty or so F)
> >> T
> >> 3
> >>
> >> however this is the output I get:
> >>
> >> 1
> >> (fourty or so F)
> >> T
> >> 2
> >> (about a two second delay)
> >> T
> >> 3
> >>
> >> It seems to me that after the first set of probes, once the message was
> >> received, the non-blocking mpi probe becomes blocking for some strange
> >> reason.  I'm using mpi_iprobe for the first time, so I'm not sure if I'm
> >> doing something blatantly wrong.
> >>
> >>
> >> --
> >> David Zhang
> >> University of California, San Diego
> >>
> >
> >
> >
> > --
> > David Zhang
> > University of California, San Diego
> >
> >
> > *****************************************************************************
> > **                                                                         
> > **
> > ** WARNING:  This email contains an attachment of a very suspicious type.  
> > **
> > ** You are urged NOT to open this attachment unless you are absolutely     
> > **
> > ** sure it is legitimate.  Opening this attachment may cause irreparable   
> > **
> > ** damage to your computer and your files.  If you have any questions      
> > **
> > ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. 
> > **
> > **                                                                         
> > **
> > ** This warning was added by the IU Computer Science Dept. mail scanner.   
> > **
> > *****************************************************************************
> >
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Mon, 7 Jun 2010 07:53:19 +0200
> > From: Gijsbert Wiesenekker <gijsbert.wiesenek...@gmail.com>
> > Subject: [OMPI users] Behaviour of MPI_Cancel when using 'large'
> >        messages
> > To: Open MPI Users <us...@open-mpi.org>
> > Message-ID: <30252122-5026-42e7-bb9b-ca670e7c9...@gmail.com>
> > Content-Type: text/plain; charset=us-ascii
> >
> > The following code tries to send a message, but if it takes too long the 
> > message is cancelled:
> >
> >  #define DEADLOCK_ABORT   (30.0)
> >
> >  MPI_Isend(message, count, MPI_BYTE, comm_id,
> >    MPI_MESSAGE_TAG, MPI_COMM_WORLD, &request);
> >
> >  t0 = time(NULL);
> >  cancelled = FALSE;
> >
> >  while(TRUE)
> >  {
> >    //do some work
> >
> >    //test if message is delivered or cancelled
> >    MPI_Test(&request, &flag, &status);
> >    if (flag) break;
> >
> >    //test if it takes too long
> >    t1 = time(NULL);
> >    wall = difftime(t1, t0);
> >    if (!cancelled and (wall > DEADLOCK_ABORT))
> >    {
> >      MPI_Cancel(&request);
> >      cancelled = TRUE;
> >      my_printf("cancelled!\n");
> >    }
> >  }
> >
> > Now if I use a message size of about 5000 bytes and the message cannot be 
> > delivered after DEADLOCK_ABORT seconds the MPI_Cancel is executed, but 
> > still MPI_Test never returns TRUE, so it looks like the message cannot be 
> > cancelled for some reason.
> > I am using OpenMPI 1.4.2 on Fedora Core 13.
> > Any ideas?
> >
> > Thanks,
> > Gijsbert
> >
> >
> >
> >
> > ------------------------------
> >
> > Message: 5
> > Date: Mon, 7 Jun 2010 09:50:26 +0200
> > From: Eloi Gaudry <e...@fft.be>
> > Subject: Re: [OMPI users] [sge::tight-integration] slot scheduling and
> >        resources handling
> > To: Reuti <re...@staff.uni-marburg.de>
> > Cc: Open MPI Users <us...@open-mpi.org>
> > Message-ID: <201006070950.26323...@fft.be>
> > Content-Type: Text/Plain;  charset="iso-8859-1"
> >
> > Hi Reuti,
> >
> > I've been unable to reproduce the issue so far.
> >
> > Sorry for the convenience,
> > Eloi
> >
> > On Tuesday 25 May 2010 11:32:44 Reuti wrote:
> >> Hi,
> >>
> >> Am 25.05.2010 um 09:14 schrieb Eloi Gaudry:
> >> > I do no reset any environment variable during job submission or job
> >> > handling. Is there a simple way to check that openmpi is working as
> >> > expected with SGE tight integration (as displaying environment
> >> > variables, setting options on the command line, etc. ) ?
> >>
> >> a) put a command:
> >>
> >> env
> >>
> >> in the jobscript and check the output for $JOB_ID and various $SGE_*
> >> variables.
> >>
> >> b) to confirm the misbehavior: are the tasks on the slave nodes kids of
> >> sge_shepherd or any system sshd/rshd?
> >>
> >> -- Reuti
> >>
> >> > Regards,
> >> > Eloi
> >> >
> >> > On Friday 21 May 2010 17:35:24 Reuti wrote:
> >> >> Hi,
> >> >>
> >> >> Am 21.05.2010 um 17:19 schrieb Eloi Gaudry:
> >> >>> Hi Reuti,
> >> >>>
> >> >>> Yes, the openmpi binaries used were build after having used the
> >> >>> --with-sge during configure, and we only use those binaries on our
> >> >>> cluster.
> >> >>>
> >> >>> [eg@moe:~]$ /opt/openmpi-1.3.3/bin/ompi_info
> >> >>>
> >> >>>                MCA ras: gridengine (MCA v2.0, API v2.0, Component
> >> >>>                v1.3.3)
> >> >>
> >> >> ok. As you have a Tight Integration as goal and set in your PE
> >> >> "control_slaves TRUE", SGE wouldn't allow `qrsh -inherit ...` to nodes
> >> >> which are not in the list of granted nodes. So it looks, like your job
> >> >> is running outside of this Tight Integration with its own `rsh`or
> >> >> `ssh`.
> >> >>
> >> >> Do you reset $JOB_ID or other environment variables in your jobscript,
> >> >> which could trigger Open MPI to assume that it's not running inside SGE?
> >> >>
> >> >> -- Reuti
> >> >>
> >> >>> On Friday 21 May 2010 16:01:54 Reuti wrote:
> >> >>>> Hi,
> >> >>>>
> >> >>>> Am 21.05.2010 um 14:11 schrieb Eloi Gaudry:
> >> >>>>> Hi there,
> >> >>>>>
> >> >>>>> I'm observing something strange on our cluster managed by SGE6.2u4
> >> >>>>> when launching a parallel computation on several nodes, using
> >> >>>>> OpenMPI/SGE tight- integration mode (OpenMPI-1.3.3). It seems that
> >> >>>>> the SGE allocated slots are not used by OpenMPI, as if OpenMPI was
> >> >>>>> doing is own
> >> >>>>> round-robin allocation based on the allocated node hostnames.
> >> >>>>
> >> >>>> you compiled Open MPI with --with-sge (and recompiled your
> >> >>>> applications)? You are using the correct mpiexec?
> >> >>>>
> >> >>>> -- Reuti
> >> >>>>
> >> >>>>> Here is what I'm doing:
> >> >>>>> - launch a parallel computation involving 8 processors, using for
> >> >>>>> each of them 14GB of memory. I'm using a qsub command where i
> >> >>>>> request memory_free resource and use tight integration with openmpi
> >> >>>>> - 3 servers are available:
> >> >>>>> . barney with 4 cores (4 slots) and 32GB
> >> >>>>> . carl with 4 cores (4 slots) and 32GB
> >> >>>>> . charlie with 8 cores (8 slots) and 64GB
> >> >>>>>
> >> >>>>> Here is the output of the allocated nodes (OpenMPI output):
> >> >>>>> ======================   ALLOCATED NODES   ======================
> >> >>>>>
> >> >>>>> Data for node: Name: charlie   Launch id: -1 Arch: ffc91200  State: 2
> >> >>>>>
> >> >>>>> Daemon: [[44332,0],0] Daemon launched: True
> >> >>>>> Num slots: 4  Slots in use: 0
> >> >>>>> Num slots allocated: 4  Max slots: 0
> >> >>>>> Username on node: NULL
> >> >>>>> Num procs: 0  Next node_rank: 0
> >> >>>>>
> >> >>>>> Data for node: Name: carl.fft    Launch id: -1 Arch: 0 State: 2
> >> >>>>>
> >> >>>>> Daemon: Not defined Daemon launched: False
> >> >>>>> Num slots: 2  Slots in use: 0
> >> >>>>> Num slots allocated: 2  Max slots: 0
> >> >>>>> Username on node: NULL
> >> >>>>> Num procs: 0  Next node_rank: 0
> >> >>>>>
> >> >>>>> Data for node: Name: barney.fft    Launch id: -1 Arch: 0 State: 2
> >> >>>>>
> >> >>>>> Daemon: Not defined Daemon launched: False
> >> >>>>> Num slots: 2  Slots in use: 0
> >> >>>>> Num slots allocated: 2  Max slots: 0
> >> >>>>> Username on node: NULL
> >> >>>>> Num procs: 0  Next node_rank: 0
> >> >>>>>
> >> >>>>> =================================================================
> >> >>>>>
> >> >>>>> Here is what I see when my computation is running on the cluster:
> >> >>>>> #     rank       pid          hostname
> >> >>>>>
> >> >>>>>       0     28112          charlie
> >> >>>>>       1     11417          carl
> >> >>>>>       2     11808          barney
> >> >>>>>       3     28113          charlie
> >> >>>>>       4     11418          carl
> >> >>>>>       5     11809          barney
> >> >>>>>       6     28114          charlie
> >> >>>>>       7     11419          carl
> >> >>>>>
> >> >>>>> Note that -the parallel environment used under SGE is defined as:
> >> >>>>> [eg@moe:~]$ qconf -sp round_robin
> >> >>>>> pe_name            round_robin
> >> >>>>> slots              32
> >> >>>>> user_lists         NONE
> >> >>>>> xuser_lists        NONE
> >> >>>>> start_proc_args    /bin/true
> >> >>>>> stop_proc_args     /bin/true
> >> >>>>> allocation_rule    $round_robin
> >> >>>>> control_slaves     TRUE
> >> >>>>> job_is_first_task  FALSE
> >> >>>>> urgency_slots      min
> >> >>>>> accounting_summary FALSE
> >> >>>>>
> >> >>>>> I'm wondering why OpenMPI didn't use the allocated nodes chosen by
> >> >>>>> SGE (cf. "ALLOCATED NODES" report) but instead allocate each job of
> >> >>>>> the parallel computation at a time, using a round-robin method.
> >> >>>>>
> >> >>>>> Note that I'm using the '--bynode' option in the orterun command
> >> >>>>> line. If the behavior I'm observing is simply the consequence of
> >> >>>>> using this option, please let me know. This would eventually mean
> >> >>>>> that one need to state that SGE tight- integration has a lower
> >> >>>>> priority on orterun behavior than the different command line
> >> >>>>> options.
> >> >>>>>
> >> >>>>> Any help would be appreciated,
> >> >>>>> Thanks,
> >> >>>>> Eloi
> >> >>>>
> >> >>>> _______________________________________________
> >> >>>> users mailing list
> >> >>>> us...@open-mpi.org
> >> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > --
> >
> >
> > Eloi Gaudry
> >
> > Free Field Technologies
> > Axis Park Louvain-la-Neuve
> > Rue Emile Francqui, 1
> > B-1435 Mont-Saint Guibert
> > BELGIUM
> >
> > Company Phone: +32 10 487 959
> > Company Fax:   +32 10 454 626
> >
> >
> > ------------------------------
> >
> > Message: 6
> > Date: Mon, 7 Jun 2010 10:48:24 +0200
> > From: Nguyen Kim Son <nguyenk...@gmail.com>
> > Subject: [OMPI users] ompi-restart, ompi-ps problem
> > To: us...@open-mpi.org
> > Message-ID:
> >        <aanlktimhdrxjpkxuahl7c3_fcasr6yuubfxd7ehih...@mail.gmail.com>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Hello,
> >
> > I'n trying to get functions like orte-checkpoint, orte-restart,... works but
> > there are some errors that I don't have any clue about.
> >
> > Blcr (0.8.2) works fine apparently and  I have installed openmpi 1.4.2 from
> > source with option blcr.
> > The command
> > mpirun -np 4  -am ft-enable-cr ./checkpoint_test
> > seemed OK but
> > orte-checkpoint --term PID_of_checkpoint_test ( obtaining after ps -ef |
> > grep mpirun )
> > does not return and shows nothing like errors!
> >
> > Then, I checked with
> > ompi-ps
> > this time, I obtain:
> > oob-tcp: Communication retries exceeded.  Can not communicate with peer
> >
> > Does anyone has the same problem?
> > Any idea is welcomed!
> > Thanks,
> > Son.
> >
> >
> > --
> > ---------------------------------------------------------
> > Son NGUYEN KIM
> > Antibes 06600
> > Tel: 06 48 28 37 47
> > -------------- next part --------------
> > HTML attachment scrubbed and removed
> >
> > ------------------------------
> >
> > Message: 7
> > Date: Mon, 7 Jun 2010 23:51:07 +0900
> > From: Nguyen Toan <nguyentoan1...@gmail.com>
> > Subject: [OMPI users] ompi-restart failed
> > To: Open MPI Users <us...@open-mpi.org>
> > Message-ID:
> >        <aanlktim9rgffs_rcfdykhvz0vsnlt86uphvmqg3re...@mail.gmail.com>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Hello everyone,
> >
> > I'm using OpenMPI 1.4.2 with BLCR 0.8.2 to test checkpointing on 2 nodes but
> > it failed to restart (Segmentation fault).
> > Here are the details concerning my problem:
> >
> > + OS: Centos 5.4
> > + OpenMPI configure:
> > ./configure --with-ft=cr --enable-ft-thread --enable-mpi-threads \
> > --with-blcr=/home/nguyen/opt/blcr
> > --with-blcr-libdir=/home/nguyen/opt/blcr/lib \
> > --prefix=/home/nguyen/opt/openmpi \
> > --enable-mpirun-prefix-by-default
> > + mpirun -am ft-enable-cr -machinefile host ./test
> >
> > I checkpointed the test program using "ompi-checkpoint -v -s PID" and the
> > checkpoint file was created successfully. However it failed to restart using
> > ompi-restart:
> > *"mpirun noticed that process rank 0 with PID 21242 on node rc014.local
> > exited on signal 11 (Segmentation fault)"
> > *
> > Did I miss something in the installation of OpenMPI?
> >
> > Regards,
> > Nguyen Toan
> > -------------- next part --------------
> > HTML attachment scrubbed and removed
> >
> > ------------------------------
> >
> > Message: 8
> > Date: Tue, 8 Jun 2010 00:07:33 +0900
> > From: Nguyen Toan <nguyentoan1...@gmail.com>
> > Subject: Re: [OMPI users] ompi-restart failed
> > To: Open MPI Users <us...@open-mpi.org>
> > Message-ID:
> >        <aanlktilkha9fpbindtal_tpbwtshsf5sovvhwziss...@mail.gmail.com>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Sorry, I just want to add 2 more things:
> > + I tried configure with and without --enable-ft-thread but nothing changed
> > + I also applied this patch for OpenMPI here and reinstalled but I got the
> > same error
> > https://svn.open-mpi.org/trac/ompi/raw-attachment/ticket/2139/v1.4-preload-part1.diff
> >
> > Somebody helps? Thank you very much.
> >
> > Nguyen Toan
> >
> > On Mon, Jun 7, 2010 at 11:51 PM, Nguyen Toan 
> > <nguyentoan1...@gmail.com>wrote:
> >
> >> Hello everyone,
> >>
> >> I'm using OpenMPI 1.4.2 with BLCR 0.8.2 to test checkpointing on 2 nodes
> >> but it failed to restart (Segmentation fault).
> >> Here are the details concerning my problem:
> >>
> >> + OS: Centos 5.4
> >> + OpenMPI configure:
> >> ./configure --with-ft=cr --enable-ft-thread --enable-mpi-threads \
> >> --with-blcr=/home/nguyen/opt/blcr
> >> --with-blcr-libdir=/home/nguyen/opt/blcr/lib \
> >> --prefix=/home/nguyen/opt/openmpi \
> >> --enable-mpirun-prefix-by-default
> >> + mpirun -am ft-enable-cr -machinefile host ./test
> >>
> >> I checkpointed the test program using "ompi-checkpoint -v -s PID" and the
> >> checkpoint file was created successfully. However it failed to restart 
> >> using
> >> ompi-restart:
> >> *"mpirun noticed that process rank 0 with PID 21242 on node rc014.local
> >> exited on signal 11 (Segmentation fault)"
> >> *
> >> Did I miss something in the installation of OpenMPI?
> >>
> >> Regards,
> >> Nguyen Toan
> >>
> > -------------- next part --------------
> > HTML attachment scrubbed and removed
> >
> > ------------------------------
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > End of users Digest, Vol 1594, Issue 1
> > **************************************
> >
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] Behaviour of MPI_Cancel when using 'large' messages

Reply via email to