Hello Ralph,

Thanks for your quick reply and bug fix.  I have obtained the update and tried 
it in my simple 
example, and also in the original program from which the simple example was 
extracted.  
The update works as expected :) 

Sincerely,

Ted Sussman

On 27 Jun 2017 at 12:13, r...@open-mpi.org wrote:

> 
> Oh my - I finally tracked it down. A simple one character error.
> 
> Thanks for your patience. Fix is https://github.com/open-mpi/ompi/pull/3773 
> and will be ported to 2.x 
> and 3.0
> Ralph
> 
>     On Jun 27, 2017, at 11:17 AM, r...@open-mpi.org wrote:
> 
>     Ideally, we should be delivering the signal to all procs in the process 
> group of each dum.sh. 
>     Looking at the code in the head of the 2.x branch, that does indeed 
> appear to be what we 
>     are doing, assuming that we found setpgid in your system: 
> 
>     static int odls_default_kill_local(pid_t pid, int signum)
>     {
>         pid_t pgrp;
> 
>     #if HAVE_SETPGID
>         pgrp = getpgid(pid);
>         if (-1 != pgrp) {
>             /* target the lead process of the process
>              * group so we ensure that the signal is
>              * seen by all members of that group. This
>              * ensures that the signal is seen by any
>              * child processes our child may have
>              * started
>              */
>             pid = pgrp;
>         }
>     #endif
>         if (0 != kill(pid, signum)) {
>             if (ESRCH != errno) {
>                 OPAL_OUTPUT_VERBOSE((2, 
> orte_odls_base_framework.framework_output,
>                                      "%s odls:default:SENT KILL %d TO PID %d 
> GOT ERRNO %d",
>                                      ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), 
> signum, (int)pid, errno));
>                 return errno;
>             }
>         }
>         OPAL_OUTPUT_VERBOSE((2, orte_odls_base_framework.framework_output,
>                              "%s odls:default:SENT KILL %d TO PID %d SUCCESS",
>                              ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), signum, 
> (int)pid));
>         return 0;
>     }
> 
>     For some strange reason, it appears that you aren´t see this? I´m 
> building the branch now 
>     and will see if I can reproduce it.
> 
>     On Jun 27, 2017, at 10:58 AM, Ted Sussman <ted.suss...@adina.com > wrote:
> 
>     Hello all,
> 
>     Thank you for your help and advice.  It has taken me several days to 
> understand what 
>     you were trying to tell me.  I have now studied the problem in more 
> detail, using a 
>     version of Open MPI 2.1.1 built with --enable-debug.
> 
>     -----
> 
>     Consider the following scenario in Open MPI 2.1.1:
> 
>     mpirun --> dum.sh --> aborttest.exe  (rank 0)
>            --> dum.sh --> aborttest.exe  (rank 1)
> 
>     aborttest.exe calls MPI_Bcast several times, then aborttest.exe rank 0 
> calls 
>     MPI_Abort.
> 
>     As far as I can figure out, this is what happens after aborttest.exe rank 
> 0 calls 
>     MPI_Abort.
> 
>     1) aborttest.exe for rank 0 exits.  aborttest.exe for rank 1 is polling 
> (waiting for 
>     message from MPI_Bcast).
> 
>     2) mpirun (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL 
> to both 
>     dum.sh processes.
> 
>     3) Both dum.sh processes are killed.
> 
>     4) aborttest.exe for rank 1 continues to poll. mpirun never exits.
> 
>     ----
> 
>     Now suppose that dum.sh traps SIGCONT, and that the trap handler in 
> dum.sh sends 
>     signal SIGINT to $PPID.  This is what seems to happen after aborttest.exe 
> rank 0 calls 
>     MPI_Abort:
> 
>     1) aborttest.exe for rank 0 exits. aborttest.exe for rank 1 is polling 
> (waiting for 
>     message from MPI_Bcast).
> 
>     2) mpirun  (or maybe orted?) sends the signals SIGCONT, SIGTERM, SIGKILL 
> to both 
>     dum.sh processes.
> 
>     3) dum.sh for rank 0 catches SIGCONT and sents SIGINT to its parent.  
> dum.sh for 
>     rank 1 appears to be killed (I don't understand this, why doesn't dum.sh 
> for rank 1 also 
>     catch SIGCONT?)
> 
>     4) mpirun catches the SIGINT and kills aborttest.exe for rank 1, then 
> mpirun exits.
> 
>     So adding the trap handler to dum.sh solves my problem.
> 
>     Is this the preferred solution to my problem?  Or is there a more elegant 
> solution?
> 
>     Sincerely,
> 
>     Ted Sussman
> 
> 
> 
> 
> 
> 
> 
> 
>     On 19 Jun 2017 at 11:19, r...@open-mpi.org wrote:
> 
>     >
>     >
>     >
>     >     On Jun 19, 2017, at 10:53 AM, Ted Sussman <ted.suss...@adina.com > 
> wrote:
>     >
>     >     For what it's worth, the problem might be related to the following:
>     >
>     >     mpirun: -np 2 ... dum.sh
>     >     dum.sh: Invoke aborttest11.exe
>     >     aborttest11.exe: Call  MPI_Init, go into an infinite loop.
>     >
>     >     Now when mpirun is running, send signals at the processes, as 
> follows:
>     >
>     >     1) kill -9 (pid for one of the aborttest11.exe processes)
>     >
>     >     The shell for this aborttest11.exe continues. Once this shell 
> exits, then Open MPI 
>     sends
>     >     signals to both shells, killing the other shell, but the remaining 
> aborttest11.exe 
>     survives.  The
>     >     PPID for the remaining aborttest11.exe becomes 1.
>     >
>     > We have no visibility into your aborttest processes since we didn´t 
> launch them. So 
>     killing one of
>     > them is invisible to us. We can only see the shell scripts.
>     >
>     >
>     >     2) kill -9 (pid for one of the dum.sh processes).
>     >
>     >     Open MPI sends signals to both of the shells. Both shells are 
> killed off, but both
>     >     aborttest11.exe processes survive, with PPID set to 1.
>     >
>     > This again is a question of how you handle things in your program. The 
> _only_ 
>     process we can
>     > see is your script. If you kill a script that started a process, then 
> your process is 
>     going to have to
>     > know how to detect the script has died and "suicide" - there is nothing 
> we can do to 
>     help.
>     >
>     > Honestly, it sounds to me like the real problem here is that your .exe 
> program isn´t 
>     monitoring the
>     > shell above it to know when to "suicide". I don´t see how we can help 
> you there.
>     >
>     >
>     >
>     >     On 19 Jun 2017 at 10:10, r...@open-mpi.org wrote:
>     >
>     >     >
>     >     > That is typical behavior when you throw something into "sleep" - 
> not much we can 
>     do
>     >     about it, I
>     >     > think.
>     >     >
>     >     >     On Jun 19, 2017, at 9:58 AM, Ted Sussman 
> <ted.suss...@adina.com > wrote:
>     >     >
>     >     >     Hello,
>     >     >    
>     >     >     I have rebuilt Open MPI 2.1.1 on the same computer, including 
> --enable-debug.
>     >     >    
>     >     >     I have attached the abort test program aborttest10.tgz.  This 
> version sleeps for 5 sec 
>     before
>     >     >     calling MPI_ABORT, so that I can check the pids using ps.
>     >     >    
>     >     >     This is what happens (see run2.sh.out).
>     >     >    
>     >     >     Open MPI invokes two instances of dum.sh.  Each instance of 
> dum.sh invokes 
>     aborttest.exe.
>     >     >    
>     >     >     Pid    Process
>     >     >     -------------------
>     >     >     19565  dum.sh
>     >     >     19566  dum.sh
>     >     >     19567 aborttest10.exe
>     >     >     19568 aborttest10.exe
>     >     >    
>     >     >     When MPI_ABORT is called, Open MPI sends SIGCONT, SIGTERM and 
> SIGKILL to 
>     both
>     >     >     instances of dum.sh (pids 19565 and 19566).
>     >     >    
>     >     >     ps shows that both the shell processes vanish, and that one 
> of the aborttest10.exe
>     >     processes
>     >     >     vanishes.  But the other aborttest10.exe remains and 
> continues until it is finished 
>     sleeping.
>     >     >    
>     >     >     Hope that this information is useful.
>     >     >    
>     >     >     Sincerely,
>     >     >    
>     >     >     Ted Sussman
>     >     >    
>     >     >    
>     >     >    
>     >     >     On 19 Jun 2017 at 23:06,  gil...@rist.or.jp   wrote:
>     >     >
>     >     >    
>     >     >      Ted,
>     >     >      
>     >     >     some traces are missing  because you did not configure with 
> --enable-debug
>     >     >     i am afraid you have to do it (and you probably want to 
> install that debug version in an
>     >     >     other
>     >     >     location since its performances are not good for production) 
> in order to get all the 
>     logs.
>     >     >      
>     >     >     Cheers,
>     >     >      
>     >     >     Gilles
>     >     >      
>     >     >     ----- Original Message -----
>     >     >        Hello Gilles,
>     >     >    
>     >     >        I retried my example, with the same results as I observed 
> before.  The process with 
>     rank
>     >     >     1
>     >     >        does not get killed by MPI_ABORT.
>     >     >    
>     >     >        I have attached to this E-mail:
>     >     >    
>     >     >          config.log.bz2
>     >     >          ompi_info.bz2  (uses ompi_info -a)
>     >     >          aborttest09.tgz
>     >     >    
>     >     >        This testing is done on a computer running Linux 3.10.0.  
> This is a different computer
>     >     >     than
>     >     >        the computer that I previously used for testing.  You can 
> confirm that I am using Open
>     >     >     MPI
>     >     >        2.1.1.
>     >     >    
>     >     >        tar xvzf aborttest09.tgz
>     >     >        cd aborttest09
>     >     >        ./sh run2.sh
>     >     >    
>     >     >        run2.sh contains the command
>     >     >    
>     >     >        /opt/openmpi-2.1.1-GNU/bin/mpirun -np 2 -mca btl tcp,self 
> --mca odls_base_verbose
>     >     >     10
>     >     >        ./dum.sh
>     >     >    
>     >     >        The output from this run is in aborttest09/run2.sh.out.
>     >     >    
>     >     >        The output shows that the the "default" component is 
> selected by odls.
>     >     >    
>     >     >        The only messages from odls are: odls: launch spawning 
> child ...  (two messages).
>     >     >     There
>     >     >        are no messages from odls with "kill" and I see no SENDING 
> SIGCONT / SIGKILL
>     >     >        messages.
>     >     >    
>     >     >        I am not running from within any batch manager.
>     >     >    
>     >     >        Sincerely,
>     >     >    
>     >     >        Ted Sussman
>     >     >    
>     >     >        On 17 Jun 2017 at 16:02, gil...@rist.or.jp wrote:
>     >     >
>     >     >     Ted,
>     >     >    
>     >     >     i do not observe the same behavior you describe with Open MPI 
> 2.1.1
>     >     >    
>     >     >     # mpirun -np 2 -mca btl tcp,self --mca odls_base_verbose 5 
> ./abort.sh
>     >     >    
>     >     >     abort.sh 31361 launching abort
>     >     >     abort.sh 31362 launching abort
>     >     >     I am rank 0 with pid 31363
>     >     >     I am rank 1 with pid 31364
>     >     >     
> ------------------------------------------------------------------------
>     >     >     --
>     >     >     MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>     >     >     with errorcode 1.
>     >     >    
>     >     >     NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI 
> processes.
>     >     >     You may or may not see output from other processes, depending 
> on
>     >     >     exactly when Open MPI kills them.
>     >     >     
> ------------------------------------------------------------------------
>     >     >     --
>     >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc working on 
> WILDCARD
>     >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking 
> child process
>     >     >     [[18199,1],0]
>     >     >     [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],0]
>     >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 
> 31361
>     >     >     SUCCESS
>     >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking 
> child process
>     >     >     [[18199,1],1]
>     >     >     [linux:31356] [[18199,0],0] SENDING SIGCONT TO [[18199,1],1]
>     >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 18 TO PID 
> 31362
>     >     >     SUCCESS
>     >     >     [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],0]
>     >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 
> 31361
>     >     >     SUCCESS
>     >     >     [linux:31356] [[18199,0],0] SENDING SIGTERM TO [[18199,1],1]
>     >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 15 TO PID 
> 31362
>     >     >     SUCCESS
>     >     >     [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],0]
>     >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 
> 31361
>     >     >     SUCCESS
>     >     >     [linux:31356] [[18199,0],0] SENDING SIGKILL TO [[18199,1],1]
>     >     >     [linux:31356] [[18199,0],0] odls:default:SENT KILL 9 TO PID 
> 31362
>     >     >     SUCCESS
>     >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc working on 
> WILDCARD
>     >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking 
> child process
>     >     >     [[18199,1],0]
>     >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc child 
> [[18199,1],0] is
>     >     >     not alive
>     >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc checking 
> child process
>     >     >     [[18199,1],1]
>     >     >     [linux:31356] [[18199,0],0] odls:kill_local_proc child 
> [[18199,1],1] is
>     >     >     not alive
>     >     >    
>     >     >    
>     >     >     Open MPI did kill both shells, and they were indeed killed as 
> evidenced
>     >     >     by ps
>     >     >    
>     >     >     #ps -fu gilles --forest
>     >     >     UID        PID  PPID  C STIME TTY          TIME CMD
>     >     >     gilles    1564  1561  0 15:39 ?        00:00:01 sshd: 
> gilles@pts/1
>     >     >     gilles    1565  1564  0 15:39 pts/1    00:00:00  \_ -bash
>     >     >     gilles   31356  1565  3 15:57 pts/1    00:00:00      \_ 
> /home/gilles/
>     >     >     local/ompi-v2.x/bin/mpirun -np 2 -mca btl tcp,self --mca 
> odls_base
>     >     >     gilles   31364     1  1 15:57 pts/1    00:00:00 ./abort
>     >     >    
>     >     >    
>     >     >     so trapping SIGTERM in your shell and manually killing the 
> MPI task
>     >     >     should work
>     >     >     (as Jeff explained, as long as the shell script is fast 
> enough to do
>     >     >     that between SIGTERM and SIGKILL)
>     >     >    
>     >     >    
>     >     >     if you observe a different behavior, please double check your 
> Open MPI
>     >     >     version and post the outputs of the same commands.
>     >     >    
>     >     >     btw, are you running from a batch manager ? if yes, which one 
> ?
>     >     >    
>     >     >     Cheers,
>     >     >    
>     >     >     Gilles
>     >     >    
>     >     >     ----- Original Message -----
>     >     >     Ted,
>     >     >    
>     >     >     if you
>     >     >    
>     >     >     mpirun --mca odls_base_verbose 10 ...
>     >     >    
>     >     >     you will see which processes get killed and how
>     >     >    
>     >     >     Best regards,
>     >     >    
>     >     >    
>     >     >     Gilles
>     >     >    
>     >     >     ----- Original Message -----
>     >     >     Hello Jeff,
>     >     >    
>     >     >     Thanks for your comments.
>     >     >    
>     >     >     I am not seeing behavior #4, on the two computers that I have
>     >     >     tested
>     >     >     on, using Open MPI
>     >     >     2.1.1.
>     >     >    
>     >     >     I wonder if you can duplicate my results with the files that 
> I have
>     >     >     uploaded.
>     >     >    
>     >     >     Regarding what is the "correct" behavior, I am willing to 
> modify my
>     >     >     application to correspond
>     >     >     to Open MPI's behavior (whatever behavior the Open MPI
>     >     >     developers
>     >     >     decide is best) --
>     >     >     provided that Open MPI does in fact kill off both shells.
>     >     >    
>     >     >     So my highest priority now is to find out why Open MPI 2.1.1 
> does
>     >     >     not
>     >     >     kill off both shells on
>     >     >     my computer.
>     >     >    
>     >     >     Sincerely,
>     >     >    
>     >     >     Ted Sussman
>     >     >    
>     >     >       On 16 Jun 2017 at 16:35, Jeff Squyres (jsquyres) wrote:
>     >     >
>     >     >     Ted --
>     >     >    
>     >     >     Sorry for jumping in late.  Here's my $0.02...
>     >     >    
>     >     >     In the runtime, we can do 4 things:
>     >     >    
>     >     >     1. Kill just the process that we forked.
>     >     >     2. Kill just the process(es) that call back and identify
>     >     >     themselves
>     >     >     as MPI processes (we don't track this right now, but we could 
> add that
>     >     >     functionality).
>     >     >     3. Union of #1 and #2.
>     >     >     4. Kill all processes (to include any intermediate processes
>     >     >     that
>     >     >     are not included in #1 and #2).
>     >     >    
>     >     >     In Open MPI 2.x, #4 is the intended behavior.  There may be a
>     >     >     bug
>     >     >     or
>     >     >     two that needs to get fixed (e.g., in your last mail, I don't 
> see
>     >     >     offhand why it waits until the MPI process finishes 
> sleeping), but we
>     >     >     should be killing the process group, which -- unless any of 
> the
>     >     >     descendant processes have explicitly left the process group 
> -- should
>     >     >     hit the entire process tree. 
>     >     >    
>     >     >     Sidenote: there's actually a way to be a bit more aggressive
>     >     >     and
>     >     >     do
>     >     >     a better job of ensuring that we kill *all* processes (via 
> creative
>     >     >     use
>     >     >     of PR_SET_CHILD_SUBREAPER), but that's basically a future
>     >     >     enhancement
>     >     >     /
>     >     >     optimization.
>     >     >    
>     >     >     I think Gilles and Ralph proposed a good point to you: if you
>     >     >     want
>     >     >     to be sure to be able to do cleanup after an MPI process 
> terminates (
>     >     >     normally or abnormally), you should trap signals in your 
> intermediate
>     >     >     processes to catch what Open MPI's runtime throws and 
> therefore know
>     >     >     that it is time to cleanup. 
>     >     >    
>     >     >     Hypothetically, this should work in all versions of Open 
> MPI...?
>     >     >    
>     >     >     I think Ralph made a pull request that adds an MCA param to
>     >     >     change
>     >     >     the default behavior from #4 to #1.
>     >     >    
>     >     >     Note, however, that there's a little time between when Open
>     >     >     MPI
>     >     >     sends the SIGTERM and the SIGKILL, so this solution could be 
> racy.  If
>     >     >     you find that you're running out of time to cleanup, we might 
> be able
>     >     >     to
>     >     >     make the delay between the SIGTERM and SIGKILL be configurable
>     >     >     (e.g.,
>     >     >     via MCA param).
>     >     >    
>     >     >    
>     >     >    
>     >     >
>     >     >     On Jun 16, 2017, at 10:08 AM, Ted Sussman
>     >     >     <ted.suss...@adina.com
>     >     >    
>     >     >     wrote:
>     >     >    
>     >     >     Hello Gilles and Ralph,
>     >     >    
>     >     >     Thank you for your advice so far.  I appreciate the time
>     >     >     that
>     >     >     you
>     >     >     have spent to educate me about the details of Open MPI.
>     >     >    
>     >     >     But I think that there is something fundamental that I
>     >     >     don't
>     >     >     understand.  Consider Example 2 run with Open MPI 2.1.1.
>     >     >    
>     >     >     mpirun --> shell for process 0 -->  executable for process
>     >     >     0 -->
>     >     >     MPI calls, MPI_Abort
>     >     >             --> shell for process 1 -->  executable for process 1 
> -->
>     >     >     MPI calls
>     >     >    
>     >     >     After the MPI_Abort is called, ps shows that both shells
>     >     >     are
>     >     >     running, and that the executable for process 1 is running (in 
> this
>     >     >     case,
>     >     >     process 1 is sleeping).  And mpirun does not exit until 
> process 1 is
>     >     >     finished sleeping.
>     >     >    
>     >     >     I cannot reconcile this observed behavior with the
>     >     >     statement
>     >     >
>     >     >           >     2.x: each process is put into its own process 
> group
>     >     >     upon launch. When we issue a
>     >     >          >     "kill", we issue it to the process group. Thus,
>     >     >     every
>     >     >     child proc of that child proc will
>     >     >          >     receive it. IIRC, this was the intended behavior.
>     >     >    
>     >     >     I assume that, for my example, there are two process
>     >     >     groups. 
>     >     >     The
>     >     >     process group for process 0 contains the shell for process 0 
> and the
>     >     >     executable for process 0; and the process group for process 1 
> contains
>     >     >     the shell for process 1 and the executable for process 1.  So 
> what
>     >     >     does
>     >     >     MPI_ABORT do?  MPI_ABORT does not kill the process group for 
> process
>     >     >     0,
>     >     >      
>     >     >     since the shell for process 0 continues.  And MPI_ABORT does 
> not kill
>     >     >     the process group for process 1, since both the shell and 
> executable
>     >     >     for
>     >     >     process 1 continue.
>     >     >    
>     >     >     If I hit Ctrl-C after MPI_Abort is called, I get the message
>     >     >    
>     >     >     mpirun: abort is already in progress.. hit ctrl-c again to
>     >     >     forcibly terminate
>     >     >    
>     >     >     but I don't need to hit Ctrl-C again because mpirun
>     >     >     immediately
>     >     >     exits.
>     >     >    
>     >     >     Can you shed some light on all of this?
>     >     >    
>     >     >     Sincerely,
>     >     >    
>     >     >     Ted Sussman
>     >     >    
>     >     >    
>     >     >     On 15 Jun 2017 at 14:44, r...@open-mpi.org wrote:
>     >     >
>     >     >    
>     >     >     You have to understand that we have no way of
>     >     >     knowing who is
>     >     >     making MPI calls - all we see is
>     >     >     the proc that we started, and we know someone of
>     >     >     that rank is
>     >     >     running (but we have no way of
>     >     >     knowing which of the procs you sub-spawned it is).
>     >     >    
>     >     >     So the behavior you are seeking only occurred in
>     >     >     some earlier
>     >     >     release by sheer accident. Nor will
>     >     >     you find it portable as there is no specification
>     >     >     directing
>     >     >     that
>     >     >     behavior.
>     >     >    
>     >     >     The behavior I´ve provided is to either deliver the
>     >     >     signal to
>     >     >     _
>     >     >     all_ child processes (including
>     >     >     grandchildren etc.), or _only_ the immediate child
>     >     >     of the
>     >     >     daemon.
>     >     >       It won´t do what you describe -
>     >     >     kill the mPI proc underneath the shell, but not the
>     >     >     shell
>     >     >     itself.
>     >     >    
>     >     >     What you can eventually do is use PMIx to ask the
>     >     >     runtime to
>     >     >     selectively deliver signals to
>     >     >     pid/procs for you. We don´t have that capability
>     >     >     implemented
>     >     >     just yet, I´m afraid.
>     >     >    
>     >     >     Meantime, when I get a chance, I can code an
>     >     >     option that will
>     >     >     record the pid of the subproc that
>     >     >     calls MPI_Init, and then let´s you deliver signals to
>     >     >     just
>     >     >     that
>     >     >     proc. No promises as to when that will
>     >     >     be done.
>     >     >    
>     >     >    
>     >     >           On Jun 15, 2017, at 1:37 PM, Ted Sussman
>     >     >     <ted.sussman@
>     >     >     adina.
>     >     >     com> wrote:
>     >     >    
>     >     >          Hello Ralph,
>     >     >    
>     >     >           I am just an Open MPI end user, so I will need to
>     >     >     wait for
>     >     >     the next official release.
>     >     >    
>     >     >          mpirun --> shell for process 0 -->  executable for
>     >     >     process
>     >     >     0
>     >     >     --> MPI calls
>     >     >                  --> shell for process 1 -->  executable for 
> process
>     >     >     1
>     >     >     --> MPI calls
>     >     >                                           ...
>     >     >    
>     >     >          I guess the question is, should MPI_ABORT kill the
>     >     >     executables or the shells?  I naively
>     >     >          thought, that, since it is the executables that make
>     >     >     the
>     >     >     MPI
>     >     >     calls, it is the executables that
>     >     >          should be aborted by the call to MPI_ABORT.  Since
>     >     >     the
>     >     >     shells don't make MPI calls, the
>     >     >           shells should not be aborted.
>     >     >    
>     >     >          And users might have several layers of shells in
>     >     >     between
>     >     >     mpirun and the executable.
>     >     >    
>     >     >          So now I will look for the latest version of Open MPI
>     >     >     that
>     >     >     has the 1.4.3 behavior.
>     >     >    
>     >     >          Sincerely,
>     >     >    
>     >     >          Ted Sussman
>     >     >    
>     >     >           On 15 Jun 2017 at 12:31, r...@open-mpi.org wrote:
>     >     >    
>     >     >          >
>     >     >           > Yeah, things jittered a little there as we debated
>     >     >     the "
>     >     >     right" behavior. Generally, when we
>     >     >          see that
>     >     >          > happening it means that a param is required, but
>     >     >     somehow
>     >     >     we never reached that point.
>     >     >          >
>     >     >          > See if https://github.com/open-mpi/ompi/pull/3704  
>     >     >     helps
>     >     >     -
>     >     >     if so, I can schedule it for the next
>     >     >          2.x
>     >     >           > release if the RMs agree to take it
>     >     >          >
>     >     >          > Ralph
>     >     >           >
>     >     >          >     On Jun 15, 2017, at 12:20 PM, Ted Sussman <ted.
>     >     >     sussman
>     >     >     @adina.com > wrote:
>     >     >           >
>     >     >          >     Thank you for your comments.
>     >     >           >   
>     >     >          >     Our application relies upon "dum.sh" to clean up
>     >     >     after
>     >     >     the process exits, either if the
>     >     >           process
>     >     >          >     exits normally, or if the process exits abnormally
>     >     >     because of MPI_ABORT.  If the process
>     >     >           >     group is killed by MPI_ABORT, this clean up will 
> not
>     >     >     be performed.  If exec is used to launch
>     >     >          >     the executable from dum.sh, then dum.sh is
>     >     >     terminated
>     >     >     by the exec, so dum.sh cannot
>     >     >          >     perform any clean up.
>     >     >          >   
>     >     >           >     I suppose that other user applications might work
>     >     >     similarly, so it would be good to have an
>     >     >          >     MCA parameter to control the behavior of
>     >     >     MPI_ABORT.
>     >     >          >   
>     >     >          >     We could rewrite our shell script that invokes
>     >     >     mpirun,
>     >     >     so that the cleanup that is now done
>     >     >          >     by
>     >     >           >     dum.sh is done by the invoking shell script after
>     >     >     mpirun exits.  Perhaps this technique is the
>     >     >          >     preferred way to clean up after mpirun is invoked.
>     >     >           >   
>     >     >          >     By the way, I have also tested with Open MPI
>     >     >     1.10.7,
>     >     >     and Open MPI 1.10.7 has different
>     >     >           >     behavior than either Open MPI 1.4.3 or Open MPI
>     >     >     2.1.
>     >     >     1.
>     >     >        In this explanation, it is important to
>     >     >           >     know that the aborttest executable sleeps for 20
>     >     >     sec.
>     >     >          >   
>     >     >           >     When running example 2:
>     >     >          >   
>     >     >          >     1.4.3: process 1 immediately aborts
>     >     >          >     1.10.7: process 1 doesn't abort and never stops.
>     >     >           >     2.1.1 process 1 doesn't abort, but stops after it 
> is
>     >     >     finished sleeping
>     >     >          >   
>     >     >          >     Sincerely,
>     >     >          >   
>     >     >          >     Ted Sussman
>     >     >           >   
>     >     >          >     On 15 Jun 2017 at 9:18, r...@open-mpi.org wrote:
>     >     >          >
>     >     >          >     Here is how the system is working:
>     >     >           >   
>     >     >          >     Master: each process is put into its own process
>     >     >     group
>     >     >     upon launch. When we issue a
>     >     >          >     "kill", however, we only issue it to the individual
>     >     >     process (instead of the process group
>     >     >          >     that is headed by that child process). This is
>     >     >     probably a bug as I don´t believe that is
>     >     >          >     what we intended, but set that aside for now.
>     >     >           >   
>     >     >          >     2.x: each process is put into its own process group
>     >     >     upon launch. When we issue a
>     >     >          >     "kill", we issue it to the process group. Thus,
>     >     >     every
>     >     >     child proc of that child proc will
>     >     >          >     receive it. IIRC, this was the intended behavior.
>     >     >           >   
>     >     >          >     It is rather trivial to make the change (it only
>     >     >     involves 3 lines of code), but I´m not sure
>     >     >          >     of what our intended behavior is supposed to be.
>     >     >     Once
>     >     >     we clarify that, it is also trivial
>     >     >          >     to add another MCA param (you can never have too
>     >     >     many!)
>     >     >       to allow you to select the
>     >     >          >     other behavior.
>     >     >          >   
>     >     >          >
>     >     >           >     On Jun 15, 2017, at 5:23 AM, Ted Sussman <ted.
>     >     >     sussman@
>     >     >     adina.com > wrote:
>     >     >          >   
>     >     >          >     Hello Gilles,
>     >     >          >   
>     >     >           >     Thank you for your quick answer.  I confirm that 
> if
>     >     >     exec is used, both processes
>     >     >          >     immediately
>     >     >           >     abort.
>     >     >          >   
>     >     >           >     Now suppose that the line
>     >     >          >   
>     >     >          >     echo "After aborttest:
>     >     >          >    
>     >     >     OMPI_COMM_WORLD_RANK="$OMPI_COMM_
>     >     >     WORLD_RANK
>     >     >           >   
>     >     >          >     is added to the end of dum.sh.
>     >     >          >   
>     >     >          >     If Example 2 is run with Open MPI 1.4.3, the output
>     >     >     is
>     >     >          >   
>     >     >          >     After aborttest: OMPI_COMM_WORLD_RANK=0
>     >     >          >   
>     >     >          >     which shows that the shell script for the process
>     >     >     with
>     >     >     rank 0 continues after the
>     >     >           >     abort,
>     >     >          >     but that the shell script for the process with rank
>     >     >     1
>     >     >     does not continue after the
>     >     >           >     abort.
>     >     >          >   
>     >     >           >     If Example 2 is run with Open MPI 2.1.1, with exec
>     >     >     used to invoke
>     >     >          >     aborttest02.exe, then
>     >     >          >     there is no such output, which shows that both 
> shell
>     >     >     scripts do not continue after
>     >     >          >     the abort.
>     >     >          >   
>     >     >           >     I prefer the Open MPI 1.4.3 behavior because our
>     >     >     original application depends
>     >     >          >     upon the
>     >     >           >     Open MPI 1.4.3 behavior.  (Our original 
> application
>     >     >     will also work if both
>     >     >          >     executables are
>     >     >           >     aborted, and if both shell scripts continue after
>     >     >     the
>     >     >     abort.)
>     >     >          >   
>     >     >           >     It might be too much to expect, but is there a way
>     >     >     to
>     >     >     recover the Open MPI 1.4.3
>     >     >          >     behavior
>     >     >           >     using Open MPI 2.1.1? 
>     >     >          >   
>     >     >           >     Sincerely,
>     >     >          >   
>     >     >          >     Ted Sussman
>     >     >          >   
>     >     >          >   
>     >     >           >     On 15 Jun 2017 at 9:50, Gilles Gouaillardet wrote:
>     >     >          >
>     >     >          >     Ted,
>     >     >          >   
>     >     >           >   
>     >     >          >     fwiw, the 'master' branch has the behavior you
>     >     >     expect.
>     >     >          >   
>     >     >          >   
>     >     >          >     meanwhile, you can simple edit your 'dum.sh' script
>     >     >     and replace
>     >     >           >   
>     >     >          >     /home/buildadina/src/aborttest02/aborttest02.exe
>     >     >           >   
>     >     >          >     with
>     >     >           >   
>     >     >          >     exec /home/buildadina/src/aborttest02/aborttest02.
>     >     >     exe
>     >     >           >   
>     >     >          >   
>     >     >          >     Cheers,
>     >     >          >   
>     >     >          >   
>     >     >          >     Gilles
>     >     >          >   
>     >     >           >   
>     >     >          >     On 6/15/2017 3:01 AM, Ted Sussman wrote:
>     >     >           >     Hello,
>     >     >          >   
>     >     >          >     My question concerns MPI_ABORT, indirect
>     >     >     execution
>     >     >     of
>     >     >          >     executables by mpirun and Open
>     >     >          >     MPI 2.1.1.  When mpirun runs executables directly,
>     >     >     MPI
>     >     >     _ABORT
>     >     >          >     works as expected, but
>     >     >           >     when mpirun runs executables indirectly,
>     >     >     MPI_ABORT
>     >     >     does not
>     >     >          >     work as expected.
>     >     >          >   
>     >     >          >     If Open MPI 1.4.3 is used instead of Open MPI
>     >     >     2.1.1,
>     >     >     MPI_ABORT
>     >     >          >     works as expected in all
>     >     >           >     cases.
>     >     >          >   
>     >     >           >     The examples given below have been simplified as
>     >     >     far
>     >     >     as possible
>     >     >          >     to show the issues.
>     >     >          >   
>     >     >          >     ---
>     >     >          >   
>     >     >           >     Example 1
>     >     >          >   
>     >     >           >     Consider an MPI job run in the following way:
>     >     >          >   
>     >     >           >     mpirun ... -app addmpw1
>     >     >          >   
>     >     >          >     where the appfile addmpw1 lists two executables:
>     >     >          >   
>     >     >          >     -n 1 -host gulftown ... aborttest02.exe
>     >     >          >     -n 1 -host gulftown ... aborttest02.exe
>     >     >           >   
>     >     >          >     The two executables are executed on the local node
>     >     >     gulftown.
>     >     >          >      aborttest02 calls MPI_ABORT
>     >     >          >     for rank 0, then sleeps.
>     >     >          >   
>     >     >          >     The above MPI job runs as expected.  Both
>     >     >     processes
>     >     >     immediately
>     >     >          >     abort when rank 0 calls
>     >     >          >     MPI_ABORT.
>     >     >          >   
>     >     >           >     ---
>     >     >          >   
>     >     >           >     Example 2
>     >     >          >   
>     >     >          >     Now change the above example as follows:
>     >     >          >   
>     >     >          >     mpirun ... -app addmpw2
>     >     >          >   
>     >     >          >     where the appfile addmpw2 lists shell scripts:
>     >     >          >   
>     >     >          >     -n 1 -host gulftown ... dum.sh
>     >     >          >     -n 1 -host gulftown ... dum.sh
>     >     >          >   
>     >     >          >     dum.sh invokes aborttest02.exe.  So aborttest02.exe
>     >     >     is
>     >     >     executed
>     >     >          >     indirectly by mpirun.
>     >     >          >   
>     >     >          >     In this case, the MPI job only aborts process 0 
> when
>     >     >     rank 0 calls
>     >     >           >     MPI_ABORT.  Process 1
>     >     >          >     continues to run.  This behavior is unexpected.
>     >     >          >   
>     >     >          >     ----
>     >     >           >   
>     >     >          >     I have attached all files to this E-mail.  Since
>     >     >     there
>     >     >     are absolute
>     >     >           >     pathnames in the files, to
>     >     >          >     reproduce my findings, you will need to update the
>     >     >     pathnames in the
>     >     >           >     appfiles and shell
>     >     >          >     scripts.  To run example 1,
>     >     >           >   
>     >     >          >     sh run1.sh
>     >     >           >   
>     >     >          >     and to run example 2,
>     >     >          >   
>     >     >          >     sh run2.sh
>     >     >          >   
>     >     >           >     ---
>     >     >          >   
>     >     >           >     I have tested these examples with Open MPI 1.4.3
>     >     >     and
>     >     >     2.
>     >     >     0.3.  In
>     >     >          >     Open MPI 1.4.3, both
>     >     >           >     examples work as expected.  Open MPI 2.0.3 has
>     >     >     the
>     >     >     same behavior
>     >     >          >     as Open MPI 2.1.1.
>     >     >          >   
>     >     >          >     ---
>     >     >           >   
>     >     >          >     I would prefer that Open MPI 2.1.1 aborts both
>     >     >     processes, even
>     >     >          >     when the executables are
>     >     >          >     invoked indirectly by mpirun.  If there is an MCA
>     >     >     setting that is
>     >     >          >     needed to make Open MPI
>     >     >          >     2.1.1 abort both processes, please let me know.
>     >     >           >   
>     >     >          >   
>     >     >          >     Sincerely,
>     >     >          >   
>     >     >          >     Theodore Sussman
>     >     >           >   
>     >     >          >   
>     >     >           >     The following section of this message contains a
>     >     >     file
>     >     >     attachment
>     >     >          >     prepared for transmission using the Internet MIME
>     >     >     message format.
>     >     >           >     If you are using Pegasus Mail, or any other MIME-
>     >     >     compliant system,
>     >     >          >     you should be able to save it or view it from 
> within
>     >     >     your mailer.
>     >     >          >     If you cannot, please ask your system administrator
>     >     >     for assistance.
>     >     >          >   
>     >     >          >       ---- File information -----------
>     >     >          >         File:  config.log.bz2
>     >     >          >         Date:  14 Jun 2017, 13:35
>     >     >          >         Size:  146548 bytes.
>     >     >           >         Type:  Binary
>     >     >          >   
>     >     >           >   
>     >     >          >     The following section of this message contains a
>     >     >     file
>     >     >     attachment
>     >     >           >     prepared for transmission using the Internet MIME
>     >     >     message format.
>     >     >          >     If you are using Pegasus Mail, or any other MIME-
>     >     >     compliant system,
>     >     >          >     you should be able to save it or view it from 
> within
>     >     >     your mailer.
>     >     >          >     If you cannot, please ask your system administrator
>     >     >     for assistance.
>     >     >          >   
>     >     >          >       ---- File information -----------
>     >     >          >         File:  ompi_info.bz2
>     >     >          >         Date:  14 Jun 2017, 13:35
>     >     >           >         Size:  24088 bytes.
>     >     >          >         Type:  Binary
>     >     >           >   
>     >     >          >   
>     >     >           >     The following section of this message contains a
>     >     >     file
>     >     >     attachment
>     >     >          >     prepared for transmission using the Internet MIME
>     >     >     message format.
>     >     >           >     If you are using Pegasus Mail, or any other MIME-
>     >     >     compliant system,
>     >     >          >     you should be able to save it or view it from 
> within
>     >     >     your mailer.
>     >     >          >     If you cannot, please ask your system administrator
>     >     >     for assistance.
>     >     >          >   
>     >     >          >       ---- File information -----------
>     >     >          >         File:  aborttest02.tgz
>     >     >          >         Date:  14 Jun 2017, 13:52
>     >     >          >         Size:  4285 bytes.
>     >     >           >         Type:  Binary
>     >     >          >   
>     >     >           >   
>     >     >          >    
>     >     >     ________________________________________
>     >     >     _______
>     >     >           >     users mailing list
>     >     >          >     users@lists.open-mpi.org
>     >     >           >    
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listin
>     >     >     fo/users
>     >     >
>     >     >
>     >     >          >   
>     >     >          >    
>     >     >     ________________________________________
>     >     >     _______
>     >     >           >     users mailing list
>     >     >          >     users@lists.open-mpi.org
>     >     >          >    
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listin
>     >     >     fo/users
>     >     >
>     >     >
>     >     >          >   
>     >     >          >   
>     >     >           >   
>     >     >          >    
>     >     >     ________________________________________
>     >     >     _______
>     >     >           >     users mailing list
>     >     >          >     users@lists.open-mpi.org
>     >     >           >    
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listin
>     >     >     fo/users
>     >     >
>     >     >
>     >     >          >   
>     >     >          >    
>     >     >     ________________________________________
>     >     >     _______
>     >     >           >     users mailing list
>     >     >          >     users@lists.open-mpi.org
>     >     >          >    
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listin
>     >     >     fo/users
>     >     >
>     >     >
>     >     >          >   
>     >     >          >   
>     >     >           >   
>     >     >          >    
>     >     >     ________________________________________
>     >     >     _______
>     >     >           >     users mailing list
>     >     >          >     users@lists.open-mpi.org
>     >     >           >    
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listin
>     >     >     fo/users
>     >     >
>     >     >
>     >     >          >
>     >     >    
>     >     >           
>     >     >          __________________________________________
>     >     >     _____
>     >     >           users mailing list
>     >     >          users@lists.open-mpi.org
>     >     >         
>     >     >      https://rfd.newmexicoconsortium.org/mailman/listin
>     >     >     fo/users
>     >     >
>     >     >    
>     >     >       
>     >     >     _____________________________________________
>     >     >     __
>     >     >     users mailing list
>     >     >     users@lists.open-mpi.org
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/us
>     >     >     ers
>     >     >    
>     >     >    
>     >     >     --
>     >     >     Jeff Squyres
>     >     >     jsquy...@cisco.com
>     >     >    
>     >     >     _______________________________________________
>     >     >     users mailing list
>     >     >     users@lists.open-mpi.org
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     >     >    
>     >     >    
>     >     >    
>     >     >     _______________________________________________
>     >     >     users mailing list
>     >     >     users@lists.open-mpi.org
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     >     >
>     >     >     _______________________________________________
>     >     >     users mailing list
>     >     >     users@lists.open-mpi.org
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     >     >
>     >     >     _______________________________________________
>     >     >     users mailing list
>     >     >     users@lists.open-mpi.org
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     >     >    
>     >     >          
>     >     >    
>     >     >    
>     >     >     The following section of this message contains a file 
> attachment
>     >     >     prepared for transmission using the Internet MIME message 
> format.
>     >     >     If you are using Pegasus Mail, or any other MIME-compliant 
> system,
>     >     >     you should be able to save it or view it from within your 
> mailer.
>     >     >     If you cannot, please ask your system administrator for 
> assistance.
>     >     >    
>     >     >       ---- File information -----------
>     >     >         File:  aborttest10.tgz
>     >     >         Date:  19 Jun 2017, 12:42
>     >     >         Size:  4740 bytes.
>     >     >         Type:  Binary
>     >     >     
> <aborttest10.tgz>_______________________________________________
>     >     >     users mailing list
>     >     >     users@lists.open-mpi.org
>     >     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     >     >
>     >
>     >       
>     >     The following section of this message contains a file attachment
>     >     prepared for transmission using the Internet MIME message format.
>     >     If you are using Pegasus Mail, or any other MIME-compliant system,
>     >     you should be able to save it or view it from within your mailer.
>     >     If you cannot, please ask your system administrator for assistance.
>     >    
>     >       ---- File information -----------
>     >         File:  aborttest11.tgz
>     >         Date:  19 Jun 2017, 13:48
>     >         Size:  3800 bytes.
>     >         Type:  Unknown
>     >     <aborttest11.tgz> _______________________________________________
>     >     users mailing list
>     >     users@lists.open-mpi.org
>     >     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     >
> 
>       
>     _______________________________________________
>     users mailing list
>     users@lists.open-mpi.org
>     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
>     _______________________________________________
>     users mailing list
>     users@lists.open-mpi.org
>     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to