FYI: general Open MPI questions are better sent to the user's mailing list.

Up through the v4.1.x series, the "orted" is a general helper process that Open 
MPI uses on the back-end.  It will not quit until all of its children have 
died.  Open MPI's run time is designed with the intent that some external 
helper will be there for the entire duration of the job; there is no option to 
run without one.

Two caveats:

1. In Open MPI v5.0.x, from the user's perspective, "orted" has been renamed to 
be "prted".  Since this is 99.999% behind the scenes, most users won't notice 
the difference.

2. You can run without "orted" (or "prted") if you use a different run-time 
environment (e.g., SLURM).  In this case, you'll use that environment's 
launcher (e.g., srun or sbatch in SLURM environments) to directly launch MPI 
processes -- you won't use "mpirun" at all.  Fittingly, this is called "direct 
launch" in Open MPI parlance (i.e., using another run-time's daemons to launch 
processes instead of first launching orteds (or prteds).



On May 16, 2021, at 8:34 AM, 叶安华 
<yean...@sensetime.com<mailto:yean...@sensetime.com>> wrote:

Code snippet:

# sleep.sh
sleep 10001 &
/bin/sh son_sleep.sh
sleep 10002

# son_sleep.sh
sleep 10003 &
sleep 10004 &

thanks
Anhua


From: 叶安华 <yean...@sensetime.com<mailto:yean...@sensetime.com>>
Date: Sunday, May 16, 2021 at 20:31
To: "jsquy...@cisco.com<mailto:jsquy...@cisco.com>" 
<jsquy...@cisco.com<mailto:jsquy...@cisco.com>>
Subject: [Help] Must orted exit after all spawned proecesses exit

Dear Jeff,

Sorry to bother you but I am really curious about the conditions on which orted 
exits in the below scenario, and I am looking forward to hearing from you.

Scenario description:
•         Step 1: start a remote process via "mpirun -np 1 -host 10.211.55.4 sh 
sleep.sh"
•         Step 2: check pstree in the remote host:
<image001.jpg>
•         Step 3: the mpirun process in step 1 does not exit until I kill all 
the sleeping process, which are 15479 15481 15482 15483

To conclude, my questions are as follows:

  1.  Must orted wait until all spawned processes exit?
  2.  Is this behavior configurable? What if I want orted to exit immediately 
after any one of the spawned proecess exits?
  3.  I did not find the specific logic about orted waiting for spawned 
proecesses to exit, hope I can get some hint from you.


PS (scripts):
<image002.png>


thanks
Anhua



--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>



Reply via email to