On 1/13/20 5:55 am, Youssef Eldakar wrote:

In an sbatch script, a user calls a shell script that starts a Java background process. The job immediately is completed, but the child Java process is still running on the compute node.

Is there a way to prevent this from happening?

What I would recommend is to use Slurm's cgroups support so that processes that put themselves into the background this way are tracked as part of the job and cleaned up when the job exits.

https://slurm.schedmd.com/cgroups.html

Depending on how the Java process puts itself into the background you could try adding a "wait" command at the end of the shell script so that it doesn't exit immediately (it's not guaranteed though).

With cgroups the Slurm script could also check the processes in your cgroup to monitor the existence of the Java process, sleeping for a while between checks, and exit when it's no longer found. For instance once you've got the PID of the Java process you can use "kill -0 $PID" to check if it's still there (rather than using ps).

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Reply via email to