We use a starter method here, and it looks something like this:
function execute_normal_job() {
# Start the job normally with proper login shell handling
if [ "$SGE_STARTER_USE_LOGIN_SHELL" == true ]
then
exec -l $SGE_STARTER_SHELL_PATH -c "${@}"
else
exec $SGE_STARTER_SHELL_PATH -c "${@}"
fi
}
execute_normal_job "${@}"
The problem with 'exec ${@}' is that it expects a binary to execute,
and you are passing in a small amount of shell commands *and* then a
binary.
My version says execute a shell, and pass the ${@} as shell commands.
If they are binaries, great, if not the rest works, too.
--
Adam
On Wed, Apr 2, 2014 at 3:19 PM, <[email protected]> wrote:
>
> We're bringing up SoGE 8.1.6 and I've run into a problem with the use of a
> 'starter_method' that's affecting OpenMPI jobs.
>
> Following previous discussions on the list[1], we're using the
> 'environment modules' package, and using a starter_method to initialize
> the user's environment as if it was a login shell and to export the
> "module" function.
>
> Our starter_method script is "/lab/bin/starter" and contains:
>
> ---------------------------------------------
> [line 1] #!/bin/bash -l
> [line 2] # initialize modules, then run whatever was given
> [line 3] . /usr/share/Modules/init/bash
> [line 4]
> [line 5] # check if "module" is declared as a function
> [line 6] declare -f -F module 1> /dev/null 2>&1
> [line 7] if [ $? = 0 ] ; then
> [line 8] # there is a module function, export it
> [line 9] export -f module
> [line 10] fi
> [line 11]
> [line 12] printf "Debugging. About to:\n\texec \"${@}\"\n" 1>&2
> [line 12] exec "${@}"
> ---------------------------------------------
>
> (The debugging statement is not normally active.)
>
> This works fine for serial jobs.
>
> However, OpenMPI (1.3.3) jobs fail to start. It appears as if the
> starter_method is somehow corrupting the environment passed to mpirun.
>
> For example, I submitted a job when I was in the directory
> /lab/home/bergman/sge_job_output.
>
> The starter_method script reports:
>
> Debugging: About to:
> exec OPAL_PREFIX=/lab/bin/openmpi/sge; export OPAL_PREFIX;\
> PATH=/lab/bin/openmpi/bin:$PATH ;\
> export PATH ;\
>
> LD_LIBRARY_PATH=/lab/bin/openmpi/lib:$LD_LIBRARY_PATH ;\
> export LD_LIBRARY_PATH ; /lab/bin/openmpi/bin/orted
>
> which looks fine. However, that is followed by the error:
>
> /lab/bin/starter: line 12:
> /lab/home/bergman/sge_job_output/OPAL_PREFIX=/lab/bin/openmpi/sge;\
> export OPAL_PREFIX;\
> PATH=/lab/bin/openmpi/bin:$PATH ;\
> export PATH ;\
>
> LD_LIBRARY_PATH=/lab/bin/openmpi/lib:$LD_LIBRARY_PATH ;\
> export LD_LIBRARY_PATH ; /lab/bin/openmpi/bin/orted
>
> [Lines broken for readability.]
>
>
> The odd thing is that it the current working directory (where the SGE
> job was submitted) is pre-peneded to the definition of the OPAL_PREFIX
> variable. This is consistent, regardless of where the SGE job is launched (~,
> /tmp, etc.).
>
> Any suggestions?
>
> Thanks,
>
> Mark
>
>
>
> [1] http://gridengine.org/pipermail/users/2014-January/007121.html
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users