We use a starter method here, and it looks something like this:

function execute_normal_job() {
  # Start the job normally with proper login shell handling
  if [ "$SGE_STARTER_USE_LOGIN_SHELL" == true ]
  then
    exec -l $SGE_STARTER_SHELL_PATH -c "${@}"
  else
    exec $SGE_STARTER_SHELL_PATH -c "${@}"
  fi
}
execute_normal_job "${@}"

The problem with 'exec ${@}' is that it expects a binary to execute,
and you are passing in a small amount of shell commands *and* then a
binary.

My version says execute a shell, and pass the ${@} as shell commands.
If they are binaries, great, if not the rest works, too.
--
Adam

On Wed, Apr 2, 2014 at 3:19 PM,  <[email protected]> wrote:
>
> We're bringing up SoGE 8.1.6 and I've run into a problem with the use of a
> 'starter_method' that's affecting OpenMPI jobs.
>
> Following  previous discussions on the list[1], we're using the
> 'environment modules' package, and using a starter_method to initialize
> the user's environment as if it was a login shell and to export the
> "module" function.
>
> Our starter_method script is "/lab/bin/starter" and contains:
>
> ---------------------------------------------
> [line  1]       #!/bin/bash -l
> [line  2]       # initialize modules, then run whatever was given
> [line  3]       . /usr/share/Modules/init/bash
> [line  4]
> [line  5]       #  check if "module" is declared as a function
> [line  6]       declare -f -F module 1> /dev/null 2>&1
> [line  7]       if [ $? = 0 ] ; then
> [line  8]               # there is a module function, export it
> [line  9]               export -f module
> [line 10]       fi
> [line 11]
> [line 12]       printf "Debugging. About to:\n\texec \"${@}\"\n" 1>&2
> [line 12]       exec "${@}"
> ---------------------------------------------
>
> (The debugging statement is not normally active.)
>
> This works fine for serial jobs.
>
> However, OpenMPI (1.3.3) jobs fail to start. It appears as if the
> starter_method is somehow corrupting the environment passed to mpirun.
>
> For example, I submitted a job when I was in the directory
> /lab/home/bergman/sge_job_output.
>
> The starter_method script reports:
>
>         Debugging: About to:
>                 exec OPAL_PREFIX=/lab/bin/openmpi/sge; export OPAL_PREFIX;\
>                          PATH=/lab/bin/openmpi/bin:$PATH ;\
>                          export PATH ;\
>                          
> LD_LIBRARY_PATH=/lab/bin/openmpi/lib:$LD_LIBRARY_PATH ;\
>                          export LD_LIBRARY_PATH ; /lab/bin/openmpi/bin/orted
>
> which looks fine. However, that is followed by the error:
>
>         /lab/bin/starter: line 12: 
> /lab/home/bergman/sge_job_output/OPAL_PREFIX=/lab/bin/openmpi/sge;\
>                          export OPAL_PREFIX;\
>                          PATH=/lab/bin/openmpi/bin:$PATH ;\
>                          export PATH ;\
>                          
> LD_LIBRARY_PATH=/lab/bin/openmpi/lib:$LD_LIBRARY_PATH ;\
>                          export LD_LIBRARY_PATH ; /lab/bin/openmpi/bin/orted
>
> [Lines broken for readability.]
>
>
> The odd thing is that it the current working directory (where the SGE
> job was submitted) is pre-peneded to the definition of the OPAL_PREFIX
> variable. This is consistent, regardless of where the SGE job is launched (~,
> /tmp, etc.).
>
> Any suggestions?
>
> Thanks,
>
> Mark
>
>
>
> [1] http://gridengine.org/pipermail/users/2014-January/007121.html
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to