> Am 09.01.2019 um 23:39 schrieb Derrick Lin <[email protected]>:
>
> Hi Reuti and Iyad,
>
> Here is my prolog script, it just does one thing, setting quota on the XFS
> volume for each job:
>
> The prolog_exec_xx_xx.log file was generated, so I assumed the first exec
> command got executed.
>
> Since the generated log file is empty, I think nothing was executed after
> that.
>
> Cheers
>
> [root@zeta-4-12 common]# cat prolog_exec.sh
> #!/bin/sh
Are the shells the same, i.e. same version? Maybe you can alos use the full
path /bin/bash here, as /bin/sh will also switch on some kind of compatibility
mode to the original sh in case bash in invoked by this name.
-- Reuti
>
> exec >> /tmp/prolog_exec_"$JOB_ID"_"$SGE_TASK_ID".log
> exec 2>&1
>
> SGE_TMP_ROOT="/scratch_local"
>
> pe_num=$(cat $PE_HOSTFILE | grep $HOSTNAME | awk '{print $2}')
>
> tmp_req_var=$(echo "$tmp_requested" | grep -o -E '[0-9]+')
> tmp_req_unit=$(echo "$tmp_requested" | sed 's/[0-9]*//g')
>
> if [ -z "$pe_num" ]; then
> quota=$tmp_requested
> else
> quota=$(expr $tmp_req_var \* $pe_num)$tmp_req_unit
> fi
>
> echo "############################# [$HOSTNAME PROLOG] - JOB_ID:$JOB_ID
> TASK_ID:$SGE_TASK_ID #############################"
> echo "`date` [$HOSTNAME PROLOG]: xfs_quota -x -c 'project -s -p $TMP $JOB_ID'
> $SGE_TMP_ROOT"
> echo "`date` [$HOSTNAME PROLOG]: xfs_quota -x -c 'limit -p bhard=$quota
> $JOB_ID' $SGE_TMP_ROOT"
>
> xfs_quota_rc=0
>
> /usr/sbin/xfs_quota -x -c "project -s -p $TMP $JOB_ID" $SGE_TMP_ROOT
> ((xfs_quota_rc+=$?))
>
> /usr/sbin/xfs_quota -x -c "limit -p bhard=$quota $JOB_ID" $SGE_TMP_ROOT
> ((xfs_quota_rc+=$?))
>
> if [ $xfs_quota_rc -eq 0 ]; then
> exit 0
> else
> exit 100 # Put job in error state
> fi
>
>
> On Wed, Jan 9, 2019 at 7:36 PM Reuti <[email protected]> wrote:
> Hi,
>
> > Am 09.01.2019 um 01:14 schrieb Derrick Lin <[email protected]>:
> >
> > Hi guys,
> >
> > I just brought up a new SGE cluster, but somehow the qrsh session does not
> > work:
> >
> > tester@login-gpu:~$ qrsh
> > ^Cerror: error while waiting for builtin IJS connection: "got select
> > timeout"
> >
> > after I hit entered, the session just stuck there forever instead of bring
> > me to a compute node. I have to entered Crtl+c to terminate and it gave the
> > above error.
> >
> > I noticed, the SGE did send my qrsh request to a compute node as I could
> > tell from qstat:
> >
> > ---------------------------------------------------------------------------------
> > [email protected] BIP 0/1/80 0.01 lx-amd64
> > 15 0.55500 QRLOGIN tester r 01/09/2019 10:47:13 1
> >
> > We have a prolog script configured globally, the script deals with local
> > disk quota and keep all output to a log file for each job. So I went to
> > that compute node, and check, found that a log file was created but it was
> > empty.
> >
> > So my thinking so far is, my qrsh stuck because the prolog script is not
> > fully executed.
>
> Is there any statement in the prolog, which could wait for stdin – and in a
> batch job there is just no stdin, hence it continues? Could be tested with
> "-i" to a batch job.
>
> -- Reuti
>
>
> > qsub job are working fine.
> >
> > Any idea will be appreciated
> >
> > Cheers,
> > Derrick
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users