Hi Reuti and Iyad,
Here is my prolog script, it just does one thing, setting quota on the XFS
volume for each job:
The prolog_exec_xx_xx.log file was generated, so I assumed the first exec
command got executed.
Since the generated log file is empty, I think nothing was executed after
that.
Cheers
[root@zeta-4-12 common]# cat prolog_exec.sh
#!/bin/sh
exec >> /tmp/prolog_exec_"$JOB_ID"_"$SGE_TASK_ID".log
exec 2>&1
SGE_TMP_ROOT="/scratch_local"
pe_num=$(cat $PE_HOSTFILE | grep $HOSTNAME | awk '{print $2}')
tmp_req_var=$(echo "$tmp_requested" | grep -o -E '[0-9]+')
tmp_req_unit=$(echo "$tmp_requested" | sed 's/[0-9]*//g')
if [ -z "$pe_num" ]; then
quota=$tmp_requested
else
quota=$(expr $tmp_req_var \* $pe_num)$tmp_req_unit
fi
echo "############################# [$HOSTNAME PROLOG] - JOB_ID:$JOB_ID
TASK_ID:$SGE_TASK_ID #############################"
echo "`date` [$HOSTNAME PROLOG]: xfs_quota -x -c 'project -s -p $TMP
$JOB_ID' $SGE_TMP_ROOT"
echo "`date` [$HOSTNAME PROLOG]: xfs_quota -x -c 'limit -p bhard=$quota
$JOB_ID' $SGE_TMP_ROOT"
xfs_quota_rc=0
/usr/sbin/xfs_quota -x -c "project -s -p $TMP $JOB_ID" $SGE_TMP_ROOT
((xfs_quota_rc+=$?))
/usr/sbin/xfs_quota -x -c "limit -p bhard=$quota $JOB_ID" $SGE_TMP_ROOT
((xfs_quota_rc+=$?))
if [ $xfs_quota_rc -eq 0 ]; then
exit 0
else
exit 100 # Put job in error state
fi
On Wed, Jan 9, 2019 at 7:36 PM Reuti <[email protected]> wrote:
> Hi,
>
> > Am 09.01.2019 um 01:14 schrieb Derrick Lin <[email protected]>:
> >
> > Hi guys,
> >
> > I just brought up a new SGE cluster, but somehow the qrsh session does
> not work:
> >
> > tester@login-gpu:~$ qrsh
> > ^Cerror: error while waiting for builtin IJS connection: "got select
> timeout"
> >
> > after I hit entered, the session just stuck there forever instead of
> bring me to a compute node. I have to entered Crtl+c to terminate and it
> gave the above error.
> >
> > I noticed, the SGE did send my qrsh request to a compute node as I could
> tell from qstat:
> >
> >
> ---------------------------------------------------------------------------------
> > [email protected] BIP 0/1/80 0.01 lx-amd64
> > 15 0.55500 QRLOGIN tester r 01/09/2019 10:47:13 1
> >
> > We have a prolog script configured globally, the script deals with local
> disk quota and keep all output to a log file for each job. So I went to
> that compute node, and check, found that a log file was created but it was
> empty.
> >
> > So my thinking so far is, my qrsh stuck because the prolog script is not
> fully executed.
>
> Is there any statement in the prolog, which could wait for stdin – and in
> a batch job there is just no stdin, hence it continues? Could be tested
> with "-i" to a batch job.
>
> -- Reuti
>
>
> > qsub job are working fine.
> >
> > Any idea will be appreciated
> >
> > Cheers,
> > Derrick
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users