Woah man! That's impressive!

I had to make a couple of adjustments to the script. Mainly checking first if $JOB_ID existed (to instantly exclude qsub jobs), and adding an if to the pid check because non-root qlogins needed an additional "step" in the search of the shepherd pid.

For the record, this is my final script:

#cat /etc/profile.d/qlogin_timelimit_message.sh
#!/bin/bash

if [ ! -n "$JOB_ID" ]; then
        GO="false";
        MYPARENT=`ps -p $$ -o ppid --no-header`
#       echo $MYPARENT
        MYPARENT=`ps -p $MYPARENT -o ppid --no-header`
#       echo $MYPARENT
        MYSTARTUP=`ps -p $MYPARENT -o command --no-header`
#       echo $MYSTARTUP

        if [ "${MYSTARTUP:0:13}" = "sge_shepherd-" ]; then
                GO="true";
        else
                MYPARENT=`ps -p $MYPARENT -o ppid --no-header`
#               echo $MYPARENT
                MYSTARTUP=`ps -p $MYPARENT -o command --no-header`
#               echo $MYSTARTUP
                if [ "${MYSTARTUP:0:13}" = "sge_shepherd-" ]; then
                        GO="true";
                fi
        fi



#        if [ "${MYSTARTUP:0:13}" = "sge_shepherd-" ]; then
        if [ "$GO" = "true" ]; then
#               echo "Running inside SGE";
                MYJOBID=${MYSTARTUP:13}
                MYJOBID=${MYJOBID% -bg}
#               echo "Job $MYJOBID"

                if [ -n "$MYJOBID" ]; then
                . /opt/gridengine/default/common/settings.sh
TIMELIMIT=`qstat -j $MYJOBID | sed -n -e "/^context/s/^context: *//p" | tr "," "\n" | sed -n -e "s/^QLOGIN_TIMELIMIT=//p"`
#               echo $TIMELIMIT
                        if [ -n "$TIMELIMIT" ]; then

echo -e "\n\n"
echo -e "\t\x1b\x5b1;31;49m#################################################\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\t\t* W A R N I N G *\t\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#################################################\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m\t\t\t\t\t\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" ##print ("\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m | | | | | \x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m The qlogin job you submitted did not request\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m any time duration.\t\t\t\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m\t\t\t\t\t\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m This qlogin session has been assigned a\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m duration of\x1b\x5b1;32;49m ${TIMELIMIT}\x1b\x5b0;39;49m. After this time expires,\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" #echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m duration of \x1b\x5b1;32;49m2 hours\x1b\x5b0;39;49m. After this time expires,\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m the qlogin session will close.\t\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m\t\t\t\t\t\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m If you want to submit a qlogin session with\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m longer duration, please add to your resource\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m request a time petition by adding the\t\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m following parameter to your qlogin command:\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m \t-l h_rt=hh:mm:ss\t\t\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m as in\t\t\t\t\t\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m \t-l h_rt=02:30:00 (2 hours 30 minutes)\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m\t\t\t\t\t\t\x1b\x5b1;31;49m#\x1b\x5b0;39;49m" echo -e "\t\x1b\x5b1;31;49m#################################################\x1b\x5b0;39;49m"
echo -e "\n"

                        fi
                fi
        fi
fi



And this is the message it produces (in full blown color).

$ qlogin -l h_vmem=500M
Your job 4550191 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 4550191 has been successfully scheduled.
Establishing /opt/gridengine/bin/rocks-qlogin.sh session to host compute-1-2.local ... Warning: Permanently added '[compute-1-2.local]:53921' (RSA) to the list of known hosts.
Last login: Wed Jun 18 18:55:04 2014 from floquet.local
Rocks Compute Node
Rocks 6.0 (Mamba)
Profile built 15:58 25-Sep-2013

Kickstarted 16:05 25-Sep-2013



    #################################################
    #        * W A R N I N G *        #
    #################################################
    #                        #
    # The qlogin job you submitted did not request    #
    # any time duration.                #
    #                        #
    # This qlogin session has been assigned a    #
    # duration of 2 hours. After this time expires,    #
    # the qlogin session will close.        #
    #                        #
    # If you want to submit a qlogin session with    #
    # longer duration, please add to your resource    #
    # request a time petition by adding the        #
    # following parameter to your qlogin command:    #
    #     -l h_rt=hh:mm:ss            #
    # as in                        #
    #     -l h_rt=02:30:00 (2 hours 30 minutes)    #
    #                        #
    #################################################


[theredia@compute-1-2 ~]$




Thank you very much,

Txema






El 18/06/14 17:37, Reuti escribió:
Am 18.06.2014 um 17:01 schrieb Txema Heredia:

El 17/06/14 17:23, Reuti escribió:
Am 17.06.2014 um 15:46 schrieb Txema Heredia:

Basically, the JSV checks the CLIENT parameter. If it is equal to "qlogin", 
then checks if there is any h_rt, and sets a limit on 2h:10min if there is none, while 
showing a colorful warning message to the user so they know that there is a time limit.
Then, it also sets all.q as a hard queue if there is none (or *) and also sets 
the core binding policy.

This works wonders when I run it as a client jsv by using "qlogin -jsv 
/opt/gridengine/default/common/jsv.pl". The message appears, the limit is set, and 
the job runs fine.

But, once I set this as a server JSV (by qconf -mconf global), the time limit 
no longer applies.

As far as I've been able to find the following behaviours differ from running 
it as client or server jsv:

- The 'CLIENT' parameter changed, from 'qlogin' to 'qmaster'. This skips all my 
"if" in the jsv and stops checking for time limits. Can I trust this? Why is 
this 'qmaster' appearing? Now both qsub's and qlogin's show the same command. How can I 
distinguish them?
I found the same:

http://gridengine.org/pipermail/users/2012-September/004808.html

You can check for QRSH_PORT port according to William's post.

Thank you as usual, Reuti.

That QRSH_PORT env variable allows to differentiate between qlogin and qsub 
commands. But I am still having some problems.

- The jsv_show_params() command shows nothing. Neither on stdout nor in 
/opt/gridengine/default/spool/qmaster/messages This makes debugging really 
cumbersome
For me it's working, being it Bash or Perl.

06/17/2014 17:13:30|worker|pc15370|I|got param: A='sge'
06/17/2014 17:13:30|worker|pc15370|I|got param: GROUP='users'
06/17/2014 17:13:30|worker|pc15370|I|got param: N='test.sh'
06/17/2014 17:13:30|worker|pc15370|I|got param: CMDNAME='test.sh'
06/17/2014 17:13:30|worker|pc15370|I|got param: CMDARGS='0'
06/17/2014 17:13:30|worker|pc15370|I|got param: JOB_ID='11553'
06/17/2014 17:13:30|worker|pc15370|I|got param: M='reuti@pc15370'
06/17/2014 17:13:30|worker|pc15370|I|got param: CLIENT='qmaster'
06/17/2014 17:13:30|worker|pc15370|I|got param: VERSION='1.0'
06/17/2014 17:13:30|worker|pc15370|I|got param: USER='reuti'
06/17/2014 17:13:30|worker|pc15370|I|got param: CONTEXT='server'
My bad. I had my loglevel set to log_warning instead of log_info. Now I can see 
these messages.

- No message can be sent to the user.  Being it info, warning or error. The 
user won't know if I have set a time limit to his session
Yep, only for "jsv_reject_wait" a message can be displayed. Despite the fact that also for 
"jsv_correct" and "jsv_accept" a message can be specified too.

-- Reuti

Trying to overcome this, I thought of making my JSV add an environment variable 
"IS_QLOGIN=true" whenever it detects the QRSH_PORT. Then, a prolog script in 
the execution host would check that environment variable and print, if needed, the 
timelimit message to the user.
BUT, the prolog script cannot print anything to the standard output. (¿because 
it is run before the actual session begins?)
Yep, its stdout is not connected to the terminal yet.


So, I thought about modifying the .bashrc file (or any other of the several scripts under /etc/profile.d/), 
and make it read that "IS_QLOGIN=true" environment variable. But, again, the fates are against me, 
and qlogin commands cannot use the "-v" parameter. Even if I use the jsv_add_env() command in the 
JSV script, that environment variable is passed to the prolog script, but is nowhere to be found once the 
"real" qlogin session starts.

I could also ignore both the JSV and the prolog scripts, go directly to the 
.bashrc script and check there for the presence of (or lack thereof) variables 
like JOB_ID or JOB_NAME. That would allow me to distinguish between qlogin and 
qsub sessions (qlogins and ssh environments are identical). But then, again, I 
won't have access to anything able to tell the script if there is a timelimit 
set.

The only solution to all this mess that comes to my mind would be to make 
/etc/motd writable by all users, have the prolog script to modify it with the 
timelimit message, and then have some sort of contraption in /etc/profile.d/ 
that resets the motd back to its previous non-qlogin version.

Does anyone have a better (or less-prone-to-failure) idea?
In the JSV you can also add some job context and fill it with a proper message. 
This could then be output:

$ qrsh -ac "MESSAGE=Time limit of 12 hrs set."
Running inside SGE
Job 11562
Time limit of 12 hrs set.

The necessary profile for bash (depending on builtin/ssh/rsh the parent must be 
looked up more than once):

MYPARENT=`ps -p $$ -o ppid --no-header`
#MYPARENT=`ps -p $MYPARENT -o ppid --no-header`
#MYPARENT=`ps -p $MYPARENT -o ppid --no-header`
MYSTARTUP=`ps -p $MYPARENT -o command --no-header`

if [ "${MYSTARTUP:0:13}" = "sge_shepherd-" ]; then
    echo "Running inside SGE"
    MYJOBID=${MYSTARTUP:13}
    MYJOBID=${MYJOBID% -bg}
    echo "Job $MYJOBID"

    if [ -n "$MYJOBID" ]; then
       . /usr/sge/default/common/settings.sh
        qstat -j $MYJOBID | sed -n -e "/^context/s/^context: *//p" | tr "," "\n" | sed -n 
-e "s/^MESSAGE=//p"
    fi
fi

HTH - Reuti

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to