Hi,

Am 23.09.2015 um 21:18 schrieb Lane, William:

> Reuti,
> 
> 1.
> If more than one compute node takes part in the compute ring, how does one 
> determine
> which one is the exechost?

What do you mean by compute ring - a parallel job?

The exechost is the one where the job script is executed. Hence you can use 
$HOSTNAME in the jobscript to get its name (which also shows up in `qstat`).


> Or is the exechost always the node on which you submit a job?

There are only rare circumstances where a submit host is also an exechost (or 
vice versa). Usually the jobscript is executed on one of the exechosts (which 
may not be reachable by a login at all), while you submit on a machines where 
you logged into.

(There are exceptions in case of a CRAY where the jobscript may indeed run on a 
submit host, as you need `aprun` to push the real executable to the nodes while 
there is no load by the jobs on the submit machine at all).


> 2.
>> Not by default. You will have to use a mail wrapper which will scan the 
>> messages file of the exechost for an entry of this particular job and append 
>> it to the email. I can supply a snippet if you need.
> 
> We would be interested in implementing the above. Is there anyway to have an 
> email differentiate between a job being aborted because it exceeds the h_rt 
> constraint of a queue vs. other reasons?

After scanning the messages file you can decide to send the email or change the 
header as you like. See attached.


> 3.
> Another issue is what kind of values for h_rt should be used?  I've had jobs 
> last for 8 hours as well as nearly 24. What kind of stats would be good to 
> look at to determine what values of h_rt should be used?

This you have to decide on your own. What is judged as a  short job depends on 
the circumstances.

> 
> 4.
> Should there be nodes dedicated to the short.q queue?

Depends on your overall setup and goal. Do you want to have these nodes left 
free by other jobs, so that short jobs start instantly?

-- Reuti

[siehe angehÃĪngte Datei: mailer.sh]

Attachment: mailer.sh
Description: Binary data

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to