Am 19.08.2014 um 17:06 schrieb Wouter Verhelst:

>> Hi,
>> 
>> Am 19.08.2014 um 15:26 schrieb Wouter Verhelst:
>> 
>>> I've been experimenting with checkpoints, and things mostly seem to
>> work; I can submit a shell script with qsub, passing on a -ckpt
>> parameter, and then whenever the jobs get suspended (due to
>> subordinate_queues), they migrate to a different host. So far so good.
>>> 
>>> However, my users actually submit jobs from cadence adexl, which
>> AFAICS does not have an option to specify a checkpointing environment.
>> This would make all the work I've put in moot.
>> 
>> I don't know "adexl"
> 
> "Analog Design Environment XL"
> 
>> - is there an option to specify any arbitrary
>> command line option to the `qsub` command
> 
> Unfortunately not.
> 
>> - how is this defined?
> 
> Cadence has created a dialog where I can specify things like the queue to 
> use, the priority, whatever resource requirements I might have and a number 
> of other things, but it's fairly limited.
> 
>> - you could use a wrapper to `qsub`
>> - you could use a JSV, which whenever a queue is selected (how do you
>> do this right now?), the checkpoint environment is requested too
> 
> I'm not entirely sure (lots of black magic here), but it would appear that 
> cadence just calls qsub with the parameters it was given in that dialog box.

A server side JSV will be called for every job - it's something the admin can 
set up to enforce certain policies and what can't be bypassed. There are some 
examples in $SGE_ROOT/util/resources/jsv and the man pages "jsv" resp. 
"jsv_script_interface" (using a compiled language like Perl is much faster here 
than Bash, but for a proof of concept Bash will do too).

Attaching a checkpoint environment in case a dedicated queue is selected is one 
utilization.

-- Reuti


>> - you could use a default options file for jobs submitted form a
>> certain directory (`man qsub` => "ge_request", actually it's
>> ".sge_request" for the users' files besides the system wide global
>> "sge_request" one)
> 
> I gave that a try, but it doesn't seem to actually checkpoint the jobs. Not 
> sure why; I'll experiment a bit more. Thanks for the pointer!
> 
>>> I was thinking I should be able to define a default checkpoint
>> environment for a particular queue, but that doesn't seem to be
>> possible. Am I missing something? If not, does anyone have any other
>> suggestions?
>> 
>> The definition in the queue makes the checkpoint environment only
>> available to the queue, but you can also submit jobs to the queue
>> without requesting the checkpoint environment.
> 
> I had noticed that :-)
> 
>> A forced checkpoint
>> environment does not exist (sure, this could be an RFE - but making the
>> calendar complex forced is not working in the way I expected too).
>> 
>> BTW: When you use the checkpoint environment, maybe you don't need any
>> means any longer to select a particular queue, as the checkpoint
>> environment is attached to the queue only where it should run anyway.
> 
> No, but I do need queues to select on which machine to run the job out of the 
> variety of machines we have: some have much memory, some have little; some 
> have modern CPUs, others have CPUs of a few years old; some have 16 cores, 
> others have 8; etc. This could all be done by having my users request a given 
> amount of resources, but we've tried that and it failed; it was much easier 
> to define a number of queues based on the types of simulation they're trying 
> to run, and then schedule based on the expected type of workload common for 
> that type of simulation.
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to