Am 19.08.2014 um 17:06 schrieb Wouter Verhelst: >> Hi, >> >> Am 19.08.2014 um 15:26 schrieb Wouter Verhelst: >> >>> I've been experimenting with checkpoints, and things mostly seem to >> work; I can submit a shell script with qsub, passing on a -ckpt >> parameter, and then whenever the jobs get suspended (due to >> subordinate_queues), they migrate to a different host. So far so good. >>> >>> However, my users actually submit jobs from cadence adexl, which >> AFAICS does not have an option to specify a checkpointing environment. >> This would make all the work I've put in moot. >> >> I don't know "adexl" > > "Analog Design Environment XL" > >> - is there an option to specify any arbitrary >> command line option to the `qsub` command > > Unfortunately not. > >> - how is this defined? > > Cadence has created a dialog where I can specify things like the queue to > use, the priority, whatever resource requirements I might have and a number > of other things, but it's fairly limited. > >> - you could use a wrapper to `qsub` >> - you could use a JSV, which whenever a queue is selected (how do you >> do this right now?), the checkpoint environment is requested too > > I'm not entirely sure (lots of black magic here), but it would appear that > cadence just calls qsub with the parameters it was given in that dialog box.
A server side JSV will be called for every job - it's something the admin can set up to enforce certain policies and what can't be bypassed. There are some examples in $SGE_ROOT/util/resources/jsv and the man pages "jsv" resp. "jsv_script_interface" (using a compiled language like Perl is much faster here than Bash, but for a proof of concept Bash will do too). Attaching a checkpoint environment in case a dedicated queue is selected is one utilization. -- Reuti >> - you could use a default options file for jobs submitted form a >> certain directory (`man qsub` => "ge_request", actually it's >> ".sge_request" for the users' files besides the system wide global >> "sge_request" one) > > I gave that a try, but it doesn't seem to actually checkpoint the jobs. Not > sure why; I'll experiment a bit more. Thanks for the pointer! > >>> I was thinking I should be able to define a default checkpoint >> environment for a particular queue, but that doesn't seem to be >> possible. Am I missing something? If not, does anyone have any other >> suggestions? >> >> The definition in the queue makes the checkpoint environment only >> available to the queue, but you can also submit jobs to the queue >> without requesting the checkpoint environment. > > I had noticed that :-) > >> A forced checkpoint >> environment does not exist (sure, this could be an RFE - but making the >> calendar complex forced is not working in the way I expected too). >> >> BTW: When you use the checkpoint environment, maybe you don't need any >> means any longer to select a particular queue, as the checkpoint >> environment is attached to the queue only where it should run anyway. > > No, but I do need queues to select on which machine to run the job out of the > variety of machines we have: some have much memory, some have little; some > have modern CPUs, others have CPUs of a few years old; some have 16 cores, > others have 8; etc. This could all be done by having my users request a given > amount of resources, but we've tried that and it failed; it was much easier > to define a number of queues based on the types of simulation they're trying > to run, and then schedule based on the expected type of workload common for > that type of simulation. > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
