> Hi, > > Am 19.08.2014 um 15:26 schrieb Wouter Verhelst: > > > I've been experimenting with checkpoints, and things mostly seem to > work; I can submit a shell script with qsub, passing on a -ckpt > parameter, and then whenever the jobs get suspended (due to > subordinate_queues), they migrate to a different host. So far so good. > > > > However, my users actually submit jobs from cadence adexl, which > AFAICS does not have an option to specify a checkpointing environment. > This would make all the work I've put in moot. > > I don't know "adexl"
"Analog Design Environment XL" > - is there an option to specify any arbitrary > command line option to the `qsub` command Unfortunately not. > - how is this defined? Cadence has created a dialog where I can specify things like the queue to use, the priority, whatever resource requirements I might have and a number of other things, but it's fairly limited. > - you could use a wrapper to `qsub` > - you could use a JSV, which whenever a queue is selected (how do you > do this right now?), the checkpoint environment is requested too I'm not entirely sure (lots of black magic here), but it would appear that cadence just calls qsub with the parameters it was given in that dialog box. > - you could use a default options file for jobs submitted form a > certain directory (`man qsub` => "ge_request", actually it's > ".sge_request" for the users' files besides the system wide global > "sge_request" one) I gave that a try, but it doesn't seem to actually checkpoint the jobs. Not sure why; I'll experiment a bit more. Thanks for the pointer! > > I was thinking I should be able to define a default checkpoint > environment for a particular queue, but that doesn't seem to be > possible. Am I missing something? If not, does anyone have any other > suggestions? > > The definition in the queue makes the checkpoint environment only > available to the queue, but you can also submit jobs to the queue > without requesting the checkpoint environment. I had noticed that :-) > A forced checkpoint > environment does not exist (sure, this could be an RFE - but making the > calendar complex forced is not working in the way I expected too). > > BTW: When you use the checkpoint environment, maybe you don't need any > means any longer to select a particular queue, as the checkpoint > environment is attached to the queue only where it should run anyway. No, but I do need queues to select on which machine to run the job out of the variety of machines we have: some have much memory, some have little; some have modern CPUs, others have CPUs of a few years old; some have 16 cores, others have 8; etc. This could all be done by having my users request a given amount of resources, but we've tried that and it failed; it was much easier to define a number of queues based on the types of simulation they're trying to run, and then schedule based on the expected type of workload common for that type of simulation. _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
