On 7 September 2012 19:41, Reuti <re...@staff.uni-marburg.de> wrote: > Am 07.09.2012 um 18:39 schrieb Lars van der bijl: > >> On 7 September 2012 17:48, Reuti <re...@staff.uni-marburg.de> wrote: >>> Am 07.09.2012 um 17:45 schrieb Lars van der bijl: >>> >>>> On 7 September 2012 17:23, Reuti <re...@staff.uni-marburg.de> wrote: >>>>>> would it be possible to change the execd to put any job that does not >>>>>> exit with 0 into an error state? regardless of it being a kill -9? >>>>> >>>>> You can rerun the job automatically if you exit the epilog with 99. >>>> >>>> yes but with 137 or 139 i can't. and as the task hasn't successfully >>>> finished i don't want it to start it's dependencies. i'd rather it >>>> just go to a error state. >>> >>> You observe, that a job being rescheduled by exit 99 will trigger its >>> successors by -hold_jid to start? >>> >> no when i'm able to raise a 99 exit status it will not trigger it's >> dependencies. however a task killed because of 137 or 139 do. >> and I'd rather them error out with 100 them to be removed from the >> queue all together. >> >> i know that the grid uses 137 when you request a qdel. and this makes >> it kinda hard to stop a task if anything else would be put in a 100 >> error state. > > No, the chain of commands is the other way round. The `qdel` will send > sigkill to the job and remove it from the list of jobs in the system > (whatever you do or set in the epilog doesn't matter, as the job is to be > removed by the `qdel`). > > You can for example: > > - Submit all jobs with a user hold of the successor(s), this user hold you > can be removed in the epilog of the predecessor if it ran successful. The > name/jobid of the successor to be released could be put in a job context > which you have to read in the epilog and act accordingly.
I could create this with my database layer however our system relies very heavily on batching. so task1 -> task2 with the same batch range but with different batch sizes. for example 1-100:25 for task1 and 1-100:1 for task2. how would I be able to find out what the other range is and how would i be able to un-hold that specific range? > > - Create a special queue for some kind of `enabler' jobs which run forever > (loop e.g. once a minute until they quit), the original job will create/touch > a special file for which the `enabler' is waiting. If the existence of the > relevant file is detected, the `enabler' can release a hold of a certain job > or even just submit the successor job. > > - Creating a workflow can be done with: http://wildfire.bii.a-star.edu.sg/ > tool GEL http://wildfire.bii.a-star.edu.sg/docs/gel_ref.pdf where you can > check for files. But the jobs will be submitted during the workflow and not > all in advance. Maybe it is useful anyway. > > -- Reuti. it would still be nice to know if it where possible to know implement the "dormant" task approach. the company I work for would be willing to pay for such development. depending on the feasibility. _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users