We have pipelines that are driven by a qsub at the end of a batch script. Error tracking is an issue but sometimes it's easier to do that than to engineer a raft of job dependencies. As you note, concurrency can be an issue, but there are a number of ways to deal with that:
* Lock file in a POSIX-compliant filesystem * Semaphore in a network-accessible database You can prevent dead jobs from stalling other jobs by tying the lock/semaphore back to a job and ensuring that it's still running. On Thu, Feb 25, 2016 at 11:16:49PM +0200, Ben Daniel Pere wrote: > Where I work, we have jobs that submit jobs that submit jobs.. this could > potentially cause a deadlock but we're somehow (probably luck) manage to > live with it.. I'm wondering if that's a reasonable practice and if not if > you can suggest a better way to do what we do.. > > Example: > > we have these 3 tasks: > > - "analyze.day" job analyzed a day of data and returns some output > - "analyze.month" job sends "analyze.day" jobs for a whole month and > outputs summary > - "analyze.year" job sends "analyze.month" jobs for a whole year and > outputs summary > > usually people run analyze.day everyday on previous day but sometimes they > test their new algorithm on a whole year so they dispatch analyze.year > which dispatched analyze.month which dispatched analyze.day.. > We created a "dispatching" queue which is the only queue we allow > submitting jobs from but since both analyze.year and analyze.month need to > run there (both dispatch tasks) we could end up with a dead lock > (theoretically, lots of analyze.year running together taking all > dispatching queue slots and not leaving room for analyze.month tasks which > they will forever wait for), also besides dispatching they also do some > logic so it's a strange animal, this "dispatching" queue.. > > What's the "correct" practice here? > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users -- -- Skylar Thompson ([email protected]) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
