On Thu, Feb 25, 2016 at 05:09:41PM -0500, [email protected] wrote:
> So, the lockfile contains the jobID, or something similar?
> 
> Some jobs use a file as a flag (ie. checking the existence or content), but 
> we've largely avoided POSIX file locking (the mixture of NFS, GPFS, & CIFS
> here should work....but it gets complicated quickly).

The folks who have used this technique most successfully track job status
in a Postgres database, but having the job ID in a file would be OK too.
Note that if you run lots of jobs, older Grid Engine versions will
wraparound job IDs at 10 million. Our newer UGE clusters don't have this
limitation, but I'm unsure when this changed.
 
> Our better 'chained' jobs use the SGE "hold" feature, some are launched as
> array jobs, and some (the really ulgy ones) loop within a shell script,
> checking for files that indicate that a prerequisite job finished,
> checking for errors in the prereq, or loop over 'qstat' checking if a
> specific jobid has completed.

Yep, we recommend folks use -hold_jid whenever possible. Some of our labs
also use DRMAA but obviously that introduces even more complexity.

-- 
-- Skylar Thompson ([email protected])
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to