I'm looking to add support for BLCR into our grid engine
configuration.  A quick google suggests the scripts here
https://github.com/HPCKP/BLCR-GridEngine-Integration should be useful
for such integration.  Not wishing to engage in cargo cult sysadmin
I'm trying to understand what the scripts actually do.  It appears
that the id to be fed  to cr_checkpoint is supposed to be generated by
the following command:
pstree -p $pid | head -1 | perl -pe '$p="g\?time"; $p=cr_restart
if(/cr_restart\(\d+\)/);s/.*-$p\(\d+\)[-\+]+[^(]+\((\d+)\)/$1/g;'

As far as I can tell this attempts to extract from the first line of
pstree -p $pid's output the process id of the first child of
cr_restart command if present or time/gtime if not.  cr_restart should
presumably be the ancestor of the useful parts of jobs that have been
restored at least once but I can't see any reason in the scripts to
expect that time will be the ancestor of all the useful parts of a job
prior to the first restart.  Feeding pids that
don't have time or cr_restart as a descendant on the first line just
produces the first line of pstree -p's output which doesn't look like
it would be useful for feeding to cr_chckpoint.

William
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to