I'm looking to add support for BLCR into our grid engine configuration. A quick google suggests the scripts here https://github.com/HPCKP/BLCR-GridEngine-Integration should be useful for such integration. Not wishing to engage in cargo cult sysadmin I'm trying to understand what the scripts actually do. It appears that the id to be fed to cr_checkpoint is supposed to be generated by the following command: pstree -p $pid | head -1 | perl -pe '$p="g\?time"; $p=cr_restart if(/cr_restart\(\d+\)/);s/.*-$p\(\d+\)[-\+]+[^(]+\((\d+)\)/$1/g;'
As far as I can tell this attempts to extract from the first line of pstree -p $pid's output the process id of the first child of cr_restart command if present or time/gtime if not. cr_restart should presumably be the ancestor of the useful parts of jobs that have been restored at least once but I can't see any reason in the scripts to expect that time will be the ancestor of all the useful parts of a job prior to the first restart. Feeding pids that don't have time or cr_restart as a descendant on the first line just produces the first line of pstree -p's output which doesn't look like it would be useful for feeding to cr_chckpoint. William _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users