Am 16.06.2011 um 15:03 schrieb baf035: > we are using SoGE rel. 3910 for tests. > Submited jobs are correcty dispatched but no informations are stored in a > spool direcrory <SPOOL_DIR>/qmaster/jobs.
You are using classic spooling? > In a qmaster messages file are inforamations about missing file/folder at the > time of ending of job: > ---------------- > 6/16/2011 10:06:30|schedu|sged2|E|can't find parallel task 50993.1 task > past_usage for update in function pe_task_update_master_list_usage > 06/16/2011 10:06:30|schedu|sged2|E|callback function for event "3941466. > EVENT JOB 50993.1 task past_usage USAGE" failed > 06/16/2011 10:07:10|worker|sged2|E|unlink(jobs/00/0005/0993/common) failed: > No such file or directory > 06/16/2011 10:07:10|worker|sged2|E|can not remove file job spool file: > jobs/00/0005/0993/common The "common" is strange here. What I saw in the past was just a plain file like 0993 containing binary information of the job. > 06/16/2011 10:07:10|worker|sged2|E|can not remove file job spool directory: > jobs/00/0005/0993 > --------------- > qacct -j 50993 | grep end_time | uniq > end_time Thu Jun 16 10:05:52 2011 > -------------- > > > A migration of the qmasterd leads to a total lost of job informations. No > jobs in qstat after the migration. > > We have encountered also a case when files in <SPOOL_DIR>/qmaster/jobs are > correctly created but during > the migration disappeard without a log in the messages file. And it's in a shared space? -- Reuti > Please validate this behavior and thanks for a fix. > > baf035 > _______________________________________________ > SGE-discuss mailing list > [email protected] > https://arc.liv.ac.uk/mailman/listinfo/sge-discuss _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
