Hi we are running sge 6.2u5. I am trying to restart jobs via checkpointing. On one of our clusters that works fine - jobs is suspended via the suspend command, is stopped, rescheduled in the queue and restarted if resources are available.
With apparently the same setup of the sge on a second cluster my jobs are rescheduled but do not get started. qstat -sj shows "cannot run on host XXX until clean up of an previous run has finished" If the job is deleted from the queue and restarted manually works perfect. Is there a way to get a more elaborate error message and to find out what exactly goes wrong with the cleanup? Juryk This e-mail and any attachment thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. Any access of third parties to this e-mail is unauthorised. Any use of this e-mail by unintended recipients such as total or partial copying, distribution, disclosure etc. is prohibited and may be unlawful. When addressed to our clients the content of this e-mail is subject to the General Terms and Conditions of GL's Group of Companies applicable at the date of this e-mail. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. GL's Group of Companies does not warrant and/or guarantee that this message at the moment of receipt is authentic, correct and its communication free of errors, interruption etc. FutureShip GmbH, HRB 106781 AG HH, VAT Reg. No. DE263937825 Geschäftsführer (CEO): Volker Höppner, Henning Kinkhorst, Stefan Deucker _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users