Hi All. We are using Son of GE 8.1.7 with checkpoint & BLCR. All works great.
Even though everything works just fine, SGE log message shows the following when a job is migrated: 2/05/2014 22:18:08|worker|hpc-s|W|job 3029146.1 failed on host compute-7-5.local migrating because: <unknown reason> And the email notification also shows the "failed migrating: <unknown reason>". Job 3029146 (TEST) Migrates Exit Status = 0 Signal = unknown signal User = me Queue [email protected] Host = compute-7-5.local Start Time = 12/05/2014 22:16:03 End Time = 12/05/2014 22:18:07 CPU = 00:00:00 Max vmem = 12.078M failed migrating because: <unknown reason> The checkpoint scripts all exit with 0, so I am not sure where this is coming from. The checkpoint process works just fine but it would be nice to correct this as it freaks out our users. Thank you, Joseph _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
