Hi All.

We are using Son of GE 8.1.7 with checkpoint & BLCR.   All works great.

Even though everything works just fine, SGE log message shows the following 
when a job is migrated:

2/05/2014 22:18:08|worker|hpc-s|W|job 3029146.1 failed on host compute-7-5.local 
migrating because: <unknown reason>

And the email notification also shows the "failed migrating: <unknown reason>".

Job 3029146 (TEST) Migrates
 Exit Status      = 0
 Signal           = unknown signal
 User             = me
 Queue            [email protected]
 Host             = compute-7-5.local
 Start Time       = 12/05/2014 22:16:03
 End Time         = 12/05/2014 22:18:07
 CPU              = 00:00:00
 Max vmem         = 12.078M
failed migrating because:
<unknown reason>


The checkpoint scripts all exit with 0, so I am not sure where this is coming 
from.    The checkpoint process
works just fine but it would be nice to correct this as it freaks out our users.

Thank you,
Joseph

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to