On 07/10/2012 01:24 PM, Rayson Ho wrote:
On Tue, Jul 10, 2012 at 3:51 PM, Reuti<re...@staff.uni-marburg.de>  wrote:
Joseph,

To debug the difference in behavior:

1) make sure that you can always reproduce the job failure.

2) then submit jobs to a node that fails the job&  to a node that does
not fail the job.

3) in the job script, add logic so that if things that you expect
would work fail to work, then run sleep in a loop (infinite sleep).

4) then log into the node, and in the job's spool directory, you go to
the "config" file and look for: shell_path, shell_start_mode,
use_login_shell - compare the values of the 2 jobs and you should be
able to tell what's going on...

This way, you can tell what's going on, instead of just working around things!

Rayson

Hi Rayson.   Thank you for all of these good suggestions.

Let me say that I been doing sysadmin work for 25 years with the last 10 being 
solely dedicated to Cluster work.

If I may offer back some helpful suggestions as well for you and others who 
support this product, is to not assume we all know or understand Grid Engine to 
the degree you do.

For example, on several ( not one, but several ) responses I have received from you and 
others on this very helpful forum on OGE, I don't understand what is being said because 
many times people here assume we know this product to the same level you guys do and we 
do not.     On several occasions, I have taken your forum responses to the Grid Engine 
"gurus" ( folks that have been using GE for years at this campus ) and they 
don't understand your answers either!

So if the folks who have been using GE for many years have a hard time 
understanding your answers, I who just started with OGE will have that much of 
a much harder time.

I do understand the principals at play however - I have been using Torque/Maui 
for a long time and I can navigate that very well.   We now need to move to OGE 
so I am trying to make OGE work in a similar fashion to what we had, so it's 
the syntax and general concepts in OGE that I am struggling with.

So in short, thank you and the rest here for all of your good work and 
helpfulness with OGE and keep in mind that we are here asking questions because 
we don't understand OGE to the same level and depth you guys know it.   We 
probably come across as dummy's at times, so keep in mind the above and don't 
assume we get the entire picture because as I said, guys here with much more 
experience than I with GE don't get your responses either.

Best,
Joseph





_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to