On 07/10/2012 01:24 PM, Rayson Ho wrote:
On Tue, Jul 10, 2012 at 3:51 PM, Reuti<re...@staff.uni-marburg.de> wrote: Joseph, To debug the difference in behavior: 1) make sure that you can always reproduce the job failure. 2) then submit jobs to a node that fails the job& to a node that does not fail the job. 3) in the job script, add logic so that if things that you expect would work fail to work, then run sleep in a loop (infinite sleep). 4) then log into the node, and in the job's spool directory, you go to the "config" file and look for: shell_path, shell_start_mode, use_login_shell - compare the values of the 2 jobs and you should be able to tell what's going on... This way, you can tell what's going on, instead of just working around things! Rayson
Hi Rayson. Thank you for all of these good suggestions. Let me say that I been doing sysadmin work for 25 years with the last 10 being solely dedicated to Cluster work. If I may offer back some helpful suggestions as well for you and others who support this product, is to not assume we all know or understand Grid Engine to the degree you do. For example, on several ( not one, but several ) responses I have received from you and others on this very helpful forum on OGE, I don't understand what is being said because many times people here assume we know this product to the same level you guys do and we do not. On several occasions, I have taken your forum responses to the Grid Engine "gurus" ( folks that have been using GE for years at this campus ) and they don't understand your answers either! So if the folks who have been using GE for many years have a hard time understanding your answers, I who just started with OGE will have that much of a much harder time. I do understand the principals at play however - I have been using Torque/Maui for a long time and I can navigate that very well. We now need to move to OGE so I am trying to make OGE work in a similar fashion to what we had, so it's the syntax and general concepts in OGE that I am struggling with. So in short, thank you and the rest here for all of your good work and helpfulness with OGE and keep in mind that we are here asking questions because we don't understand OGE to the same level and depth you guys know it. We probably come across as dummy's at times, so keep in mind the above and don't assume we get the entire picture because as I said, guys here with much more experience than I with GE don't get your responses either. Best, Joseph _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users