Thanks for that, Reuti - I found the trace file and also discovered that for this new node, the mail was not set up correctly.
Bu the base problem remains: the job continues to result in the following trace: ==================== 12/14/2011 12:31:13| main|claw8|E|shepherd of job 1438689.1 exited with exit status = 7 12/14/2011 12:31:13| main|claw8|W|reaping job "1438689" ptf complains: Job does not exist 12/14/2011 12:31:13| main|claw8|E|abnormal termination of shepherd for job 1438689.1: "exit_status" file is empty ==================== with the job remaining in the job Q. The job itself is a variant Sun's 'simple.sh' with some testing of SGE variables for a new GPU Q. ==================== #!/bin/sh #$ -S /bin/bash #$ -q gpu # print date and time date echo "job owner = " $USER echo "job_id = " $JOB_ID ==================== This Q has a prolog that calls 'cuda_wrapper' <http://sourceforge.net/projects/cudawrapper/>: Prolog: [<apppath>cuda_wrapper/bin/wrapper_init -u $USER -k $JOB_ID -c /etc/cuda_wrapper.conf -g <apppath>/gpu.devices] (which executes without error from the commandline, given a dummy $JOB_ID) but the trace and job status: ==================== error reason 1: shepherd exited with exit status 7: before prolog 1: shepherd exited with exit status 7: before prolog ==================== imply that the shepherd exits before it even gets to the prolog. Do these errors imply a further setup error? On Wednesday 14 December 2011 11:25:25 Reuti wrote: > Hi, > > Am 14.12.2011 um 19:49 schrieb Harry Mangalam: > > I'm trying to debug a failing qsub script (SGE 6.2) which is > > stuck in the queue, returning: ---------------------- > > error reason 1: shepherd exited with exit status 7: > > before prolog > > > > 1: shepherd exited with exit status 7: > > before prolog 1: shepherd exited with > > exit status 7: before prolog 1: > > shepherd exited with exit status 7: before > > prolog 1: shepherd exited with exit > > status 7: before prolog > > > > ---------------------- > > Is there a way to extract more info from the shepherd via a > > trace. I see repeated references to a 'shepherd trace' but > > can't seem to see how to turn it on. Is this related to the > > debug level? Does 'dl' have to elevated in the launching shell > > to turn it on? > > can you check the trace file of the job in the spool directory of > the job? Are administrative mails send to anyone, there it should > be included? > > -- Reuti -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) -- This signature has been OCCUPIED!
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users