Thanks for that, Reuti - I found the trace file and also discovered 
that for this new node, the mail was not set up correctly.

Bu the base problem remains:  the job continues to result in the 
following trace:
====================
12/14/2011 12:31:13|  main|claw8|E|shepherd of job 1438689.1 exited 
with exit status = 7
12/14/2011 12:31:13|  main|claw8|W|reaping job "1438689" ptf 
complains: Job does not exist
12/14/2011 12:31:13|  main|claw8|E|abnormal termination of shepherd 
for job 1438689.1: "exit_status" file is empty
====================

with the job remaining in the job Q.

The job itself is a variant Sun's 'simple.sh' with some testing of SGE 
variables for a new GPU Q.
====================
#!/bin/sh
#$ -S /bin/bash
#$ -q gpu

# print date and time
date
echo "job owner = " $USER 
echo "job_id = " $JOB_ID 
====================


This Q has a prolog that calls 'cuda_wrapper' 
<http://sourceforge.net/projects/cudawrapper/>:

Prolog: [<apppath>cuda_wrapper/bin/wrapper_init -u $USER -k $JOB_ID -c 
/etc/cuda_wrapper.conf -g <apppath>/gpu.devices]
(which executes without error from the commandline, given a dummy 
$JOB_ID)

but the trace and job status:
====================
error reason    1:          shepherd exited with exit status 7: before 
prolog
                1:          shepherd exited with exit status 7: before 
prolog
====================

imply that the shepherd exits before it even gets to the prolog.  Do 
these errors imply a further setup error?  


On Wednesday 14 December 2011 11:25:25 Reuti wrote:
> Hi,
> 
> Am 14.12.2011 um 19:49 schrieb Harry Mangalam:
> > I'm trying to debug a failing qsub script (SGE 6.2) which is
> > stuck in the queue, returning: ----------------------
> > error reason    1:          shepherd exited with exit status 7:
> > before prolog
> > 
> >                 1:          shepherd exited with exit status 7:
> >                 before prolog 1:          shepherd exited with
> >                 exit status 7: before prolog 1:         
> >                 shepherd exited with exit status 7: before
> >                 prolog 1:          shepherd exited with exit
> >                 status 7: before prolog
> > 
> > ----------------------
> > Is there a way to extract more info from the shepherd via a
> > trace.  I see repeated references to a 'shepherd trace' but
> > can't seem to see how to turn it on. Is this related to the
> > debug level?  Does 'dl' have to elevated in the launching shell
> > to turn it on?
> 
> can you check the trace file of the job in the spool directory of
> the job? Are administrative mails send to anyone, there it should
> be included?
> 
> -- Reuti

-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697  Google Voice Multiplexer: (949) 478-4487 
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
This signature has been OCCUPIED!
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to