Hi Reuti,

> -----Original Message-----
> From: Reuti <[email protected]>
> Sent: Wednesday, March 28, 2018 11:52 PM
> To: Mun Johl <[email protected]>
> Cc: [email protected]
> Subject: Re: [gridengine users] Weird interaction between TCL and SGE
> 
> [EXTERNAL EMAIL]
> This email was received from outside the organization.
> ________________________________
> 
> Hi,
> 
> Am 29.03.2018 um 06:40 schrieb Mun Johl:
> 
> > HI,
> >
> > I'm updating some of our TCL scripts to submit jobs to grid and I've run 
> > into
> an issue I can't explain.  In the TCL script I initialize a variable thusly:
> >
> > set hostname ""
> >
> > and that gets passed into the qsub command as follows:
> >
> > set status [catch {exec qsub -b yes -cwd -sync n -V -q short.q -l vl
> > $hostname tclsh $sim} result]
> 
> What does $hostname stands for? Do you want the job to start on a
> particular machine?

Precisely.  That is the end goal.

> > SGE v8.1.9 is not happy with that for some reason, and I get nasty emails
> such as the following:
> >
> >
> ================================================================
> ======
> > ===== failed before prolog: shepherd exited with exit status 7: before
> > prolog Shepherd trace:
> > 03/28/2018 16:25:20 [495:19306]: shepherd called with uid = 0, euid =
> > 495
> > 03/28/2018 16:25:20 [495:19306]: starting up 8.1.9
> > 03/28/2018 16:25:20 [495:19306]: setpgid(19306, 19306) returned 0
> > 03/28/2018 16:25:20 [495:19306]: do_core_binding: "binding" parameter
> > not found in config file
> > 03/28/2018 16:25:20 [495:19306]: no prolog script to start
> >
> > Shepherd pe_hostfile:
> > sim.company.com 1 [email protected] UNDEFINED
> >
> > Furthermore, the spool/sim/messages has the following messages:
> >
> > 03/28/2018 21:15:36|  main|sim|E|shepherd of job 134.1 died through
> > signal = 11
> > 03/28/2018 21:15:36|  main|sim|E|abnormal termination of shepherd for
> > job 134.1: no "exit_status" file
> > 03/28/2018 21:15:36|  main|sim|E|can't open file
> > active_jobs/134.1/error: No such file or directory
> >
> > If I initialize hostname to a space (" "), the job will run just fine.
> >
> > Moreover, if I assign a var to something like "-l vl" to reserve a 
> > consumable
> resource, SGE complains thusly:
> >
> > qsub: invalid option argument "-l vl"
> 
> Looks like TCL will give the argument as one, but SGE expects two. Separating
> them as "-l" and "vl" might work, on the command line the splitting is done
> by the shell where $foo will be split but "$foo" won't and raise an error 
> too. I
> have no clue whether TCL has an option `eval` the expression to split the
> options.

I'm not sure I understand: It does appear as if the "-l" and "vl" are separated 
by a space character.  Is that not enough?

Thanks,

-- 
Mun

> 
> -- Reuti
> 
> 
> > But if I "hard code" that option in the qsub command, the job will run
> correctly.
> >
> > I've tried to scrub 'hostname' of non-printable characters, and to strip and
> (unseen) white space, but nothing seems to work.  I don't know what gets
> embedded into the TCL strings that SGE seems to dislike.  It doesn't help that
> I'm relatively new to TCL.
> >
> > Any suggestions would be appreciated.
> >
> > Regards,
> >
> > --
> > Mun
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to