Hi, Am 14.11.2014 um 00:34 schrieb Brian Small:
> Hello all, this is my first time posting to this mailing list. > > About 1% or less of our qrsh grid jobs are failing in an unusual way. > > We are running Open Grid Scheduler 2011.11 on CentOS 6.5. > > The small percentage of failing qrsh jobs get a non-zero exit status back to > the submit host (exit status 1), and display this message: What do you start by `qrsh` - a binary or a script? This sounds like the probably started script wants to start another `qrsh`. In case it's a script, the first line with "#!/bin/sh -x" will list the executed commands. -- Reuti NB: The side effect of "-now n" is that the job will go to a queue of "qtype" set to "BATCH", while "-now y" will route to a queue with "qtype" being "INTERACTIVE" (the same applies when this option is used for `qsub`). > Your "qrsh" request could not be scheduled, try again later. > > Note, we do include the “-now n” option on the command line. > > Also the qacct log shows the job as having completed successfully: > > qsub_time Thu Nov 13 14:17:47 2014 > start_time Thu Nov 13 14:21:13 2014 > end_time Thu Nov 13 14:25:15 2014 > granted_pe NONE > slots 1 > failed 0 > exit_status 0 > ru_wallclock 242 > ru_utime 226.439 > ru_stime 5.383 > > And reviewing the working directory, it does look like the job completed > properly. > > I’m not sure how to take the next step in debugging this problem. Any advice? > > Brian Small > Northwest Logic > 1100 NW Compton Drive, Ste. 100 > Beaverton, OR 97006 > Desk - 503-533-5800 x-320 > Cell - 503-577-6869 > Fax: 503-533-5900 > E-mail - [email protected] > Web - www.nwlogic.com > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
