Am 12.05.2014 um 20:12 schrieb Karun K: > The qsub man page says "If job scripts are available on the execution nodes, > e.g. via NFS, binary submission can be the better choice".
This depends. In case of a non-binary, the script will be copied at submission time and this copy later be executed. I.e., you can even delete the job script like our submission system is doing it. Submitting a job script as binary might lead to the effect, that you apply changes to the script and suddenly jobs submitted in the past fail due to an error in the script - as the actual version is executed now. -- Reuti > On Mon, May 12, 2014 at 11:01 AM, Karun K <[email protected]> wrote: > It's writing output to job submission directory (default behavior) which > works for us. > Regarding using -V for shell scripts, I need to consult my engineers about > it. Other than exporting current environment variables, is there any other > difference ? > > Thanks! > > > > On Sat, May 10, 2014 at 5:05 AM, Reuti <[email protected]> wrote: > Am 10.05.2014 um 01:18 schrieb Karun K: > > > Here is the job script, > > > > qsub -N myusername$N -l h_vmem=5.0G ../job1.sh my1$N > > > > from .sge_request > > # default SGE options > > -j y -cwd -b y > > # -j y -cwd > > # -cwd > > I don't see a -o Option here - how does ../job1.sh decide where the output > should go to; why are you submitting a script as binary? > > -- Reuti > > > > On Fri, May 9, 2014 at 3:35 PM, Reuti <[email protected]> wrote: > > Am 10.05.2014 um 00:18 schrieb Karun K: > > > > > Reuti, > > > > > > Some of them are array jobs, looks like we have been using $task_id for > > > array jobs. > > > The issue we are seeing are for non-array jobs. > > > > > > Here is a snippet from one of the corrupted job output log file, the > > > numbers in between the txt lines are actually output from a different job. > > > > How exactly and where are you specifying this output path: command line or > > inside the job script? > > > > What does the job script look like? > > > > -- Reuti > > > > > > > Processing Haplotype 7204 of 15166 ... > > > Outputting Individual 450996750985279->450996750985279 ... > > > Processing Haplotype 7205 of 15166 ... > > > Processing Haplotype 7206 of 15166 ... > > > Outputting Individual 632999004155376->632999004155376 ... > > > Processing Haplotype 7207 of 15955 0.532 0.994 0.538 0.998 > > > 0.999 0.988 0.561 0.560 0.995 0.607 0.978 0.949 0.577 > > > 0.998 0.926 0.998 > > > 0.927 0.938 0.532 0.997 0.999 0.994 0.965 0.533 > > > 0.994 0.938 0.738 0.945 0.995 0.534 0.529 0.998 0.999 > > > 0.968 0.534 0.994 > > > 0.531 0.997 0.539 0.529 0.945 0.529 0.999 0.996 > > > 0.926 0.535 0.546 0.946 0.999 0.999 0.945 0.996 0.998 > > > 0.979 0.978 0.532 > > > 0.925 0.987 0.994 0.945 0.984 0.998 0.969 0.999 > > > 0.983 0.543 0.718 0.918 0.555 0.501 0.998 0.541 0.998 > > > 0.999 0.997 0.553 > > > 0.946 0.987 0.995 0.999 0.979 0.999 0.999 0.881 > > > 0.543 0.541 0.538 0.900 0.979 0.999 0.998 0.999 0.999 > > > 0.999 0.999 0.999 > > > 0.990 0.989 0.986 0.931 0.997 0.997 0.999 0.999 > > > 0.530 0.997 0.925 0.994 0.986 0.795 0.999 0.999 0.978 > > > 0.993 0.721 0.978 > > > 0.538 0.998 0.999 0.984 0.999 0.997 0.997 0.979 > > > 0.553 0.795 0.999 0.979 0.998 0.995 0.999 0.988 0.946 > > > 0.543 0.558 0.995 > > > 0.983 0.992 0.926 0.567 0.979 0.923 0.919 0.949 > > > 0.652 0.940 0.995 0.999 0.999 0.647 0.996 0.678 0.933 > > > 0.870 0.997 0.690 > > > 0.995 0.992 0.981 0.932 0.995 0.993 0.999 0.998 0.861 > > > 0.861 0.979 0.995 0.999 0.999 0.584 0.861 0.978 0.870 > > > 0.872 0.932 > > > 0.999 0.790 0.995 0.999 0.932 0.999 0.863 0. of 15166 > > > ... > > > Processing Haplotype 8564 of 15166 ... > > > Outputting Individual 770954964699120->770954964699120 ... > > > > > > > > > > > > > > > On Fri, May 9, 2014 at 2:46 PM, Reuti <[email protected]> wrote: > > > Am 09.05.2014 um 23:29 schrieb Karun K: > > > > > > > Thanks Reuti. > > > > > > > > But how come other log files are fine and we only see this behavior on > > > > few output logs randomly? > > > > > > And all are array jobs? > > > > > > In case just one runs after the other, they will override the old logfile. > > > > > > -- Reuti > > > > > > > > > > Shouldn't it be consistent with all other output logs too ? > > > > > > > > > > > > On Fri, May 9, 2014 at 2:17 PM, Reuti <[email protected]> > > > > wrote: > > > > Am 09.05.2014 um 23:04 schrieb Karun K: > > > > > > > > > Yes, these are array jobs with output path set to -cwd during job > > > > > submission. > > > > > > > > Well, then you also have to use the $TASK_ID in the -o option to > > > > distinguish between different tasks. > > > > > > > > -- Reuti > > > > > > > > > > > > > On Fri, May 9, 2014 at 12:20 PM, Reuti <[email protected]> > > > > > wrote: > > > > > Am 09.05.2014 um 20:18 schrieb Karun K: > > > > > > > > > > > Reuti, > > > > > > > > > > > > These are the job output logs not /var/spool/sge/qmaster/message. > > > > > > These are in user job directories with jobname.o$jobid > > > > > > > > > > How exactly and where are you specifying this output path: command > > > > > line or inside the job script? > > > > > > > > > > Are these array jobs? > > > > > > > > > > -- Reuti > > > > > > > > > > > > > > > > On Fri, May 9, 2014 at 11:02 AM, Reuti <[email protected]> > > > > > > wrote: > > > > > > Hi, > > > > > > > > > > > > Am 09.05.2014 um 19:43 schrieb Karun K: > > > > > > > > > > > > > We are using OGS/GE 2011.11p1 > > > > > > > > > > > > > > We encountered log file corruptions, in ge log files there is > > > > > > > output of some other jobs written to it (in very few log files), > > > > > > > filesystem is working fine, no corruptions with data files just > > > > > > > in some ge log files randomly. > > > > > > > > > > > > What file do you refer to in detail - the > > > > > > /var/spool/sge/qmaster/messages and alike? Although it's best to > > > > > > have them local on each node, even having them in an NFS locations > > > > > > still means that only one process - the sge_exed/sge_qmaster will > > > > > > write to it. > > > > > > > > > > > > -- Reuti > > > > > > > > > > > > > > > > > > > > Has anyone else seen this issue? > > > > > > > > > > > > > > Thanks! > > > > > > > _______________________________________________ > > > > > > > users mailing list > > > > > > > [email protected] > > > > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
