Am 31.01.2012 um 18:10 schrieb Lane Schwartz: > On Fri, Jan 27, 2012 at 3:42 PM, Rayson Ho <ray...@scalablelogic.com> wrote: >> On Fri, Jan 27, 2012 at 2:49 PM, Lane Schwartz <dowob...@gmail.com> wrote: >>> I have encountered a problem where sometimes (but not always) my jobs >>> ignore the -cwd or -wd flags and run in my home directory instead of >>> the specified working directory. I can run the same job multiple times >>> launching from the same directory, and sometimes the job correctly >>> runs from the current directory, and sometimes it runs from my home >>> directory. >> >> I ran over 100 test jobs and all of them ran in directory specified in >> -cwd or -wd. How easy is it to reproduce the issue?? Is the home >> directory on NFS or some kind of network or cluster storage?? > > The home directory is mounted via NFS. The correct directory (where > the jobs are launched from) is also on NFS.
Do you use automounter or is it a hard mount? (/scratch4 is mounted on all the exechosts if I get you right) -- Reuti >> If Grid Engine cannot change the directory to the one specified by >> -cwd/-wd, then it will simply turn the job into the "Eqw" state. > > When jobs run in the wrong directory, their job state remains in "r" state. > > >> 2) So assume you have jobs do not run in the "correct" directory, run: >> >> - qstat -j <job id> >> >> the "sge_o_workdir" should show you what SGE thinks which directory >> the job is supposed to run in. > > I ran a bunch of jobs. The job is a dummy script that simply runs > `pwd` and echoes the value of $PWD, then checks the value of $PWD > against the hardcoded directory where the job should be run. If $PWD > fails to match the expected directory, the job echoes "Failure" then > sleeps. > > For all of the jobs that printed "Failure", the log file shows that > running 'pwd' returned my home directory instead of the correct > directory. Likewise, $PWD reported my home directory. > > For those jobs that printed "Failure", when I run qstat -j <job id> > the value of sge_o_workdir lists the directory where the job was > launched (that is, the directory where the job should have been run). > >> - go into the $SGE_ROOT/$SGE_CELL/spool/<execution >> host>/active_jobs/<job id.1> directory > > I ssh'd to the execution host for one of the jobs that reported > "Failure" and went to the directory you specified above. > > The "environment" file lists the following: > PWD=/scratch4/lane/2011-12-15_europarl > > That is where the job should be running, but when the job ran it > printed out /home/lane as the value of $PWD. > > The "config" file lists the following: > cwd=/scratch4/lane/2011-12-15_europarl. > > Again, this is the directory where the job should have run. > > Any ideas? > > Thanks, > Lane > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users