Am 31.01.2012 um 18:10 schrieb Lane Schwartz:

> On Fri, Jan 27, 2012 at 3:42 PM, Rayson Ho <ray...@scalablelogic.com> wrote:
>> On Fri, Jan 27, 2012 at 2:49 PM, Lane Schwartz <dowob...@gmail.com> wrote:
>>> I have encountered a problem where sometimes (but not always) my jobs
>>> ignore the -cwd or -wd flags and run in my home directory instead of
>>> the specified working directory. I can run the same job multiple times
>>> launching from the same directory, and sometimes the job correctly
>>> runs from the current directory, and sometimes it runs from my home
>>> directory.
>> 
>> I ran over 100 test jobs and all of them ran in directory specified in
>> -cwd or -wd. How easy is it to reproduce the issue?? Is the home
>> directory on NFS or some kind of network or cluster storage??
> 
> The home directory is mounted via NFS. The correct directory (where
> the jobs are launched from) is also on NFS.

Do you use automounter or is it a hard mount?

(/scratch4 is mounted on all the exechosts if I get you right)

-- Reuti


>> If Grid Engine cannot change the directory to the one specified by
>> -cwd/-wd, then it will simply turn the job into the "Eqw" state.
> 
> When jobs run in the wrong directory, their job state remains in "r" state.
> 
> 
>> 2) So assume you have jobs do not run in the "correct" directory, run:
>> 
>> - qstat -j <job id>
>> 
>> the "sge_o_workdir" should show you what SGE thinks which directory
>> the job is supposed to run in.
> 
> I ran a bunch of jobs. The job is a dummy script that simply runs
> `pwd` and echoes the value of $PWD, then checks the value of $PWD
> against the hardcoded directory where the job should be run. If $PWD
> fails to match the expected directory, the job echoes "Failure" then
> sleeps.
> 
> For all of the jobs that printed "Failure", the log file shows that
> running 'pwd' returned my home directory instead of the correct
> directory. Likewise, $PWD reported my home directory.
> 
> For those jobs that printed "Failure", when I run qstat -j <job id>
> the value of sge_o_workdir lists the directory where the job was
> launched (that is, the directory where the job should have been run).
> 
>> - go into the $SGE_ROOT/$SGE_CELL/spool/<execution
>> host>/active_jobs/<job id.1> directory
> 
> I ssh'd to the execution host for one of the jobs that reported
> "Failure" and went to the directory you specified above.
> 
> The "environment" file lists the following:
> PWD=/scratch4/lane/2011-12-15_europarl
> 
> That is where the job should be running, but when the job ran it
> printed out /home/lane as the value of $PWD.
> 
> The "config" file lists the following:
> cwd=/scratch4/lane/2011-12-15_europarl.
> 
> Again, this is the directory where the job should have run.
> 
> Any ideas?
> 
> Thanks,
> Lane
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to