> Am 15.10.2015 um 14:33 schrieb Hatem Elshazly <hmelsha...@gmail.com>:
> 
> It is in state qw.
> 
> home directory is mounted.
> 
> I used qalter command it produces this output:
> instance "node" dropped because it is temporarily not available
> I checked the firewalls and all of them are dropped and daemons are listing 
> on the ports on the master and executions nodes.
> 
> I noticed that there is no directory in /opt/sge/default/spool/ shouldn't a 
> directory with the name of the execution node be created in this path??

Yes.

Is the $SGE_ROOT shared too?

The location of the spool directory for the exechosts can be checked in `qconf 
-sconf` ("execd_spool_dir").

-- Reuti


> 
> -- Shazly
> 
> On Thu, Oct 15, 2015 at 11:45 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> Hi,
> 
> > Am 15.10.2015 um 01:16 schrieb Hatem Elshazly <hmelsha...@gmail.com>:
> >
> > Hi there,
> >
> > I'm having a problem getting an execution host to work. The master node 
> > seems it can't sense the execution node, when I submit a job it stalls in 
> > the queue.
> 
> Is it in state "qw" or "t"?
> 
> $ qalter -w v <job_id>
> 
> will check whether the job could be started in an empty cluster in the 
> current configuration.
> 
> The home directory is shared in the cluster, so that the user's home 
> directory can be accessed?
> 
> 
> > Both daemons are running on master and executing node, I added the 
> > execution node to the queue and made sure the ports are open and can ssh 
> > without password from/to both nodes
> 
> It's not necessary to have passphraseless SSH in the cluster. Even parallel 
> jobs can run without this setting. In fact, I allow SSH access to nodes only 
> for admin staff.
> 
> 
> > , sge_root and sge_cell are open to read and write. The strange thing is 
> > when I change the ncpu of the execution node it gets reflected when I use 
> > qhost command on master node.
> 
> You mean "num_proc"? This should be seen as a read only value and it's 
> normally not necessary to adjust it. The slot count in the queues is 
> independent from this setting.
> 
> -- Reuti
> 
> 
> > This is the output of qhost command: (Arch and mem is NA although I set 
> > them in the node's values)
> >
> > HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  
> > SWAPUS
> > -------------------------------------------------------------------------------
> > global                  -               -     -       -       -       -     
> >   -
> > node001               -               1     -       -       -       -       
> > -
> > master                 linux-x64       1  0.01    3.7G  157.8M     0.0     
> > 0.0
> >
> >
> > Any suggestions on what might be wrong is really appreciated.
> >
> > Thanks.
> > _______________________________________________
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users
> 
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to