The spool directory is created when the execd starts. I.e. it can also be removed in case of problems in this spool directory and with the next restart it's recreated.
Is there any file in /tmp on the exechost having execd in its name? If execd runs into problems during startup, it's the only output you may get. -- Reuti > Am 15.10.2015 um 14:52 schrieb Hatem Elshazly <hmelsha...@gmail.com>: > > Yes it is. > > Why do you think that the exec dirs weren't created? all the permissions and > ownerships are granted. > I'm using this script: inst_sge_sc to make the installation on ec2 instances > not using apt-get gridengine-exec because I want to make the installation in > noninteractive mode but it seems that there is something I'm dropping. > > On Thu, Oct 15, 2015 at 2:38 PM, Reuti <re...@staff.uni-marburg.de> wrote: > > > Am 15.10.2015 um 14:33 schrieb Hatem Elshazly <hmelsha...@gmail.com>: > > > > It is in state qw. > > > > home directory is mounted. > > > > I used qalter command it produces this output: > > instance "node" dropped because it is temporarily not available > > I checked the firewalls and all of them are dropped and daemons are listing > > on the ports on the master and executions nodes. > > > > I noticed that there is no directory in /opt/sge/default/spool/ shouldn't a > > directory with the name of the execution node be created in this path?? > > Yes. > > Is the $SGE_ROOT shared too? > > The location of the spool directory for the exechosts can be checked in > `qconf -sconf` ("execd_spool_dir"). > > -- Reuti > > > > > > -- Shazly > > > > On Thu, Oct 15, 2015 at 11:45 AM, Reuti <re...@staff.uni-marburg.de> wrote: > > Hi, > > > > > Am 15.10.2015 um 01:16 schrieb Hatem Elshazly <hmelsha...@gmail.com>: > > > > > > Hi there, > > > > > > I'm having a problem getting an execution host to work. The master node > > > seems it can't sense the execution node, when I submit a job it stalls in > > > the queue. > > > > Is it in state "qw" or "t"? > > > > $ qalter -w v <job_id> > > > > will check whether the job could be started in an empty cluster in the > > current configuration. > > > > The home directory is shared in the cluster, so that the user's home > > directory can be accessed? > > > > > > > Both daemons are running on master and executing node, I added the > > > execution node to the queue and made sure the ports are open and can ssh > > > without password from/to both nodes > > > > It's not necessary to have passphraseless SSH in the cluster. Even parallel > > jobs can run without this setting. In fact, I allow SSH access to nodes > > only for admin staff. > > > > > > > , sge_root and sge_cell are open to read and write. The strange thing is > > > when I change the ncpu of the execution node it gets reflected when I use > > > qhost command on master node. > > > > You mean "num_proc"? This should be seen as a read only value and it's > > normally not necessary to adjust it. The slot count in the queues is > > independent from this setting. > > > > -- Reuti > > > > > > > This is the output of qhost command: (Arch and mem is NA although I set > > > them in the node's values) > > > > > > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO > > > SWAPUS > > > ------------------------------------------------------------------------------- > > > global - - - - - - > > > - > > > node001 - 1 - - - - > > > - > > > master linux-x64 1 0.01 3.7G 157.8M 0.0 > > > 0.0 > > > > > > > > > Any suggestions on what might be wrong is really appreciated. > > > > > > Thanks. > > > _______________________________________________ > > > users mailing list > > > users@gridengine.org > > > https://gridengine.org/mailman/listinfo/users > > > > > > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users