It is in state qw. home directory is mounted.
I used qalter command it produces this output: instance "node" dropped because it is temporarily not available I checked the firewalls and all of them are dropped and daemons are listing on the ports on the master and executions nodes. I noticed that there is no directory in /opt/sge/default/spool/ shouldn't a directory with the name of the execution node be created in this path?? -- Shazly On Thu, Oct 15, 2015 at 11:45 AM, Reuti <re...@staff.uni-marburg.de> wrote: > Hi, > > > Am 15.10.2015 um 01:16 schrieb Hatem Elshazly <hmelsha...@gmail.com>: > > > > Hi there, > > > > I'm having a problem getting an execution host to work. The master node > seems it can't sense the execution node, when I submit a job it stalls in > the queue. > > Is it in state "qw" or "t"? > > $ qalter -w v <job_id> > > will check whether the job could be started in an empty cluster in the > current configuration. > > The home directory is shared in the cluster, so that the user's home > directory can be accessed? > > > > Both daemons are running on master and executing node, I added the > execution node to the queue and made sure the ports are open and can ssh > without password from/to both nodes > > It's not necessary to have passphraseless SSH in the cluster. Even > parallel jobs can run without this setting. In fact, I allow SSH access to > nodes only for admin staff. > > > > , sge_root and sge_cell are open to read and write. The strange thing is > when I change the ncpu of the execution node it gets reflected when I use > qhost command on master node. > > You mean "num_proc"? This should be seen as a read only value and it's > normally not necessary to adjust it. The slot count in the queues is > independent from this setting. > > -- Reuti > > > > This is the output of qhost command: (Arch and mem is NA although I set > them in the node's values) > > > > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO > SWAPUS > > > ------------------------------------------------------------------------------- > > global - - - - - - > - > > node001 - 1 - - - - > - > > master linux-x64 1 0.01 3.7G 157.8M 0.0 > 0.0 > > > > > > Any suggestions on what might be wrong is really appreciated. > > > > Thanks. > > _______________________________________________ > > users mailing list > > users@gridengine.org > > https://gridengine.org/mailman/listinfo/users > >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users