Re: [gridengine users] execution node installation error

Hatem Elshazly Thu, 15 Oct 2015 05:53:45 -0700

Yes it is.

Why do you think that the exec dirs weren't created? all the permissions
and ownerships are granted.
I'm using this script: inst_sge_sc to make the installation on ec2
instances not using apt-get gridengine-exec because I want to make the
installation in noninteractive mode but it seems that there is something
I'm dropping.


On Thu, Oct 15, 2015 at 2:38 PM, Reuti <[email protected]> wrote:

>
> > Am 15.10.2015 um 14:33 schrieb Hatem Elshazly <[email protected]>:
> >
> > It is in state qw.
> >
> > home directory is mounted.
> >
> > I used qalter command it produces this output:
> > instance "node" dropped because it is temporarily not available
> > I checked the firewalls and all of them are dropped and daemons are
> listing on the ports on the master and executions nodes.
> >
> > I noticed that there is no directory in /opt/sge/default/spool/
> shouldn't a directory with the name of the execution node be created in
> this path??
>
> Yes.
>
> Is the $SGE_ROOT shared too?
>
> The location of the spool directory for the exechosts can be checked in
> `qconf -sconf` ("execd_spool_dir").
>
> -- Reuti
>
>
> >
> > -- Shazly
> >
> > On Thu, Oct 15, 2015 at 11:45 AM, Reuti <[email protected]>
> wrote:
> > Hi,
> >
> > > Am 15.10.2015 um 01:16 schrieb Hatem Elshazly <[email protected]>:
> > >
> > > Hi there,
> > >
> > > I'm having a problem getting an execution host to work. The master
> node seems it can't sense the execution node, when I submit a job it stalls
> in the queue.
> >
> > Is it in state "qw" or "t"?
> >
> > $ qalter -w v <job_id>
> >
> > will check whether the job could be started in an empty cluster in the
> current configuration.
> >
> > The home directory is shared in the cluster, so that the user's home
> directory can be accessed?
> >
> >
> > > Both daemons are running on master and executing node, I added the
> execution node to the queue and made sure the ports are open and can ssh
> without password from/to both nodes
> >
> > It's not necessary to have passphraseless SSH in the cluster. Even
> parallel jobs can run without this setting. In fact, I allow SSH access to
> nodes only for admin staff.
> >
> >
> > > , sge_root and sge_cell are open to read and write. The strange thing
> is when I change the ncpu of the execution node it gets reflected when I
> use qhost command on master node.
> >
> > You mean "num_proc"? This should be seen as a read only value and it's
> normally not necessary to adjust it. The slot count in the queues is
> independent from this setting.
> >
> > -- Reuti
> >
> >
> > > This is the output of qhost command: (Arch and mem is NA although I
> set them in the node's values)
> > >
> > > HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE
> SWAPTO  SWAPUS
> > >
> -------------------------------------------------------------------------------
> > > global                  -               -     -       -       -
>  -       -
> > > node001               -               1     -       -       -       -
>      -
> > > master                 linux-x64       1  0.01    3.7G  157.8M
>  0.0     0.0
> > >
> > >
> > > Any suggestions on what might be wrong is really appreciated.
> > >
> > > Thanks.
> > > _______________________________________________
> > > users mailing list
> > > [email protected]
> > > https://gridengine.org/mailman/listinfo/users
> >
> >
>
>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] execution node installation error

Reply via email to