Re: [gridengine users] execution node installation error

Hatem Elshazly Thu, 15 Oct 2015 07:28:40 -0700

Yup, It finally worked.

Thanks very much for your help!


On Thu, Oct 15, 2015 at 3:48 PM, Reuti <re...@staff.uni-marburg.de> wrote:

> They must run as root to allow switching to any user to run a job.
>
> reuti@node:~> ps -eo user,ruser,command | grep sge
> sgeadmin root     /usr/sge/bin/lx24-amd64/sge_execd
>
>
> > Am 15.10.2015 um 15:43 schrieb Hatem Elshazly <hmelsha...@gmail.com>:
> >
> > No. the daemon process is owned by the user:
> > ehpcuser  5842  0.0  0.0  61376  1748 ?        Sl   13:22   0:00
> /opt/sge6/bin/linux-x64/sge_execd
> >
> >
> > On Thu, Oct 15, 2015 at 3:32 PM, Reuti <re...@staff.uni-marburg.de>
> wrote:
> >
> > > Am 15.10.2015 um 15:21 schrieb Hatem Elshazly <hmelsha...@gmail.com>:
> > >
> > > Exactly!
> > > I found this message in a execd log file:
> > > 10/15/2015 12:28:02|  main|ip-172-31-49-241|E|getting configuration:
> denied: request for user "ehpcuser" does not match credentials for
> connection <ip-172-31-49-241.ec2.
> > >
> > > Does this mean that this user should get SUDO privileges?
> >
> > No. Do you start the daemons as root user?
> >
> > -- Reuti
> >
> >
> > > On Thu, Oct 15, 2015 at 2:56 PM, Reuti <re...@staff.uni-marburg.de>
> wrote:
> > > The spool directory is created when the execd starts. I.e. it can also
> be removed in case of problems in this spool directory and with the next
> restart it's recreated.
> > >
> > > Is there any file in /tmp on the exechost having execd in its name? If
> execd runs into problems during startup, it's the only output you may get.
> > >
> > > -- Reuti
> > >
> > >
> > > > Am 15.10.2015 um 14:52 schrieb Hatem Elshazly <hmelsha...@gmail.com
> >:
> > > >
> > > > Yes it is.
> > > >
> > > > Why do you think that the exec dirs weren't created? all the
> permissions and ownerships are granted.
> > > > I'm using this script: inst_sge_sc to make the installation on ec2
> instances not using apt-get gridengine-exec because I want to make the
> installation in noninteractive mode but it seems that there is something
> I'm dropping.
> > > >
> > > > On Thu, Oct 15, 2015 at 2:38 PM, Reuti <re...@staff.uni-marburg.de>
> wrote:
> > > >
> > > > > Am 15.10.2015 um 14:33 schrieb Hatem Elshazly <
> hmelsha...@gmail.com>:
> > > > >
> > > > > It is in state qw.
> > > > >
> > > > > home directory is mounted.
> > > > >
> > > > > I used qalter command it produces this output:
> > > > > instance "node" dropped because it is temporarily not available
> > > > > I checked the firewalls and all of them are dropped and daemons
> are listing on the ports on the master and executions nodes.
> > > > >
> > > > > I noticed that there is no directory in /opt/sge/default/spool/
> shouldn't a directory with the name of the execution node be created in
> this path??
> > > >
> > > > Yes.
> > > >
> > > > Is the $SGE_ROOT shared too?
> > > >
> > > > The location of the spool directory for the exechosts can be checked
> in `qconf -sconf` ("execd_spool_dir").
> > > >
> > > > -- Reuti
> > > >
> > > >
> > > > >
> > > > > -- Shazly
> > > > >
> > > > > On Thu, Oct 15, 2015 at 11:45 AM, Reuti <
> re...@staff.uni-marburg.de> wrote:
> > > > > Hi,
> > > > >
> > > > > > Am 15.10.2015 um 01:16 schrieb Hatem Elshazly <
> hmelsha...@gmail.com>:
> > > > > >
> > > > > > Hi there,
> > > > > >
> > > > > > I'm having a problem getting an execution host to work. The
> master node seems it can't sense the execution node, when I submit a job it
> stalls in the queue.
> > > > >
> > > > > Is it in state "qw" or "t"?
> > > > >
> > > > > $ qalter -w v <job_id>
> > > > >
> > > > > will check whether the job could be started in an empty cluster in
> the current configuration.
> > > > >
> > > > > The home directory is shared in the cluster, so that the user's
> home directory can be accessed?
> > > > >
> > > > >
> > > > > > Both daemons are running on master and executing node, I added
> the execution node to the queue and made sure the ports are open and can
> ssh without password from/to both nodes
> > > > >
> > > > > It's not necessary to have passphraseless SSH in the cluster. Even
> parallel jobs can run without this setting. In fact, I allow SSH access to
> nodes only for admin staff.
> > > > >
> > > > >
> > > > > > , sge_root and sge_cell are open to read and write. The strange
> thing is when I change the ncpu of the execution node it gets reflected
> when I use qhost command on master node.
> > > > >
> > > > > You mean "num_proc"? This should be seen as a read only value and
> it's normally not necessary to adjust it. The slot count in the queues is
> independent from this setting.
> > > > >
> > > > > -- Reuti
> > > > >
> > > > >
> > > > > > This is the output of qhost command: (Arch and mem is NA
> although I set them in the node's values)
> > > > > >
> > > > > > HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE
> SWAPTO  SWAPUS
> > > > > >
> -------------------------------------------------------------------------------
> > > > > > global                  -               -     -       -       -
>      -       -
> > > > > > node001               -               1     -       -       -
>    -       -
> > > > > > master                 linux-x64       1  0.01    3.7G  157.8M
>    0.0     0.0
> > > > > >
> > > > > >
> > > > > > Any suggestions on what might be wrong is really appreciated.
> > > > > >
> > > > > > Thanks.
> > > > > > _______________________________________________
> > > > > > users mailing list
> > > > > > users@gridengine.org
> > > > > > https://gridengine.org/mailman/listinfo/users
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] execution node installation error

Reply via email to