Yup, It finally worked. Thanks very much for your help!
On Thu, Oct 15, 2015 at 3:48 PM, Reuti <re...@staff.uni-marburg.de> wrote: > They must run as root to allow switching to any user to run a job. > > reuti@node:~> ps -eo user,ruser,command | grep sge > sgeadmin root /usr/sge/bin/lx24-amd64/sge_execd > > > > Am 15.10.2015 um 15:43 schrieb Hatem Elshazly <hmelsha...@gmail.com>: > > > > No. the daemon process is owned by the user: > > ehpcuser 5842 0.0 0.0 61376 1748 ? Sl 13:22 0:00 > /opt/sge6/bin/linux-x64/sge_execd > > > > > > On Thu, Oct 15, 2015 at 3:32 PM, Reuti <re...@staff.uni-marburg.de> > wrote: > > > > > Am 15.10.2015 um 15:21 schrieb Hatem Elshazly <hmelsha...@gmail.com>: > > > > > > Exactly! > > > I found this message in a execd log file: > > > 10/15/2015 12:28:02| main|ip-172-31-49-241|E|getting configuration: > denied: request for user "ehpcuser" does not match credentials for > connection <ip-172-31-49-241.ec2. > > > > > > Does this mean that this user should get SUDO privileges? > > > > No. Do you start the daemons as root user? > > > > -- Reuti > > > > > > > On Thu, Oct 15, 2015 at 2:56 PM, Reuti <re...@staff.uni-marburg.de> > wrote: > > > The spool directory is created when the execd starts. I.e. it can also > be removed in case of problems in this spool directory and with the next > restart it's recreated. > > > > > > Is there any file in /tmp on the exechost having execd in its name? If > execd runs into problems during startup, it's the only output you may get. > > > > > > -- Reuti > > > > > > > > > > Am 15.10.2015 um 14:52 schrieb Hatem Elshazly <hmelsha...@gmail.com > >: > > > > > > > > Yes it is. > > > > > > > > Why do you think that the exec dirs weren't created? all the > permissions and ownerships are granted. > > > > I'm using this script: inst_sge_sc to make the installation on ec2 > instances not using apt-get gridengine-exec because I want to make the > installation in noninteractive mode but it seems that there is something > I'm dropping. > > > > > > > > On Thu, Oct 15, 2015 at 2:38 PM, Reuti <re...@staff.uni-marburg.de> > wrote: > > > > > > > > > Am 15.10.2015 um 14:33 schrieb Hatem Elshazly < > hmelsha...@gmail.com>: > > > > > > > > > > It is in state qw. > > > > > > > > > > home directory is mounted. > > > > > > > > > > I used qalter command it produces this output: > > > > > instance "node" dropped because it is temporarily not available > > > > > I checked the firewalls and all of them are dropped and daemons > are listing on the ports on the master and executions nodes. > > > > > > > > > > I noticed that there is no directory in /opt/sge/default/spool/ > shouldn't a directory with the name of the execution node be created in > this path?? > > > > > > > > Yes. > > > > > > > > Is the $SGE_ROOT shared too? > > > > > > > > The location of the spool directory for the exechosts can be checked > in `qconf -sconf` ("execd_spool_dir"). > > > > > > > > -- Reuti > > > > > > > > > > > > > > > > > > -- Shazly > > > > > > > > > > On Thu, Oct 15, 2015 at 11:45 AM, Reuti < > re...@staff.uni-marburg.de> wrote: > > > > > Hi, > > > > > > > > > > > Am 15.10.2015 um 01:16 schrieb Hatem Elshazly < > hmelsha...@gmail.com>: > > > > > > > > > > > > Hi there, > > > > > > > > > > > > I'm having a problem getting an execution host to work. The > master node seems it can't sense the execution node, when I submit a job it > stalls in the queue. > > > > > > > > > > Is it in state "qw" or "t"? > > > > > > > > > > $ qalter -w v <job_id> > > > > > > > > > > will check whether the job could be started in an empty cluster in > the current configuration. > > > > > > > > > > The home directory is shared in the cluster, so that the user's > home directory can be accessed? > > > > > > > > > > > > > > > > Both daemons are running on master and executing node, I added > the execution node to the queue and made sure the ports are open and can > ssh without password from/to both nodes > > > > > > > > > > It's not necessary to have passphraseless SSH in the cluster. Even > parallel jobs can run without this setting. In fact, I allow SSH access to > nodes only for admin staff. > > > > > > > > > > > > > > > > , sge_root and sge_cell are open to read and write. The strange > thing is when I change the ncpu of the execution node it gets reflected > when I use qhost command on master node. > > > > > > > > > > You mean "num_proc"? This should be seen as a read only value and > it's normally not necessary to adjust it. The slot count in the queues is > independent from this setting. > > > > > > > > > > -- Reuti > > > > > > > > > > > > > > > > This is the output of qhost command: (Arch and mem is NA > although I set them in the node's values) > > > > > > > > > > > > HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE > SWAPTO SWAPUS > > > > > > > ------------------------------------------------------------------------------- > > > > > > global - - - - - > - - > > > > > > node001 - 1 - - - > - - > > > > > > master linux-x64 1 0.01 3.7G 157.8M > 0.0 0.0 > > > > > > > > > > > > > > > > > > Any suggestions on what might be wrong is really appreciated. > > > > > > > > > > > > Thanks. > > > > > > _______________________________________________ > > > > > > users mailing list > > > > > > users@gridengine.org > > > > > > https://gridengine.org/mailman/listinfo/users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users