From: Reuti <re...@staff.uni-marburg.de> Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3 Date: Wed, 7 Nov 2012 20:26:54 +0100
> Am 07.11.2012 um 18:49 schrieb Petter Gustad: > >> From: Reuti <re...@staff.uni-marburg.de> >> Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3 >> Date: Wed, 7 Nov 2012 16:37:22 +0100 >> >>> Am 07.11.2012 um 15:46 schrieb Petter Gustad: >>> >>>>> From: Reuti <re...@staff.uni-marburg.de> >>>>> Subject: Re: [gridengine users] Configure gridengine on CentOS 6.3 >>>>> Date: Tue, 30 Oct 2012 11:27:49 +0100 >>>>> >>>>>> Just use the version you have already in the shared /usr/sge or your >>>>>> particular mountpoint. >>>>> >>>>> I should probably try this first, at least to verify that it's >>>>> working. But later I would like to migrate to the CentOS on all my >>>>> exechosts and leave the installation to somebody else. >>>> >>>> I did this and it worked out fine on the first machine I migrated. >>>> However, on the next set of machines I run into the problem where the >>>> submitted job will cause the queue to go into the error state. >>>> >>>> I observe that: >>>> >>>> 1) It will not be submitted >>>> 2) The queue will be marked with the 'E' state >>>> 3) I get an e-mail saying >>>> Shepherd pe_hostfile: >>>> node 1 queue@node UNDEFINED >>>> 4) The node will log the following in the spool/node/messages file: >>>> 11/07/2012 15:33:07| main|node|E|shepherd of job 48548.1 exited with >>>> exit status = 11 >>>> 5) qstat -j jobnumber returns >>>> >>>> error reason 1: 11/07/2012 15:33:06 [555:29681]: unable to >>>> find job file "/work/gridengine/spool/node/job_scr >> >> Is this output always truncated, > > Yes. OK. Good. > >> or could this be the source of the problem? > > No. > > >>> This looks like an unusual path for the spool directory. The name of the >>> node should be included. >> >> I've subsituted the string "node" for the actual node name. It appears >> to be the same for all the nodes, hence I just used "node". > > Good. > > >>> $ qconf -sconf >>> >>> (at the top something like: execd_spool_dir /var/spool/sge, >>> the directory for the particular node will be created automatically when >>> the execd starts up) >> >> This will show the spool directory on the qmaster, which is different > > No, it's the global setting for the execd spool directory. This can be > overridden, in case you have different paths on all the node. > > If all nodes are the same, you can even delete all the local definitions > which were listed in `qconf -sconfl`. > > NB: The location of the qmaster spool directory is in > "/usr/sge/default/common/bootstrap" (adjust the path for your installation): > like for me "qmaster_spool_dir /var/spool/sge/qmaster" > > >> from the above. But for all the nodes this is /work/gridengine/spool. > > Yes, but if you check the directory /work/gridengine/spool there should be a > level for the node /work/gridengine/spool/node001 or whatever. This > directory is readable for the sgeadmin user account? That was the problem. Thanks! This directory was readable by the gridengine account only. By making this world readable I managed to submit a job. These permissions were different on the working and non-working nodes as well. > >>> $ qconf -sconfl >>> >>> (get all exechost definitions [if any are present at all]), then for the >>> particular node: >>> >>> $ qconf -sconf node42 >>> >>> and check the path to the execd_spool_dir. >> >> They are all identical. If I do something like: >> >> qconf -sconf good-node > /tmp/good-node >> qconf -sconf bad-node > /tmp/bad-node >> >> and diff the two, the only diff will be the hostname part. >> >> All the nodes are using spool on a local filesystem located at >> /work/gridengine/spool >> >> >> The only difference I see on the bad nodes is that there is a "." at >> the end of the permissions in the spool directory so I think this >> might be related to SELinux. I'll have to investegate this further. > > Yep. It means access limits by other facility, like it is a "+" for ACL. > > I suggest to switch off SELinux. > -- Reuti Best regards //Petter _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users