I killed all sge_* processes in exec nodes and tried to restart execd but
got this message

root@compute010:/home/ubuntu# /usr/lib/gridengine/sge_execd
error: can't find connection
error: can't get configuration from qmaster -- backgrounding


On Mon, May 30, 2016 at 10:36 AM, Radhouane Aniba <[email protected]> wrote:

> Hi Bill
>
> Yes I am sure
>
> This is what I have when I login to one of the nodes and do
>
> ubuntu@compute010:~$ ps -ef | grep sge_
> sgeadmin  1254     1  0 May28 ?        00:00:39
> /usr/lib/gridengine/sge_qmaster
> sgeadmin  1446     1  0 May28 ?        00:00:22
> /usr/lib/gridengine/sge_execd
> ubuntu    2552  2527  0 17:36 pts/0    00:00:00 grep --color=auto sge_
>
>
> On Mon, May 30, 2016 at 10:33 AM, Bill Bryce <[email protected]> wrote:
>
>> Hi Rad,
>>
>> Are you sure that the execution daemons are running on your compute
>> nodes?  Can you login to one of the nodes say ‘compute001’ and do a ps
>> looking for the execd?  When an execd is functioning normally it provides
>> the load and memory, etc… none of your nodes are showing that.
>>
>> Regards,
>>
>> Bill.
>>
>> On May 30, 2016, at 1:20 PM, Radhouane Aniba <[email protected]> wrote:
>>
>> Hello all,
>>
>> I am trying to submit a simple "hello world" to test a gridengine (I used
>> it before with no problems)
>>
>> The problem is that my job is waiting in the queue forever
>>
>> The qhost command shows a wired state of the compute nodes
>>
>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  
>> SWAPUS
>> -------------------------------------------------------------------------------
>> global                  -               -     -       -       -       -      
>>  -
>> compute001              lx26-amd64      4     -   31.4G       -     0.0      
>>  -
>> compute002              lx26-amd64      4     -   31.4G       -     0.0      
>>  -
>> compute003              lx26-amd64      4     -   31.4G       -     0.0      
>>  -
>> compute004              lx26-amd64      4     -   31.4G       -     0.0      
>>  -
>> compute005              lx26-amd64      4     -   31.4G       -     0.0      
>>  -
>> compute006              lx26-amd64      4     -   31.4G       -     0.0      
>>  -
>> compute007              lx26-amd64      4     -   31.4G       -     0.0      
>>  -
>> compute008              lx26-amd64      4     -   31.4G       -     0.0      
>>  -
>> compute009              lx26-amd64      4     -   31.4G       -     0.0      
>>  -
>> compute010              lx26-amd64      4     -   31.4G       -     0.0      
>>  -
>> compute011              lx26-amd64      4     -   31.4G       -     0.0
>>
>> In normal times even when the compute nodes are not used I used to have
>> some information on the load and memuse columns
>>
>> I am not an SGE persons but I am familiar with all the commands, any help
>> would be much appreciated
>>
>> the qstat -f command shows all my nodes in au state. I've been reading a
>> lot about it and I understood its an alarm state (overloaded ?)
>>
>> the only heavy activity I had on the head node was a script downloading
>> 19T of data, could the headnode be the problem and not the compute nodes ?
>> sge_execd is working on all the compute/exec nodes :/
>>
>> --
>> *Rad*
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>>
>>
>> William Bryce | VP Products
>> Univa Corporation, Toronto
>> E: [email protected] | D: 647-9742841 | Toll-Free (800) 370-5320
>> W: Univa.com | FB: facebook.com/univa.corporation | T:
>> twitter.com/Grid_Engine
>>
>>
>
>
> --
> *Radhouane Aniba*
> *Bioinformatics Scientist*
> *BC Cancer Agency, Vancouver, Canada*
>



-- 
*Radhouane Aniba*
*Bioinformatics Scientist*
*BC Cancer Agency, Vancouver, Canada*
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to