Re-starting one of the execd nodes solved the issue. I then found some
jobs that I force deleted and the problem seems to have gone away.

Thanks.

Simon

On Sun, Mar 6, 2016 at 10:07 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> Hi,
>
> Am 04.03.2016 um 16:40 schrieb Simon Matthews:
>
>> I am getting this error message:
>> 03/04/2016 07:30:14|listen|sgemaster|E|commlib error: local host name
>> error (remote rdata host name "turquoise" is not equal to local
>> resolved host name "h2.sj.bps")
>> 03/04/2016 
>> 07:30:23|worker|sgemaster|E|cqueue_list_locate_qinstance("(null)@(null)"):
>> cqueue == NULL("(null)", "(null)", 1, 0
>> 03/04/2016 07:30:23|worker|sgemaster|E|writing job finish information:
>> can't locate queue "(null)@(null)"
>> 03/04/2016 07:30:23|worker|sgemaster|W|job 9179498.1 failed on host
>> <unknown host> before writing exit_status because: shepherd exited
>> with exit status 19: before writing exit_status
>> 03/04/2016 07:30:23|worker|sgemaster|C|!!!!!!!!!! got NULL element for
>> QU_rerun !!!!!!!!!!
>>
>> I have seen references to this condition being fixed by deleting the
>> job, but how do I do this? We use BDB spooling. This grid is running
>> SGE 6.2U5.
>
> Is the job still running? It looks like it finished already. Nevertheless: 
> did you try a `qdel -f <job_id>`?
>
> -- Reuti
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to