Hallo to distinguished forum members,
We are using SoGE 8.1.8 and since recently approximately 2 months ago our job
schedule time raised up to 30-60 sec.
Our environment details:
We have about 3-4K of concurrent jobs.
We have about 300 physical and 200 VMs as execution hosts.
In total we have about 5K cores (slots).
We have a lot of different projects/parallel environments and
quotas configured.
We perform simple check to see how much it takes to schedule the job:
>time qrsh date
Mon Oct 10 17:35:31 IDT 2016
0.015u 0.010s 0:22.38 0.0% 0+0k 0+0io 0pf+0w
Previously we tried to clean all the running jobs and the schedule time dropped
to 1 sec, but then again when more jobs were on the queue the schedule time
raised up to 10-20 sec and now we have 30-60 sec.
Any tips and advices where to look for the root cause and/or how can we improve
the situation, will be greatly appreciated.
Thank You.
Yuri Burmachenko | Sr. Engineer | IT | Mellanox Technologies Ltd.
Work: +972 74 7236386 | Cell +972 54 7542188 |Fax: +972 4 959 3245
Follow us on Twitter<http://twitter.com/mellanoxtech> and
Facebook<http://www.facebook.com/pages/Mellanox-Technologies/223164879116>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users