The docs we've found say that gid_range must be greater than the number of jobs expected to run currently on one host.
Our recent experience suggests that it has to be greater than the total number of jobs in the queue. If it's not, then a few jobs get mysteriously killed (typically about 1 in 30-40). Has anyone else had that experience? We did fix this by expanding the range (it was the default of 20000-20100, which we changed to 20200-21000), but would like to know if there's a "best practice" regarding the range of values. The server is running Centos 7.3.1611. Below are the rpm details regarding the installed version of gridengine. [root@HOST spool]# rpm -qi gridengine Name : gridengine Version : 8.1.9 Release : 1.el7.centos Architecture: x86_64 Install Date: Fri 12 May 2017 01:20:16 PM PDT Group : Applications/System Size : 48045546 License : (SISSL and BSD and LGPLv3+ and MIT) and GPLv3+ and GFDL and others Signature : RSA/SHA1, Mon 29 Feb 2016 03:40:25 PM PST, Key ID 7aa656a092258035 Source RPM : gridengine-8.1.9-1.el7.centos.src.rpm Build Date : Mon 29 Feb 2016 03:39:54 PM PST Build Host : copr-builder-480932275.novalocal Relocations : /opt/sge Vendor : Fedora Project COPR (loveshack/SGE) URL : https://arc.liv.ac.uk/trac/SGE Summary : Grid Engine - Distributed Resource Manager Description : Grid Engine (often known as SGE) is a distributed resource manager, typically deployed to manage batch jobs on computational clusters (like Torque/Maui), but also capable of managing interactive jobs and looser collections of resources, such as desktop PCs (like Condor). The computational resources may be heterogeneous (including different operating systems) with specified properties. Jobs are matched to available resources according to the properties they request. These are the files shared by both the qmaster and execd daemons, required to run either the server or clients. https://arc.liv.ac.uk/trac/SGE _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
