I've spent the morning tracking down a scheduling problem on our cluster which arose from a misunderstanding on how complex values and parallel environments interact.
In our setup we have configured h_vmem to be consumable so we can schedule based on the memory requirements of the jobs. We also have a parallel environment set up for SMP jobs which allow the user to reserve multiple cores on the same physical machine. This morning we found a load of jobs which couldn't be scheduled despite us appearing to have plenty of memory and cores free. Other jobs with similar memory requirements and numbers of cores were able to be scheduled, but this one set of jobs would only stay queued. We eventually figured out that this was because when we set a pe request and an h_vmem request, that the actual reservation of memory multiplies the h_vmem by the number of cores, so we were actually requesting about 10X the memory we thought we were after. I can see that for MPI type jobs this makes plenty of sense since they are running independently, potentially on different machines. For SMP jobs though we're actually just running different threads so it seems odd to have to make our users calculate a 'memory per core' value, rather than an overall value for the job. Is there therefore any way to configure this behaviour within a pe? I couldn't see anything obvious in the pe or complex config, but this must have been something people have addressed before. For memory it's not so bad in that we can at least just divide the allocation, but for something like licenses where you only need one for a large SMP job I can't see how you could set this up. Any pointers would be greatly appreciated. Thanks Simon. The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902. The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms> _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
