On 28/6/23 04:02, Rahmanpour Koushki, Maysam wrote:

Upon reviewing the current FAQ, I found that it states node shrinking is only possible for pending jobs. Unfortunately, it does not provide additional information or examples to clarify if this functionality can be extended to running jobs.

You can definitely release nodes from a running job, what I believe the FAQ is saying is you cannot do something like change the number of cores per node or memory you requested once a job is running.

As for why you'd do that, we've had people who (before we set up a mechanism to automatically reboot nodes to address this) would request more nodes than they needed, look for how fragmented kernel hugepages were and then exclude nodes where there were too many fragmented for their needs.

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


Reply via email to