Dear Slurm users, my team is managing a HPC cluster (running Slurm) for a research centre. We are planning to expand the cluster in the next couple of years and we are facing a problem. We would like to put a figure on how many resources will be needed on average for each user (in terms of CPU cores, RAM, GPUs) but we have almost one hundred researchers using the cluster for all sorts of different use cases so there isn't a typical workload that we could take as a model. Most of the work is, however, in the field of machine learning and deep learning. Users go all the range from first year PhD students with limited skills to researchers and professors with many years of experience. In principle we could use a mix of: looking at current usage patterns, user surveys, etc.
I was just wondering whether anyone here, working in a similar setting, had some sort of guidelines that they have been using for budgeting hardware purchases and that they would be willing to share? Many thanks and regards -- Graziano D'Innocenzo (PGP key: 9213BE46) Systems Administrator - ADAPT Centre