In environments where you do tens of thousands of jobs per day or tons or really short jobs or a constant flow of jobs always active you may need a master node that is somewhat beefy. If you've never seen your head node get slammed then you can downsize. If there is a chance that your workload could change significantly then keep the size as is.

I'm in favor of massive login nodes. They are often used by users who are prototyping job scripts and we can't always train them to 'qlogin' or 'qrsh' into a remote node for testing. All you need is a couple of people running large R or Matlab tasks plus some other people doing a massive set of array job prep combined with a couple of people who constantly "qstat" and you can run the login node out of resources pretty quickly.

The cost of CPU and RAM at this scale is dirt cheap. Effectively noise relative to cost of networking and storage so I also tend to make login and interactive nodes larger than strictly necessary.

My $.02!

Chris



Notorious Biggles wrote:
Hi all,

I have some money available to replace the infrastructure nodes of one of my company's grid engine clusters and I wanted a sanity check before I order anything new.

Initially we contacted the company we originally bought the cluster from and they quoted us for a combined login/storage/master node with loads of everything and a hefty price tag. I feel an aversion to combining login nodes with storage and master nodes - we already have that on one of the clusters and a user being able to crash the entire cluster seems a bad thing to me and it happened often enough.

I read Rayson's blog post about scaling grid engine to 10k nodes at http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html and it seems that 4 cores and 1 GB of memory is more than enough to run a grid engine master. Given that I'd be lucky to have 100 nodes to a master, can anybody see a reason to spec a high powered master node? I look at my existing master nodes with 8+ cores and 24+ GB of memory and in Ganglia all I see is acres of green from memory being used as cache and buffers. It seems rather a waste.

The other thing I was curious about is what kind of spec seems reasonable to you for a login node. My one cluster with separate login nodes has similar specs to the master nodes - 8 cores, 24 GB memory and it seems wasted. I can see an argument for these nodes to be more than just a low end box, especially if anybody is trying to do some kind of visualization on them, but I've never had complaints about them being under-powered yet.

Any thoughts you might have are appreciated.

Thanks
Biggles


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to