In environments where you do tens of thousands of jobs per day or tons
or really short jobs or a constant flow of jobs always active you may
need a master node that is somewhat beefy. If you've never seen your
head node get slammed then you can downsize. If there is a chance that
your workload could change significantly then keep the size as is.
I'm in favor of massive login nodes. They are often used by users who
are prototyping job scripts and we can't always train them to 'qlogin'
or 'qrsh' into a remote node for testing. All you need is a couple of
people running large R or Matlab tasks plus some other people doing a
massive set of array job prep combined with a couple of people who
constantly "qstat" and you can run the login node out of resources
pretty quickly.
The cost of CPU and RAM at this scale is dirt cheap. Effectively noise
relative to cost of networking and storage so I also tend to make login
and interactive nodes larger than strictly necessary.
My $.02!
Chris
Notorious Biggles wrote:
Hi all,
I have some money available to replace the infrastructure nodes of one
of my company's grid engine clusters and I wanted a sanity check
before I order anything new.
Initially we contacted the company we originally bought the cluster
from and they quoted us for a combined login/storage/master node with
loads of everything and a hefty price tag. I feel an aversion to
combining login nodes with storage and master nodes - we already have
that on one of the clusters and a user being able to crash the entire
cluster seems a bad thing to me and it happened often enough.
I read Rayson's blog post about scaling grid engine to 10k nodes at
http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html
and it seems that 4 cores and 1 GB of memory is more than enough to
run a grid engine master. Given that I'd be lucky to have 100 nodes to
a master, can anybody see a reason to spec a high powered master node?
I look at my existing master nodes with 8+ cores and 24+ GB of memory
and in Ganglia all I see is acres of green from memory being used as
cache and buffers. It seems rather a waste.
The other thing I was curious about is what kind of spec seems
reasonable to you for a login node. My one cluster with separate login
nodes has similar specs to the master nodes - 8 cores, 24 GB memory
and it seems wasted. I can see an argument for these nodes to be more
than just a low end box, especially if anybody is trying to do some
kind of visualization on them, but I've never had complaints about
them being under-powered yet.
Any thoughts you might have are appreciated.
Thanks
Biggles
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users