+1 for homogeneous design across all resource types, as per Ian. An HPC community waiting for a particular node resource to be serviced, that's super expensive compared to any other savings! Having spare login nodes in VMs is likely a good idea, at least one on standby recommended!
F. On Wednesday, July 20, 2016, Ian Kaufman <[email protected]> wrote: > I agree - separating the master, login, and storage is wise for the > specific reason of having one user or one poorly constructed script ruin > everyone's day. For larger, more active clusters, this is my SOP. For > smaller ones, I might skip the login node. > > As far as specs, these days, finding a low spec system is not really worth > it. The price delta isn't huge. But, it also depends on what the head is > doing in addition to Grid Engine. If you are using ROCKS, Warewulf, or > Perceus, or some other software toolkit to manage and deploy the nodes, you > might need more CPU, RAM, and storage. I personally prefer homogeneous > hardware for frontend, login, and compute. Sure, it gives plenty of elbow > room for the frontend and login, but I'd prefer that over those systems > going down due to random loss of resources. > > Obviously give more RAM to the compute nodes, and either do RAM only (lots > of RAM) or big disks for scratch. I only do stateless nodes that run in RAM > and use any local disks as scratch space. > > Ian > > On Wed, Jul 20, 2016 at 7:56 AM, Notorious Biggles < > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Hi all, >> >> I have some money available to replace the infrastructure nodes of one of >> my company's grid engine clusters and I wanted a sanity check before I >> order anything new. >> >> Initially we contacted the company we originally bought the cluster from >> and they quoted us for a combined login/storage/master node with loads of >> everything and a hefty price tag. I feel an aversion to combining login >> nodes with storage and master nodes - we already have that on one of the >> clusters and a user being able to crash the entire cluster seems a bad >> thing to me and it happened often enough. >> >> I read Rayson's blog post about scaling grid engine to 10k nodes at >> http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html >> and it seems that 4 cores and 1 GB of memory is more than enough to run a >> grid engine master. Given that I'd be lucky to have 100 nodes to a master, >> can anybody see a reason to spec a high powered master node? I look at my >> existing master nodes with 8+ cores and 24+ GB of memory and in Ganglia all >> I see is acres of green from memory being used as cache and buffers. It >> seems rather a waste. >> >> The other thing I was curious about is what kind of spec seems reasonable >> to you for a login node. My one cluster with separate login nodes has >> similar specs to the master nodes - 8 cores, 24 GB memory and it seems >> wasted. I can see an argument for these nodes to be more than just a low >> end box, especially if anybody is trying to do some kind of visualization >> on them, but I've never had complaints about them being under-powered yet. >> >> Any thoughts you might have are appreciated. >> >> Thanks >> Biggles >> >> _______________________________________________ >> users mailing list >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> https://gridengine.org/mailman/listinfo/users >> >> > > > -- > Ian Kaufman > Research Systems Administrator > UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu > -- echo "sysadmin know better bash than english"|sed s/min/mins/ \ | sed 's/better bash/bash better/' # signal detected in a CERN forum
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
