+1 for homogeneous design across all resource types, as per Ian. An HPC
community waiting for a particular node resource to be serviced, that's
super expensive compared to any other savings! Having spare login nodes in
VMs is likely a good idea, at least one on standby recommended!

F.

On Wednesday, July 20, 2016, Ian Kaufman <[email protected]> wrote:

> I agree - separating the master, login, and storage is wise for the
> specific reason of having one user or one poorly constructed script ruin
> everyone's day. For larger, more active clusters, this is my SOP. For
> smaller ones, I might skip the login node.
>
> As far as specs, these days, finding a low spec system is not really worth
> it. The price delta isn't huge. But, it also depends on what the head is
> doing in addition to Grid Engine. If you are using ROCKS, Warewulf, or
> Perceus, or some other software toolkit to manage and deploy the nodes, you
> might need more CPU, RAM, and storage. I personally prefer homogeneous
> hardware for frontend, login, and compute. Sure, it gives plenty of elbow
> room for the frontend and login, but I'd prefer that over those systems
> going down due to random loss of resources.
>
> Obviously give more RAM to the compute nodes, and either do RAM only (lots
> of RAM) or big disks for scratch. I only do stateless nodes that run in RAM
> and use any local disks as scratch space.
>
> Ian
>
> On Wed, Jul 20, 2016 at 7:56 AM, Notorious Biggles <
> [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> Hi all,
>>
>> I have some money available to replace the infrastructure nodes of one of
>> my company's grid engine clusters and I wanted a sanity check before I
>> order anything new.
>>
>> Initially we contacted the company we originally bought the cluster from
>> and they quoted us for a combined login/storage/master node with loads of
>> everything and a hefty price tag. I feel an aversion to combining login
>> nodes with storage and master nodes - we already have that on one of the
>> clusters and a user being able to crash the entire cluster seems a bad
>> thing to me and it happened often enough.
>>
>> I read Rayson's blog post about scaling grid engine to 10k nodes at
>> http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html
>> and it seems that 4 cores and 1 GB of memory is more than enough to run a
>> grid engine master. Given that I'd be lucky to have 100 nodes to a master,
>> can anybody see a reason to spec a high powered master node? I look at my
>> existing master nodes with 8+ cores and 24+ GB of memory and in Ganglia all
>> I see is acres of green from memory being used as cache and buffers. It
>> seems rather a waste.
>>
>> The other thing I was curious about is what kind of spec seems reasonable
>> to you for a login node. My one cluster with separate login nodes has
>> similar specs to the master nodes - 8 cores, 24 GB memory and it seems
>> wasted. I can see an argument for these nodes to be more than just a low
>> end box, especially if anybody is trying to do some kind of visualization
>> on them, but I've never had complaints about them being under-powered yet.
>>
>> Any thoughts you might have are appreciated.
>>
>> Thanks
>> Biggles
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>
>> https://gridengine.org/mailman/listinfo/users
>>
>>
>
>
> --
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>


-- 
echo "sysadmin know better bash than english"|sed s/min/mins/ \
  | sed 's/better bash/bash better/' # signal detected in a CERN forum
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to