[gridengine users] A 10, 000-node Grid Engine Cluster in Amazon EC2

Rayson Ho Wed, 19 Dec 2012 09:11:15 -0800

A few weeks ago, we stress tested commlib (for those who don't know
the code, commlib is the communication library in Grid Engine) to make
sure that Grid Engine works in clusters larger than 10,000 nodes, and
works efficiently. We blogged the experience and you can read it at:

http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html

We used smaller nodes (called instances in AWS terminology), and in
terms of core count per node, the largest ones we used only have 8
cores per node. We could have used larger nodes, like cc2.8xlarge
(Cluster Compute Eight Extra Large Instance) that has 16 Intel Xeon
E5-2670 cores per node, and use less number of nodes to achieve the
same core count, but then it would put less stress on the commlib...

There are some performance issues that we would like to fix before we
run something even larger (like 20,000 nodes and beyond :-D ), and I
think we are hitting the "C10K problem" that was encountered by web
servers a few years ago!

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

On Thu, Nov 15, 2012 at 10:52 AM, Rayson Ho <[email protected]> wrote:
>
> This year, we tested the scalability of Open Grid Scheduler / Grid
> Engine on the cloud -- we ran a 10,000-node cluster on EC2 (we could
> have used Gompute's hardware but obviously there are more important
> workloads in the dedicated HPC Clouds at Gompute).
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] A 10, 000-node Grid Engine Cluster in Amazon EC2

Reply via email to