John Hearns wrote:
On 20 January 2011 16:50, Olivier SANNIER <olivier.sann...@actuaris.com> wrote:
I’ve started looking at beowulf clusters, and that lead me to PBS. Am I
right in assuming that PBS (PBSPro or TORQUE) could be used to do the
monitoring and the load balancing I thought of?

Yes, that is correct. An alternative is Gridengine.

To be honest, I think you should contact a company which sells
ccomputational clusters.
They will send someone to tell you how these clusters work, and give
you an idea of how a small cluster could help with your work.
I can suggest some companies off-list.

Hi Olivier

1) Besides John's suggestions, there are some good and informative
articles on how clusters work, etc, at ClusterMonkey.net:

http://www.clustermonkey.net/

2) Since clusters != MPI != OpenMPI,
you may find general information about clusters
in the  Beowulf and Rocks Clusters web sites
and mailing lists:

http://www.beowulf.org/
http://www.beowulf.org/archive/index.html
http://www.beowulf.org/mailman/listinfo/beowulf

http://www.rocksclusters.org/wordpress/
http://marc.info/?l=npaci-rocks-discussion
https://lists.sdsc.edu/mailman/listinfo/npaci-rocks-discussion

BTW, Rocks provides free software to setup a standard cluster with
minimal effort.  It is a NSF-supported project at UCSD:

http://www.rocksclusters.org/wordpress/?page_id=57

3) Resource managers / job queuing systems:

Torque (which we use here) is free, available to download
from the AdaptiveComputing/ClusterResources web site:

http://www.adaptivecomputing.com/
http://www.clusterresources.com/products/torque-resource-manager.php
http://www.clusterresources.com/products/maui-cluster-scheduler.php
http://www.adaptivecomputing.com/resources/docs/

Torque  was formerly called PBS,
although PBS-Pro also exists as a licensed product:

http://en.wikipedia.org/wiki/Portable_Batch_System

Torque performs resource management, job queuing and control,
and, along with its cousin job scheduler Maui, which is also
available from the same site (one of the links above),
gives you a handle to manage resource optimization and load balancing
in one or more clusters.

There are other free resource managers, like Sun Grid Engine,
although its future is not completely clear after Sun was
bought by Oracle, and its development/maintenance
apparently has been taken over by Univa:

http://www.univa.com/about/contact/grid-engine-hotline.php?source=GoogleAds&utm_source=google&utm_medium=ppc&utm_campaign=sun-grid-engine

Lawrence Livermore produces another free scheduler named Slurm,
but my perception is that Slurm doesn't integrate to as many HPC
tools or as easily as Torque and SGE do:

https://computing.llnl.gov/linux/slurm/

Other licensed resource managers/batch systems also exist,
including Moab (Adaptive Computing),
LSF (Platform Computing),
Tivoli/Load Leveler (IBM),
PBS-Pro (Altair):


http://www.adaptivecomputing.com/products/
http://www.platform.com/Products
http://www-03.ibm.com/systems/software/loadleveler/
http://www.pbsworks.com/Default.aspx?AspxAutoDetectCookieSupport=1

There are also "grid" resource managers (Condor, Globus, etc):

http://www.globus.org/
http://www.globus.org/grid_software/computation/condor.php
http://www.globus.org/toolkit/

I hope this helps,
Gus Correa

Reply via email to