Michael Dalton wrote:
> Hi all,

Hi, sorry I've not replied, been on holiday/vacation for two weeks

> 
> I have a quick question regarding SmartFrog and managing Hadoop
> clusters. I'm setting up a reasonable-sized Hadoop cluster that's
> expected to grow fairly quickly. However, I'm not really sure what the
> appropriate cluster management and administration tools are. From what
> I understand, I'll need tools to manage software configurations
> (packages+config files), obtain hadoop-specific metrics, obtain
> overall machine metrics(CPU load, memory usage, etc), and continuously
> perform process health/service lifecycle management (i.e. restarting
> datanodes that crash, reporting on critical errors like EDAC errors in
> Linux indicating uncorrectable DRAM flaws).
> 
> It appears right now folks like Yahoo! use Yum (or apt) + bcfg2 for
> the package/configuration, some variant of Ganglia for Hadoop, Nagios
> for overall metrics, and there is nothing publicly available for
> process health. I'm fine using Ganglia and Nagios for metrics unless
> someone can point me to better tools, but I'd rather not use cobble
> together something using bcfg2 and hacked up shell scripts for
> configuration/process health management.

Ganglia and Nagios are the way to go for monitoring, Todd Lipcon, author 
of Ganglia, now works for Cloudera; we've been discussing adding better 
monitoring/logging -specifically we want more stable/parseable events 
logged.

> 
> It looks like SmartFrog in conjunction with HADOOP-3628 would provide
> both reasonable configuration and process health/service lifecycle
> management. If I understand correctly, SmartFrog will allow me to
> manage Hadoop configuration files, manage the installation of my
> Hadoop packages, _and_ also provide monitoring of the health of Hadoop
> nodes so I can automatically log and restart hadoop nodes when they
> crash, etc.
> 

yes, and on VM infrastructure, let you ask for new VMs

> Is this a reasonably accurate description of the state of the art with
> Hadoop and cluster administration? Also, are there are any estimates
> on when the HADOOP-3628 branch will be committed into SVN trunk, and
> when that occurs will the SmartFrog project still need to maintain its
> own Hadoop branch?

Once that change goes in, no more branch, which is good as it takes up 
too much time. We would still have our own subclasses of the main hadoop 
objects and other things which would need to stay in sync, but that is 
easier to handle -and would be in our codebase.

that said, once the lifecycle patch goes in,  I have some other things I 
want to deal with
  * long-haul job submission/monitoring
  * explicit support for different configuration backends

The current state of the branch is that everything worked up to the big 
three-way -core -hdfs -mapred split, and I've not sat down to update it 
for the changes, a combination of the two-week break and local pressure 
to get that version of hadoop working on dynamic VM clusters, clusters 
where things like hostnames aren't known until startup, which makes 
configuration that much trickier/more fun.

I dont plan to sit down and deal with the merge for another couple of 
weeks, though I will start looking at it. My code was fairly stable, the 
big problem is more that the hadoop code tends to move around when you 
are not looking,

> Lastly, I was unable to find a list of real-world
> projects running SmartFrog: are there any large-scale(> 1000 node)
> clusters running SF? Thanks for your help

We're using it for smaller, short-lived clusters. Once the OpenCirrus 
cloud testbed is live I am going to try on the thousands, as it is we 
are only working on the 1-10-100 scale. Short cluster life also stops me 
having to worry about how stable the filesystem is; keep the data on 
other filestores and I can run against SVN_HEAD code without worrying 
about what if HDFS goes wrong.

If you want to use SF with Hadoop, talk to me direct about your cluster 
and it will give me extra motivation to get the branch in,  and extra 
pressure on Y! to accept it; julio and I can help you starting out with 
SmartFrog. It also generates extra motivation for me to create better 
documentation, which is always something on my TODO list.

-Steve



------------------------------------------------------------------------------
_______________________________________________
Smartfrog-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/smartfrog-users

Reply via email to