Re: [Smartfrog-users] FW: smartfrog and hadoop

Steve Loughran Mon, 21 Sep 2009 10:27:18 -0700

Guijarro, Julio wrote:
>  
> 
>  
> 
> *From:* Cai Cai [mailto:[email protected]]
> *Sent:* 21 September 2009 08:00
> *To:* Guijarro, Julio
> *Subject:* Re: [Smartfrog-users] smartfrog and hadoop
> 
>  
> 
> Hi Julio,
>  Currently I'm doing some simple experiments about hadoop with some 
> classmates and we are interested in building our own cloudplatform(and 
> if our idea is workable, we may get more support from our institute). 
> But we just have some person computers with ordinary configurations. We 
> have installed Xen(Xen 3.2-1-amd64) on our computers, with debian lenny 
> OS, to set up simple clusters. And we also install hadoop on them. Now 
> we are thinking about how to do it with Smartfrog (since we think 
> Smartfrog is suitable for automatic deployment and may be helpful for 
> our platform in the future, such as application deployment and ect.)
>  Now I have download the "smartfrog.3.17.014_dist.tar.gz" and 
> "smartfrog-rpm-bundle-3.17.014.tar.gz", with the formal one, i have test 
> the examples in Smartfrog User Manual(sfRun 
> org/smartfrog/examples/arithnet/example1.sf and etc.), but there is 
> nothing related to hadoop,i find the followings in your website
>   Steps to deployability
>   1 Configure Hadoop from an SmartFrog description
>   2 Write components for the Hadoop nodes
>   3 Write the functional tests
>   4 Add workflow components to work with the filesystem; submit jobs
>   5 Get the tests to pass
> should i add the hadoop jar file in 
> "smartfrog-rpm-bundle-3.17.014.tar.gz" into "smartfrog.3.17.014_dist"'s  
> dist/lib filefolder or could you please tell me what i should do next?
> 
>  I hope the text above can supply plenty of information you want, and if 
> not, please email me.
>  Thank you very much and we're looking forward to you help.
>  Best wishes,
>  Cai
> 
>

If things aren't written up, its my fault. We have tests and examples,
but things are unstable and yes, I need to write all this up.

JARs

The sf-hadoop RPM contains all the JARs needed to bring up Hadoop under
SmartFrog. In an RPM-based installation they should all go into
SFHOME/lib automatically -if you are installing on Debian then alien
ought to be able to handle everything. Know that a debian package is
something we would like to do at some point, it's just there are lots of
other things on my todo list too.

Installations

Once the JARs are in place, SmartFrog can bring up a node as a namenode,
datanode, job tracker, task tracker or all of the above -you just have
to push out the right .sf file to the nodes to tell them what you want
them to be. Which leads to the question, "where do those .sf files
live"? They are in our SVN repository

-The hadoop-cluster package contains everything needed to configure a
hadoop cluster, and some ant targets to help push
these descriptions out. This is something I am still busy developing.

http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/hadoop-cluster/

Although it is not released as any RPM, it does generate a JAR file
containing nothing but .sf files for different parts of a system; I'm
busy working on this to drive it more dynamically, expanding templates
with late binding information (URLs of the master servers, numbers of
task slots per host, etc.), so that when a cluster of machines is
dynamically created, its easy to push out the configurations. Hadoop is
tricky in that while the workers will all spin waiting for the master
nodes to come up, they do all need to know the URL of the namenode and
jobtracker before they start spinning -so you need to know the
hostname/port of the master nodes before trying to start any workers.

the templates are here
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/hadoop-cluster/src/org/smartfrog/extras/hadoop/cluster/services/bondable/components.sf

these are what we push out to dynamically allocated machines once they
are up, part of my long-haul cluster management and job submission stuff
http://www.slideshare.net/steve_l/long-haul-hadoop

-the citerank package contains everything needed to run a page-rank
style algorithm on citation data; this is a basic example of a complex
Hadoop MR sequence. It is what I use for testing that clusters work. It
also does its

http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/citerank/

This is designed to build and run standalone (via standalone.xml) as
well as part of our bigger build process. It uses the classic MapReduce
APIs so it can run against older versions -provided you compile it
against whichever version of Hadoop you intend to use.

My recommendations then are
* check out the main SmartFrog core source tree, including the the
extras/hadoop-cluster and citerank areas
* have a look at how we bring up test clusters in components/hadoop
and hadoop-cluster
* If there are bits that are confusing, where you want some
documentation, email this list and it will force me to write things up.

-Steve

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Smartfrog-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/smartfrog-users

Re: [Smartfrog-users] FW: smartfrog and hadoop

Reply via email to