Guijarro, Julio wrote: > > > > > *From:* Cai Cai [mailto:[email protected]] > *Sent:* 21 September 2009 08:00 > *To:* Guijarro, Julio > *Subject:* Re: [Smartfrog-users] smartfrog and hadoop > > > > Hi Julio, > Currently I'm doing some simple experiments about hadoop with some > classmates and we are interested in building our own cloudplatform(and > if our idea is workable, we may get more support from our institute). > But we just have some person computers with ordinary configurations. We > have installed Xen(Xen 3.2-1-amd64) on our computers, with debian lenny > OS, to set up simple clusters. And we also install hadoop on them. Now > we are thinking about how to do it with Smartfrog (since we think > Smartfrog is suitable for automatic deployment and may be helpful for > our platform in the future, such as application deployment and ect.) > Now I have download the "smartfrog.3.17.014_dist.tar.gz" and > "smartfrog-rpm-bundle-3.17.014.tar.gz", with the formal one, i have test > the examples in Smartfrog User Manual(sfRun > org/smartfrog/examples/arithnet/example1.sf and etc.), but there is > nothing related to hadoop,i find the followings in your website > Steps to deployability > 1 Configure Hadoop from an SmartFrog description > 2 Write components for the Hadoop nodes > 3 Write the functional tests > 4 Add workflow components to work with the filesystem; submit jobs > 5 Get the tests to pass > should i add the hadoop jar file in > "smartfrog-rpm-bundle-3.17.014.tar.gz" into "smartfrog.3.17.014_dist"'s > dist/lib filefolder or could you please tell me what i should do next? > > I hope the text above can supply plenty of information you want, and if > not, please email me. > Thank you very much and we're looking forward to you help. > Best wishes, > Cai > >
If things aren't written up, its my fault. We have tests and examples, but things are unstable and yes, I need to write all this up. JARs The sf-hadoop RPM contains all the JARs needed to bring up Hadoop under SmartFrog. In an RPM-based installation they should all go into SFHOME/lib automatically -if you are installing on Debian then alien ought to be able to handle everything. Know that a debian package is something we would like to do at some point, it's just there are lots of other things on my todo list too. Installations Once the JARs are in place, SmartFrog can bring up a node as a namenode, datanode, job tracker, task tracker or all of the above -you just have to push out the right .sf file to the nodes to tell them what you want them to be. Which leads to the question, "where do those .sf files live"? They are in our SVN repository -The hadoop-cluster package contains everything needed to configure a hadoop cluster, and some ant targets to help push these descriptions out. This is something I am still busy developing. http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/hadoop-cluster/ Although it is not released as any RPM, it does generate a JAR file containing nothing but .sf files for different parts of a system; I'm busy working on this to drive it more dynamically, expanding templates with late binding information (URLs of the master servers, numbers of task slots per host, etc.), so that when a cluster of machines is dynamically created, its easy to push out the configurations. Hadoop is tricky in that while the workers will all spin waiting for the master nodes to come up, they do all need to know the URL of the namenode and jobtracker before they start spinning -so you need to know the hostname/port of the master nodes before trying to start any workers. the templates are here http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/hadoop-cluster/src/org/smartfrog/extras/hadoop/cluster/services/bondable/components.sf these are what we push out to dynamically allocated machines once they are up, part of my long-haul cluster management and job submission stuff http://www.slideshare.net/steve_l/long-haul-hadoop -the citerank package contains everything needed to run a page-rank style algorithm on citation data; this is a basic example of a complex Hadoop MR sequence. It is what I use for testing that clusters work. It also does its http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/citerank/ This is designed to build and run standalone (via standalone.xml) as well as part of our bigger build process. It uses the classic MapReduce APIs so it can run against older versions -provided you compile it against whichever version of Hadoop you intend to use. My recommendations then are * check out the main SmartFrog core source tree, including the the extras/hadoop-cluster and citerank areas * have a look at how we bring up test clusters in components/hadoop and hadoop-cluster * If there are bits that are confusing, where you want some documentation, email this list and it will force me to write things up. -Steve ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Smartfrog-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/smartfrog-users
