Re: GSoc08 Application: Hadoop Map/Reduce SCA Integration Project

Jean-Sebastien Delfino Sun, 30 Mar 2008 16:13:20 -0700

Chris Trezzo wrote:

This is a great idea. I think, if I understand you correctly, adding afourth implementation type is intended to address this. The extra typeis going to act something like an orchestrator, trying to intelligentlymanage the Map, Combine, and Reduce functions over distributed computingfacilities and heterogeneous data sources. Like you said, this componentcould shape the deployment of computations based on things like cloudload, time of day, locality of resources and so on.


I should probably make this more clear in my proposal.

Thank you for the comment/suggestion!

Chris


On Mar 30, 2008, at 12:50 AM, Robert Burrell Donkin wrote:

On Sun, Mar 30, 2008 at 5:06 AM, Chris Trezzo <[EMAIL PROTECTED]> wrote:

Hello everyone,

I have posted a rough draft proposal for the project entitled
"Simplify the development of Map/Reduce applications and their
integration with various sources of information."

The draft is located here:http://www.cse.ucsd.edu/~ctrezzo/gsocapplication.html


Any comments/suggestions would be highly appreciated.


just throwing out an idea...

but would it be possible/beneficial to wrap Map/Reduce resources using
SCAs the other way round as well?

for example, take an abstract service which performs some possibly
intensive analytic computation. at smaller scales or development, the
analytic components might be assembled into a simple web service
running on a single container. at the the largest scales, the work may
need to be farmed out to one of a number of clouds.

perhaps an active management layer might be able to make decisions to
route the processing to different possibly hetrogeneous resources
based on data and meta-data (cloud load, time of day and so on) . for
example, during local night these computations might be directed to a
grid formed on general purpose PCs used during the day.

(who usually just lurks...)

- robert


Looks pretty good to me!

One comment: I think it would be good to introduce concrete use cases /scenarios to help drive the development of the project, and present themin a sentence or two in the proposal.

You could start with some the existing Hadoop examples implemented asSCA components, then a slightly bigger application showing the benefitsof reusing and wiring components - as counting words in a big documentis a little simplistic :) - and the integration of external datasources, or invocation of SCA services with the output of the map/reducefor example.

For item (3) you could start by looking at the SCA interface types. SCAcomponents can use local interfaces or remote interfaces on theirservices and references. Remote interfaces can cross a networkcommunication, local interfaces require the components to run in thesame JVM, classloader etc. You could start with that and use that infoto control where components run on the Hadoop cloud, components withlocal interfaces would be packaged together, components with remoteinterfaces could run on different nodes.

Then once you have that running you could explore SCA policies, otherrequirements of your components etc.

I like Robert's idea too, I can imagine a management layer that analyzesthe SCA metadata, the shape and usage of the cloud and uses some rulesto decide the allocation of components and jobs. Sounds really cool!


Hope this helps.
--
Jean-Sebastien

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: GSoc08 Application: Hadoop Map/Reduce SCA Integration Project

Reply via email to