Chris Trezzo wrote:
This is a great idea. I think, if I understand you correctly, adding a fourth implementation type is intended to address this. The extra type is going to act something like an orchestrator, trying to intelligently manage the Map, Combine, and Reduce functions over distributed computing facilities and heterogeneous data sources. Like you said, this component could shape the deployment of computations based on things like cloud load, time of day, locality of resources and so on.

I should probably make this more clear in my proposal.

Thank you for the comment/suggestion!

Chris


On Mar 30, 2008, at 12:50 AM, Robert Burrell Donkin wrote:

On Sun, Mar 30, 2008 at 5:06 AM, Chris Trezzo <[EMAIL PROTECTED]> wrote:
Hello everyone,

I have posted a rough draft proposal for the project entitled
"Simplify the development of Map/Reduce applications and their
integration with various sources of information."

The draft is located here: http://www.cse.ucsd.edu/~ctrezzo/gsocapplication.html

Any comments/suggestions would be highly appreciated.

just throwing out an idea...

but would it be possible/beneficial to wrap Map/Reduce resources using
SCAs the other way round as well?

for example, take an abstract service which performs some possibly
intensive analytic computation. at smaller scales or development, the
analytic components might be assembled into a simple web service
running on a single container. at the the largest scales, the work may
need to be farmed out to one of a number of clouds.

perhaps an active management layer might be able to make decisions to
route the processing to different possibly hetrogeneous resources
based on data and meta-data (cloud load, time of day and so on) . for
example, during local night these computations might be directed to a
grid formed on general purpose PCs used during the day.

(who usually just lurks...)

- robert


Looks pretty good to me!

One comment: I think it would be good to introduce concrete use cases / scenarios to help drive the development of the project, and present them in a sentence or two in the proposal.

You could start with some the existing Hadoop examples implemented as SCA components, then a slightly bigger application showing the benefits of reusing and wiring components - as counting words in a big document is a little simplistic :) - and the integration of external data sources, or invocation of SCA services with the output of the map/reduce for example.

For item (3) you could start by looking at the SCA interface types. SCA components can use local interfaces or remote interfaces on their services and references. Remote interfaces can cross a network communication, local interfaces require the components to run in the same JVM, classloader etc. You could start with that and use that info to control where components run on the Hadoop cloud, components with local interfaces would be packaged together, components with remote interfaces could run on different nodes.

Then once you have that running you could explore SCA policies, other requirements of your components etc.

I like Robert's idea too, I can imagine a management layer that analyzes the SCA metadata, the shape and usage of the cloud and uses some rules to decide the allocation of components and jobs. Sounds really cool!

Hope this helps.
--
Jean-Sebastien

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to