Chris Trezzo wrote:
This is a great idea. I think, if I understand you correctly, adding a
fourth implementation type is intended to address this. The extra type
is going to act something like an orchestrator, trying to intelligently
manage the Map, Combine, and Reduce functions over distributed computing
facilities and heterogeneous data sources. Like you said, this component
could shape the deployment of computations based on things like cloud
load, time of day, locality of resources and so on.
I should probably make this more clear in my proposal.
Thank you for the comment/suggestion!
Chris
On Mar 30, 2008, at 12:50 AM, Robert Burrell Donkin wrote:
On Sun, Mar 30, 2008 at 5:06 AM, Chris Trezzo <[EMAIL PROTECTED]> wrote:
Hello everyone,
I have posted a rough draft proposal for the project entitled
"Simplify the development of Map/Reduce applications and their
integration with various sources of information."
The draft is located here:
http://www.cse.ucsd.edu/~ctrezzo/gsocapplication.html
Any comments/suggestions would be highly appreciated.
just throwing out an idea...
but would it be possible/beneficial to wrap Map/Reduce resources using
SCAs the other way round as well?
for example, take an abstract service which performs some possibly
intensive analytic computation. at smaller scales or development, the
analytic components might be assembled into a simple web service
running on a single container. at the the largest scales, the work may
need to be farmed out to one of a number of clouds.
perhaps an active management layer might be able to make decisions to
route the processing to different possibly hetrogeneous resources
based on data and meta-data (cloud load, time of day and so on) . for
example, during local night these computations might be directed to a
grid formed on general purpose PCs used during the day.
(who usually just lurks...)
- robert
Looks pretty good to me!
One comment: I think it would be good to introduce concrete use cases /
scenarios to help drive the development of the project, and present them
in a sentence or two in the proposal.
You could start with some the existing Hadoop examples implemented as
SCA components, then a slightly bigger application showing the benefits
of reusing and wiring components - as counting words in a big document
is a little simplistic :) - and the integration of external data
sources, or invocation of SCA services with the output of the map/reduce
for example.
For item (3) you could start by looking at the SCA interface types. SCA
components can use local interfaces or remote interfaces on their
services and references. Remote interfaces can cross a network
communication, local interfaces require the components to run in the
same JVM, classloader etc. You could start with that and use that info
to control where components run on the Hadoop cloud, components with
local interfaces would be packaged together, components with remote
interfaces could run on different nodes.
Then once you have that running you could explore SCA policies, other
requirements of your components etc.
I like Robert's idea too, I can imagine a management layer that analyzes
the SCA metadata, the shape and usage of the cloud and uses some rules
to decide the allocation of components and jobs. Sounds really cool!
Hope this helps.
--
Jean-Sebastien
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]