Hey Jean-Sebastien,
Thanks a lot for the input! I will update my proposal, and submit it
to the GSOC web app.
I am about to start the drive from the bay area down to San Diego, so
I will not have access to the internet again until late tonight.
The management layer sounds like a great idea. Definitely something I
would like to work on in the future. I will investigate both approaches.
I will also check out skeleton and ORC.
Thanks again,
Chris
On Mar 30, 2008, at 4:12 PM, Jean-Sebastien Delfino wrote:
Chris Trezzo wrote:
This is a great idea. I think, if I understand you correctly,
adding a fourth implementation type is intended to address this.
The extra type is going to act something like an orchestrator,
trying to intelligently manage the Map, Combine, and Reduce
functions over distributed computing facilities and heterogeneous
data sources. Like you said, this component could shape the
deployment of computations based on things like cloud load, time of
day, locality of resources and so on.
I should probably make this more clear in my proposal.
Thank you for the comment/suggestion!
Chris
On Mar 30, 2008, at 12:50 AM, Robert Burrell Donkin wrote:
On Sun, Mar 30, 2008 at 5:06 AM, Chris Trezzo <[EMAIL PROTECTED]>
wrote:
Hello everyone,
I have posted a rough draft proposal for the project entitled
"Simplify the development of Map/Reduce applications and their
integration with various sources of information."
The draft is located here: http://www.cse.ucsd.edu/~ctrezzo/gsocapplication.html
Any comments/suggestions would be highly appreciated.
just throwing out an idea...
but would it be possible/beneficial to wrap Map/Reduce resources
using
SCAs the other way round as well?
for example, take an abstract service which performs some possibly
intensive analytic computation. at smaller scales or development,
the
analytic components might be assembled into a simple web service
running on a single container. at the the largest scales, the work
may
need to be farmed out to one of a number of clouds.
perhaps an active management layer might be able to make decisions
to
route the processing to different possibly hetrogeneous resources
based on data and meta-data (cloud load, time of day and so on) .
for
example, during local night these computations might be directed
to a
grid formed on general purpose PCs used during the day.
(who usually just lurks...)
- robert
Looks pretty good to me!
One comment: I think it would be good to introduce concrete use
cases / scenarios to help drive the development of the project, and
present them in a sentence or two in the proposal.
You could start with some the existing Hadoop examples implemented
as SCA components, then a slightly bigger application showing the
benefits of reusing and wiring components - as counting words in a
big document is a little simplistic :) - and the integration of
external data sources, or invocation of SCA services with the output
of the map/reduce for example.
For item (3) you could start by looking at the SCA interface types.
SCA components can use local interfaces or remote interfaces on
their services and references. Remote interfaces can cross a network
communication, local interfaces require the components to run in the
same JVM, classloader etc. You could start with that and use that
info to control where components run on the Hadoop cloud, components
with local interfaces would be packaged together, components with
remote interfaces could run on different nodes.
Then once you have that running you could explore SCA policies,
other requirements of your components etc.
I like Robert's idea too, I can imagine a management layer that
analyzes the SCA metadata, the shape and usage of the cloud and uses
some rules to decide the allocation of components and jobs. Sounds
really cool!
Hope this helps.
--
Jean-Sebastien
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]