Agreed. A first cut can be integrating Oozie and adding the gap functionalities. If this works fine, that would be fine ... AM sure we can also add to Ozzie and required functionalities .... If there are no volunteers, let me start digging through Ozzie and propose a framework through the Orchestration-Automation-Provisioning-Configuration layers ...
Cheers <k/> On 10/29/10 Fri Oct 29, 10, "Tom White (JIRA)" <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/WHIRR-119?page=com.atlassian.jira.plugin > .system.issuetabpanels:comment-tabpanel&focusedCommentId=12926400#action_12926 > 400 ] > > Tom White commented on WHIRR-119: > --------------------------------- > > I think Oozie (http://yahoo.github.com/oozie/) provides a lot of what you > describe. It would be great to have Whirr able to run an Oozie service that > integrates with the Hadoop service. > >> Job Submission and dynamic provisioning framework for Hadoop Clouds >> ------------------------------------------------------------------- >> >> Key: WHIRR-119 >> URL: https://issues.apache.org/jira/browse/WHIRR-119 >> Project: Whirr >> Issue Type: New Feature >> Components: core >> Affects Versions: 0.2.0 >> Reporter: Krishna Sankar >> >> A thin framework that can submit a MR job, run it and report results. Some >> thoughts: >> # Most probably it will be a server-side daemon >> # JSON over HTTP with REST semantics >> # Functions - top level preliminary >> ## Accept a job and it's components at a well known URL >> ## Parse & create MR workflow >> ## Create & store a job context - ID, security artifacts et al >> ## Return a status URL (can be used to query status or kill the job) This is >> the REST model >> ## Run the job (might include dynamic elastic cloud provisioning for example >> OpenStack) >> ## As the job runs, collect and store in the job context >> ## If client queries return status >> ## Once job is done, store status and return results (most probably pointers >> to files and so forth) >> ## Calculate & store performance metrics >> ## Calculate & store charge back in generic units (eg: >> CPU,Memory,Network,storage >> ## As and when the client asks, return job results >> # Some thoughts on implementation >> ## Store context et al in HBase >> ## A Clojure implementation ? >> ## Packaging like OVF ? (with embedded pointers to VM, data and so forth) >> ## For 1st release assume a homogeneous Hadoop infrastructure in a cloud >> ## Customer reporter/context counters? >> ## Distributed cache for framework artifacts and run time monitoring ? >> ## Most probably might have to use taskrunner ? >> ## Extend classes with submission framework setup and teardown code ?
