Great reply Cam
-----Original Message----- From: Cameron Goodale <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Wednesday, February 26, 2014 10:32 PM To: "[email protected]" <[email protected]> Subject: Re: Running operations over data >Hey Tom, > > >TLDR - Crawler ships with some actions, but you can write your own >actions, and those actions can be wired into PreIngestion or >PostIngestion. FileManager has MetExtractors that run before ingestion, >they traditionally are meant to extract metadata (as > the name implies) but you could just as easily have it run a checksum >and store it in metadata, or convert an incoming file into PDF, then >ingest the PDF. > > > > >On the Snow Data System here at JPL we have a lights out operation that >might be of interest, so I will try to explain it below. > > >1. Every hour OODT PushPull wakes up and tries to download new data from >a Near Real Time Satellite Imagery service via FTP >(http://lance-modis.eosdis.nasa.gov/) >2. Every 20 minutes OODT Crawler wakes up and crawls a local file >staging area where PushPull downloads Satellite Images >3. When the crawler encounters files that have been downloaded and are >ready for ingestion then things get interesting. During the crawl >several pre-conditions need to be met (the file cannot already be in the >catalog - guarding against duplicates, the > file has to be of the correct mime-type, etc..) >4. If preconditions pass then Crawler will ingest the file(s) into OODT >FileManager, but things don't stop here. >5. Crawler has a post-ingest success hook that we leverage and we use >the "TriggerPostIngestWorkflow" action which automatically submits an >event to workflow >6. OODT Workflow Manager receives the event (in this example it would be >"MOD09GANRTIngest") and it boils that down into tasks that get run. >7. Workflow Manager then sends these tasks to the OODT Resource Manager >who farms the jobs off to Batchstubs that are running across 4 different >machines. >8. When the jobs complete, crawler will ingest the final outputs back >into the FileManager. > > >Hope that helps. > > >Best Regards, > > > > >Cameron > > > >On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber ><[email protected]> wrote: > >Hello folks, > >Preparing for this talk, so I figure I should probably work out how OODT >works..... ;) > >Anyway I have some ideas as how to integrate some more non science like >tools into OODT but I'm still figuring out some of the components. >Namely, workflows. > > >If for example, in OODT world I wanted to ingest a bunch of data and >perform some operation on them, does this happen during the ingest phase, >or post ingest? > >Normally you guys would write some crazy scientific stuff I guess to >analyse the data you're ingesting and then dump it in some different >format into the catalog, does that sound about right? > >Thanks > >Tom >-- >Tom Barber | Technical Director > >meteorite bi >T: >+44 20 8133 3730 <tel:%2B44%2020%208133%203730> >W: www.meteorite.bi <http://www.meteorite.bi> | >Skype: meteorite.consulting >A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK > > > > > > > > >-- > >Sent from a Tin Can attached to a String > >
