Hey Tom, TLDR - Crawler ships with some actions, but you can write your own actions, and those actions can be wired into PreIngestion or PostIngestion. FileManager has MetExtractors that run before ingestion, they traditionally are meant to extract metadata (as the name implies) but you could just as easily have it run a checksum and store it in metadata, or convert an incoming file into PDF, then ingest the PDF.
On the Snow Data System here at JPL we have a lights out operation that might be of interest, so I will try to explain it below. 1. Every hour OODT PushPull wakes up and tries to download new data from a Near Real Time Satellite Imagery service via FTP ( http://lance-modis.eosdis.nasa.gov/) 2. Every 20 minutes OODT Crawler wakes up and crawls a local file staging area where PushPull downloads Satellite Images 3. When the crawler encounters files that have been downloaded and are ready for ingestion then things get interesting. During the crawl several pre-conditions need to be met (the file cannot already be in the catalog - guarding against duplicates, the file has to be of the correct mime-type, etc..) 4. If preconditions pass then Crawler will ingest the file(s) into OODT FileManager, but things don't stop here. 5. Crawler has a post-ingest success hook that we leverage and we use the "TriggerPostIngestWorkflow" action which automatically submits an event to workflow 6. OODT Workflow Manager receives the event (in this example it would be "MOD09GANRTIngest") and it boils that down into tasks that get run. 7. Workflow Manager then sends these tasks to the OODT Resource Manager who farms the jobs off to Batchstubs that are running across 4 different machines. 8. When the jobs complete, crawler will ingest the final outputs back into the FileManager. Hope that helps. Best Regards, Cameron On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber <[email protected]> wrote: > Hello folks, > > Preparing for this talk, so I figure I should probably work out how OODT > works..... ;) > > Anyway I have some ideas as how to integrate some more non science like > tools into OODT but I'm still figuring out some of the components. Namely, > workflows. > > If for example, in OODT world I wanted to ingest a bunch of data and > perform some operation on them, does this happen during the ingest phase, > or post ingest? > > Normally you guys would write some crazy scientific stuff I guess to > analyse the data you're ingesting and then dump it in some different format > into the catalog, does that sound about right? > > Thanks > > Tom > -- > *Tom Barber* | Technical Director > > meteorite bi > *T:* +44 20 8133 3730 > *W:* www.meteorite.bi | *Skype:* meteorite.consulting > *A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, > UK > -- Sent from a Tin Can attached to a String
