Re: Running operations over data

Tom Barber Tue, 04 Mar 2014 11:52:51 -0800

Okay,

I have a follow up question :)


So you run all the steps set out below.

What then? How do people get access to the data?

I've seen a bunch of screenshots of different frontends that run overthe filemanager and allow people to export the files that have beeningested. Is that the "normal" way of giving people access to the dataor have users come up with more novel ways of getting hands on the data?


Cheers

Tom

On 02/03/14 07:18, Chris Mattmann wrote:

Great reply Cam



-----Original Message-----
From: Cameron Goodale <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, February 26, 2014 10:32 PM
To: "[email protected]" <[email protected]>
Subject: Re: Running operations over data

Hey Tom,


TLDR - Crawler ships with some actions, but you can write your own
actions, and those actions can be wired into PreIngestion or
PostIngestion.  FileManager has MetExtractors that run before ingestion,
they traditionally are meant to extract metadata (as
the name implies) but you could just as easily have it run a checksum
and store it in metadata, or convert an incoming file into PDF, then
ingest the PDF.




On the Snow Data System here at JPL we have a lights out operation that
might be of interest, so I will try to explain it below.


1.  Every hour OODT PushPull wakes up and tries to download new data from
a Near Real Time Satellite Imagery service via FTP
(http://lance-modis.eosdis.nasa.gov/)
2.  Every 20 minutes OODT Crawler wakes up and crawls a local file
staging area where PushPull downloads Satellite Images
3.  When the crawler encounters files that have been downloaded and are
ready for ingestion then things get interesting.  During the crawl
several pre-conditions need to be met (the file cannot already be in the
catalog - guarding against duplicates, the
file has to be of the correct mime-type, etc..)
4.  If preconditions pass then Crawler will ingest the file(s) into OODT
FileManager, but things don't stop here.
5.  Crawler has a post-ingest success hook that we leverage and we use
the "TriggerPostIngestWorkflow" action which automatically submits an
event to workflow
6.  OODT Workflow Manager receives the event (in this example it would be
"MOD09GANRTIngest") and it boils that down into tasks that get run.
7.  Workflow Manager then sends these tasks to the OODT Resource Manager
who farms the jobs off to Batchstubs that are running across 4 different
machines.
8.  When the jobs complete, crawler will ingest the final outputs back
into the FileManager.


Hope that helps.


Best Regards,




Cameron



On Tue, Feb 25, 2014 at 1:47 PM, Tom Barber
<[email protected]> wrote:

Hello folks,

Preparing for this talk, so I figure I should probably work out how OODT
works..... ;)

Anyway I have some ideas as how to integrate some more non science like
tools into OODT but I'm still figuring out some of the components.
Namely, workflows.


If for example, in OODT world I wanted to ingest a bunch of data and
perform some operation on them, does this happen during the ingest phase,
or post ingest?

Normally you guys would write some crazy scientific stuff I guess to
analyse the data you're ingesting and then dump it in some different
format into the catalog, does that sound about right?

Thanks

Tom
--
Tom Barber | Technical Director

meteorite bi
T:
+44 20 8133 3730 <tel:%2B44%2020%208133%203730>
W: www.meteorite.bi <http://www.meteorite.bi> |
Skype: meteorite.consulting
A: Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK








--

Sent from a Tin Can attached to a String



--
*Tom Barber* | Technical Director

meteorite bi
*T:* +44 20 8133 3730
*W:* www.meteorite.bi | *Skype:* meteorite.consulting
*A:* Surrey Technology Centre, Surrey Research Park, Guildford, GU2 7YG, UK

Re: Running operations over data

Reply via email to