Hey Rishi,

There was no hookup directly between Push Pull and crawler for the reason of 
keeping them to be
independent components -- so the best way to do it would be to maintain crawler 
separately as a daemon,
and then to use File Guard Met Extraction Pre Conditions to prevent ingesting 
any files that don't match a particular
pattern and/or that haven't been fully downloaded by Push Pull.

Cheers,
Chris

On Nov 10, 2012, at 8:06 AM, Verma, Rishi (388J) wrote:

> Hey Brian, Shreyl,
> 
> Thanks for your input and clarification on this.
> 
> Brian - the delegation of duties you described makes sense. Does cas-puspull 
> have any way to invoke a local crawl process following completion of 
> downloads? I know it has a filemgr hookup, but I wonder about whether a crawl 
> process can be invoked following the completion of all file downloads via 
> pushpull. The alternative way of doing this could, of course, be to schedule 
> the crawler deamon to run well after the pushpull deamon finishes its work.
> 
> Thanks to both of you for your help!
> rishi
> 
> On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:
> 
>> 
>> Hey Rishi,
>> 
>> You will need to use both cas-pushpull and cas-crawler to accomplish this...
>> 
>> cas-pushpull: Used to for downloading files from remote sites to you local 
>> systems... the .tmp files contain cas-pushpull's known metadata and you can 
>> configure which of the known metadata gets written out or if a .tmp file 
>> gets created at all... however you can add custom metadata fields to it.
>> 
>> cas-crawler: Allows for metadata extraction (custom metadata) from files on 
>> your local system... and then allows you to ingest them into the filemgr 
>> (optionally can be turned off)
>> 
>> HTH
>> -brian
>> 
>> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" 
>> <[email protected]> wrote:
>> 
>>> Hi All -
>>> 
>>> I'm wondering if anyone has experience with, or knows the details of how to 
>>> use custom MetExtractors on products that are remotely downloaded via 
>>> PushPull. 
>>> 
>>> By default, PushPull performs some basic met-extraction and creates a 
>>> ".tmp" file associated with downloaded products, but I'm wondering whether 
>>> this met generation step is customizable.
>>> 
>>> I've looked through the configuration files (e.g. [1], [2]) as well as the 
>>> code for PushPull, but I can't seem to locate configuration parameters to 
>>> support the invocation of custom met extractors on downloaded data.
>>> 
>>> If any of you have experience with this, or can point me on where to look, 
>>> I'd really appreciate it.
>>> 
>>> Thanks! 
>>> Rishi 
>>> 
>>> --
>>> [1] 
>>> http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties
>>>  
>>> [2] 
>>> http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/
> 

Reply via email to