Hey Brian,

That sounds pretty reasonable. Thanks for your help on this!

rishi

On Nov 9, 2012, at 12:07 PM, Brian Foster wrote:

Hey Rishi,

The filemgr connection from the pushpull is just to verify if the filemgr 
already has a file, so the pushpull doesn't redownload files (no ingest 
support)... usually you configure your pushpull deamon to run at longer 
interval times, but the crawler usually will wake up more often (every 30 
seconds is a typical interval time for it)... so just have the pushpull 
download its files to a staging area which is the same directory which the 
crawler is monitoring.

-brian

On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" 
<[email protected]<mailto:[email protected]>> wrote:

Hey Brian, Shreyl,

Thanks for your input and clarification on this.

Brian - the delegation of duties you described makes sense. Does cas-puspull 
have any way to invoke a local crawl process following completion of downloads? 
I know it has a filemgr hookup, but I wonder about whether a crawl process can 
be invoked following the completion of all file downloads via pushpull. The 
alternative way of doing this could, of course, be to schedule the crawler 
deamon to run well after the pushpull deamon finishes its work.

Thanks to both of you for your help!
rishi

On Nov 9, 2012, at 10:08 AM, Brian Foster wrote:


Hey Rishi,

You will need to use both cas-pushpull and cas-crawler to accomplish this...

cas-pushpull: Used to for downloading files from remote sites to you local 
systems... the .tmp files contain cas-pushpull's known metadata and you can 
configure which of the known metadata gets written out or if a .tmp file gets 
created at all... however you can add custom metadata fields to it.

cas-crawler: Allows for metadata extraction (custom metadata) from files on 
your local system... and then allows you to ingest them into the filemgr 
(optionally can be turned off)

HTH
-brian

On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" 
<[email protected]<mailto:[email protected]>> wrote:

Hi All -

I'm wondering if anyone has experience with, or knows the details of how to use 
custom MetExtractors on products that are remotely downloaded via PushPull.

By default, PushPull performs some basic met-extraction and creates a ".tmp" 
file associated with downloaded products, but I'm wondering whether this met 
generation step is customizable.

I've looked through the configuration files (e.g. [1], [2]) as well as the code 
for PushPull, but I can't seem to locate configuration parameters to support 
the invocation of custom met extractors on downloaded data.

If any of you have experience with this, or can point me on where to look, I'd 
really appreciate it.

Thanks!
Rishi

--
[1] 
http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties

[2] 
http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/


Reply via email to