Hey Rishi, There was no hookup directly between Push Pull and crawler for the reason of keeping them to be independent components -- so the best way to do it would be to maintain crawler separately as a daemon, and then to use File Guard Met Extraction Pre Conditions to prevent ingesting any files that don't match a particular pattern and/or that haven't been fully downloaded by Push Pull.
Cheers, Chris On Nov 10, 2012, at 8:06 AM, Verma, Rishi (388J) wrote: > Hey Brian, Shreyl, > > Thanks for your input and clarification on this. > > Brian - the delegation of duties you described makes sense. Does cas-puspull > have any way to invoke a local crawl process following completion of > downloads? I know it has a filemgr hookup, but I wonder about whether a crawl > process can be invoked following the completion of all file downloads via > pushpull. The alternative way of doing this could, of course, be to schedule > the crawler deamon to run well after the pushpull deamon finishes its work. > > Thanks to both of you for your help! > rishi > > On Nov 9, 2012, at 10:08 AM, Brian Foster wrote: > >> >> Hey Rishi, >> >> You will need to use both cas-pushpull and cas-crawler to accomplish this... >> >> cas-pushpull: Used to for downloading files from remote sites to you local >> systems... the .tmp files contain cas-pushpull's known metadata and you can >> configure which of the known metadata gets written out or if a .tmp file >> gets created at all... however you can add custom metadata fields to it. >> >> cas-crawler: Allows for metadata extraction (custom metadata) from files on >> your local system... and then allows you to ingest them into the filemgr >> (optionally can be turned off) >> >> HTH >> -brian >> >> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" >> <[email protected]> wrote: >> >>> Hi All - >>> >>> I'm wondering if anyone has experience with, or knows the details of how to >>> use custom MetExtractors on products that are remotely downloaded via >>> PushPull. >>> >>> By default, PushPull performs some basic met-extraction and creates a >>> ".tmp" file associated with downloaded products, but I'm wondering whether >>> this met generation step is customizable. >>> >>> I've looked through the configuration files (e.g. [1], [2]) as well as the >>> code for PushPull, but I can't seem to locate configuration parameters to >>> support the invocation of custom met extractors on downloaded data. >>> >>> If any of you have experience with this, or can point me on where to look, >>> I'd really appreciate it. >>> >>> Thanks! >>> Rishi >>> >>> -- >>> [1] >>> http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties >>> >>> [2] >>> http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/ >
