+1... Cheers, Chris
On Nov 10, 2012, at 9:07 AM, Brian Foster wrote: > Hey Rishi, > > The filemgr connection from the pushpull is just to verify if the filemgr > already has a file, so the pushpull doesn't redownload files (no ingest > support)... usually you configure your pushpull deamon to run at longer > interval times, but the crawler usually will wake up more often (every 30 > seconds is a typical interval time for it)... so just have the pushpull > download its files to a staging area which is the same directory which the > crawler is monitoring. > > -brian > > On Nov 09, 2012, at 11:06 AM, "Verma, Rishi (388J)" > <[email protected]> wrote: > >> Hey Brian, Shreyl, >> >> Thanks for your input and clarification on this. >> >> Brian - the delegation of duties you described makes sense. Does cas-puspull >> have any way to invoke a local crawl process following completion of >> downloads? I know it has a filemgr hookup, but I wonder about whether a >> crawl process can be invoked following the completion of all file downloads >> via pushpull. The alternative way of doing this could, of course, be to >> schedule the crawler deamon to run well after the pushpull deamon finishes >> its work. >> >> Thanks to both of you for your help! >> rishi >> >> On Nov 9, 2012, at 10:08 AM, Brian Foster wrote: >> >>> >>> Hey Rishi, >>> >>> You will need to use both cas-pushpull and cas-crawler to accomplish this... >>> >>> cas-pushpull: Used to for downloading files from remote sites to you local >>> systems... the .tmp files contain cas-pushpull's known metadata and you can >>> configure which of the known metadata gets written out or if a .tmp file >>> gets created at all... however you can add custom metadata fields to it. >>> >>> cas-crawler: Allows for metadata extraction (custom metadata) from files on >>> your local system... and then allows you to ingest them into the filemgr >>> (optionally can be turned off) >>> >>> HTH >>> -brian >>> >>> On Nov 08, 2012, at 06:11 PM, "Verma, Rishi (388J)" >>> <[email protected]> wrote: >>> >>>> Hi All - >>>> >>>> I'm wondering if anyone has experience with, or knows the details of how >>>> to use custom MetExtractors on products that are remotely downloaded via >>>> PushPull. >>>> >>>> By default, PushPull performs some basic met-extraction and creates a >>>> ".tmp" file associated with downloaded products, but I'm wondering whether >>>> this met generation step is customizable. >>>> >>>> I've looked through the configuration files (e.g. [1], [2]) as well as the >>>> code for PushPull, but I can't seem to locate configuration parameters to >>>> support the invocation of custom met extractors on downloaded data. >>>> >>>> If any of you have experience with this, or can point me on where to look, >>>> I'd really appreciate it. >>>> >>>> Thanks! >>>> Rishi >>>> >>>> -- >>>> [1] >>>> http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push_pull_framework.properties >>>> >>>> [2] >>>> http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/examples/ >>
