Hi Tom,

> I finally got around to getting my AutoDetectProductCrawler working. In 
> response, Chris I hope you don't mind I've given some feedback about my 
> experiences with the crawler on the wiki page that you created below. I hope 
> thats okay. Please feel free to modify/add/revert as you wish.

Awesome!

Thanks for the contribution, Tom. Wow you really rocked that page! Keep em' 
comin'!

Cheers,
Chris

> 
> Cheers,
> Tom
> 
> On 4 June 2011 07:40, Mattmann, Chris A (388J) 
> <[email protected]> wrote:
> Brian, I created a wiki page with your guidance below:
> 
> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
> 
> Others can feel free to jump on and contribute.
> 
> Cheers,
> Chris
> 
> On Jun 1, 2011, at 2:20 PM, holenoter wrote:
> 
> > hey thomas,
> >
> > you are using StdProductCrawler which assumes a *.met file already exist 
> > for each file (it has only one precondition which is the existing of the 
> > *.met file) . . . if you want a *.met file generated you will have to use 
> > one of the other 2 crawlers.  running: ./crawler_launcher -psc will give 
> > you a list of supported crawlers.  you can then run: ./crawler_launcher -h 
> > -cid <crawler_id> where crawler id is one of the ids from the previous 
> > command . . . unfortunately i don't think the other crawlers are documented 
> > all that extensively . . . MetExtractorProductCrawler will use a single 
> > extractor for all files . . . AutoDetectProductCrawler requires a mapping 
> > file to be filled out an mime-types defined
> >
> > * MetExtractorProductCrawler example configuration can be found in the 
> > source:
> >  - allows you to specify how the crawler will run your extractor
> > https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
> >
> > * AutoDetectProductCrawler example configuration can be found in the source:
> >  - uses the same metadata extractor specification file (you will have one 
> > of these for each mime-type)
> >  - allows you to define your mime-types -- that is, give a mime-type for a 
> > given filename regular expression
> > https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
> >
> >    - your file might look something like:
> >
> > <mime-info>
> >
> >
> >
> >       <mime-type type="product/hdf5">
> >
> >
> > <glob pattern="*.h5"/>
> >
> >
> > </mime-type>
> >
> >
> >       </mime-info>
> >  - maps your mime-types to extractors
> > https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
> >
> > Hope this helps . . .
> > -brian
> >
> > On Jun 01, 2011, at 12:54 PM, Thomas Bennett <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> I've successfully got the CmdLineIngester working with an 
> >> ExternMetExtractor (written in python):
> >>
> >> However, when I try launch the crawler I get a warning telling me the the 
> >> preconditions for ingest have not been met. No .met file has been created.
> >>
> >> Two questions:
> >> 1) I'm just wondering if there is any configuration that I'm missing.
> >> 2) Where should I start hunting in the code or logs to find out why my met 
> >> extractor was not run?
> >>
> >> Kind regards,
> >> Thomas
> >>
> >> For your reference, here is the command and output.
> >>
> >> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath 
> >> /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl 
> >> http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile 
> >> MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer 
> >> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory 
> >> --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor 
> >> --metExtractorConfig 
> >> /usr/local/meerkat/extractors/katextractor/katextractor.config
> >> http://localhost:9000
> >> StdProductCrawler
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
> >> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler handleFile
> >> INFO: Handling file 
> >> /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
> >> WARNING: Failed to pass preconditions for ingest of product: 
> >> [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
> >>
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [email protected]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 
> 
> -- 
> Thomas Bennett
> 
> SKA South Africa
> 
> Office :  +2721 506 7341
> Mobile : +2779 523 7105
> Email  :  [email protected]
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to