Same issue found here: http://mail-archives.apache.org/mod_mbox/oodt-user/201103.mbox/%[email protected]%3e
> o First question: why is the versioner run twice ? FinalFileLocationExtractor runs the versioner during "met extraction" phase of ingestion, but doesn't persist the datastore references. The second run is the actual "versioning" phase with datastore persistence. > It seems like the first time it is run, > it has access to all the metadata that has been previously extracted by the > NetCDFMetExtractor, > but the second time it doesn't ? Exactly what I'm seeing. This explains why FileLocation met is persisted (on 1st run), but datastore reference is incorrect (on 2nd run). I guess Luca ended up using client-side (crawler) met extractors in order to fill met elements prior to ingestion. in short: (1) client-side metExtractor + versioner = all client-extracted met is available to the versioner (2) server-side metExtractor + versioner = server-extracted met is NOT available to the versioner (unless, as Chris suggested, versioner re-runs server-side metExtractor) Is (2) expected behavior? -Ricky On Nov 2, 2011, at 10:16 PM, Mattmann, Chris A (388J) wrote: > Hi Ricky, > > You're running into the issue of where/when Versioning is done. > > Right now you are using a server-side met extractor -- that metadata is > extracted on the server side, cataloged, > but is _not_ passed back to the client, for use in client-side versioning > (which I'm guessing you're using). > > One way around this is to take an approach similar to the > FinalFileLocationExtractor -- that is: make your > versioner run the server side met extractor as part of its versioning process > to derive the same metadata > that you want used for versioning. Or, alternatively, bake in somehow (to the > metadata stream that you > use in read-only form in the versioner) the field that you are interested in > flowing through. > > HTH! > > Cheers, > Chris > > On Nov 2, 2011, at 4:20 PM, Nguyen, Ricky wrote: > >> Hi, >> >> My MetadataBasedFileVersioner can't see the met produced by my custom >> metExtractor >> >> I've read OODT-72. That issue describes using the Versioner's calculated >> Reference to assign Metadata (ver -> met). My issue is the opposite >> direction, using extracted Metadata in the Versioner's Reference calculation. >> >> For example, suppose my metExtractor assigns a value to the "MRN" element. >> Then I want my versioner to create a datastore reference at >> "/[MRN]/[Filename]". >> >> My product-types.xml (abbreviated): >> <type name="CustomProdType"/> >> <versioner class="CustomMetBasedFileVersioner"/> >> <extractor class="CoreMetExtractor"/> >> <extractor class="MimeTypeExtractor"/> >> <extractor class="MRNExtractor"/> >> <extractor class="FinalFileLocationExtractor"/> >> </type> >> >> After I ingest the file, I dump the met (using MetadataDumper) and the >> product (using ProductDumper). The met looks fine: >> <key>FileLocation</key> >> <val>%2FUsers%2Frnguyen%2Fvpicu%2Fdata%2Farchive%2FMRN_1010209</val> >> >> But the product reference doesn't: >> <reference dataStore="file:/Users/rnguyen/vpicu/data/archive/MRN_null/null" >> orig="file:///Users/rnguyen/vpicu/components/filemgr/policy/cerner/vps_demog.csv" >> size="1114427"/> >> >> Is this an issue? Or am I not using the components correctly? Is there a >> better way to achieve what I want? >> >> Thanks, >> Ricky >> >> >> --------------------------------------------------------------------- >> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, >> is for the sole use of the intended recipient(s) and may contain confidential >> or legally privileged information. Any unauthorized review, use, disclosure >> or distribution is prohibited. If you are not the intended recipient, please >> contact the sender by reply e-mail and destroy all copies of this original >> message. >> >> --------------------------------------------------------------------- >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > --------------------------------------------------------------------- CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential or legally privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of this original message. ---------------------------------------------------------------------
