Re: Data transfer questions

Thomas Bennett Tue, 20 Mar 2012 02:02:04 -0700

Thanks Chris - wiki page on its way :)

On 19 March 2012 22:52, Mattmann, Chris A (388J) <
[email protected]> wrote:


> Hey Tom,
>
> AWESOME. I smell Wiki page :)
>
> Read on below:
>
> On Mar 19, 2012, at 8:18 PM, Thomas Bennett wrote:
>
> >
> > Versioner schemes
> >
> > The Data Transferers have an acute coupling with the Versioner scheme,
> case in point: if you are doing InPlaceTransfer,
> > you need a versioner that will handle file paths that don't change from
> src to dest.
> >
> > The Versioner is used to describe who a target directory is created for
> a file to archive. I.e a directory structure where the data will be place.
> So if I have an archive root at /var/kat/archive/data/ and I use a basic
> versioner it will archive a file called 1234567890.h5 at
> /var/kat/archive/data/1234567890.h5/1234567890.h5. So this would describe
> the destination for a local data transfer.
> >
> > I have the following versioner set in my policy/product-types.xml.
> >
> > policy/product-types.xml
> > <versioner
> class="org.apache.oodt.cas.filemgr.versioning.BasicVersioner"/>
>
> Ah, gotcha. You may consider using the MetadataBasedFileVersioner. It lets
> you define a filePathSpec,
> e.g., /[PrincipalInvestigator]/[Project]/[AcquisitionDate]/[Filename]
>
> And then versions or "places" the resulting product files in that
> specification structure.
>
> To create the above, you would simply subclass the Versioner like so:
>
> public KATVersioner extends MetadataBasedFileVersioner{
>   String filePathSpec =
> "/[PrincipalInvestigator]/[Project]/[AcquisitionDate]/[Filename]";
>
>   public KATVersioner(){
>     setFilePathSpec(filePathSpec);
>   }
> }
>
> You can even refer to keys that don't exist yet, and then dynamically
> generate them (and their
> values) by overriding the createDatStoreReferences method:
>
> @Override
>  public void createDataStoreReferences(Product product, Metadata met){
>     // do work to generate AcquisitionDate here
>     met.replaceMetadata("AcquisitionDate", acqdate);
>     super.createDataStoreReferences(product, met);
>   }
>
>
> >
> > Just out of curiosity... why is this called a versioner?
>
> Hehe, if it's weird in OODT, it most likely resulted from me :) I
> originally saw
> this as a great tool to "version" or allow for multiple copies of a file
> on disk, e.g., with different
> file (or directory-based) metadata to delineate the versioners. Over time
> it really grew to be a
> "URIGenerationScheme" or "ArchivePathGenerator". Those would be better
> names, but Versioner
> stuck, so here we are :)
>
> >
> > Using the File Manager as the client
> >
> > Configuring a data trransfer in filemgr.properties, and then not using
> the crawler directly, but e.g., using the XmlRpcFileManagerClient,directly,
> > you can tell the server (on the ingest(...) method) to handle all the
> file transfers for you. In that case, the server needs a
> > Data Transferer configured, and the above properties apply, with the
> caveat that the FM server is now the "client" that is transferring
> > the data to itself :)
> >
> > If I set the following property in the etc/filemgr.property file
> >
> >
> filemgr.datatransfer.factory=org.apache.oodt.cas.filemgr.datatransfer.RemoteDataTransfer
> >
> > I did a quick try of this today, trying an ingest on my localhost, (to
> avoid any sticky network issues) and I was able to perform an ingest.
> >
> > I see you can specify the data transfer factory to use, so I assume then
> that the filemgr.datatransfer.factory setting is just the default if none
> is specified on the command line. Is this true?
>
> It's true, if you are doing server-based transfers (by calling the
> filemgr-client --ingestProduct method directory, without specifying the
> data transfer factory on the command line,
> yep).
>
> >
> > I ran a version of the command line client (my own version of
> filemgr-client with abs paths to the configuration files):
> >
> > cas-filemgr-client.sh --url http://localhost:9101 --operation
> --ingestProduct --refs /Users/thomas/1331871808.h5 --productStructure Flat
> --productTypeName KatFile --metadataFil/Users/thomas/1331871808.h5.met
> --productName 1331871808.h5 --clientTransfer --dataTransfer
> org.apache.oodt.cas.filemgr.datatransfer.RemoteDataTransferFactory
> >
> > With the data factory also type spec'ed as:
> >
> > etc/filemgr.properties
> >
> filemgr.datatransfer.factory=org.apache.oodt.cas.filemgr.datatransfer.RemoteDataTransferFactory
> >
> > And the versioner set as:
> >
> > policy/product-types.xml
> > <versioner
> class="org.apache.oodt.cas.filemgr.versioning.BasicVersioner"/>
> >
> > And it ingested the file. +1 for OODT!
>
> WOOT!
>
> >
> > Local and remote transfers to the same filemgr
> >
> > One way to do this is to write a Facade java class, e.g.,
> MultiTransferer, that can e.g., on a per-product type basis,
> > decide whether to call and delegate to LocalDataTransfer or
> RemoteDataTransfer. If wrote in a configurable way, that would be
> > an awesome addition to the OODT code base. We could call it
> ProductTypeDelegatingDataTransfer.
> >
> > I'm thinking I would prefer to have some crawlers specifying how file
> should be transferred. Is there any particular reason why this would not be
> a good idea - as long as the client specifies the transfer method to use?
>
> Yeah this is totally acceptable -- you can simply tell the crawler which
> TransferFactory to use. If you wanted the crawlers to sense it
> automatically based on Product Type (which also has to be provided), then
> you could use a method similar to the above.
>
> >
> > Getting the product to a second archive
> >
> > One way to do it is to simply stand up a file manager at the remote site
> and catalog, and then do remote data transfer (and met transfer) to take
> care of that.
> > Then as long as your XML-RPC ports are open both the data and metadata
> can be backed up by simply doing the same ingestion mechanisms. You could
> > wire that up as a Workflow task to run periodically, or as part of your
> std ingest pipeline (e.g., a Crawler action that on postIngestSuccess backs
> up to the remote
> > site by ingesting into the remote backup file manager).
> >
> > Okay. Got it! I'll see if I can wire up both options!
>
> AWESOME.
>
> >
> > I'd be happy to help you down either path.
> >
> > Thanks! Much appreciated.
> >
> > > I was thinking, perhaps using the functionality described in OODT-84
> (Ability for File Manager to stage an ingested Product to one of its
> clients) and then have a second crawler on the backup archive which will
> then update it's own catalogue.
> >
> > +1, that would work too!
> >
> > Once again, thanks for the input and advice - always informative ;)
>
> Haha anytime dude. Great work!
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [email protected]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Re: Data transfer questions

Reply via email to