Re: OODT for a Distributed Archiving Project

Ivan Subotic Fri, 21 Jan 2011 01:46:28 -0800

Hi Chris,

Thank you very much for your answer. And thank you for the link to your thesis.


Now I see things much clearer. I'm sure I'll come with more questions, as I 
progress with the assembly of the system.

Thanks,
Ivan


 
On 21.01.2011, at 04:25, Mattmann, Chris A (388J) wrote:

> Hi Ivan,
> 
> Thanks for your email! Comments inline below:
> 
>> I'm currently working on my PhD project, where I'm building a distributed 
>> archiving solution.
> 
> Strangely familiar :)
> 
> I was doing the same thing in the context of OODT from 2003-2007, see here 
> for the culmination:
> 
> http://sunset.usc.edu/~mattmann/Dissertation.pdf
> 
>> 
>> Basically the distributed archive will consist of a number of nodes (every 
>> node belonging to another organization), where every node will be storing 
>> his data on a local node and replicas on a number of selected remote nodes.
> 
> Gotcha.
> 
>> 
>> There will be a number of predefined processes (eg., integrity checking, 
>> creating additional replicas, etc.) that will run either periodically or 
>> when some event occurs (node lost event, corrupted object event, etc.). The 
>> data that the system will archive will consist of RDF/XML files (metadata) + 
>> binary files (e.g., tiff images, jpeg images, etc.; referenced from the 
>> RDF). The RDF/XML files together with the binary files will be the products 
>> (in OODT language).
> 
> Okey dokey.
> 
>> 
>> I'm looking into OODT to see if it can be used to create such a system and 
>> what components I would be using.
>> 
>> In the following is a list of components that I have identified that I could 
>> use:
>> - CAS Workflow (to implement the processes)
>> - CAS Push/Pull Component (to send products to remote nodes, to get products 
>> from remote nodes). With what is the push/pull component communication on 
>> the other side?
> 
> The Pull communication in PushPull is the set of protocols like FTP, SCP, 
> HTTP, etc. The Push Part is its ability to accept emails over IMAPS "pushed" 
> to a mailbox, and then to take the URLs from those emails and go resolve them 
> using the pull protocols. So, it's really simulated Push at this point, but 
> it works well with systems that deliver emails (like NOAA, NASA, etc.) to 
> indicate a file is ready to be pushed.
> 
>> The push/pull component? From where is the push/pull component getting the 
>> data that it will send? From the file manager?
> 
> Push Pull acquires remote content and then hands off to a staging area that 
> the crawler component picks up and reads from. crawler only handles local 
> data (intentionally -- the complexity of acquiring remote content was large 
> enough to warrant the creation of its own component). crawler takes the now 
> local content (and any other content dropped in the shared staging area) and 
> then ingests it into the file manager, sending metadata + references to it.
> 
>> 
>> What I'm missing, but should be there somewhere:
>> - Security Component. How do I create Virtual Organizations and manage user 
>> and groups, so that I can restrict access?
> 
> There is an sso component that is pretty light-weight at this point it 
> implements connections to LDAP to do single sign on. At one point I did a 
> restful-implementation of the SSO interface that connected to Java's Open 
> SSO. Totally cleanroom using web services and protocols to connect to an 
> OpenSSO service. I'll create a JIRA for this and attach in the next few days.
> 
>> 
>> Probably also needed:
>> - File Manager. In my case I would have the products (rdf + binary files) 
>> and would need to create the profiles on the fly with some basic 
>> information. Do I need the file manager for something other than for the end 
>> user to access products and profiles?
> 
> Yep you sure do. You'll need file manager, along with the cas-product webapp 
> that lives in webapp/fmprod.
> 
> 
>> Since I'm going to load up the RDF files in a triple store for further use, 
>> is it possible to extend the file manager so that the profile catalog is 
>> stored in a triple store?
> 
> Sure you could do a catalog implementation that stores the metadata to a 
> triple store. Alternatively you could use the fmprod webapp to deliver RDF 
> views of the metadata that's stored per product, and configure it using the 
> rdfconf.xml file that's part of fmprod.
> 
> Thanks!
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [email protected]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>

Re: OODT for a Distributed Archiving Project

Reply via email to