Hi Chris, Thank you very much for your answer. And thank you for the link to your thesis.
Now I see things much clearer. I'm sure I'll come with more questions, as I progress with the assembly of the system. Thanks, Ivan On 21.01.2011, at 04:25, Mattmann, Chris A (388J) wrote: > Hi Ivan, > > Thanks for your email! Comments inline below: > >> I'm currently working on my PhD project, where I'm building a distributed >> archiving solution. > > Strangely familiar :) > > I was doing the same thing in the context of OODT from 2003-2007, see here > for the culmination: > > http://sunset.usc.edu/~mattmann/Dissertation.pdf > >> >> Basically the distributed archive will consist of a number of nodes (every >> node belonging to another organization), where every node will be storing >> his data on a local node and replicas on a number of selected remote nodes. > > Gotcha. > >> >> There will be a number of predefined processes (eg., integrity checking, >> creating additional replicas, etc.) that will run either periodically or >> when some event occurs (node lost event, corrupted object event, etc.). The >> data that the system will archive will consist of RDF/XML files (metadata) + >> binary files (e.g., tiff images, jpeg images, etc.; referenced from the >> RDF). The RDF/XML files together with the binary files will be the products >> (in OODT language). > > Okey dokey. > >> >> I'm looking into OODT to see if it can be used to create such a system and >> what components I would be using. >> >> In the following is a list of components that I have identified that I could >> use: >> - CAS Workflow (to implement the processes) >> - CAS Push/Pull Component (to send products to remote nodes, to get products >> from remote nodes). With what is the push/pull component communication on >> the other side? > > The Pull communication in PushPull is the set of protocols like FTP, SCP, > HTTP, etc. The Push Part is its ability to accept emails over IMAPS "pushed" > to a mailbox, and then to take the URLs from those emails and go resolve them > using the pull protocols. So, it's really simulated Push at this point, but > it works well with systems that deliver emails (like NOAA, NASA, etc.) to > indicate a file is ready to be pushed. > >> The push/pull component? From where is the push/pull component getting the >> data that it will send? From the file manager? > > Push Pull acquires remote content and then hands off to a staging area that > the crawler component picks up and reads from. crawler only handles local > data (intentionally -- the complexity of acquiring remote content was large > enough to warrant the creation of its own component). crawler takes the now > local content (and any other content dropped in the shared staging area) and > then ingests it into the file manager, sending metadata + references to it. > >> >> What I'm missing, but should be there somewhere: >> - Security Component. How do I create Virtual Organizations and manage user >> and groups, so that I can restrict access? > > There is an sso component that is pretty light-weight at this point it > implements connections to LDAP to do single sign on. At one point I did a > restful-implementation of the SSO interface that connected to Java's Open > SSO. Totally cleanroom using web services and protocols to connect to an > OpenSSO service. I'll create a JIRA for this and attach in the next few days. > >> >> Probably also needed: >> - File Manager. In my case I would have the products (rdf + binary files) >> and would need to create the profiles on the fly with some basic >> information. Do I need the file manager for something other than for the end >> user to access products and profiles? > > Yep you sure do. You'll need file manager, along with the cas-product webapp > that lives in webapp/fmprod. > > >> Since I'm going to load up the RDF files in a triple store for further use, >> is it possible to extend the file manager so that the profile catalog is >> stored in a triple store? > > Sure you could do a catalog implementation that stores the metadata to a > triple store. Alternatively you could use the fmprod webapp to deliver RDF > views of the metadata that's stored per product, and configure it using the > rdfconf.xml file that's part of fmprod. > > Thanks! > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >
