Hello Stanbol developers. We (Martin Dow and Stephen Bayliss) are working on the Stanbol early adopter integration with the Fedora Commons digital object repository[1].
This post is to give you a heads-up on what we are seeking to do and some background, and we do welcome your input and comments. >From a functional perspective, the central use case we are tackling can be described as applying ontologies and rules as a task-oriented lens over content repository data, making use of Stanbol's KReS components in particular. First a note on the integration between Fedora and Stanbol. We are basing this on Fedora's JMS capabilities, which we have enhanced. These messages are then used to "synchronise" Stanbol with Fedora. (Fedora also provides a REST API for access to content). We've enhanced the existing Fedora JMS messaging capabilities as the messages were tightly-coupled to the Fedora API, which means that message consumers need detailed knowledge of that. Also in some cases some information that would be useful in interpreting changes that have been made to Fedora content is not readily available in these messages. We're hoping that this new messaging component will be re-usable outside of the Stanbol integration and will bring value to the Fedora community. Conceptually the integration is similar to the CMS Adapter[2] - however as Fedora's central index of content, the "Resource Index" (or "RI" [3]) is implemented as an RDF graph of URIs, representing Fedora's notion of digital objects and datastreams, and this is not the same as the JCR or CMIS view of an object, we're not planning on using the CMS Adapter component directly. In particular we haven't identified a need to "bridge" to the Fedora object model as it's already expressible -- partly at least -- in RDFS. Currently there's no formal schema for Fedora's "internal" object and datastream relationships, but implementation relationships between objects expressed in Fedora's RDF datastreams (RELS-EXT and RELS-INT) do have an RDF-S schema. (For more information on Fedora relationships see [4].) Fedora uses the Mulgara quad-store [5] as its RDF database, and it is possible to expose this directly with a SPARQL endpoint. Furthermore Clerezza can be configured to use Mulgara as its source. This therefore may be one nice way of accessing Fedora's Resource Index, particularly as the intent of the Resource Index is to provide access to the RDF view of Fedora's data. The repository content itself we are working with is a collection of images and their metadata, catalogued using the VRA (Visual Resources Association) XML metadata schema [6]. Some elements in the metadata records are populated with items from controlled vocabularies and thesauri- in particular artists are identified with persistent identifiers from the Getty ULAN thesaurus [7], which we have converted to SKOS (and SKOS-XL) [8]. So in terms of semantics, there are three kinds we need to work with in order to give the end-user a consistent experience during browse and discovery - the repository structures as expressed within the Fedora Resource Index, image metadata records (lifted from VRA into OWL), and Getty thesaurus concepts and interrelations expressed as SKOS. These schemata are loaded into KReS scopes, and alignment is needed. The Fedora schema is static, as one would expect, given Fedora's key role as a stable repository for digital contents, so is considered immutable in KReS. The VRA schema and our mapping to the SKOS/SKOS-XL thesaurus we are evolving and so we consider it shared and loaded at run-time. Scopes are intended for task-oriented partitioning of the semantic data. This comes into play in two scenarios. Firstly, when content is updated, the additional objects must be added. Secondly, there are adding constraints driven by user search or browse, eg when faceted browsing. So the final step in our use case demonstration will be to use per-user session scopes to handle this case. The intention is that data matching the user's view must be synchronised in cached graphs in a dedicated session scope; the results will be materialised by the reasoner. The DL reasoner could also be used to check integrity constraints at the point of access. Regards Steve [1] http://fedora-commons.org/ [2] http://wiki.iks-project.eu/index.php/CMS_Adaptor [3] https://wiki.duraspace.org/display/FEDORA35/Resource+Index [4] https://wiki.duraspace.org/display/FEDORA35/Triples+in+the+Resource+Index [5] http://www.mulgara.org/ [6] http://www.vraweb.org/projects/vracore4/index.html [7] http://www.getty.edu/research/tools/vocabularies/ulan/ [8] http://www.w3.org/TR/skos-reference/
