Fedora and Stanbol/KReS integration

Stephen Bayliss Fri, 30 Sep 2011 07:00:06 -0700

Hello Stanbol developers.

We (Martin Dow and Stephen Bayliss) are working on the Stanbol early adopter
integration with the Fedora Commons digital object repository[1].


This post is to give you a heads-up on what we are seeking to do and some
background, and we do welcome your input and comments.

>From a functional perspective, the central use case we are tackling can be
described as applying ontologies and rules as a task-oriented lens over
content repository data, making use of Stanbol's KReS components in
particular.

First a note on the integration between Fedora and Stanbol.  We are basing
this on Fedora's JMS capabilities, which we have enhanced.  These messages
are then used to "synchronise" Stanbol with Fedora. (Fedora also provides a
REST API for access to content).  We've enhanced the existing Fedora JMS
messaging capabilities as the messages were tightly-coupled to the Fedora
API, which means that message consumers need detailed knowledge of that.
Also in some cases some information that would be useful in interpreting
changes that have been made to Fedora content is not readily available in
these messages.  We're hoping that this new messaging component will be
re-usable outside of the Stanbol integration and will bring value to the
Fedora community.

Conceptually the integration is similar to the CMS Adapter[2] - however as
Fedora's central index of content, the "Resource Index" (or "RI" [3]) is
implemented as an RDF graph of URIs, representing Fedora's notion of digital
objects and datastreams, and this is not the same as the JCR or CMIS view of
an object, we're not planning on using the CMS Adapter component directly.
In particular we haven't identified a need to "bridge" to the Fedora object
model as it's already expressible -- partly at least -- in RDFS.  Currently
there's no formal schema for Fedora's "internal" object and datastream
relationships, but implementation relationships  between objects expressed
in Fedora's RDF datastreams (RELS-EXT and RELS-INT) do have an RDF-S schema.
(For more information on Fedora relationships see [4].)

Fedora uses the Mulgara quad-store [5] as its RDF database, and it is
possible to expose this directly with a SPARQL endpoint. Furthermore
Clerezza can be configured to use Mulgara as its source.  This therefore may
be one nice way of accessing Fedora's Resource Index, particularly as the
intent of the Resource Index is to provide access to the RDF view of
Fedora's data.

The repository content itself we are working with is a collection of images
and their metadata, catalogued using the VRA (Visual Resources Association)
XML metadata schema [6].  Some elements in the metadata records are
populated with items from controlled vocabularies and thesauri- in
particular artists are identified with persistent identifiers from the Getty
ULAN thesaurus [7], which we have converted to SKOS (and SKOS-XL) [8].

So in terms of semantics, there are three kinds we need to work with in
order to give the end-user a consistent experience during browse and
discovery - the repository structures as expressed within the Fedora
Resource Index, image metadata records (lifted from VRA into OWL), and Getty
thesaurus concepts and interrelations expressed as SKOS.  These schemata are
loaded into KReS scopes, and alignment is needed.  The Fedora schema is
static, as one would expect, given Fedora's key role as a stable repository
for digital contents, so is considered immutable in KReS.  The VRA schema
and our mapping to the SKOS/SKOS-XL thesaurus we are evolving and so we
consider it shared and loaded at run-time.

Scopes are intended for task-oriented partitioning of the semantic data.
This comes into play in two scenarios.  Firstly, when content is updated,
the additional objects must be added.  Secondly, there are adding
constraints driven by user search or browse, eg when faceted browsing.  So
the final step in our use case demonstration  will be to use per-user
session scopes to handle this case.  The intention is that data matching the
user's view must be synchronised in cached graphs in a dedicated session
scope; the results will be materialised by the reasoner.  The DL reasoner
could also be used to check integrity constraints at the point of access.  

Regards
Steve

[1] http://fedora-commons.org/
[2] http://wiki.iks-project.eu/index.php/CMS_Adaptor
[3] https://wiki.duraspace.org/display/FEDORA35/Resource+Index
[4]
https://wiki.duraspace.org/display/FEDORA35/Triples+in+the+Resource+Index
[5] http://www.mulgara.org/
[6] http://www.vraweb.org/projects/vracore4/index.html
[7] http://www.getty.edu/research/tools/vocabularies/ulan/
[8] http://www.w3.org/TR/skos-reference/

Fedora and Stanbol/KReS integration

Reply via email to