Re: Proposal for Integration of Linked Media Framework in Apache Stanbol

Rupert Westenthaler Fri, 22 Jul 2011 10:28:23 -0700

Hi

Let me add my first comments


>
> LMF Core
> ========
>
> The core component of the Linked Media Framework is a Linked Data Server that 
> allows to expose data following the Linked Data Principles:
> * Use URIs as names for things.
> * Use HTTP URIs, so that people can look up those names.
> * When someone looks up a URI, provide useful information, using the 
> standards (RDF, SPARQL).
> * Include links to other URIs, so that they can discover more things.
> The Linked Data Server implemented as part of the LMF goes beyond the Linked 
> Data principles by extending them with Linked Data Updates and by integrating 
> management of metadata and content and making both accessible in a uniform 
> way. In addition to the Linked Data Server, the LMF Core also offers a SPARQL 
> endpoint.
>

The Linked Data Server incl. the LMF extensions (content+metadata and
full CRUD) functionality would be a very nice extension for STANBOL:
(1) because it is central for the plant Contenthub.(2) it is also
relevant for the Entityhub and (3) if we design this component smart,
that it should be possible that it could provide an easy way for many
existing CMS systems to support LOD for there contents.

> LMF Modules
> ===========
>
> As extension for the LMF Core, we are working on a number of optional modules 
> that can be used to extend the functionality of the Linked Media Server:
>
> Implemented:
> * LMF Semantic Search offers a highly configurable Semantic Search service 
> based on Apache SOLR. Several semantic search indexes can be configured in 
> the same LMF instance using an RDF Path Language that allows traversal over 
> several Linked Data sources.

At first this sounds similar to the SolrYard used by the Entityhub but
in the details it is very different.
Here it is the goal to use a Path Language to
* define the Information indexed for a Document
* create a Solr Schema that stores exactly such information in fields
with simple names
* based on the paths retrieves Information from different DataSources
such as the Entityhub, Enhancement Results possible CMS queries via
the CMSAdapter, ...)
* users will directly access the Solr Index via the normal Solr
RESTful interface.

The SolrYard aims to use a single schema that it capable to store
Entities of any kind (without schema limitations). For providing this
it uses rather tricky prefixes and postfixes for Solr fields. Solr
Indices are not intended to be directly used by Users, but via the
Java API/RESTful API of the Entityhub. However all the Solr related
utilities (like creating/loading SolrCores via the Sling Installer/
Stanbol DataFileprovider) will be used by both components. This is
also the reason why I have already started to move this
functionalities over to the stanbol.commons.solr bundle.

I think the first and most intuitive use is to use this component to
build the Search Interface for the Contenthub (similar to what it
already does within the LMF), but this is also an very related to the
work Fabian has started with the Factstore (the path configuration
would be the definition of the fact).

> * LMF Linked Data Cache implements a cache to the Linked Data Cloud that is 
> transparently used when querying the content of the LMF using either SPARQL 
> or the Semantic Search component. In case a local resource links to a remote 
> resource in the Linked Data Cloud and this relationship is queried, the 
> remote resource will be retrieved in the background and cached locally.

Linked Data Crawling functionality is definitely of interest to the
Entityhub. It could be used build more efficient local caches (e.g. to
prefetch data that are only a single URI away from data already used).
However such a feature is also of general interest to CMS.

> * LMF Reasoner implements a rule-based reasoner that allows to process 
> Datalog-style rules over RDF triples; the LMF Reasoner is based on the 
> reasoning component developed in the KiWi project, the predecessor of the LMF
>

In my opinion this component may have the biggest impact if used in
combination with the LMF Semantic Search/Indexing component - adding
the possibility to use  Rules (in addition to the RDF Path Language)
to specify how to build documents for the semantic index.
If this can solve the licencing Issues with current Reasoners would be
interesting. I think the current Rule/Reasoner functionality is based
on OWL-DL and SWRL and I do not know how that relates to Datalog-style
rule languages.

> Under Progress:
> * LMF Permissions implements and extends the WebID and WebACL specifications 
> for standards-conforming authentication and access control in the Linked 
> Media Framework. (state: almost completed)
> * LMF Enhancer offers semantic enhancement of content by analysing textual 
> and media content; the LMF Enhancer will build upon the Apache Stanbol 
> framework (state: started)
> * LMF Media Interlinking will implement support for multimedia interlinking 
> based on the work in the W3C Multimedia Fragments WG and theW3C Multimedia 
> Annotations WG
> * LMF Versioning implements versioning of metadata updates; versioning itself 
> is already carried out by LMF Core, but the management of versions will be 
> carried out by this module (state: started)
> --- 8< ---- 8< ---
>
> As far as I can see, Apache Stanbol and the Linked Media Framework currently 
> cover mostly complementary areas and I think that a combination could be of 
> benefit for both projects. In particular, I would think that the Linked Media 
> Framework can also offer an almost ready implementation of the Stanbol 
> Content Hub, as well as a free reasoner and full Linked Data capabilities 
> (server as well as client) and Semantic Search.
>
> Last but not least it is also of strategic interest for the KMT group at 
> Salzburg Research to (1) integrate the technologies developed in the group, 
> and (2) avoid duplication of effort if it is not necessary.
>
> What I can offer is that - following a discussion on the mailinglist - we 
> donate the LMF code base to Apache Stanbol and try integrating the two 
> projects in the next months. Since the LMF is easily deployable as a .war 
> file, a first step could be to deploy the web application inside Apache 
> Stanbol as an OSGi web application. This could demonstrate the usefulness of 
> the combination. Of course, in the course of the integration it would be 
> necessary to isolate the individual modules of the LMF as separate components.
>
> Difficulties I see at the moment (just to mention these...):
> - technological issue: LMF is currently not using OSGi, but it uses a Gradle 
> build system, Java 6 EE dependency injection (CDI) and a typical Java EE 
> architecture which is incompatible with OSGi for now; one of our selling 
> points is also "easy setup" and "lightweight", so I would not really like to 
> change this for a complicated architecture ...

I think the Stanbol community is a very good place for getting help
with OSGI related questions.

> - license issue: LMF might still use libraries that are released under 
> incompatible licenses, so this needs to be checked. In particular, Hibernate 
> is still licensed under LGPL, and it is one of the core libraries of the LMF; 
> porting to other persistence frameworks might require a lot of effort
> - organisational issue: LMF is developed in several projects that have their 
> specific goals; we will in some way need to still be able to follow our goals 
> in these projects, e.g. by appropriate Stanbol extensions; all software 
> developed in these projects is Open Source though...
>
Relating to this one has simple to consider that the focus of Stanbol
and the LMF is a little be different:

Stanbol tries to be the Semantic Engine that brings Semantic
Capabilities to CMS. The LMF also provides a lot of such capabilities.
Luckily such capabilities are mostly complementary and therefore it
would be a really great thing to merge them and build a single more
capable thing with an even stronger community.

However the LMF provides also a lot of Semantic CMS capabilities.
Features like "LMF Versioning", "LMF Permissions" and its dependency
to Hibernate are hints about that. Stanbol explicitly excludes such
stuff because one can typically already find this features in the
existing CMS stacks of potential Stanbol users.
For the LMF this stuff is important because such features are required
for the kind of research projects we need to run here at Salzburg
Research.

What does that mean? In my opinion this simply indicates that there
will be still a LMF around after the merging with Stanbol. It will
have a very strong dependency to Stanbol and the remaining parts will
be much more focused around Semantic ContentManagement.

best
Rupert Westenthaler


-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Proposal for Integration of Linked Media Framework in Apache Stanbol

Reply via email to