Hi,
I agree that consuming the Stanbol Enhancements is currently the most complex
part for the average developer. But that is not only true for Java applies to
any programming language.
Because of that I would really like to tackle this issue not via a Java APi,
but rather on a level where all Stanbol users can TODO"profitieren"
Here is how it could work ...
In the "RESTful Design Workshop" at the WWW2012 conference Markus Lanthaler
gave a really nice presentation on JSON-LD [1]. While Apache Stanbol already
uses JSON-LD[2] as default serialization the current usage is very basic - we
just use JSON-LD as an other RDF-serialization format. However JSON-LD would
allow for much more customization/control of the serialized JSON by using the
"@context".
However only improving the RDF -> JSON mapping alone will not be sufficient to
transform the raw enhancement results to an easy to comsume JSON structure.
Because of that I suggest to extend the "@context" with additional properties
that allow to define transformation rules that are applied to the Enhancement
results before the actual serialization. In the examples of provided by this
mail I will use LDpath [3], but one could also use Stanbol Rules similar to the
RefactorEngine to achieve the same thing.
Lets use an example to describe the whole Idea.
Everything starts with a JSON-LD @context like the following (this tries to
resemble the example in the original mail by Bertrand)
{
"@context":
{
"enhancer": "http://fise.iks-project.eu/ontology/",
"skos": "http://www.w3.org/2004/02/skos/core#,
"label": "skos:prefLabel",
"parent": {
"@id": "skos:broader",
"@type": "@id"
},
"categories": {
"@id":"enhancer:suggested-topics",
"@ldpath": "enhancer:e
xtracted-from[rdf:type is
enhancer:TopicEnhancement]/enhancer:referenced-entity",
"@type":"skos:Concept",
"@container":"@list",
}
}
}
This is a normal JSON-LD context as defined by [2] with a single exception the
"@ldpath" property in the JSON object describing "categories".
This LD-path statement is used to "transform" the more complex modeling as used
by the Stanbol Enhancement structure
{content-item} -- extracted-from --> {topic-annotation-n} --
referenced-entity --> {category}
to the desired JSON structure
{
"@id" : "urn:content-item:SHA1-123456789"
"categories" : [{
"@id" : "http://cv.iptc.org/newscodes/subjectcode/15002001"
"label" : "downhill"
"parent" : "http://cv.iptc.org/newscodes/subjectcode/15002000"
}]
}
For java users: Implementing SINR should be easily possible to use existing
frameworks that supports mapping of Java objects to JSON (such as [4]).
However I expect that similar frameworks are also available for other
programming languages and JavaScript will get native JSON objects.
A positive side-effect is that even Users that want to process RDF (instead of
JSON) will profit from the easier RDF graphs produced by the @ldpath (or
Stanbol Rule based) transformations as specified in the @contexts.
So while this will ensure that Enhancement results will be much easier to
consume it also introduces a new weakness - the definition of the "@context" as
this will be the new "most complex task for the average developer".
However here the working assumption is that in most of the cases users will not
need to define their own constexts, but rather use existing one that are
included in the Stanbol distribution.
Such contexts would include:
* the basic building blocks such as categories (the above example), named
entities (TextAnnotations), linked entities (EntityAnnotations), mentions
(positions within the content), ...
* combinations of such patterns tailored for typical use cases such as
entity-tagging (e.g. [6]) or inline-annotations (e.g. [7])
All those predefined contexts need to be available via the stanbol web
interface so that users can easily link/use them in requests (see [3] and
especially the use of the Link header [5])
The Stanbol infrastructure for this feature would include
* available context should be also available via the OSGI environment
(whiteboard pattern). The components performing the transformation will need
that configuration
* the actual transformations based on the "@ldpath" instructions can be done by
an EnhancementEngine in the post-processing phase.
* for serialization we will need to update the JSON-LD path serializer for
Apache Clerezza to make it compliant with the recent changes/additions to the
JSON-LD specification.
* maybe we can use [8] as a base, but currently it does not define any
license (added already an issue about that)
* most of the LD-path utilities already exist (see the enhancer/ldpath module)
* Implementation of SINR that directly accesses the Java-API could either be
based on the transformed RDF graph.
WDYT?
Rupert
[1] http://www.slideshare.net/lanthaler/jsonld-for-restful-services
[2] http://json-ld.org/spec/latest/json-ld-syntax/
[3] http://json-ld.org/spec/latest/json-ld-syntax/#the-context
[4] http://xstream.codehaus.org/json-tutorial.html
[5]
http://json-ld.org/spec/latest/json-ld-syntax/#referencing-contexts-from-json-documents
[6] http://www.youtube.com/watch?v=957-bs16Fjg
[7] http://hallojs.org/annotate.html (press the annotate button)
[8] https://github.com/tristan/jsonld-java
On 09.05.2012, at 01:06, Bertrand Delacretaz wrote:
> Hi,
>
> I've been thinking recently that we could make Stanbol's content
> enhancement services more accessible to the average developer by
> providing a simplified POJO-like client API.
>
> A secondary idea is to use that same API for other content enhancement
> services, making it possible to combine them and/or make them
> interchangeable.
>
> This means losing the flexibility of RDF, but by using an Adapter
> pattern we can remain sufficiently flexible while making it much
> simpler to get started with Stanbol services.
>
> The suggested name of this API is SINR (SINR Is Not RDF). Pronounced "sinner".
>
> Here's an initial overview of what this could look like. Comments welcome.
>
> Simple interfaces like Category, Annotation, Keyword are used to
> represent content enhancements.
>
> Here's Category, for example (credits to Reto for this one). Plain and simple:
>
> interface Category {
> String getId();
> String getLabel();
> Category getParent();
> }
>
> To enhance content with categories and keywords, you call the
> SinrEnhancer service like this:
>
> InputStream content = ....
> String mimeType = ...
> // Specifying which enhancement types are desired allows the enhancer
> // to avoid doing unnecessary work, while making it possible to define
> // new types of enhancements later.
> Class [] desiredEnhancements = new Class[] { Category.class, Keyowrds.class };
> SinrResult r = enhancer.process(content, mimeType, desiredEnhancements);
>
> An Adapter pattern allows you to convert the SINRresult to the various
> data types:
>
> List<Category> c = r.getResultsOfType(Category.class);
> List<Keyowrd> k = r.getResultsOfType(Keywords.class);
>
> With this pattern, new enhancement types can be added without changing
> the SinrEnhancer interface.
>
> We might create two SINR implementations, one that talks to an OSGi
> service directly and another one that talks to a Stanbol server over
> HTTP.
>
> WDYT?
> -Bertrand