Re: [RT] SINR - a simplified client API for content enhancement

Rupert Westenthaler Wed, 09 May 2012 01:45:26 -0700

Hi,

I agree that consuming the Stanbol Enhancements is currently the most complex 
part for the average developer. But that is not only true for Java applies to 
any programming language.

Because of that I would really like to tackle this issue not via a Java APi, 
but rather on a level where all Stanbol users can TODO"profitieren"

Here is how it could work ...

In the "RESTful Design Workshop" at the WWW2012 conference Markus Lanthaler 
gave a really nice presentation on JSON-LD [1]. While Apache Stanbol already 
uses JSON-LD[2] as default serialization the current usage is very basic - we 
just use JSON-LD as an other RDF-serialization format. However JSON-LD would 
allow for much more customization/control of the serialized JSON by using the 
"@context".

However only improving the RDF -> JSON mapping alone will not be sufficient to 
transform the raw enhancement results to an easy to comsume JSON structure. 
Because of that I suggest to extend the "@context" with additional properties 
that allow to define transformation rules that are applied to the Enhancement 
results before the actual serialization. In the examples of provided by this 
mail I will use LDpath [3], but one could also use Stanbol Rules similar to the 
RefactorEngine to achieve the same thing.

Lets use an example to describe the whole Idea.

Everything starts with a JSON-LD @context like the following (this tries to 
resemble the example in the original mail by Bertrand)

{
  "@context":
  {
    "enhancer": "http://fise.iks-project.eu/ontology/";,
    "skos": "http://www.w3.org/2004/02/skos/core#,
    "label": "skos:prefLabel",
    "parent":  {
        "@id": "skos:broader",
        "@type": "@id"
    },
    "categories": {
        "@id":"enhancer:suggested-topics",
        "@ldpath": "enhancer:e
xtracted-from[rdf:type is 
enhancer:TopicEnhancement]/enhancer:referenced-entity",
        "@type":"skos:Concept",
        "@container":"@list",
    }
  }
}

This is a normal JSON-LD context as defined by [2] with a single exception the 
"@ldpath" property in the JSON object describing "categories".
This LD-path statement is used to "transform" the more complex modeling as used 
by the Stanbol Enhancement structure

    {content-item} -- extracted-from --> {topic-annotation-n} -- 
referenced-entity --> {category}

to the desired JSON structure

    {
        "@id" : "urn:content-item:SHA1-123456789"
        "categories" :  [{
            "@id" : "http://cv.iptc.org/newscodes/subjectcode/15002001";
            "label" : "downhill"
            "parent" : "http://cv.iptc.org/newscodes/subjectcode/15002000";
        }]
    }

For java users: Implementing SINR should be easily possible to use existing 
frameworks that supports mapping of Java objects to JSON (such as [4]).
However I expect that similar frameworks are also available for other 
programming languages and JavaScript will get native JSON objects.

A positive side-effect is that even Users that want to process RDF (instead of 
JSON) will profit from the easier RDF graphs produced by the @ldpath (or 
Stanbol Rule based) transformations as specified in the @contexts.

So while this will ensure that Enhancement results will be much easier to 
consume it also introduces a new weakness - the definition of the "@context" as 
this will be the new "most complex task for the average developer". 
However here the working assumption is that in most of the cases users will not 
need to define their own constexts, but rather use existing one that are 
included in the Stanbol distribution.

Such contexts would include:
    * the basic building blocks such as  categories (the above example), named 
entities (TextAnnotations), linked entities (EntityAnnotations), mentions 
(positions within the content), ...
    * combinations of such patterns tailored for typical use cases such as 
entity-tagging (e.g. [6]) or inline-annotations (e.g. [7])

All those predefined contexts need to be available via the stanbol web 
interface so that users can easily link/use them in requests (see [3] and 
especially the use of the Link header [5])

The Stanbol infrastructure for this feature would include

* available context should be also available via the OSGI environment 
(whiteboard pattern). The components performing the transformation will need 
that configuration
* the actual transformations based on the "@ldpath" instructions can be done by 
an EnhancementEngine in the post-processing phase.
* for serialization we will need to update the JSON-LD path serializer for 
Apache Clerezza to make it compliant with the recent changes/additions to the 
JSON-LD specification.
    * maybe we can use [8] as a base, but currently it does not define any 
license (added already an issue about that) 
* most of the LD-path utilities already exist (see the enhancer/ldpath module)
* Implementation of  SINR that directly accesses the Java-API could either be 
based on the transformed RDF graph.

WDYT?
Rupert

[1] http://www.slideshare.net/lanthaler/jsonld-for-restful-services
[2] http://json-ld.org/spec/latest/json-ld-syntax/
[3] http://json-ld.org/spec/latest/json-ld-syntax/#the-context
[4] http://xstream.codehaus.org/json-tutorial.html
[5] 
http://json-ld.org/spec/latest/json-ld-syntax/#referencing-contexts-from-json-documents
[6] http://www.youtube.com/watch?v=957-bs16Fjg
[7] http://hallojs.org/annotate.html (press the annotate button)
[8] https://github.com/tristan/jsonld-java

On 09.05.2012, at 01:06, Bertrand Delacretaz wrote:

> Hi,
> 
> I've been thinking recently that we could make Stanbol's content
> enhancement services more accessible to the average developer by
> providing a simplified POJO-like client API.
> 
> A secondary idea is to use that same API for other content enhancement
> services, making it possible to combine them and/or make them
> interchangeable.
> 
> This means losing the flexibility of RDF, but by using an Adapter
> pattern we can remain sufficiently flexible while making it much
> simpler to get started with Stanbol services.
> 
> The suggested name of this API is SINR (SINR Is Not RDF). Pronounced "sinner".
> 
> Here's an initial overview of what this could look like. Comments welcome.
> 
> Simple interfaces like Category, Annotation, Keyword are used to
> represent content enhancements.
> 
> Here's Category, for example (credits to Reto for this one). Plain and simple:
> 
> interface Category {
>  String getId();
>  String getLabel();
>  Category getParent();
> }
> 
> To enhance content with categories and keywords, you call the
> SinrEnhancer service like this:
> 
> InputStream content = ....
> String mimeType = ...
> // Specifying which enhancement types are desired allows the enhancer
> // to avoid doing unnecessary work, while making it possible to define
> // new types of enhancements later.
> Class [] desiredEnhancements = new Class[] { Category.class, Keyowrds.class };
> SinrResult r = enhancer.process(content, mimeType, desiredEnhancements);
> 
> An Adapter pattern allows you to convert the SINRresult to the various
> data types:
> 
> List<Category> c = r.getResultsOfType(Category.class);
> List<Keyowrd> k = r.getResultsOfType(Keywords.class);
> 
> With this pattern, new enhancement types can be added without changing
> the SinrEnhancer interface.
> 
> We might create two SINR implementations, one that talks to an OSGi
> service directly and another one that talks to a Stanbol server over
> HTTP.
> 
> WDYT?
> -Bertrand

Re: [RT] SINR - a simplified client API for content enhancement

Reply via email to