What's going on with reasoners (STANBOL-185)

Enrico Daga Tue, 30 Aug 2011 07:13:23 -0700

Hallo,
this is to share the developments of the last three weeks around
STANBOL-185 (Support for Jena-based reasoning services).


Initial problem was:
1) No support for Jena
2) Hermit LGPL dependency (so no default reasoner with the current
implementation that can be distributed with Stanbol)
3) No extendibility for off the shelf reasoning services

All three points are addressed by the current version in
branches/jena-reasoners (even if lot of things can improved).

What have been implemented?
* A /serviceapi for ReasoningServices, using SCR, and a REST endpoint
which automatically publish all available services (in /web)
* Base OWLApi and Jena abstract services
* Jena RDFS,OWL,OWLMini reasoning services
* HermiT reasoning service (this will be moved out of Stanbol, finally)

At the end of this e-mail I include the content of the README.txt
file, for your convenience, which includes some of ideas and proposal
for improvements.
I would really appreciate if the community can have a look at it, and
provide some feedback.

I did my best to make this api as flexible and efficient as possible,
but I know more work must still be done in this direction (some ideas
are below).

I am going to verify the integritycheck demo with respect to this
implementation, while it is working, I will move the code in /trunk.

Thank you all

Enrico

-------------------------------------

Description
=============

* A serviceapi for ReasoningServices, using SCR
* Base OWLApi and Jena abstract services
* Jena RDFS,OWL,OWLMini reasoning services
* HermiT reasoning service

* A common REST endpoint at /reasoners with the following preloaded services:
**    /rdfs
**    /owl
**    /owlmini
**    /owl2

each can be accessed with one of three tasks: check,enrich,classify,
for example:

/reasoners/owl/check    (the Jena owl service with task classify)
or
/reasoners/owl2/classify (the hermit service with task classify)

Tasks description:
* check    : is the input consistent? 200 =true, 204 =false
* classify : return only rdf:type inferences
* enrich   : return all inferences

This is how the endpoint behave:

GET (same if POST and Content-type: application/x-www-form-urlencoded)
params:
* url        // Loads the input from url
* target  // (optional) If given, save output in the store (TcManager)
and does not return the stream

for example:
$ curl 
"http://localhost:8080/reasoners/owl2/classify?url=http://xmlns.com/foaf/0.1/";

POST   [Content-type: multipart/form-data]
* file       // Loads from the input stream
* target  // (optional)  If given, save output in the store
(TcManager) and does not return the stream

Other parameters can be sent, to support inputs from Ontonet and Rules:
These additional parameters can be sent:
* scope // the ID of an Ontonet scope
* session // The ID of an Ontonet session
* recipe  // The ID of a recipe from the Rules module (only with
OWLApi based services)s

Supported output formats:
Supported return formats are all classic RDF types (n3,turtle,rdf+xml)
and HTML. For HTML the returned statements are provided in Turtle
(Jena) or OWL Manchester syntax (OWLApi), wrapped in the stanbol
layout. It would be nice to have all in the latter, which is very much
readable (todo).

Todo
=============
* Support for return types json and json-ld (need to write jersey writers)
* The front service actually returns only inferred statements. It is
useful also to have the complete set of input+inferred statements
* Support for long-term operations. This is crucial for reasoning
tasks, since it can take some time with large graphs. This is needed
in general for Stanbol, something like "Stanbol Jobs".
* Decouple input preparation from the rest endpoint resource, creating
something like an InputProvider SCR api;  each InputProvider is bound
to a set of additional parameters.
This have several benefits:
** Remove of additional optional parameters, bound to specific input
sources from the default rest api (ex, session, scope, recipe)
** Remove dependencies to ontonet, rules and other modules which are
not needed for standard usage. They could be implemented as
InputProvider/s, bound to specific parameters.
** Allow the addition of other input sources (for example 'graph',
'entity' or 'site')
* Implement a Custom Jena ReasoningService, to use a Jena rules file
or a stanbol recipe (when implemented the toJena() functionality in
the rules module) from configuration. This could be done as multiple
SCR instance, as it is now for entityhub sites, for example.
* Provide a validation report in case of task CHECK (validity check).
* Implement a progress monitor, relying on the jena and owlapi apis,
which have this feature, for debugging purpose
* Implement a benchmark endpoint, relying on OWL manchester syntax, to
setup benchmark tests in the style of the one made for the enhancer
* Implementing owllink client reasoning service
* Implement additional data preparation steps, for example to
implement a "consistent refactoring" task. For example, giving a
parameter 'refactor=<recipe-id>' the service could refactor the graph
before execute the task.
* Implement off the shelf reasoning services (for example, targeted to
resolve only owl:sameAs links)

General issues
=============
The main problem is performance, which decrease while the input data
grows, in some cases dramatically. This could be faced (IMHO), in two
directions:
* Improve input preparation. In particular, the preparation of input
form ontonet scope/session needs to stream the ontologies, in cases of
more input (url provided) twice!, and this have some drawback on
performance.
* Support long-term operations, to start the process from the REST
call and then ping it's process through a dedicated endpoint

Notes (to be known)
=============
Differences between Jena and OWLApi services:
* CHECK have different meaning with respect to the reasoning service
implementation



Examples
=============
#
# Basic GET calls to the reasoning services.
# Send a URL and the service will return the inferred triples
#
# Classify the FOAF ontology, getting it from the web using the Jena
OWL reasoner, result in turtle
curl -v -H "Accept: application/turtle"
"http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/";

# Classify the FOAF ontology, getting it from the web using the Jena
OWL reasoner, result in n3
curl -v -H "Accept: text/n3"
"http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/";

# Enrich the FOAF ontology, getting it from the web using the Jena
RDFS reasoner, result in rdf/xml
curl -v -H "Accept: application/rdf+xml"
"http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/";

# Check consistency of the FOAF ontology, getting it from the web
using the Jena OWL reasoner, result in turtle
curl -v 
"http://localhost:8080/reasoners/owl/check?url=http://xmlns.com/foaf/0.1/";

# Check consistency of the FOAF ontology, getting it from the web
using the Hermit OWL2 reasoner, result in turtle
curl -v 
"http://localhost:8080/reasoners/owl2/check?url=http://xmlns.com/foaf/0.1/";

# Trying with an ontology network (large ontology composed by a set of
little ontologies connected through owl:import statements)
curl -v 
"http://localhost:8080/reasoners/owl2/check?url=http://www.cnr.it/ontology/cnr/cnr.owl";
# or
curl -v 
"http://localhost:8080/reasoners/owl2/enrich?url=http://www.cnr.it/ontology/cnr/cnr.owl";

#
# POST calls (send a file)
#
# Send the foaf.rdf file to a reasoning service and see the output
# (get it with
curl -H "Accept: application/rdf+xml"  http://xmlns.com/foaf/0.1/ > foaf.rdf
# )
curl -X POST -H "Content-type: multipart/form-data" -H "Accept:
text/turtle" -F [email protected]
"http://localhost:8080/reasoners/rdfs/enrich";

# Save output in the triple store instead of return
# >> Add the "target" parameter, with the graph identifier
curl 
"http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/&target=example-foaf-inferred";
# or, posting a file
curl -X POST -H "Content-type: multipart/form-data" -F [email protected]
-F target=example-rdfs-inferences
"http://localhost:8080/reasoners/rdfs/enrich";



-- 
Enrico Daga

--
http://www.enridaga.net
skype: enri-pan

What's going on with reasoners (STANBOL-185)

Reply via email to