Hallo, this is to share the developments of the last three weeks around STANBOL-185 (Support for Jena-based reasoning services).
Initial problem was: 1) No support for Jena 2) Hermit LGPL dependency (so no default reasoner with the current implementation that can be distributed with Stanbol) 3) No extendibility for off the shelf reasoning services All three points are addressed by the current version in branches/jena-reasoners (even if lot of things can improved). What have been implemented? * A /serviceapi for ReasoningServices, using SCR, and a REST endpoint which automatically publish all available services (in /web) * Base OWLApi and Jena abstract services * Jena RDFS,OWL,OWLMini reasoning services * HermiT reasoning service (this will be moved out of Stanbol, finally) At the end of this e-mail I include the content of the README.txt file, for your convenience, which includes some of ideas and proposal for improvements. I would really appreciate if the community can have a look at it, and provide some feedback. I did my best to make this api as flexible and efficient as possible, but I know more work must still be done in this direction (some ideas are below). I am going to verify the integritycheck demo with respect to this implementation, while it is working, I will move the code in /trunk. Thank you all Enrico ------------------------------------- Description ============= * A serviceapi for ReasoningServices, using SCR * Base OWLApi and Jena abstract services * Jena RDFS,OWL,OWLMini reasoning services * HermiT reasoning service * A common REST endpoint at /reasoners with the following preloaded services: ** /rdfs ** /owl ** /owlmini ** /owl2 each can be accessed with one of three tasks: check,enrich,classify, for example: /reasoners/owl/check (the Jena owl service with task classify) or /reasoners/owl2/classify (the hermit service with task classify) Tasks description: * check : is the input consistent? 200 =true, 204 =false * classify : return only rdf:type inferences * enrich : return all inferences This is how the endpoint behave: GET (same if POST and Content-type: application/x-www-form-urlencoded) params: * url // Loads the input from url * target // (optional) If given, save output in the store (TcManager) and does not return the stream for example: $ curl "http://localhost:8080/reasoners/owl2/classify?url=http://xmlns.com/foaf/0.1/" POST [Content-type: multipart/form-data] * file // Loads from the input stream * target // (optional) If given, save output in the store (TcManager) and does not return the stream Other parameters can be sent, to support inputs from Ontonet and Rules: These additional parameters can be sent: * scope // the ID of an Ontonet scope * session // The ID of an Ontonet session * recipe // The ID of a recipe from the Rules module (only with OWLApi based services)s Supported output formats: Supported return formats are all classic RDF types (n3,turtle,rdf+xml) and HTML. For HTML the returned statements are provided in Turtle (Jena) or OWL Manchester syntax (OWLApi), wrapped in the stanbol layout. It would be nice to have all in the latter, which is very much readable (todo). Todo ============= * Support for return types json and json-ld (need to write jersey writers) * The front service actually returns only inferred statements. It is useful also to have the complete set of input+inferred statements * Support for long-term operations. This is crucial for reasoning tasks, since it can take some time with large graphs. This is needed in general for Stanbol, something like "Stanbol Jobs". * Decouple input preparation from the rest endpoint resource, creating something like an InputProvider SCR api; each InputProvider is bound to a set of additional parameters. This have several benefits: ** Remove of additional optional parameters, bound to specific input sources from the default rest api (ex, session, scope, recipe) ** Remove dependencies to ontonet, rules and other modules which are not needed for standard usage. They could be implemented as InputProvider/s, bound to specific parameters. ** Allow the addition of other input sources (for example 'graph', 'entity' or 'site') * Implement a Custom Jena ReasoningService, to use a Jena rules file or a stanbol recipe (when implemented the toJena() functionality in the rules module) from configuration. This could be done as multiple SCR instance, as it is now for entityhub sites, for example. * Provide a validation report in case of task CHECK (validity check). * Implement a progress monitor, relying on the jena and owlapi apis, which have this feature, for debugging purpose * Implement a benchmark endpoint, relying on OWL manchester syntax, to setup benchmark tests in the style of the one made for the enhancer * Implementing owllink client reasoning service * Implement additional data preparation steps, for example to implement a "consistent refactoring" task. For example, giving a parameter 'refactor=<recipe-id>' the service could refactor the graph before execute the task. * Implement off the shelf reasoning services (for example, targeted to resolve only owl:sameAs links) General issues ============= The main problem is performance, which decrease while the input data grows, in some cases dramatically. This could be faced (IMHO), in two directions: * Improve input preparation. In particular, the preparation of input form ontonet scope/session needs to stream the ontologies, in cases of more input (url provided) twice!, and this have some drawback on performance. * Support long-term operations, to start the process from the REST call and then ping it's process through a dedicated endpoint Notes (to be known) ============= Differences between Jena and OWLApi services: * CHECK have different meaning with respect to the reasoning service implementation Examples ============= # # Basic GET calls to the reasoning services. # Send a URL and the service will return the inferred triples # # Classify the FOAF ontology, getting it from the web using the Jena OWL reasoner, result in turtle curl -v -H "Accept: application/turtle" "http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/" # Classify the FOAF ontology, getting it from the web using the Jena OWL reasoner, result in n3 curl -v -H "Accept: text/n3" "http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/" # Enrich the FOAF ontology, getting it from the web using the Jena RDFS reasoner, result in rdf/xml curl -v -H "Accept: application/rdf+xml" "http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/" # Check consistency of the FOAF ontology, getting it from the web using the Jena OWL reasoner, result in turtle curl -v "http://localhost:8080/reasoners/owl/check?url=http://xmlns.com/foaf/0.1/" # Check consistency of the FOAF ontology, getting it from the web using the Hermit OWL2 reasoner, result in turtle curl -v "http://localhost:8080/reasoners/owl2/check?url=http://xmlns.com/foaf/0.1/" # Trying with an ontology network (large ontology composed by a set of little ontologies connected through owl:import statements) curl -v "http://localhost:8080/reasoners/owl2/check?url=http://www.cnr.it/ontology/cnr/cnr.owl" # or curl -v "http://localhost:8080/reasoners/owl2/enrich?url=http://www.cnr.it/ontology/cnr/cnr.owl" # # POST calls (send a file) # # Send the foaf.rdf file to a reasoning service and see the output # (get it with curl -H "Accept: application/rdf+xml" http://xmlns.com/foaf/0.1/ > foaf.rdf # ) curl -X POST -H "Content-type: multipart/form-data" -H "Accept: text/turtle" -F [email protected] "http://localhost:8080/reasoners/rdfs/enrich" # Save output in the triple store instead of return # >> Add the "target" parameter, with the graph identifier curl "http://localhost:8080/reasoners/owl/classify?url=http://xmlns.com/foaf/0.1/&target=example-foaf-inferred" # or, posting a file curl -X POST -H "Content-type: multipart/form-data" -F [email protected] -F target=example-rdfs-inferences "http://localhost:8080/reasoners/rdfs/enrich" -- Enrico Daga -- http://www.enridaga.net skype: enri-pan
