Hi Sebastian, Jakob, Stanbol team Based on the positive feedback of Anil to participate on this I decided to create an own thread to plan the next steps.
Next steps: The first step will be to define Java API that allows to provide different implementations. I think the Idea was to create an own Project (should we use Github or GoogleCode? MIT/BSD/Apache licensed?) that only focusses on the Specification of the Language [1] and the Java API. Sebastian needs to take the lead of this. If I remember correctly his plan was to start this next week. As soon as a first version of this specification is available we can start to work on implementations. * Kiwi TripleStore: I assume Sebastian and Jakob will work on that * Clerezza: Anil could you take the lead for that? * Entityhub: This will be my responsibility * SPARQL based implementation: I think that would be interesting - someone interested to work on that? * CMS Adapter: Suat could you follow this effort and check for possible usage scenarios. * Fact Store: This could be also an interesting. But same as for the CMS Adapter we need first to check usage scenarios. best Rupert On 28.10.2011, at 10:07, Ali Anil SINACI wrote: > Dear Rupert, > > On 10/28/2011 08:47 AM, Rupert Westenthaler wrote: >> On 27.10.2011, at 16:59, Ali Anil SINACI wrote: >>>> >>>> * The LMF semantic search component overlaps greatly with the recently by >>>> Anil contributed "contenthub/search/engines/solr" component. Related to >>>> this it would be great if Anil could have a look at [2] and check for >>>> similarities/differencies and possible integration paths. >>>> >>> I had a look on the semantic search component of LMF. As you pointed it >>> out, LMF semantic search provides a convenient way to index any part of >>> documents with the help of RDFPath Language. I think that we can make use >>> of this feature in contenthub. As I described in my previous e-mail, >>> currently, contenthub indexes a number of semantic fields based on DBPedia >>> relations. These are hardcoded relations. RDFPath language can be used to >>> indicate specific semantic fields to be indexed along with the content >>> itself. Let me describe the thing in our mind in a scenario: >>> >>> A user provides a domain ontology (e.g. music domain), submits to Entityhub >>> to be used in the enhancement process. Suppose the domain ontology includes >>> vast of information about artists, their albums etc... I assume that this >>> ontology does not include conceptual definitions (it only includes Abox >>> definitions). User writes an RDF Path Program (in LMF terminology) to >>> indicate the fields to be indexed when a content item has an enhancement >>> related with any path in that program. Suppose user submits a content item >>> along with the RDF Path Program(s) to be used to determine the fields to be >>> indexed. Enhancement engines find an entity (or lots of entities). Now, we >>> execute the selected RDF Path Program(s) and embed the results into the >>> Solr representation of the content item. >>> >>> If you have any other suggestions, please let me know so that we can >>> discuss in detail (in SRDC) before the meeting. >>> >> This is exactly what I was thinking about. Let me only add that such >> additional Knowledge to be included within the Semantic Index might not only >> come from the Entityhub, but also from other sources (like the CMS via the >> CMS adapter) >> >> I you would like to help me with an Implementation of the RdfPathLanguage >> (e.g. the Clerezza based Implementation, or maybe a Jena bases >> implementation) please let me know. Help would be greatly welcome, because I >> have already a lot of things on my TODO list before the Meeting in November >> (such as defining a Proposal for the Stanbol Enhancement Structure). >> > > We would like to get involved in the implementation of RDFPathLanguage for > Stanbol. We plan to work on this starting from next week. I think you & LMF > team already have a design in your mind. I will appreciate if you could share > your thoughts with us. > >>>> * The Semantic Search Inteface: The Contenthub currently defines it's own >>>> query API (supports keyword based search as well as "field -> value" >>>> like constraints, supports facets). The LMF directly exposes the RESTful >>>> API of the semantic Solr index. I strongly prefer the approach of the LMF, >>>> because the two points already described above. >>> We think that we do not have to make a selection here. We can keep a simple >>> wrap-up on the Solr interface (contenthub's own query API) while providing >>> the Solr RESTful API as is. IMO a wrap-up on Solr interface would be >>> beneficial. On the other hand, in this interface we try to make use of an >>> ontology to be used in OntologyResourceSearchEngine. This might help to >>> figure out new keywords based on the subsumption hierarchy inside the >>> ontology. However, I think this may lead to performance issues and may not >>> be useful at all. We can decide on this later. >> You forgot to mention one additional advantage for using the Solr RESTful >> API: If we do that one could create the Semantic Index and than copy it over >> to some other SolrServer without the need to run Stanbol directly on the >> production infrastructure. >> >> In general I would suggest to first focus the discussion on the unique >> features we would like to provide with the Semantic Search component. I >> already included three features I would like to have in my first Mail (Query >> preprocessing, Entity Facets, Semantic Facets). As you now mention the >> OntologyResourceSearchEngine is very relevant in relation to such features. >> However adding such features must not necessarily mean to create an own >> query language. One could also try to add such features directly to Solr by >> implementing some Solr extensions. >> > > Let me briefly comment in your suggestions about the semantic search. > >>>> But I am also the opinion that a semantic search interface should at >>>> least provide the following three additional features: >>>> 1. Query preprocessing: e.g. substitute "Paris" in the query with >>>> "http://dbpedia.org/resource/Paris"; >>>> 2. Entity Facets: if a keyword matches a Entity (e.g. "Paris" -> >>>> "dbpedia:Paris", "dbpedia:Paris_Texas", "dbpedia:Paris_Hilton") than >>>> provide a Facet to the user over such possible nnnnnnnnmatches; > > As far as we understand, first and second features will be handled by > querying the Entityhub with the query keyword (Paris) i.e the first entity > obtained from the Entityhub will help us to recognize its type and the other > entities will be served as facet values of Paris facet. > >>>> 3. Semantic Facets: if a user uses an instance of an ontology type >>>> (e.g. a Place, Person, Organization) in a query, that provide facets over >>>> semantic relations for such types (e.g. fiends for persons, >>>> products/services for Organizations, nearby Points-Of-Interests for >>>> Places, Participants for Events, …). To implement features like that we >>>> need components that provide query preprocessing capabilities based on >>>> data available in the Entityhub, Ontonet … . To me it seams that the >>>> contenthub/search/engines/ontologyresource component provides already some >>>> functionality related to this so this might be a good starting point. > > Currently, we are trying to integrate an exploration mechanism like you said > above. It is also based on DBPedia ontology. OntologyResourceEngine can be > used for this purpose for the user registered ontologies. Current > implementation of this engine only computes closures by exploiting the > hierarchy in the ontology. RDFPath Programs can also be an option at this > point. With an RDF Path Program user may specify the relations to be used in > the exploration process. But I think this means the user decides beforehand > which fields should be presented to him as exploration fields. I think this > is open to discussion. > >> best >> Rupert >> > > Regards, > Anil.
