Hi Sebastian, Jakob, Stanbol team

Based on the positive feedback of Anil to participate on this I decided to 
create an own thread to plan the next steps.

Next steps:

The first step will be to define Java API that allows to provide different 
implementations. I think the Idea was to create an own Project (should we use 
Github or GoogleCode? MIT/BSD/Apache licensed?) that only focusses on the 
Specification of the Language [1] and the Java API. Sebastian needs to take the 
lead of this. If I remember correctly his plan was to start this next week.

As soon as a first version of this specification is available we can start to 
work on implementations.

* Kiwi TripleStore: I assume Sebastian and Jakob will work on that
* Clerezza: Anil could you take the lead for that?
* Entityhub: This will be my responsibility
* SPARQL based implementation: I think that would be interesting - someone 
interested to work on that?
* CMS Adapter: Suat could you follow this effort and check for possible usage 
scenarios.
* Fact Store: This could be also an interesting. But same as for the CMS 
Adapter we need first to check usage scenarios.

best
Rupert



On 28.10.2011, at 10:07, Ali Anil SINACI wrote:

> Dear Rupert,
> 
> On 10/28/2011 08:47 AM, Rupert Westenthaler wrote:
>> On 27.10.2011, at 16:59, Ali Anil SINACI wrote:
>>>> 
>>>> * The LMF semantic search component overlaps greatly with the recently by 
>>>> Anil contributed "contenthub/search/engines/solr" component.  Related to 
>>>> this it would be great if Anil could have a look at [2] and check for 
>>>> similarities/differencies and possible integration paths.
>>>> 
>>> I had a look on the semantic search component of LMF. As you pointed it 
>>> out, LMF semantic search provides a convenient way to index any part of 
>>> documents with the help of RDFPath Language. I think that we can make use 
>>> of this feature in contenthub. As I described in my previous e-mail, 
>>> currently, contenthub indexes a number of semantic fields based on DBPedia 
>>> relations. These are hardcoded relations. RDFPath language can be used  to 
>>> indicate specific semantic fields to be indexed along with the content 
>>> itself. Let me describe the thing in our mind in a scenario:
>>> 
>>> A user provides a domain ontology (e.g. music domain), submits to Entityhub 
>>> to be used in the enhancement process. Suppose the domain ontology includes 
>>> vast of information about artists, their albums etc... I assume that this 
>>> ontology does not include conceptual definitions (it only includes Abox 
>>> definitions). User writes an RDF Path Program (in LMF terminology) to 
>>> indicate the fields to be indexed when a content item has an enhancement 
>>> related with any path in that program. Suppose user submits a content item 
>>> along with the RDF Path Program(s) to be used to determine the fields to be 
>>> indexed. Enhancement engines find an entity (or lots of entities). Now, we 
>>> execute the selected RDF Path Program(s) and embed the results into the 
>>> Solr representation of the content item.
>>> 
>>> If you have any other suggestions, please let me know so that we can 
>>> discuss in detail (in SRDC) before the meeting.
>>> 
>> This is exactly what I was thinking about. Let me only add that such 
>> additional Knowledge to be included within the Semantic Index might not only 
>> come from the Entityhub, but also from other sources (like the CMS via the 
>> CMS adapter)
>> 
>> I you would like to help me with an Implementation of the RdfPathLanguage 
>> (e.g. the Clerezza based Implementation, or maybe a Jena bases 
>> implementation) please let me know. Help would be greatly welcome, because I 
>> have already a lot of things on my TODO list before the Meeting in November 
>> (such as defining a Proposal for the Stanbol Enhancement Structure).
>> 
> 
> We would like to get involved in the implementation of RDFPathLanguage for 
> Stanbol. We plan to work on this starting from next week. I think you & LMF 
> team already have a design in your mind. I will appreciate if you could share 
> your thoughts with us.
> 
>>>> * The Semantic Search Inteface: The Contenthub currently defines it's own 
>>>> query API (supports keyword based search as well as "field ->   value" 
>>>> like constraints, supports facets). The LMF directly exposes the RESTful 
>>>> API of the semantic Solr index. I strongly prefer the approach of the LMF, 
>>>> because the two points already described above.
>>> We think that we do not have to make a selection here. We can keep a simple 
>>> wrap-up on the Solr interface (contenthub's own query API) while providing 
>>> the Solr RESTful API as is. IMO a wrap-up on Solr interface would be 
>>> beneficial. On the other hand, in this interface we try to make use of an 
>>> ontology to be used in OntologyResourceSearchEngine. This might help to 
>>> figure out new keywords based on the subsumption hierarchy inside the 
>>> ontology. However, I think this may lead to performance issues and may not 
>>> be useful at all. We can decide on this later.
>> You forgot to mention one additional advantage for using the Solr RESTful 
>> API: If we do that one could create the Semantic Index and than copy it over 
>> to some other SolrServer without the need to run Stanbol directly on the 
>> production infrastructure.
>> 
>> In general I would suggest to first focus the discussion on the unique 
>> features we would like to provide with the Semantic Search component. I 
>> already included three features I would like to have in my first Mail (Query 
>> preprocessing, Entity Facets, Semantic Facets). As you now mention the 
>> OntologyResourceSearchEngine is very relevant in relation to such features.
>> However adding such features must not necessarily mean to create an own 
>> query language. One could also try to add such features directly to Solr by 
>> implementing some Solr extensions.
>> 
> 
> Let me briefly comment in your suggestions about the semantic search.
> 
>>>>  But I am also the opinion that a semantic search interface should at 
>>>> least provide the following three additional features:
>>>>     1. Query preprocessing: e.g. substitute  "Paris" in the query with 
>>>> "http://dbpedia.org/resource/Paris";;
>>>>     2. Entity Facets: if a keyword matches a Entity (e.g. "Paris" ->   
>>>> "dbpedia:Paris", "dbpedia:Paris_Texas", "dbpedia:Paris_Hilton") than 
>>>> provide a Facet to the user over such possible nnnnnnnnmatches;
> 
> As far as we understand, first and second features will be handled by 
> querying the Entityhub with the query keyword (Paris) i.e the first entity 
> obtained from the Entityhub will help us to recognize its type and the other 
> entities will be served as facet values of Paris facet.
> 
>>>>     3. Semantic Facets: if a user uses an instance of an ontology type 
>>>> (e.g. a Place, Person, Organization) in a query, that provide facets over 
>>>> semantic relations for such types (e.g. fiends for persons, 
>>>> products/services for Organizations, nearby Points-Of-Interests for 
>>>> Places, Participants for Events, …). To implement features like that we 
>>>> need components that provide query preprocessing capabilities based on 
>>>> data available in the Entityhub, Ontonet … . To me it seams that the 
>>>> contenthub/search/engines/ontologyresource component provides already some 
>>>> functionality related to this so this might be a good starting point.
> 
> Currently, we are trying to integrate an exploration mechanism like you said 
> above. It is also based on DBPedia ontology.  OntologyResourceEngine can be 
> used for this purpose for the user registered ontologies. Current 
> implementation of this engine only computes closures by exploiting the 
> hierarchy in the ontology. RDFPath Programs can also be an option at this 
> point. With an RDF Path Program user may specify the relations to be used in 
> the exploration process. But I think this means the user decides beforehand 
> which fields should be presented to him as exploration fields. I think this 
> is open to discussion.
> 
>> best
>> Rupert
>> 
> 
> Regards,
> Anil.

Reply via email to