Re: RdfPathLanguage: Next steps (was Re: Updates to LMF/Stanbol integration)

Sebastian Schaffert Wed, 02 Nov 2011 01:48:45 -0700

Dear Valentina and all,

This is indeed very interesting. We developed "RDFPath" originally as a 
configuration language for the semantic search index, but later realised that 
it is a very natural way of querying over the Linked Data Cloud. In contrast to 
SPARQL, it is resource-centric like Linked Data, and it is restricted in a way 
that is more suitable for Linked Data servers. The path language is documented 
here:


http://code.google.com/p/kiwi/wiki/RdfPathLanguage

Combining this with pattern extraction would be very interesting, e.g. for 
automatically suggesting a configuration of our search index.


Greetings,

Sebastian


Am 28.10.2011 um 15:17 schrieb [email protected]:

> Actually, we have developed experience on this issue (SPARQL-based 
> implementation for path extraction). I think that Andrea could lead the task 
> and bring in the work already done as a starting point.
> 
> We have also identified a number of indicators that is worth to compute and 
> associate to paths. Everything is in the two papers (and the vocabulary) 
> referred by Andrea. For example, we compute the occurrences of paths in a 
> dataset, their popularity (we have defined a measure for this), and other 
> ones.
> They can be optional features offered by the SPARQL-based services.
> 
> Val
> 
> Quoting [email protected]:
> 
>> Hi Rupert, Stanbol team
>> 
>> I am interested in the SPARQL based implementation.
>> We have recently worked on extracting paths from linked data.
>> The work has been just presented at ISWC [1, 2].
>> Hence I have already some experience with the issue and some preliminary
>> results to reuse:
>> 
>> * a number of scripts for extracting paths
>> * a vocabulary for representing and storing them [3]
>> 
>> I am not sure I will be in Salzburg, but we can discuss this possibility
>> for including these issues at the hackathon.
>> 
>> --
>> Andrea Giovanni Nuzzolese
>> Semantic Technology Laboratory (STLab)
>> Institute for Cognitive Science and Technology (ISTC)
>> National Research Council  (CNR)
>> Via Nomentana 56, Roma - Italy
>> 
>> 
>> [1] http://www.stlab.istc.cnr.it/documents/papers/cold2011.pdf
>> [2] http://www.stlab.istc.cnr.it/documents/papers/wikilinkpatterns.pdf
>> [3] http://www.ontologydesignpatterns.org/ont/lod-analysis-properties.owl
>> 
>> 
>> 
>>> Hi Sebastian, Jakob, Stanbol team
>>> 
>>> Based on the positive feedback of Anil to participate on this I decided to
>>> create an own thread to plan the next steps.
>>> 
>>> Next steps:
>>> 
>>> The first step will be to define Java API that allows to provide different
>>> implementations. I think the Idea was to create an own Project (should we
>>> use Github or GoogleCode? MIT/BSD/Apache licensed?) that only focusses on
>>> the Specification of the Language [1] and the Java API. Sebastian needs to
>>> take the lead of this. If I remember correctly his plan was to start this
>>> next week.
>>> 
>>> As soon as a first version of this specification is available we can start
>>> to work on implementations.
>>> 
>>> * Kiwi TripleStore: I assume Sebastian and Jakob will work on that
>>> * Clerezza: Anil could you take the lead for that?
>>> * Entityhub: This will be my responsibility
>>> * SPARQL based implementation: I think that would be interesting - someone
>>> interested to work on that?
>>> * CMS Adapter: Suat could you follow this effort and check for possible
>>> usage scenarios.
>>> * Fact Store: This could be also an interesting. But same as for the CMS
>>> Adapter we need first to check usage scenarios.
>>> 
>>> best
>>> Rupert
>>> 
>>> 
>>> 
>>> On 28.10.2011, at 10:07, Ali Anil SINACI wrote:
>>> 
>>>> Dear Rupert,
>>>> 
>>>> On 10/28/2011 08:47 AM, Rupert Westenthaler wrote:
>>>>> On 27.10.2011, at 16:59, Ali Anil SINACI wrote:
>>>>>>> 
>>>>>>> * The LMF semantic search component overlaps greatly with the
>>>>>>> recently by Anil contributed "contenthub/search/engines/solr"
>>>>>>> component.  Related to this it would be great if Anil could have a
>>>>>>> look at [2] and check for similarities/differencies and possible
>>>>>>> integration paths.
>>>>>>> 
>>>>>> I had a look on the semantic search component of LMF. As you pointed
>>>>>> it out, LMF semantic search provides a convenient way to index any
>>>>>> part of documents with the help of RDFPath Language. I think that we
>>>>>> can make use of this feature in contenthub. As I described in my
>>>>>> previous e-mail, currently, contenthub indexes a number of semantic
>>>>>> fields based on DBPedia relations. These are hardcoded relations.
>>>>>> RDFPath language can be used  to indicate specific semantic fields to
>>>>>> be indexed along with the content itself. Let me describe the thing in
>>>>>> our mind in a scenario:
>>>>>> 
>>>>>> A user provides a domain ontology (e.g. music domain), submits to
>>>>>> Entityhub to be used in the enhancement process. Suppose the domain
>>>>>> ontology includes vast of information about artists, their albums
>>>>>> etc... I assume that this ontology does not include conceptual
>>>>>> definitions (it only includes Abox definitions). User writes an RDF
>>>>>> Path Program (in LMF terminology) to indicate the fields to be indexed
>>>>>> when a content item has an enhancement related with any path in that
>>>>>> program. Suppose user submits a content item along with the RDF Path
>>>>>> Program(s) to be used to determine the fields to be indexed.
>>>>>> Enhancement engines find an entity (or lots of entities). Now, we
>>>>>> execute the selected RDF Path Program(s) and embed the results into
>>>>>> the Solr representation of the content item.
>>>>>> 
>>>>>> If you have any other suggestions, please let me know so that we can
>>>>>> discuss in detail (in SRDC) before the meeting.
>>>>>> 
>>>>> This is exactly what I was thinking about. Let me only add that such
>>>>> additional Knowledge to be included within the Semantic Index might not
>>>>> only come from the Entityhub, but also from other sources (like the CMS
>>>>> via the CMS adapter)
>>>>> 
>>>>> I you would like to help me with an Implementation of the
>>>>> RdfPathLanguage (e.g. the Clerezza based Implementation, or maybe a
>>>>> Jena bases implementation) please let me know. Help would be greatly
>>>>> welcome, because I have already a lot of things on my TODO list before
>>>>> the Meeting in November (such as defining a Proposal for the Stanbol
>>>>> Enhancement Structure).
>>>>> 
>>>> 
>>>> We would like to get involved in the implementation of RDFPathLanguage
>>>> for Stanbol. We plan to work on this starting from next week. I think
>>>> you & LMF team already have a design in your mind. I will appreciate if
>>>> you could share your thoughts with us.
>>>> 
>>>>>>> * The Semantic Search Inteface: The Contenthub currently defines it's
>>>>>>> own query API (supports keyword based search as well as "field ->
>>>>>>> value" like constraints, supports facets). The LMF directly exposes
>>>>>>> the RESTful API of the semantic Solr index. I strongly prefer the
>>>>>>> approach of the LMF, because the two points already described above.
>>>>>> We think that we do not have to make a selection here. We can keep a
>>>>>> simple wrap-up on the Solr interface (contenthub's own query API)
>>>>>> while providing the Solr RESTful API as is. IMO a wrap-up on Solr
>>>>>> interface would be beneficial. On the other hand, in this interface we
>>>>>> try to make use of an ontology to be used in
>>>>>> OntologyResourceSearchEngine. This might help to figure out new
>>>>>> keywords based on the subsumption hierarchy inside the ontology.
>>>>>> However, I think this may lead to performance issues and may not be
>>>>>> useful at all. We can decide on this later.
>>>>> You forgot to mention one additional advantage for using the Solr
>>>>> RESTful API: If we do that one could create the Semantic Index and than
>>>>> copy it over to some other SolrServer without the need to run Stanbol
>>>>> directly on the production infrastructure.
>>>>> 
>>>>> In general I would suggest to first focus the discussion on the unique
>>>>> features we would like to provide with the Semantic Search component. I
>>>>> already included three features I would like to have in my first Mail
>>>>> (Query preprocessing, Entity Facets, Semantic Facets). As you now
>>>>> mention the OntologyResourceSearchEngine is very relevant in relation
>>>>> to such features.
>>>>> However adding such features must not necessarily mean to create an own
>>>>> query language. One could also try to add such features directly to
>>>>> Solr by implementing some Solr extensions.
>>>>> 
>>>> 
>>>> Let me briefly comment in your suggestions about the semantic search.
>>>> 
>>>>>>> But I am also the opinion that a semantic search interface should at
>>>>>>> least provide the following three additional features:
>>>>>>>    1. Query preprocessing: e.g. substitute  "Paris" in the query
>>>>>>> with "http://dbpedia.org/resource/Paris";;
>>>>>>>    2. Entity Facets: if a keyword matches a Entity (e.g. "Paris" ->
>>>>>>> "dbpedia:Paris", "dbpedia:Paris_Texas", "dbpedia:Paris_Hilton")
>>>>>>> than provide a Facet to the user over such possible
>>>>>>> nnnnnnnnmatches;
>>>> 
>>>> As far as we understand, first and second features will be handled by
>>>> querying the Entityhub with the query keyword (Paris) i.e the first
>>>> entity obtained from the Entityhub will help us to recognize its type
>>>> and the other entities will be served as facet values of Paris facet.
>>>> 
>>>>>>>    3. Semantic Facets: if a user uses an instance of an ontology
>>>>>>> type (e.g. a Place, Person, Organization) in a query, that
>>>>>>> provide facets over semantic relations for such types (e.g.
>>>>>>> fiends for persons, products/services for Organizations, nearby
>>>>>>> Points-Of-Interests for Places, Participants for Events, …). To
>>>>>>> implement features like that we need components that provide
>>>>>>> query preprocessing capabilities based on data available in the
>>>>>>> Entityhub, Ontonet … . To me it seams that the
>>>>>>> contenthub/search/engines/ontologyresource component provides
>>>>>>> already some functionality related to this so this might be a
>>>>>>> good starting point.
>>>> 
>>>> Currently, we are trying to integrate an exploration mechanism like you
>>>> said above. It is also based on DBPedia ontology.
>>>> OntologyResourceEngine can be used for this purpose for the user
>>>> registered ontologies. Current implementation of this engine only
>>>> computes closures by exploiting the hierarchy in the ontology. RDFPath
>>>> Programs can also be an option at this point. With an RDF Path Program
>>>> user may specify the relations to be used in the exploration process.
>>>> But I think this means the user decides beforehand which fields should
>>>> be presented to him as exploration fields. I think this is open to
>>>> discussion.
>>>> 
>>>>> best
>>>>> Rupert
>>>>> 
>>>> 
>>>> Regards,
>>>> Anil.
>>> 
>>> 
>> 
>> 
>> 
> 
> 
> 
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
> 
> 

Sebastian
-- 
| Dr. Sebastian Schaffert          [email protected]
| Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
| Head of Knowledge and Media Technologies Group          +43 662 2288 423
| Jakob-Haringer Strasse 5/II
| A-5020 Salzburg

Re: RdfPathLanguage: Next steps (was Re: Updates to LMF/Stanbol integration)

Reply via email to