Re: RdfPathLanguage: Next steps (was Re: Updates to LMF/Stanbol integration)

valentina presutti Wed, 02 Nov 2011 04:02:11 -0700

Hi Sebastian,

Andrea will go through the RDFPath in order to better understand how the work 
we've done can benefit from RDFPath.
He'll be back with more details.


Just to give you an idea, actually our approach is focused on extracting paths 
by looking at their occurrences in the data.
Maybe what we call paths for you are path-types.
Such path-types are very useful for producing a summary of a dataset and for 
identifying invariances in the usage of data.
In few words, you can use such information for better accessing the data (e.g., 
as a support to developers designing queries for a dataset, designing 
interfaces, etc.) 
My intuition (but let me look more in details at RDFPath, I could be wrong) is 
that we could use an implementation of RDFPath for more efficiently gather such 
information and extract dataset patterns.

Val 

On Nov 2, 2011, at 9:48 AM, Sebastian Schaffert wrote:

> Dear Valentina and all,
> 
> This is indeed very interesting. We developed "RDFPath" originally as a 
> configuration language for the semantic search index, but later realised that 
> it is a very natural way of querying over the Linked Data Cloud. In contrast 
> to SPARQL, it is resource-centric like Linked Data, and it is restricted in a 
> way that is more suitable for Linked Data servers. The path language is 
> documented here:
> 
> http://code.google.com/p/kiwi/wiki/RdfPathLanguage
> 
> Combining this with pattern extraction would be very interesting, e.g. for 
> automatically suggesting a configuration of our search index.
> 
> 
> Greetings,
> 
> Sebastian
> 
> 
> Am 28.10.2011 um 15:17 schrieb [email protected]:
> 
>> Actually, we have developed experience on this issue (SPARQL-based 
>> implementation for path extraction). I think that Andrea could lead the task 
>> and bring in the work already done as a starting point.
>> 
>> We have also identified a number of indicators that is worth to compute and 
>> associate to paths. Everything is in the two papers (and the vocabulary) 
>> referred by Andrea. For example, we compute the occurrences of paths in a 
>> dataset, their popularity (we have defined a measure for this), and other 
>> ones.
>> They can be optional features offered by the SPARQL-based services.
>> 
>> Val
>> 
>> Quoting [email protected]:
>> 
>>> Hi Rupert, Stanbol team
>>> 
>>> I am interested in the SPARQL based implementation.
>>> We have recently worked on extracting paths from linked data.
>>> The work has been just presented at ISWC [1, 2].
>>> Hence I have already some experience with the issue and some preliminary
>>> results to reuse:
>>> 
>>> * a number of scripts for extracting paths
>>> * a vocabulary for representing and storing them [3]
>>> 
>>> I am not sure I will be in Salzburg, but we can discuss this possibility
>>> for including these issues at the hackathon.
>>> 
>>> --
>>> Andrea Giovanni Nuzzolese
>>> Semantic Technology Laboratory (STLab)
>>> Institute for Cognitive Science and Technology (ISTC)
>>> National Research Council  (CNR)
>>> Via Nomentana 56, Roma - Italy
>>> 
>>> 
>>> [1] http://www.stlab.istc.cnr.it/documents/papers/cold2011.pdf
>>> [2] http://www.stlab.istc.cnr.it/documents/papers/wikilinkpatterns.pdf
>>> [3] http://www.ontologydesignpatterns.org/ont/lod-analysis-properties.owl
>>> 
>>> 
>>> 
>>>> Hi Sebastian, Jakob, Stanbol team
>>>> 
>>>> Based on the positive feedback of Anil to participate on this I decided to
>>>> create an own thread to plan the next steps.
>>>> 
>>>> Next steps:
>>>> 
>>>> The first step will be to define Java API that allows to provide different
>>>> implementations. I think the Idea was to create an own Project (should we
>>>> use Github or GoogleCode? MIT/BSD/Apache licensed?) that only focusses on
>>>> the Specification of the Language [1] and the Java API. Sebastian needs to
>>>> take the lead of this. If I remember correctly his plan was to start this
>>>> next week.
>>>> 
>>>> As soon as a first version of this specification is available we can start
>>>> to work on implementations.
>>>> 
>>>> * Kiwi TripleStore: I assume Sebastian and Jakob will work on that
>>>> * Clerezza: Anil could you take the lead for that?
>>>> * Entityhub: This will be my responsibility
>>>> * SPARQL based implementation: I think that would be interesting - someone
>>>> interested to work on that?
>>>> * CMS Adapter: Suat could you follow this effort and check for possible
>>>> usage scenarios.
>>>> * Fact Store: This could be also an interesting. But same as for the CMS
>>>> Adapter we need first to check usage scenarios.
>>>> 
>>>> best
>>>> Rupert
>>>> 
>>>> 
>>>> 
>>>> On 28.10.2011, at 10:07, Ali Anil SINACI wrote:
>>>> 
>>>>> Dear Rupert,
>>>>> 
>>>>> On 10/28/2011 08:47 AM, Rupert Westenthaler wrote:
>>>>>> On 27.10.2011, at 16:59, Ali Anil SINACI wrote:
>>>>>>>> 
>>>>>>>> * The LMF semantic search component overlaps greatly with the
>>>>>>>> recently by Anil contributed "contenthub/search/engines/solr"
>>>>>>>> component.  Related to this it would be great if Anil could have a
>>>>>>>> look at [2] and check for similarities/differencies and possible
>>>>>>>> integration paths.
>>>>>>>> 
>>>>>>> I had a look on the semantic search component of LMF. As you pointed
>>>>>>> it out, LMF semantic search provides a convenient way to index any
>>>>>>> part of documents with the help of RDFPath Language. I think that we
>>>>>>> can make use of this feature in contenthub. As I described in my
>>>>>>> previous e-mail, currently, contenthub indexes a number of semantic
>>>>>>> fields based on DBPedia relations. These are hardcoded relations.
>>>>>>> RDFPath language can be used  to indicate specific semantic fields to
>>>>>>> be indexed along with the content itself. Let me describe the thing in
>>>>>>> our mind in a scenario:
>>>>>>> 
>>>>>>> A user provides a domain ontology (e.g. music domain), submits to
>>>>>>> Entityhub to be used in the enhancement process. Suppose the domain
>>>>>>> ontology includes vast of information about artists, their albums
>>>>>>> etc... I assume that this ontology does not include conceptual
>>>>>>> definitions (it only includes Abox definitions). User writes an RDF
>>>>>>> Path Program (in LMF terminology) to indicate the fields to be indexed
>>>>>>> when a content item has an enhancement related with any path in that
>>>>>>> program. Suppose user submits a content item along with the RDF Path
>>>>>>> Program(s) to be used to determine the fields to be indexed.
>>>>>>> Enhancement engines find an entity (or lots of entities). Now, we
>>>>>>> execute the selected RDF Path Program(s) and embed the results into
>>>>>>> the Solr representation of the content item.
>>>>>>> 
>>>>>>> If you have any other suggestions, please let me know so that we can
>>>>>>> discuss in detail (in SRDC) before the meeting.
>>>>>>> 
>>>>>> This is exactly what I was thinking about. Let me only add that such
>>>>>> additional Knowledge to be included within the Semantic Index might not
>>>>>> only come from the Entityhub, but also from other sources (like the CMS
>>>>>> via the CMS adapter)
>>>>>> 
>>>>>> I you would like to help me with an Implementation of the
>>>>>> RdfPathLanguage (e.g. the Clerezza based Implementation, or maybe a
>>>>>> Jena bases implementation) please let me know. Help would be greatly
>>>>>> welcome, because I have already a lot of things on my TODO list before
>>>>>> the Meeting in November (such as defining a Proposal for the Stanbol
>>>>>> Enhancement Structure).
>>>>>> 
>>>>> 
>>>>> We would like to get involved in the implementation of RDFPathLanguage
>>>>> for Stanbol. We plan to work on this starting from next week. I think
>>>>> you & LMF team already have a design in your mind. I will appreciate if
>>>>> you could share your thoughts with us.
>>>>> 
>>>>>>>> * The Semantic Search Inteface: The Contenthub currently defines it's
>>>>>>>> own query API (supports keyword based search as well as "field ->
>>>>>>>> value" like constraints, supports facets). The LMF directly exposes
>>>>>>>> the RESTful API of the semantic Solr index. I strongly prefer the
>>>>>>>> approach of the LMF, because the two points already described above.
>>>>>>> We think that we do not have to make a selection here. We can keep a
>>>>>>> simple wrap-up on the Solr interface (contenthub's own query API)
>>>>>>> while providing the Solr RESTful API as is. IMO a wrap-up on Solr
>>>>>>> interface would be beneficial. On the other hand, in this interface we
>>>>>>> try to make use of an ontology to be used in
>>>>>>> OntologyResourceSearchEngine. This might help to figure out new
>>>>>>> keywords based on the subsumption hierarchy inside the ontology.
>>>>>>> However, I think this may lead to performance issues and may not be
>>>>>>> useful at all. We can decide on this later.
>>>>>> You forgot to mention one additional advantage for using the Solr
>>>>>> RESTful API: If we do that one could create the Semantic Index and than
>>>>>> copy it over to some other SolrServer without the need to run Stanbol
>>>>>> directly on the production infrastructure.
>>>>>> 
>>>>>> In general I would suggest to first focus the discussion on the unique
>>>>>> features we would like to provide with the Semantic Search component. I
>>>>>> already included three features I would like to have in my first Mail
>>>>>> (Query preprocessing, Entity Facets, Semantic Facets). As you now
>>>>>> mention the OntologyResourceSearchEngine is very relevant in relation
>>>>>> to such features.
>>>>>> However adding such features must not necessarily mean to create an own
>>>>>> query language. One could also try to add such features directly to
>>>>>> Solr by implementing some Solr extensions.
>>>>>> 
>>>>> 
>>>>> Let me briefly comment in your suggestions about the semantic search.
>>>>> 
>>>>>>>> But I am also the opinion that a semantic search interface should at
>>>>>>>> least provide the following three additional features:
>>>>>>>>   1. Query preprocessing: e.g. substitute  "Paris" in the query
>>>>>>>> with "http://dbpedia.org/resource/Paris";;
>>>>>>>>   2. Entity Facets: if a keyword matches a Entity (e.g. "Paris" ->
>>>>>>>> "dbpedia:Paris", "dbpedia:Paris_Texas", "dbpedia:Paris_Hilton")
>>>>>>>> than provide a Facet to the user over such possible
>>>>>>>> nnnnnnnnmatches;
>>>>> 
>>>>> As far as we understand, first and second features will be handled by
>>>>> querying the Entityhub with the query keyword (Paris) i.e the first
>>>>> entity obtained from the Entityhub will help us to recognize its type
>>>>> and the other entities will be served as facet values of Paris facet.
>>>>> 
>>>>>>>>   3. Semantic Facets: if a user uses an instance of an ontology
>>>>>>>> type (e.g. a Place, Person, Organization) in a query, that
>>>>>>>> provide facets over semantic relations for such types (e.g.
>>>>>>>> fiends for persons, products/services for Organizations, nearby
>>>>>>>> Points-Of-Interests for Places, Participants for Events, …). To
>>>>>>>> implement features like that we need components that provide
>>>>>>>> query preprocessing capabilities based on data available in the
>>>>>>>> Entityhub, Ontonet … . To me it seams that the
>>>>>>>> contenthub/search/engines/ontologyresource component provides
>>>>>>>> already some functionality related to this so this might be a
>>>>>>>> good starting point.
>>>>> 
>>>>> Currently, we are trying to integrate an exploration mechanism like you
>>>>> said above. It is also based on DBPedia ontology.
>>>>> OntologyResourceEngine can be used for this purpose for the user
>>>>> registered ontologies. Current implementation of this engine only
>>>>> computes closures by exploiting the hierarchy in the ontology. RDFPath
>>>>> Programs can also be an option at this point. With an RDF Path Program
>>>>> user may specify the relations to be used in the exploration process.
>>>>> But I think this means the user decides beforehand which fields should
>>>>> be presented to him as exploration fields. I think this is open to
>>>>> discussion.
>>>>> 
>>>>>> best
>>>>>> Rupert
>>>>>> 
>>>>> 
>>>>> Regards,
>>>>> Anil.
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> ----------------------------------------------------------------
>> This message was sent using IMP, the Internet Messaging Program.
>> 
>> 
> 
> Sebastian
> -- 
> | Dr. Sebastian Schaffert          [email protected]
> | Salzburg Research Forschungsgesellschaft  http://www.salzburgresearch.at
> | Head of Knowledge and Media Technologies Group          +43 662 2288 423
> | Jakob-Haringer Strasse 5/II
> | A-5020 Salzburg
>

Re: RdfPathLanguage: Next steps (was Re: Updates to LMF/Stanbol integration)

Reply via email to