Re: RdfPathLanguage: Next steps (was Re: Updates to LMF/Stanbol integration)

nuzzoles Fri, 28 Oct 2011 05:31:32 -0700

Hi Rupert, Stanbol team

I am interested in the SPARQL based implementation.
We have recently worked on extracting paths from linked data.
The work has been just presented at ISWC [1, 2].
Hence I have already some experience with the issue and some preliminary
results to reuse:


 * a number of scripts for extracting paths
 * a vocabulary for representing and storing them [3]

I am not sure I will be in Salzburg, but we can discuss this possibility
for including these issues at the hackathon.

--
Andrea Giovanni Nuzzolese
Semantic Technology Laboratory (STLab)
Institute for Cognitive Science and Technology (ISTC)
National Research Council  (CNR)
Via Nomentana 56, Roma - Italy


[1] http://www.stlab.istc.cnr.it/documents/papers/cold2011.pdf
[2] http://www.stlab.istc.cnr.it/documents/papers/wikilinkpatterns.pdf
[3] http://www.ontologydesignpatterns.org/ont/lod-analysis-properties.owl



> Hi Sebastian, Jakob, Stanbol team
>
> Based on the positive feedback of Anil to participate on this I decided to
> create an own thread to plan the next steps.
>
> Next steps:
>
> The first step will be to define Java API that allows to provide different
> implementations. I think the Idea was to create an own Project (should we
> use Github or GoogleCode? MIT/BSD/Apache licensed?) that only focusses on
> the Specification of the Language [1] and the Java API. Sebastian needs to
> take the lead of this. If I remember correctly his plan was to start this
> next week.
>
> As soon as a first version of this specification is available we can start
> to work on implementations.
>
> * Kiwi TripleStore: I assume Sebastian and Jakob will work on that
> * Clerezza: Anil could you take the lead for that?
> * Entityhub: This will be my responsibility
> * SPARQL based implementation: I think that would be interesting - someone
> interested to work on that?
> * CMS Adapter: Suat could you follow this effort and check for possible
> usage scenarios.
> * Fact Store: This could be also an interesting. But same as for the CMS
> Adapter we need first to check usage scenarios.
>
> best
> Rupert
>
>
>
> On 28.10.2011, at 10:07, Ali Anil SINACI wrote:
>
>> Dear Rupert,
>>
>> On 10/28/2011 08:47 AM, Rupert Westenthaler wrote:
>>> On 27.10.2011, at 16:59, Ali Anil SINACI wrote:
>>>>>
>>>>> * The LMF semantic search component overlaps greatly with the
>>>>> recently by Anil contributed "contenthub/search/engines/solr"
>>>>> component.  Related to this it would be great if Anil could have a
>>>>> look at [2] and check for similarities/differencies and possible
>>>>> integration paths.
>>>>>
>>>> I had a look on the semantic search component of LMF. As you pointed
>>>> it out, LMF semantic search provides a convenient way to index any
>>>> part of documents with the help of RDFPath Language. I think that we
>>>> can make use of this feature in contenthub. As I described in my
>>>> previous e-mail, currently, contenthub indexes a number of semantic
>>>> fields based on DBPedia relations. These are hardcoded relations.
>>>> RDFPath language can be used  to indicate specific semantic fields to
>>>> be indexed along with the content itself. Let me describe the thing in
>>>> our mind in a scenario:
>>>>
>>>> A user provides a domain ontology (e.g. music domain), submits to
>>>> Entityhub to be used in the enhancement process. Suppose the domain
>>>> ontology includes vast of information about artists, their albums
>>>> etc... I assume that this ontology does not include conceptual
>>>> definitions (it only includes Abox definitions). User writes an RDF
>>>> Path Program (in LMF terminology) to indicate the fields to be indexed
>>>> when a content item has an enhancement related with any path in that
>>>> program. Suppose user submits a content item along with the RDF Path
>>>> Program(s) to be used to determine the fields to be indexed.
>>>> Enhancement engines find an entity (or lots of entities). Now, we
>>>> execute the selected RDF Path Program(s) and embed the results into
>>>> the Solr representation of the content item.
>>>>
>>>> If you have any other suggestions, please let me know so that we can
>>>> discuss in detail (in SRDC) before the meeting.
>>>>
>>> This is exactly what I was thinking about. Let me only add that such
>>> additional Knowledge to be included within the Semantic Index might not
>>> only come from the Entityhub, but also from other sources (like the CMS
>>> via the CMS adapter)
>>>
>>> I you would like to help me with an Implementation of the
>>> RdfPathLanguage (e.g. the Clerezza based Implementation, or maybe a
>>> Jena bases implementation) please let me know. Help would be greatly
>>> welcome, because I have already a lot of things on my TODO list before
>>> the Meeting in November (such as defining a Proposal for the Stanbol
>>> Enhancement Structure).
>>>
>>
>> We would like to get involved in the implementation of RDFPathLanguage
>> for Stanbol. We plan to work on this starting from next week. I think
>> you & LMF team already have a design in your mind. I will appreciate if
>> you could share your thoughts with us.
>>
>>>>> * The Semantic Search Inteface: The Contenthub currently defines it's
>>>>> own query API (supports keyword based search as well as "field ->
>>>>> value" like constraints, supports facets). The LMF directly exposes
>>>>> the RESTful API of the semantic Solr index. I strongly prefer the
>>>>> approach of the LMF, because the two points already described above.
>>>> We think that we do not have to make a selection here. We can keep a
>>>> simple wrap-up on the Solr interface (contenthub's own query API)
>>>> while providing the Solr RESTful API as is. IMO a wrap-up on Solr
>>>> interface would be beneficial. On the other hand, in this interface we
>>>> try to make use of an ontology to be used in
>>>> OntologyResourceSearchEngine. This might help to figure out new
>>>> keywords based on the subsumption hierarchy inside the ontology.
>>>> However, I think this may lead to performance issues and may not be
>>>> useful at all. We can decide on this later.
>>> You forgot to mention one additional advantage for using the Solr
>>> RESTful API: If we do that one could create the Semantic Index and than
>>> copy it over to some other SolrServer without the need to run Stanbol
>>> directly on the production infrastructure.
>>>
>>> In general I would suggest to first focus the discussion on the unique
>>> features we would like to provide with the Semantic Search component. I
>>> already included three features I would like to have in my first Mail
>>> (Query preprocessing, Entity Facets, Semantic Facets). As you now
>>> mention the OntologyResourceSearchEngine is very relevant in relation
>>> to such features.
>>> However adding such features must not necessarily mean to create an own
>>> query language. One could also try to add such features directly to
>>> Solr by implementing some Solr extensions.
>>>
>>
>> Let me briefly comment in your suggestions about the semantic search.
>>
>>>>>  But I am also the opinion that a semantic search interface should at
>>>>> least provide the following three additional features:
>>>>>     1. Query preprocessing: e.g. substitute  "Paris" in the query
>>>>> with "http://dbpedia.org/resource/Paris";;
>>>>>     2. Entity Facets: if a keyword matches a Entity (e.g. "Paris" ->
>>>>>  "dbpedia:Paris", "dbpedia:Paris_Texas", "dbpedia:Paris_Hilton")
>>>>> than provide a Facet to the user over such possible
>>>>> nnnnnnnnmatches;
>>
>> As far as we understand, first and second features will be handled by
>> querying the Entityhub with the query keyword (Paris) i.e the first
>> entity obtained from the Entityhub will help us to recognize its type
>> and the other entities will be served as facet values of Paris facet.
>>
>>>>>     3. Semantic Facets: if a user uses an instance of an ontology
>>>>> type (e.g. a Place, Person, Organization) in a query, that
>>>>> provide facets over semantic relations for such types (e.g.
>>>>> fiends for persons, products/services for Organizations, nearby
>>>>> Points-Of-Interests for Places, Participants for Events, ). To
>>>>> implement features like that we need components that provide
>>>>> query preprocessing capabilities based on data available in the
>>>>> Entityhub, Ontonet  . To me it seams that the
>>>>> contenthub/search/engines/ontologyresource component provides
>>>>> already some functionality related to this so this might be a
>>>>> good starting point.
>>
>> Currently, we are trying to integrate an exploration mechanism like you
>> said above. It is also based on DBPedia ontology.
>> OntologyResourceEngine can be used for this purpose for the user
>> registered ontologies. Current implementation of this engine only
>> computes closures by exploiting the hierarchy in the ontology. RDFPath
>> Programs can also be an option at this point. With an RDF Path Program
>> user may specify the relations to be used in the exploration process.
>> But I think this means the user decides beforehand which fields should
>> be presented to him as exploration fields. I think this is open to
>> discussion.
>>
>>> best
>>> Rupert
>>>
>>
>> Regards,
>> Anil.
>
>

Re: RdfPathLanguage: Next steps (was Re: Updates to LMF/Stanbol integration)

Reply via email to