Dear Valentina and all, This is indeed very interesting. We developed "RDFPath" originally as a configuration language for the semantic search index, but later realised that it is a very natural way of querying over the Linked Data Cloud. In contrast to SPARQL, it is resource-centric like Linked Data, and it is restricted in a way that is more suitable for Linked Data servers. The path language is documented here:
http://code.google.com/p/kiwi/wiki/RdfPathLanguage Combining this with pattern extraction would be very interesting, e.g. for automatically suggesting a configuration of our search index. Greetings, Sebastian Am 28.10.2011 um 15:17 schrieb [email protected]: > Actually, we have developed experience on this issue (SPARQL-based > implementation for path extraction). I think that Andrea could lead the task > and bring in the work already done as a starting point. > > We have also identified a number of indicators that is worth to compute and > associate to paths. Everything is in the two papers (and the vocabulary) > referred by Andrea. For example, we compute the occurrences of paths in a > dataset, their popularity (we have defined a measure for this), and other > ones. > They can be optional features offered by the SPARQL-based services. > > Val > > Quoting [email protected]: > >> Hi Rupert, Stanbol team >> >> I am interested in the SPARQL based implementation. >> We have recently worked on extracting paths from linked data. >> The work has been just presented at ISWC [1, 2]. >> Hence I have already some experience with the issue and some preliminary >> results to reuse: >> >> * a number of scripts for extracting paths >> * a vocabulary for representing and storing them [3] >> >> I am not sure I will be in Salzburg, but we can discuss this possibility >> for including these issues at the hackathon. >> >> -- >> Andrea Giovanni Nuzzolese >> Semantic Technology Laboratory (STLab) >> Institute for Cognitive Science and Technology (ISTC) >> National Research Council (CNR) >> Via Nomentana 56, Roma - Italy >> >> >> [1] http://www.stlab.istc.cnr.it/documents/papers/cold2011.pdf >> [2] http://www.stlab.istc.cnr.it/documents/papers/wikilinkpatterns.pdf >> [3] http://www.ontologydesignpatterns.org/ont/lod-analysis-properties.owl >> >> >> >>> Hi Sebastian, Jakob, Stanbol team >>> >>> Based on the positive feedback of Anil to participate on this I decided to >>> create an own thread to plan the next steps. >>> >>> Next steps: >>> >>> The first step will be to define Java API that allows to provide different >>> implementations. I think the Idea was to create an own Project (should we >>> use Github or GoogleCode? MIT/BSD/Apache licensed?) that only focusses on >>> the Specification of the Language [1] and the Java API. Sebastian needs to >>> take the lead of this. If I remember correctly his plan was to start this >>> next week. >>> >>> As soon as a first version of this specification is available we can start >>> to work on implementations. >>> >>> * Kiwi TripleStore: I assume Sebastian and Jakob will work on that >>> * Clerezza: Anil could you take the lead for that? >>> * Entityhub: This will be my responsibility >>> * SPARQL based implementation: I think that would be interesting - someone >>> interested to work on that? >>> * CMS Adapter: Suat could you follow this effort and check for possible >>> usage scenarios. >>> * Fact Store: This could be also an interesting. But same as for the CMS >>> Adapter we need first to check usage scenarios. >>> >>> best >>> Rupert >>> >>> >>> >>> On 28.10.2011, at 10:07, Ali Anil SINACI wrote: >>> >>>> Dear Rupert, >>>> >>>> On 10/28/2011 08:47 AM, Rupert Westenthaler wrote: >>>>> On 27.10.2011, at 16:59, Ali Anil SINACI wrote: >>>>>>> >>>>>>> * The LMF semantic search component overlaps greatly with the >>>>>>> recently by Anil contributed "contenthub/search/engines/solr" >>>>>>> component. Related to this it would be great if Anil could have a >>>>>>> look at [2] and check for similarities/differencies and possible >>>>>>> integration paths. >>>>>>> >>>>>> I had a look on the semantic search component of LMF. As you pointed >>>>>> it out, LMF semantic search provides a convenient way to index any >>>>>> part of documents with the help of RDFPath Language. I think that we >>>>>> can make use of this feature in contenthub. As I described in my >>>>>> previous e-mail, currently, contenthub indexes a number of semantic >>>>>> fields based on DBPedia relations. These are hardcoded relations. >>>>>> RDFPath language can be used to indicate specific semantic fields to >>>>>> be indexed along with the content itself. Let me describe the thing in >>>>>> our mind in a scenario: >>>>>> >>>>>> A user provides a domain ontology (e.g. music domain), submits to >>>>>> Entityhub to be used in the enhancement process. Suppose the domain >>>>>> ontology includes vast of information about artists, their albums >>>>>> etc... I assume that this ontology does not include conceptual >>>>>> definitions (it only includes Abox definitions). User writes an RDF >>>>>> Path Program (in LMF terminology) to indicate the fields to be indexed >>>>>> when a content item has an enhancement related with any path in that >>>>>> program. Suppose user submits a content item along with the RDF Path >>>>>> Program(s) to be used to determine the fields to be indexed. >>>>>> Enhancement engines find an entity (or lots of entities). Now, we >>>>>> execute the selected RDF Path Program(s) and embed the results into >>>>>> the Solr representation of the content item. >>>>>> >>>>>> If you have any other suggestions, please let me know so that we can >>>>>> discuss in detail (in SRDC) before the meeting. >>>>>> >>>>> This is exactly what I was thinking about. Let me only add that such >>>>> additional Knowledge to be included within the Semantic Index might not >>>>> only come from the Entityhub, but also from other sources (like the CMS >>>>> via the CMS adapter) >>>>> >>>>> I you would like to help me with an Implementation of the >>>>> RdfPathLanguage (e.g. the Clerezza based Implementation, or maybe a >>>>> Jena bases implementation) please let me know. Help would be greatly >>>>> welcome, because I have already a lot of things on my TODO list before >>>>> the Meeting in November (such as defining a Proposal for the Stanbol >>>>> Enhancement Structure). >>>>> >>>> >>>> We would like to get involved in the implementation of RDFPathLanguage >>>> for Stanbol. We plan to work on this starting from next week. I think >>>> you & LMF team already have a design in your mind. I will appreciate if >>>> you could share your thoughts with us. >>>> >>>>>>> * The Semantic Search Inteface: The Contenthub currently defines it's >>>>>>> own query API (supports keyword based search as well as "field -> >>>>>>> value" like constraints, supports facets). The LMF directly exposes >>>>>>> the RESTful API of the semantic Solr index. I strongly prefer the >>>>>>> approach of the LMF, because the two points already described above. >>>>>> We think that we do not have to make a selection here. We can keep a >>>>>> simple wrap-up on the Solr interface (contenthub's own query API) >>>>>> while providing the Solr RESTful API as is. IMO a wrap-up on Solr >>>>>> interface would be beneficial. On the other hand, in this interface we >>>>>> try to make use of an ontology to be used in >>>>>> OntologyResourceSearchEngine. This might help to figure out new >>>>>> keywords based on the subsumption hierarchy inside the ontology. >>>>>> However, I think this may lead to performance issues and may not be >>>>>> useful at all. We can decide on this later. >>>>> You forgot to mention one additional advantage for using the Solr >>>>> RESTful API: If we do that one could create the Semantic Index and than >>>>> copy it over to some other SolrServer without the need to run Stanbol >>>>> directly on the production infrastructure. >>>>> >>>>> In general I would suggest to first focus the discussion on the unique >>>>> features we would like to provide with the Semantic Search component. I >>>>> already included three features I would like to have in my first Mail >>>>> (Query preprocessing, Entity Facets, Semantic Facets). As you now >>>>> mention the OntologyResourceSearchEngine is very relevant in relation >>>>> to such features. >>>>> However adding such features must not necessarily mean to create an own >>>>> query language. One could also try to add such features directly to >>>>> Solr by implementing some Solr extensions. >>>>> >>>> >>>> Let me briefly comment in your suggestions about the semantic search. >>>> >>>>>>> But I am also the opinion that a semantic search interface should at >>>>>>> least provide the following three additional features: >>>>>>> 1. Query preprocessing: e.g. substitute "Paris" in the query >>>>>>> with "http://dbpedia.org/resource/Paris"; >>>>>>> 2. Entity Facets: if a keyword matches a Entity (e.g. "Paris" -> >>>>>>> "dbpedia:Paris", "dbpedia:Paris_Texas", "dbpedia:Paris_Hilton") >>>>>>> than provide a Facet to the user over such possible >>>>>>> nnnnnnnnmatches; >>>> >>>> As far as we understand, first and second features will be handled by >>>> querying the Entityhub with the query keyword (Paris) i.e the first >>>> entity obtained from the Entityhub will help us to recognize its type >>>> and the other entities will be served as facet values of Paris facet. >>>> >>>>>>> 3. Semantic Facets: if a user uses an instance of an ontology >>>>>>> type (e.g. a Place, Person, Organization) in a query, that >>>>>>> provide facets over semantic relations for such types (e.g. >>>>>>> fiends for persons, products/services for Organizations, nearby >>>>>>> Points-Of-Interests for Places, Participants for Events, …). To >>>>>>> implement features like that we need components that provide >>>>>>> query preprocessing capabilities based on data available in the >>>>>>> Entityhub, Ontonet … . To me it seams that the >>>>>>> contenthub/search/engines/ontologyresource component provides >>>>>>> already some functionality related to this so this might be a >>>>>>> good starting point. >>>> >>>> Currently, we are trying to integrate an exploration mechanism like you >>>> said above. It is also based on DBPedia ontology. >>>> OntologyResourceEngine can be used for this purpose for the user >>>> registered ontologies. Current implementation of this engine only >>>> computes closures by exploiting the hierarchy in the ontology. RDFPath >>>> Programs can also be an option at this point. With an RDF Path Program >>>> user may specify the relations to be used in the exploration process. >>>> But I think this means the user decides beforehand which fields should >>>> be presented to him as exploration fields. I think this is open to >>>> discussion. >>>> >>>>> best >>>>> Rupert >>>>> >>>> >>>> Regards, >>>> Anil. >>> >>> >> >> >> > > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > > Sebastian -- | Dr. Sebastian Schaffert [email protected] | Salzburg Research Forschungsgesellschaft http://www.salzburgresearch.at | Head of Knowledge and Media Technologies Group +43 662 2288 423 | Jakob-Haringer Strasse 5/II | A-5020 Salzburg
