Hi Rupert, Stanbol team I am interested in the SPARQL based implementation. We have recently worked on extracting paths from linked data. The work has been just presented at ISWC [1, 2]. Hence I have already some experience with the issue and some preliminary results to reuse:
* a number of scripts for extracting paths * a vocabulary for representing and storing them [3] I am not sure I will be in Salzburg, but we can discuss this possibility for including these issues at the hackathon. -- Andrea Giovanni Nuzzolese Semantic Technology Laboratory (STLab) Institute for Cognitive Science and Technology (ISTC) National Research Council (CNR) Via Nomentana 56, Roma - Italy [1] http://www.stlab.istc.cnr.it/documents/papers/cold2011.pdf [2] http://www.stlab.istc.cnr.it/documents/papers/wikilinkpatterns.pdf [3] http://www.ontologydesignpatterns.org/ont/lod-analysis-properties.owl > Hi Sebastian, Jakob, Stanbol team > > Based on the positive feedback of Anil to participate on this I decided to > create an own thread to plan the next steps. > > Next steps: > > The first step will be to define Java API that allows to provide different > implementations. I think the Idea was to create an own Project (should we > use Github or GoogleCode? MIT/BSD/Apache licensed?) that only focusses on > the Specification of the Language [1] and the Java API. Sebastian needs to > take the lead of this. If I remember correctly his plan was to start this > next week. > > As soon as a first version of this specification is available we can start > to work on implementations. > > * Kiwi TripleStore: I assume Sebastian and Jakob will work on that > * Clerezza: Anil could you take the lead for that? > * Entityhub: This will be my responsibility > * SPARQL based implementation: I think that would be interesting - someone > interested to work on that? > * CMS Adapter: Suat could you follow this effort and check for possible > usage scenarios. > * Fact Store: This could be also an interesting. But same as for the CMS > Adapter we need first to check usage scenarios. > > best > Rupert > > > > On 28.10.2011, at 10:07, Ali Anil SINACI wrote: > >> Dear Rupert, >> >> On 10/28/2011 08:47 AM, Rupert Westenthaler wrote: >>> On 27.10.2011, at 16:59, Ali Anil SINACI wrote: >>>>> >>>>> * The LMF semantic search component overlaps greatly with the >>>>> recently by Anil contributed "contenthub/search/engines/solr" >>>>> component. Related to this it would be great if Anil could have a >>>>> look at [2] and check for similarities/differencies and possible >>>>> integration paths. >>>>> >>>> I had a look on the semantic search component of LMF. As you pointed >>>> it out, LMF semantic search provides a convenient way to index any >>>> part of documents with the help of RDFPath Language. I think that we >>>> can make use of this feature in contenthub. As I described in my >>>> previous e-mail, currently, contenthub indexes a number of semantic >>>> fields based on DBPedia relations. These are hardcoded relations. >>>> RDFPath language can be used to indicate specific semantic fields to >>>> be indexed along with the content itself. Let me describe the thing in >>>> our mind in a scenario: >>>> >>>> A user provides a domain ontology (e.g. music domain), submits to >>>> Entityhub to be used in the enhancement process. Suppose the domain >>>> ontology includes vast of information about artists, their albums >>>> etc... I assume that this ontology does not include conceptual >>>> definitions (it only includes Abox definitions). User writes an RDF >>>> Path Program (in LMF terminology) to indicate the fields to be indexed >>>> when a content item has an enhancement related with any path in that >>>> program. Suppose user submits a content item along with the RDF Path >>>> Program(s) to be used to determine the fields to be indexed. >>>> Enhancement engines find an entity (or lots of entities). Now, we >>>> execute the selected RDF Path Program(s) and embed the results into >>>> the Solr representation of the content item. >>>> >>>> If you have any other suggestions, please let me know so that we can >>>> discuss in detail (in SRDC) before the meeting. >>>> >>> This is exactly what I was thinking about. Let me only add that such >>> additional Knowledge to be included within the Semantic Index might not >>> only come from the Entityhub, but also from other sources (like the CMS >>> via the CMS adapter) >>> >>> I you would like to help me with an Implementation of the >>> RdfPathLanguage (e.g. the Clerezza based Implementation, or maybe a >>> Jena bases implementation) please let me know. Help would be greatly >>> welcome, because I have already a lot of things on my TODO list before >>> the Meeting in November (such as defining a Proposal for the Stanbol >>> Enhancement Structure). >>> >> >> We would like to get involved in the implementation of RDFPathLanguage >> for Stanbol. We plan to work on this starting from next week. I think >> you & LMF team already have a design in your mind. I will appreciate if >> you could share your thoughts with us. >> >>>>> * The Semantic Search Inteface: The Contenthub currently defines it's >>>>> own query API (supports keyword based search as well as "field -> >>>>> value" like constraints, supports facets). The LMF directly exposes >>>>> the RESTful API of the semantic Solr index. I strongly prefer the >>>>> approach of the LMF, because the two points already described above. >>>> We think that we do not have to make a selection here. We can keep a >>>> simple wrap-up on the Solr interface (contenthub's own query API) >>>> while providing the Solr RESTful API as is. IMO a wrap-up on Solr >>>> interface would be beneficial. On the other hand, in this interface we >>>> try to make use of an ontology to be used in >>>> OntologyResourceSearchEngine. This might help to figure out new >>>> keywords based on the subsumption hierarchy inside the ontology. >>>> However, I think this may lead to performance issues and may not be >>>> useful at all. We can decide on this later. >>> You forgot to mention one additional advantage for using the Solr >>> RESTful API: If we do that one could create the Semantic Index and than >>> copy it over to some other SolrServer without the need to run Stanbol >>> directly on the production infrastructure. >>> >>> In general I would suggest to first focus the discussion on the unique >>> features we would like to provide with the Semantic Search component. I >>> already included three features I would like to have in my first Mail >>> (Query preprocessing, Entity Facets, Semantic Facets). As you now >>> mention the OntologyResourceSearchEngine is very relevant in relation >>> to such features. >>> However adding such features must not necessarily mean to create an own >>> query language. One could also try to add such features directly to >>> Solr by implementing some Solr extensions. >>> >> >> Let me briefly comment in your suggestions about the semantic search. >> >>>>> But I am also the opinion that a semantic search interface should at >>>>> least provide the following three additional features: >>>>> 1. Query preprocessing: e.g. substitute "Paris" in the query >>>>> with "http://dbpedia.org/resource/Paris"; >>>>> 2. Entity Facets: if a keyword matches a Entity (e.g. "Paris" -> >>>>> "dbpedia:Paris", "dbpedia:Paris_Texas", "dbpedia:Paris_Hilton") >>>>> than provide a Facet to the user over such possible >>>>> nnnnnnnnmatches; >> >> As far as we understand, first and second features will be handled by >> querying the Entityhub with the query keyword (Paris) i.e the first >> entity obtained from the Entityhub will help us to recognize its type >> and the other entities will be served as facet values of Paris facet. >> >>>>> 3. Semantic Facets: if a user uses an instance of an ontology >>>>> type (e.g. a Place, Person, Organization) in a query, that >>>>> provide facets over semantic relations for such types (e.g. >>>>> fiends for persons, products/services for Organizations, nearby >>>>> Points-Of-Interests for Places, Participants for Events, ). To >>>>> implement features like that we need components that provide >>>>> query preprocessing capabilities based on data available in the >>>>> Entityhub, Ontonet . To me it seams that the >>>>> contenthub/search/engines/ontologyresource component provides >>>>> already some functionality related to this so this might be a >>>>> good starting point. >> >> Currently, we are trying to integrate an exploration mechanism like you >> said above. It is also based on DBPedia ontology. >> OntologyResourceEngine can be used for this purpose for the user >> registered ontologies. Current implementation of this engine only >> computes closures by exploiting the hierarchy in the ontology. RDFPath >> Programs can also be an option at this point. With an RDF Path Program >> user may specify the relations to be used in the exploration process. >> But I think this means the user decides beforehand which fields should >> be presented to him as exploration fields. I think this is open to >> discussion. >> >>> best >>> Rupert >>> >> >> Regards, >> Anil. > >
