Hi Pablo, all I am personally really interested in DBpedia Spotlight and a nice integration in Stanbol would be really great. While the Stanbol Enhancer and Entityhub can already be used to link entities with dbpedia such components are not optimized to be used with DBpedia. In short - there are still a lot of reasons why one would want a dedicated Entity-Linking engine for DBpedia!
Regarding the validation part: It would be really great if you would work on Benchmarking. Bertrand Delacretaz has implemented this really important feature more than a year ago but it was not really used up to now, because nobody contributed test data. For interested people you can try Benchmarking under "/benchmark" (e.g. on the demo server [1]) I have already checked out the Spotlight source code and I will try to have a detailed look at it. I plan to provide detailed feedback on technical details with a focus on potential integration paths and synergies. Thx for the really nice proposal regards Rupert [1] http://dev.iks-project.eu:8081/benchmark/ On Mon, Feb 27, 2012 at 12:30 PM, Pablo Mendes <[email protected]> wrote: > Hi all, > We are interested in joining the Early Adopters Programme (EAP) as a way to > seed a long lasting collaboration with the Stanbol community. > > We are the creators of DBpedia Spotlight, a Java/Scala Open Source > Enhancement Engine (Apache V2 license) that is complementary to Stanbol. > DBpedia Spotlight has the ambitious goal to annotate any of the 3.5M > entities from all 320 classes in the DBpedia Ontology. At the core of our > proposal is the idea of remaining generic and configurable for many use > cases. Besides the open source code, we also provide a freely available > REST service that has been used to annotate cultural goods [1], generate > RDFa annotations in Wordpress [2], and enhance the content in Wikipedia > through a MediaWiki toolbar [3], among others [4]. > > [1] http://dme.ait.ac.at/annotation > [2] http://aksw.org/Projects/RDFaCE > [3] http://pedia.sztaki.hu/ > [4] More at: http://wiki.dbpedia.org/spotlight/knownuses > > We have a demo interface that lets you tweak some parameters and see how > the system works in practice: > http://spotlight.dbpedia.org/demo > > As a first step through the EAP, shall our proposal be selected, our > intention is to provide Stanbol enhancement engines based on the different > strategies that DBpedia Spotlight uses for term recognition and > disambiguation (more technical details below). For the validation part, one > idea is to provide a benchmark comparing the perfomance (esp. accuracy) of > the different enhancement engines in different annotated corpora that we > have already collected. Would this be interesting for IKS/Stanbol? Is there > another type of validation that would be more appealing to the community? > > Looking forward to discussing possibilities with you. > > Best regards, > Pablo > > For the More Technical Folks > > Our content enhancement is performed in 4 stages: > - Spotting recognizes terms in some input text. It can be done via > substring matches in a dictionary, or with more sophisticated approaches > such as NER and keyphrase extraction. > - Candidate mapping matches the "spotted" terms with their possible > interpretations (entity identifiers). This can also be done with a > dictionary (hashmap), but offers the possibility to do fancier matching > with name variations - acronyms, approximate matching, etc. > - Disambiguation ranks the "candidates" given the context (e.g. words > around the spotted phrase). This can also be done in many ways, locally, > globally, with different scoring functions, etc. > - Linking decides which of the spots to keep, given that after the previous > steps we have more information about confidence, topical pertinence, etc. > > Other potentially interesting more technical details > - Our Web service uses Jersey (JAX-RS) > - The Web Service is CORS-enabled, and we have both pure JS and jQuery > clients. We also have Java, Scala and PHP clients. > - Users can provide SPARQL queries to blacklist/whitelist results > (currently in the Linking step only, but work in progress for other steps). -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
