Le 25 mars 2012 16:53, Allel Benbrahim <[email protected]> a écrit : > Hello > > Regarding the previous issue, is there any mean to set the sensitivity of > the Stanbol extraction engine, meaning being able to determine an > acceptability threshold when it interrogates the index base ? > We tried to look at the Stanbol configuration screen "osgi" in order to > enhance the matching with the detected words (person, localisation, > organisation) but did not find the way to do this. > > Is there a way to set the sensitivity or is it planned on the project > road-map if it is an open issue ?
It really depends on why engine you are referring too. Which kind of failure is most annoying to you? Phrases detected as names of people, locations or organizations whereas they should not be detected at all (e.g. verbs, adverbs or other non-name noun phrases)? Or names of people, locations or organizations that are linked to the wrong entity in the knowledge base? For the first type of errors, there is a relevance score on the TextAnnotation object returned in the results. This score behaves as a normalized probability so that you can use it to increase the sensitivity. Alternatively you could use opennlp tools to train new NameFinder models on hand annotated text but this is a big effort. For the second type of errors: - build larger index of entities to increase the rate of unambiguous exact names match, - build a new engine in charge of entity disambiguation based on contextual information that would make it possible to return both better linking results and an ambiguity or confidence score. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel
