Hi Rupert, Thanks for the quick answer and the pointer. In summery, if I understand well, it is the enhancer's normal behaviour to return such entities (e.g., that everybody called Sean will be recognised as Sean Connery) and the only thing for me to do is to apply some post processing/filtering.
Would there be some documentation explaining more comprehensively what kind of filters should be applied for different types of entities? I noticed for example that the enhancer biased towards american presidents and american universities. Actually, generally, it is quite biased towards american things. Thanks! Mathieu. On 14 Mar 2012, at 12:00, Rupert Westenthaler wrote: > Hi > On 14.03.2012, at 12:25, Mathieu D'Aquin wrote: > >> Hi All, >> >> I'm trying to use the enhancer service, currently with the default settings, >> but it seems to be behaving rather funnily. >> (note that I only care about EntityAnnotation's with references to dbpedia >> entities). >> >> For example, I have tried with the text of the page >> http://sssw.org/2012/invited-speakers-tutors/ >> >> And it gives very weird (even random looking) results, such as "Sean >> Connery" or "Nazi Germany". >> > If you find "Germany" as a location Stanbol will return three suggested > entities. In this case this will be > > 1. http://dbpedia.org/resource/Germany (confidence: 1704736.125) > 2. http://dbpedia.org/resource/Nazi_Germany (confidence: 121766.984) > 3. http://dbpedia.org/resource/West_Germany (confidence: 38052.215) > > (confidence values for the NamedEntityTaggingEngine are the Solr scores for > the used query) > > I guess this is the reason why you are getting Nazi_Germany as an suggestion > for a lot of pages. > > For Persons the problem is with cases where OpenNLP NER (Named Entity > Recognition) marks a Person in the text, but only provides the given or > family (e.g. "sean"). In this case the Entity linking will provide you with > the most prominent person in DBpedia with that name - in your case "Sean > Connery". > > This problem is also described by > [STANBOL-320](https://issues.apache.org/jira/browse/STANBOL-320). > >> This weird behaviour is not limited to this page. I have processed several >> thousand pages and clearly the results have not been what we would have >> expected (very often, for example, it gives us the entity "Jesus" for no >> obvious reason). >> > > Jesus is also a "Person" in DBpedia. So I assume that this is similar to > "sean" -> "Sean Connery" > >> Am I doing something wrong? >> Do the default enhancer services need some kind of configuration? >> > > related to this I would suggest to > > * only consider the suggestion with the highest confidence > * ignore TextAnnotations with "dc:type=dbp-ont:Person" if the > "fise:selected-text" property only has a given or family name > > > best > Rupert > >> I have looked at the documentation but couldn't find anything that seemed to >> be helpful with this respect. >> >> Thanks! >> Mathieu. >> >> -- >> The Open University is incorporated by Royal Charter (RC 000391), an exempt >> charity in England & Wales and a charity registered in Scotland (SC 038302). >
