Hi 
On 14.03.2012, at 12:25, Mathieu D'Aquin wrote:

> Hi All, 
> 
> I'm trying to use the enhancer service, currently with the default settings, 
> but it seems to be behaving rather funnily.
> (note that I only care about EntityAnnotation's with references to dbpedia 
> entities).
> 
> For example, I have tried with the text of the page 
> http://sssw.org/2012/invited-speakers-tutors/
> 
> And it gives very weird (even random looking) results, such as "Sean Connery" 
> or "Nazi Germany".
> 
If you find "Germany" as a location Stanbol will return three suggested 
entities. In this case this will be

1. http://dbpedia.org/resource/Germany (confidence: 1704736.125)
2. http://dbpedia.org/resource/Nazi_Germany (confidence: 121766.984)
3. http://dbpedia.org/resource/West_Germany (confidence: 38052.215)

(confidence values for the NamedEntityTaggingEngine are the Solr scores for the 
used query)

I guess this is the reason why you are getting Nazi_Germany as an suggestion 
for a lot of pages.

For Persons the problem is with cases where OpenNLP NER (Named Entity 
Recognition) marks a Person in the text, but only provides the given or family 
(e.g. "sean"). In this case the Entity linking will provide you with the most 
prominent person in DBpedia with that name - in your case "Sean Connery". 

This problem is also described by 
[STANBOL-320](https://issues.apache.org/jira/browse/STANBOL-320).

> This weird behaviour is not limited to this page. I have processed several 
> thousand pages and clearly the results have not been what we would have 
> expected (very often, for example, it gives us the entity "Jesus" for no 
> obvious reason).
> 

Jesus is also a "Person" in DBpedia. So I assume that this is similar to "sean" 
-> "Sean Connery"

> Am I doing something wrong?
> Do the default enhancer services need some kind of configuration? 
> 

related to this I would suggest to

* only consider the suggestion with the highest confidence
* ignore TextAnnotations with "dc:type=dbp-ont:Person" if the 
"fise:selected-text" property only has a given or family name


best
Rupert

> I have looked at the documentation but couldn't find anything that seemed to 
> be helpful with this respect. 
> 
> Thanks!
> Mathieu.
> 
> -- 
> The Open University is incorporated by Royal Charter (RC 000391), an exempt 
> charity in England & Wales and a charity registered in Scotland (SC 038302).

Reply via email to