-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

If you are using DBPedia as a source of enhancement possibilities, I wonder if 
that has to do more with a bias in the DBpedia dataset than any bias in Stanbol?

- ---
A. Soroka
Software & Systems Engineering :: Online Library Environment
the University of Virginia Library

On Mar 14, 2012, at 1:20 PM, Mathieu D'Aquin wrote:

> Hi Rupert, 
> 
> Thanks for the quick answer and the pointer. 
> In summery, if I understand well, it is the enhancer's normal behaviour to 
> return such entities (e.g., that everybody called Sean will be recognised as 
> Sean Connery) and the only thing for me to do is to apply some post 
> processing/filtering. 
> 
> Would there be some documentation explaining more comprehensively what kind 
> of filters should be applied for different types of entities? I noticed for 
> example that the enhancer biased towards american presidents and american 
> universities. Actually, generally, it is quite biased towards american 
> things. 
> 
> Thanks!
> Mathieu.
> 
> On 14 Mar 2012, at 12:00, Rupert Westenthaler wrote:
> 
>> Hi 
>> On 14.03.2012, at 12:25, Mathieu D'Aquin wrote:
>> 
>>> Hi All, 
>>> 
>>> I'm trying to use the enhancer service, currently with the default 
>>> settings, but it seems to be behaving rather funnily.
>>> (note that I only care about EntityAnnotation's with references to dbpedia 
>>> entities).
>>> 
>>> For example, I have tried with the text of the page 
>>> http://sssw.org/2012/invited-speakers-tutors/
>>> 
>>> And it gives very weird (even random looking) results, such as "Sean 
>>> Connery" or "Nazi Germany".
>>> 
>> If you find "Germany" as a location Stanbol will return three suggested 
>> entities. In this case this will be
>> 
>> 1. http://dbpedia.org/resource/Germany (confidence: 1704736.125)
>> 2. http://dbpedia.org/resource/Nazi_Germany (confidence: 121766.984)
>> 3. http://dbpedia.org/resource/West_Germany (confidence: 38052.215)
>> 
>> (confidence values for the NamedEntityTaggingEngine are the Solr scores for 
>> the used query)
>> 
>> I guess this is the reason why you are getting Nazi_Germany as an suggestion 
>> for a lot of pages.
>> 
>> For Persons the problem is with cases where OpenNLP NER (Named Entity 
>> Recognition) marks a Person in the text, but only provides the given or 
>> family (e.g. "sean"). In this case the Entity linking will provide you with 
>> the most prominent person in DBpedia with that name - in your case "Sean 
>> Connery". 
>> 
>> This problem is also described by 
>> [STANBOL-320](https://issues.apache.org/jira/browse/STANBOL-320).
>> 
>>> This weird behaviour is not limited to this page. I have processed several 
>>> thousand pages and clearly the results have not been what we would have 
>>> expected (very often, for example, it gives us the entity "Jesus" for no 
>>> obvious reason).
>>> 
>> 
>> Jesus is also a "Person" in DBpedia. So I assume that this is similar to 
>> "sean" -> "Sean Connery"
>> 
>>> Am I doing something wrong?
>>> Do the default enhancer services need some kind of configuration? 
>>> 
>> 
>> related to this I would suggest to
>> 
>> * only consider the suggestion with the highest confidence
>> * ignore TextAnnotations with "dc:type=dbp-ont:Person" if the 
>> "fise:selected-text" property only has a given or family name
>> 
>> 
>> best
>> Rupert
>> 
>>> I have looked at the documentation but couldn't find anything that seemed 
>>> to be helpful with this respect. 
>>> 
>>> Thanks!
>>> Mathieu.
>>> 
>>> -- 
>>> The Open University is incorporated by Royal Charter (RC 000391), an exempt 
>>> charity in England & Wales and a charity registered in Scotland (SC 038302).
>> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJPYOX0AAoJEATpPYSyaoIkckEIAMr+BIkDTgram4Ow7NeEOSxj
K+vSWHStUfaOXnWSj8v6unwDls/yS6H+CZn20rezeLkJZ7nckOc+9TQIcwhbl0yV
LxYsx7NIfiefPKwCGyDH1n8Y4080CspXgWKO5+38pTT5+EjHtU4ienLhDIRjETY7
+cTh2mQN4fe8VoYgpgl1YQgpafCMmZHwP36ftA3likEO2ZGdOJmPzTpEGR/2A2FQ
kYVZshoX6Y6sjSnD+gCfxwPPliE9Td8tJGxKECmAKn8/JRRaDSsQ9AckN3E3hGEg
1guc4HHkIRmJcu7wTbJR6gHmXm5zLWtdMHqLxf6z7KYRb3TkwA22erO+WD8PWs0=
=aYov
-----END PGP SIGNATURE-----

Reply via email to