Sure, wikipedia is a lot more populated with american things than others. What 
is unclear to me however, is how the enhancer gets to choose "Sean Connery" as 
the universal representative of all the Seans in the world and by extension how 
I can recognise when it is wrong.

I understand that, directly or indirectly, the enhancer would favour common 
entities. I'm just unsure how it is evaluated that an entity is more common 
than another.

Has there been any evaluation of the results of the enhancer that could show 
this bias? 

Thanks, 
Mathieu.

On 14 Mar 2012, at 20:00, Pablo Mendes wrote:

> I can confirm that from my experience with DBpedia Spotlight, the bias
> seems to come from Wikipedia itself.
> 
> As a simple exercise, not intended to convince more than to entertain:
> 230,447 results for organization [1]
> 75,414 results for organisation [2]
> 
> Cheers,
> Pablo
> [1]
> http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=~organization&fulltext=Search
> [2]
> http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=~organisation&fulltext=Search
> 
> 
> On Wed, Mar 14, 2012 at 7:39 PM, [email protected] <[email protected]>wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> If you are using DBPedia as a source of enhancement possibilities, I
>> wonder if that has to do more with a bias in the DBpedia dataset than any
>> bias in Stanbol?
>> 
>> - ---
>> A. Soroka
>> Software & Systems Engineering :: Online Library Environment
>> the University of Virginia Library
>> 
>> On Mar 14, 2012, at 1:20 PM, Mathieu D'Aquin wrote:
>> 
>>> Hi Rupert,
>>> 
>>> Thanks for the quick answer and the pointer.
>>> In summery, if I understand well, it is the enhancer's normal behaviour
>> to return such entities (e.g., that everybody called Sean will be
>> recognised as Sean Connery) and the only thing for me to do is to apply
>> some post processing/filtering.
>>> 
>>> Would there be some documentation explaining more comprehensively what
>> kind of filters should be applied for different types of entities? I
>> noticed for example that the enhancer biased towards american presidents
>> and american universities. Actually, generally, it is quite biased towards
>> american things.
>>> 
>>> Thanks!
>>> Mathieu.
>>> 
>>> On 14 Mar 2012, at 12:00, Rupert Westenthaler wrote:
>>> 
>>>> Hi
>>>> On 14.03.2012, at 12:25, Mathieu D'Aquin wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> I'm trying to use the enhancer service, currently with the default
>> settings, but it seems to be behaving rather funnily.
>>>>> (note that I only care about EntityAnnotation's with references to
>> dbpedia entities).
>>>>> 
>>>>> For example, I have tried with the text of the page
>>>>> http://sssw.org/2012/invited-speakers-tutors/
>>>>> 
>>>>> And it gives very weird (even random looking) results, such as "Sean
>> Connery" or "Nazi Germany".
>>>>> 
>>>> If you find "Germany" as a location Stanbol will return three suggested
>> entities. In this case this will be
>>>> 
>>>> 1. http://dbpedia.org/resource/Germany (confidence: 1704736.125)
>>>> 2. http://dbpedia.org/resource/Nazi_Germany (confidence: 121766.984)
>>>> 3. http://dbpedia.org/resource/West_Germany (confidence: 38052.215)
>>>> 
>>>> (confidence values for the NamedEntityTaggingEngine are the Solr scores
>> for the used query)
>>>> 
>>>> I guess this is the reason why you are getting Nazi_Germany as an
>> suggestion for a lot of pages.
>>>> 
>>>> For Persons the problem is with cases where OpenNLP NER (Named Entity
>> Recognition) marks a Person in the text, but only provides the given or
>> family (e.g. "sean"). In this case the Entity linking will provide you with
>> the most prominent person in DBpedia with that name - in your case "Sean
>> Connery".
>>>> 
>>>> This problem is also described by [STANBOL-320](
>> https://issues.apache.org/jira/browse/STANBOL-320).
>>>> 
>>>>> This weird behaviour is not limited to this page. I have processed
>> several thousand pages and clearly the results have not been what we would
>> have expected (very often, for example, it gives us the entity "Jesus" for
>> no obvious reason).
>>>>> 
>>>> 
>>>> Jesus is also a "Person" in DBpedia. So I assume that this is similar
>> to "sean" -> "Sean Connery"
>>>> 
>>>>> Am I doing something wrong?
>>>>> Do the default enhancer services need some kind of configuration?
>>>>> 
>>>> 
>>>> related to this I would suggest to
>>>> 
>>>> * only consider the suggestion with the highest confidence
>>>> * ignore TextAnnotations with "dc:type=dbp-ont:Person" if the
>> "fise:selected-text" property only has a given or family name
>>>> 
>>>> 
>>>> best
>>>> Rupert
>>>> 
>>>>> I have looked at the documentation but couldn't find anything that
>> seemed to be helpful with this respect.
>>>>> 
>>>>> Thanks!
>>>>> Mathieu.
>>>>> 
>>>>> --
>>>>> The Open University is incorporated by Royal Charter (RC 000391), an
>> exempt charity in England & Wales and a charity registered in Scotland (SC
>> 038302).
>>>> 
>>> 
>> 
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
>> Comment: GPGTools - http://gpgtools.org
>> 
>> iQEcBAEBAgAGBQJPYOX0AAoJEATpPYSyaoIkckEIAMr+BIkDTgram4Ow7NeEOSxj
>> K+vSWHStUfaOXnWSj8v6unwDls/yS6H+CZn20rezeLkJZ7nckOc+9TQIcwhbl0yV
>> LxYsx7NIfiefPKwCGyDH1n8Y4080CspXgWKO5+38pTT5+EjHtU4ienLhDIRjETY7
>> +cTh2mQN4fe8VoYgpgl1YQgpafCMmZHwP36ftA3likEO2ZGdOJmPzTpEGR/2A2FQ
>> kYVZshoX6Y6sjSnD+gCfxwPPliE9Td8tJGxKECmAKn8/JRRaDSsQ9AckN3E3hGEg
>> 1guc4HHkIRmJcu7wTbJR6gHmXm5zLWtdMHqLxf6z7KYRb3TkwA22erO+WD8PWs0=
>> =aYov
>> -----END PGP SIGNATURE-----
>> 

Reply via email to