I can confirm that from my experience with DBpedia Spotlight, the bias
seems to come from Wikipedia itself.

As a simple exercise, not intended to convince more than to entertain:
230,447 results for organization [1]
75,414 results for organisation [2]

Cheers,
Pablo
[1]
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=~organization&fulltext=Search
[2]
http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=~organisation&fulltext=Search


On Wed, Mar 14, 2012 at 7:39 PM, [email protected] <[email protected]>wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> If you are using DBPedia as a source of enhancement possibilities, I
> wonder if that has to do more with a bias in the DBpedia dataset than any
> bias in Stanbol?
>
> - ---
> A. Soroka
> Software & Systems Engineering :: Online Library Environment
> the University of Virginia Library
>
> On Mar 14, 2012, at 1:20 PM, Mathieu D'Aquin wrote:
>
> > Hi Rupert,
> >
> > Thanks for the quick answer and the pointer.
> > In summery, if I understand well, it is the enhancer's normal behaviour
> to return such entities (e.g., that everybody called Sean will be
> recognised as Sean Connery) and the only thing for me to do is to apply
> some post processing/filtering.
> >
> > Would there be some documentation explaining more comprehensively what
> kind of filters should be applied for different types of entities? I
> noticed for example that the enhancer biased towards american presidents
> and american universities. Actually, generally, it is quite biased towards
> american things.
> >
> > Thanks!
> > Mathieu.
> >
> > On 14 Mar 2012, at 12:00, Rupert Westenthaler wrote:
> >
> >> Hi
> >> On 14.03.2012, at 12:25, Mathieu D'Aquin wrote:
> >>
> >>> Hi All,
> >>>
> >>> I'm trying to use the enhancer service, currently with the default
> settings, but it seems to be behaving rather funnily.
> >>> (note that I only care about EntityAnnotation's with references to
> dbpedia entities).
> >>>
> >>> For example, I have tried with the text of the page
> >>> http://sssw.org/2012/invited-speakers-tutors/
> >>>
> >>> And it gives very weird (even random looking) results, such as "Sean
> Connery" or "Nazi Germany".
> >>>
> >> If you find "Germany" as a location Stanbol will return three suggested
> entities. In this case this will be
> >>
> >> 1. http://dbpedia.org/resource/Germany (confidence: 1704736.125)
> >> 2. http://dbpedia.org/resource/Nazi_Germany (confidence: 121766.984)
> >> 3. http://dbpedia.org/resource/West_Germany (confidence: 38052.215)
> >>
> >> (confidence values for the NamedEntityTaggingEngine are the Solr scores
> for the used query)
> >>
> >> I guess this is the reason why you are getting Nazi_Germany as an
> suggestion for a lot of pages.
> >>
> >> For Persons the problem is with cases where OpenNLP NER (Named Entity
> Recognition) marks a Person in the text, but only provides the given or
> family (e.g. "sean"). In this case the Entity linking will provide you with
> the most prominent person in DBpedia with that name - in your case "Sean
> Connery".
> >>
> >> This problem is also described by [STANBOL-320](
> https://issues.apache.org/jira/browse/STANBOL-320).
> >>
> >>> This weird behaviour is not limited to this page. I have processed
> several thousand pages and clearly the results have not been what we would
> have expected (very often, for example, it gives us the entity "Jesus" for
> no obvious reason).
> >>>
> >>
> >> Jesus is also a "Person" in DBpedia. So I assume that this is similar
> to "sean" -> "Sean Connery"
> >>
> >>> Am I doing something wrong?
> >>> Do the default enhancer services need some kind of configuration?
> >>>
> >>
> >> related to this I would suggest to
> >>
> >> * only consider the suggestion with the highest confidence
> >> * ignore TextAnnotations with "dc:type=dbp-ont:Person" if the
> "fise:selected-text" property only has a given or family name
> >>
> >>
> >> best
> >> Rupert
> >>
> >>> I have looked at the documentation but couldn't find anything that
> seemed to be helpful with this respect.
> >>>
> >>> Thanks!
> >>> Mathieu.
> >>>
> >>> --
> >>> The Open University is incorporated by Royal Charter (RC 000391), an
> exempt charity in England & Wales and a charity registered in Scotland (SC
> 038302).
> >>
> >
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
> Comment: GPGTools - http://gpgtools.org
>
> iQEcBAEBAgAGBQJPYOX0AAoJEATpPYSyaoIkckEIAMr+BIkDTgram4Ow7NeEOSxj
> K+vSWHStUfaOXnWSj8v6unwDls/yS6H+CZn20rezeLkJZ7nckOc+9TQIcwhbl0yV
> LxYsx7NIfiefPKwCGyDH1n8Y4080CspXgWKO5+38pTT5+EjHtU4ienLhDIRjETY7
> +cTh2mQN4fe8VoYgpgl1YQgpafCMmZHwP36ftA3likEO2ZGdOJmPzTpEGR/2A2FQ
> kYVZshoX6Y6sjSnD+gCfxwPPliE9Td8tJGxKECmAKn8/JRRaDSsQ9AckN3E3hGEg
> 1guc4HHkIRmJcu7wTbJR6gHmXm5zLWtdMHqLxf6z7KYRb3TkwA22erO+WD8PWs0=
> =aYov
> -----END PGP SIGNATURE-----
>

Reply via email to