Hi,
Although I cannot answer for the Stanbol enhancer specifically, I can
report my experience.

Please see inline.

On Wed, Mar 14, 2012 at 9:56 PM, Mathieu D'Aquin <[email protected]>wrote:

> Sure, wikipedia is a lot more populated with american things than others.
> What is unclear to me however, is how the enhancer gets to choose "Sean
> Connery" as the universal representative of all the Seans in the world and
> by extension how I can recognise when it is wrong.
>

Presumably because there are more links in Wikipedia to Sean_Connery than
to any other page of a Sean.


> I understand that, directly or indirectly, the enhancer would favour
> common entities. I'm just unsure how it is evaluated that an entity is more
> common than another.
>

Number of links in Wikipedia is commonly used as a prior probability
estimate for entities.


> Has there been any evaluation of the results of the enhancer that could
> show this bias?
>

State of the art entity linkers include such a prior as one of the
components in their disambiguation algorithms. The trick is to find the
right bias. There is some preliminary analysis by Fader et al [1], and you
will see the feature appearing also at TAC-KBP-2011.

We have proposed to perform evaluations of different enhancement chains
within the EAP. Analyzing the impact of this bias would certainly be one of
the evaluation points.


>
> Thanks,
> Mathieu.
>
> On 14 Mar 2012, at 20:00, Pablo Mendes wrote:
>
> > I can confirm that from my experience with DBpedia Spotlight, the bias
> > seems to come from Wikipedia itself.
> >
> > As a simple exercise, not intended to convince more than to entertain:
> > 230,447 results for organization [1]
> > 75,414 results for organisation [2]
> >
> > Cheers,
> > Pablo
> > [1]
> >
> http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=~organization&fulltext=Search
> > [2]
> >
> http://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=~organisation&fulltext=Search
> >
> >
> > On Wed, Mar 14, 2012 at 7:39 PM, [email protected] <[email protected]
> >wrote:
> >
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> If you are using DBPedia as a source of enhancement possibilities, I
> >> wonder if that has to do more with a bias in the DBpedia dataset than
> any
> >> bias in Stanbol?
> >>
> >> - ---
> >> A. Soroka
> >> Software & Systems Engineering :: Online Library Environment
> >> the University of Virginia Library
> >>
> >> On Mar 14, 2012, at 1:20 PM, Mathieu D'Aquin wrote:
> >>
> >>> Hi Rupert,
> >>>
> >>> Thanks for the quick answer and the pointer.
> >>> In summery, if I understand well, it is the enhancer's normal behaviour
> >> to return such entities (e.g., that everybody called Sean will be
> >> recognised as Sean Connery) and the only thing for me to do is to apply
> >> some post processing/filtering.
> >>>
> >>> Would there be some documentation explaining more comprehensively what
> >> kind of filters should be applied for different types of entities? I
> >> noticed for example that the enhancer biased towards american presidents
> >> and american universities. Actually, generally, it is quite biased
> towards
> >> american things.
> >>>
> >>> Thanks!
> >>> Mathieu.
> >>>
> >>> On 14 Mar 2012, at 12:00, Rupert Westenthaler wrote:
> >>>
> >>>> Hi
> >>>> On 14.03.2012, at 12:25, Mathieu D'Aquin wrote:
> >>>>
> >>>>> Hi All,
> >>>>>
> >>>>> I'm trying to use the enhancer service, currently with the default
> >> settings, but it seems to be behaving rather funnily.
> >>>>> (note that I only care about EntityAnnotation's with references to
> >> dbpedia entities).
> >>>>>
> >>>>> For example, I have tried with the text of the page
> >>>>> http://sssw.org/2012/invited-speakers-tutors/
> >>>>>
> >>>>> And it gives very weird (even random looking) results, such as "Sean
> >> Connery" or "Nazi Germany".
> >>>>>
> >>>> If you find "Germany" as a location Stanbol will return three
> suggested
> >> entities. In this case this will be
> >>>>
> >>>> 1. http://dbpedia.org/resource/Germany (confidence: 1704736.125)
> >>>> 2. http://dbpedia.org/resource/Nazi_Germany (confidence: 121766.984)
> >>>> 3. http://dbpedia.org/resource/West_Germany (confidence: 38052.215)
> >>>>
> >>>> (confidence values for the NamedEntityTaggingEngine are the Solr
> scores
> >> for the used query)
> >>>>
> >>>> I guess this is the reason why you are getting Nazi_Germany as an
> >> suggestion for a lot of pages.
> >>>>
> >>>> For Persons the problem is with cases where OpenNLP NER (Named Entity
> >> Recognition) marks a Person in the text, but only provides the given or
> >> family (e.g. "sean"). In this case the Entity linking will provide you
> with
> >> the most prominent person in DBpedia with that name - in your case "Sean
> >> Connery".
> >>>>
> >>>> This problem is also described by [STANBOL-320](
> >> https://issues.apache.org/jira/browse/STANBOL-320).
> >>>>
> >>>>> This weird behaviour is not limited to this page. I have processed
> >> several thousand pages and clearly the results have not been what we
> would
> >> have expected (very often, for example, it gives us the entity "Jesus"
> for
> >> no obvious reason).
> >>>>>
> >>>>
> >>>> Jesus is also a "Person" in DBpedia. So I assume that this is similar
> >> to "sean" -> "Sean Connery"
> >>>>
> >>>>> Am I doing something wrong?
> >>>>> Do the default enhancer services need some kind of configuration?
> >>>>>
> >>>>
> >>>> related to this I would suggest to
> >>>>
> >>>> * only consider the suggestion with the highest confidence
> >>>> * ignore TextAnnotations with "dc:type=dbp-ont:Person" if the
> >> "fise:selected-text" property only has a given or family name
> >>>>
> >>>>
> >>>> best
> >>>> Rupert
> >>>>
> >>>>> I have looked at the documentation but couldn't find anything that
> >> seemed to be helpful with this respect.
> >>>>>
> >>>>> Thanks!
> >>>>> Mathieu.
> >>>>>
> >>>>> --
> >>>>> The Open University is incorporated by Royal Charter (RC 000391), an
> >> exempt charity in England & Wales and a charity registered in Scotland
> (SC
> >> 038302).
> >>>>
> >>>
> >>
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
> >> Comment: GPGTools - http://gpgtools.org
> >>
> >> iQEcBAEBAgAGBQJPYOX0AAoJEATpPYSyaoIkckEIAMr+BIkDTgram4Ow7NeEOSxj
> >> K+vSWHStUfaOXnWSj8v6unwDls/yS6H+CZn20rezeLkJZ7nckOc+9TQIcwhbl0yV
> >> LxYsx7NIfiefPKwCGyDH1n8Y4080CspXgWKO5+38pTT5+EjHtU4ienLhDIRjETY7
> >> +cTh2mQN4fe8VoYgpgl1YQgpafCMmZHwP36ftA3likEO2ZGdOJmPzTpEGR/2A2FQ
> >> kYVZshoX6Y6sjSnD+gCfxwPPliE9Td8tJGxKECmAKn8/JRRaDSsQ9AckN3E3hGEg
> >> 1guc4HHkIRmJcu7wTbJR6gHmXm5zLWtdMHqLxf6z7KYRb3TkwA22erO+WD8PWs0=
> >> =aYov
> >> -----END PGP SIGNATURE-----
> >>
>
>

Reply via email to