Dear members of Stanbol community,
I hereby would like to discuss about the next few iterations of the
Disambiguation Engine. The Disambiguation Engine, To Disambiguate Engines
few versions of Engines have been prepared. I would like to briefly
describe them below. I hope to become a permanent committer for Stanbol if
my contribution is considered after this GSOC period. I will be committing
the code versions soon. And applying patch to JIRA soon.
1. How disambiguation Engine problem was approached.
For certain text annotations there are might be many Entity Annotations
mapped, It was required to rank them in the order of there likelihood.
Paris is the a small city in the United States.
a.The Paris is this sentence without disambiguation (using Dbpedia as
vocabulary). There are three entity annotations mapped 1. Paris, France ,
2. Paris, Texas 3. Paris, *Something* (The entity mapped with highest
fise:confidence is Paris, France.)
b. Now how would disambiguation by humans take place. On reading the line
an individual thinks of the context the text is referring to. Doing so he
realizes that since the text talks about Paris and also about United
States. The Paris mentioned here is More Like Paris,Texas(which is in
United States) and therefore must refer to it.
c. The approach followed in implementation takes inspiration from the
example and works in the following manner somewhat follows the pseudo code
below.
for( K: TextAnnotations)
{ List EntityAnnotations =getEntityAnnotationsRelated(K);
Context=GetContextInformation(K);
List Results=QueryMLTVocabularies(K, Context);
updateConfidences(Result,EntityAnnotations)
}
d. My current approach to handle disambiguation involved a lot of
variations however for the purpose of simplicity I'll talk only about
differences in obtaining "Context".
2. The Context Procurement:
a. All Entity Context: The context would be decided on by all the
textannotations of the text. It proves to show good results for shorter
texts, but introduces lot of redundant annotations in longer ones making
context less useful
b. All link Context: The context is decided on the basis of site or
reference link associated with the text annotations, which of course can be
required to disambiguate. So it does not behave in a very good fashion
c. Selection Context: The selection context is basically contains text one
sentence prior and after the current one. Also another version worked with
Text Annotations in this region of text.
d. Vicinity Entity Context: The vicinity annotation detection measures
distance in the neighborhood of the text annotation.
3. Future
a. With a running POC of this Engine it can be used to create an advanced
version like the Spotlight approach or using Markov Logic Networks
discussed earlier.