The numbers depend on your training data. The outcomes in the name
finder are start, cont, and other.
Depending on how you train they might contain the type e.g.
start-person, cont-person, other.
We are speaking here about the outcomes used for the classifier which
the name finder uses to predict
which tokens belong to an entity or not.
Jörn
On 12/12/2012 06:39 PM, Jim - FooBar(); wrote:
Hmmm... this definitely makes more sense...Thank you Jorn. So, to come
back to the NameFinder, what would be the outcomes? In a previous
response you say:
E.g. for the name it could be:
start 5%
cont 10%
other 85%
Now, from what I understand your 2 responses contradict each
other...Judging by your latest answer, the outcomes for the NameFinder
would be the entities we're trying to find plus something like "no"
(for tokens that are NOT any of the entities we 're looking for). So
if we're looking for a single type of entity (e.g person), then the
outcomes would be "Person" & "None" - yes? I apologise for asking
again and again but I need to be clear about what this feature does
(in NER context - not tokenizing), if I'm going to include it in my
publication...
Thanks again, I really appreciate your time and your responses...
Jim
On 12/12/12 16:38, Jörn Kottmann wrote:
The default feature (the prior feature) produced by the prior feature
generator
is the same for every context and can be used to measure the
distribution of the outcomes in the training data.
Some outcomes are usually much more frequent than others, depending
on the task,
e.g. in the tokenizer NO_SPLIT is much more common than SPLIT.
HTH,
Jörn