Hmmm... this definitely makes more sense...Thank you Jorn. So, to come
back to the NameFinder, what would be the outcomes? In a previous
response you say:
E.g. for the name it could be:
start 5%
cont 10%
other 85%
Now, from what I understand your 2 responses contradict each
other...Judging by your latest answer, the outcomes for the NameFinder
would be the entities we're trying to find plus something like "no" (for
tokens that are NOT any of the entities we 're looking for). So if we're
looking for a single type of entity (e.g person), then the outcomes
would be "Person" & "None" - yes? I apologise for asking again and
again but I need to be clear about what this feature does (in NER
context - not tokenizing), if I'm going to include it in my publication...
Thanks again, I really appreciate your time and your responses...
Jim
On 12/12/12 16:38, Jörn Kottmann wrote:
The default feature (the prior feature) produced by the prior feature
generator
is the same for every context and can be used to measure the
distribution of the outcomes in the training data.
Some outcomes are usually much more frequent than others, depending on
the task,
e.g. in the tokenizer NO_SPLIT is much more common than SPLIT.
HTH,
Jörn