The numbers depend on your training data. The outcomes in the name finder are start, cont, and other. Depending on how you train they might contain the type e.g. start-person, cont-person, other.

We are speaking here about the outcomes used for the classifier which the name finder uses to predict
which tokens belong to an entity or not.

Jörn

On 12/12/2012 06:39 PM, Jim - FooBar(); wrote:
Hmmm... this definitely makes more sense...Thank you Jorn. So, to come back to the NameFinder, what would be the outcomes? In a previous response you say:

E.g. for the name it could be:
start 5%
cont 10%
other 85%

Now, from what I understand your 2 responses contradict each other...Judging by your latest answer, the outcomes for the NameFinder would be the entities we're trying to find plus something like "no" (for tokens that are NOT any of the entities we 're looking for). So if we're looking for a single type of entity (e.g person), then the outcomes would be "Person" & "None" - yes? I apologise for asking again and again but I need to be clear about what this feature does (in NER context - not tokenizing), if I'm going to include it in my publication...

Thanks again, I really appreciate your time and your responses...

Jim


On 12/12/12 16:38, Jörn Kottmann wrote:
The default feature (the prior feature) produced by the prior feature generator is the same for every context and can be used to measure the distribution of the outcomes in the training data. Some outcomes are usually much more frequent than others, depending on the task,
e.g. in the tokenizer NO_SPLIT is much more common than SPLIT.

HTH,
Jörn


Reply via email to