Daniel,

The cutoff value is for filtering out 1-time or special cases that may
hinder training.  If the training data doesn't contain at least 5
occurrences, when set to 5, in the same context it will ignore the
training data.  This happens when filtering the data and determining the
number of outcomes for the model to be trained for to find.

The iterations are how many passes through the training set the trainer
will attempt before stopping.  With the Maxent models this helps train
faster by keeping the number of iterations small.  This value really
depends on the model generated and the type.  Most of the testing is
done with 100 iterations just to get the training done quickly, due to
the size of the training data-sets used sometimes.  The key features are
two numbers that get printed for each run (iteration), they indicate how
close to trained on the data-set they are.  Be careful, the trick is to
train and not get the models to memorize the training set... this is
because the training set is only a snapshot in time... news article
limitation currently in most of the training; but, too small is also bad.

There is also another stopping point, when the model is completely
trained and knows the training set.  This happens when the numbers get
to optimal values... In my guess when the model predicts the training
set perfectly.

James


On 7/5/2012 6:04 AM, Daniel wrote:
> when I am training nameFinders, I usually use 500 iteratios and 5
> cutoff, but I really dont know why I should use these values or
> others, can anybody tell me something about this parameters?
>
> Thanks!


Reply via email to