Thanks John Miedema for the information and book reference.
Can you please explain me what is "feature" in this? Let suppose I have a
corpus containing below data and I am tagging name of the person and
organisation like below and the size of the corpus is around 1.5 Million then
ideally what could be the iterations and cutoff in this?
My name is <START:name> Nikhil Jain <END> I am talking to <START:name> John
Miedema <END><START:name> Ram <END> is very helpful.I am part of <START:org>
XYZ <END> organization.Ram is working with <START:org> AAA <END>
firm.......Thanks Again-NIkhil
On Sunday, October 5, 2014 5:42 PM, John Miedema <[email protected]>
wrote:
For cutoff, it is the minimum number of times a feature has to occur
before it is included in the model. This reduces noise. You don't
features that only occur once to appear in the model. You might reduce
the cutoff if you have a small training set.
For iterations=100. "The iterations parameter can largely be ignored,
but as the model trains, it'll output for each step of these 100
iterations." (Taming Text, Ingersol, pg. 134).
On Sat, Oct 4, 2014 at 5:35 AM, nikhil jain
<[email protected]> wrote:
> Hi,
> I am using OpenNLP Token Name Finder module for parsing some documents. For
> generating the model, I am using below command line which is mentioned in the
> documentation page and working fine.
> opennlp TokenNamefinderTrainer -model <model_name> -lang en -data <training
> file> -encoding UTF-8
> My question is, can someone explains me the signification of -iterations and
> -cutoff parameters in layman terms because when i am modifying these
> parameters by putting these parameters in my command line and give some
> different values like for iterations 80 or 120, similarly 20 or 40 to cutoff,
> I can see the difference in my model but I do not understand what is
> happening exactly.I know default is 100 for iterations and 5 for cutoff.
>
>
> BTW, I am new in machine learning and natural language processing.
> Please explain me with example.
> Thanks in advance.Nikhil Jain
--
_________________________________________
johnmiedema.com