Training new models

Sanjeev Sharma Thu, 27 Mar 2014 15:36:07 -0700

Hi,



I am new to OpenNLP.  I've been playing with chunking, tokenizing, POS
tagging, and Name recognition for a few days.  I've been following the
example code and using preexisting models from
http://opennlp.sourceforge.net/models-1.5/.  I've been having some trouble
with name recognition and organization recognition in that using the above
mentioned models I can only identify common names or organizations like
"Mike Smith" and "IBM".  In addition I need to be able to find date ranges
and technical language like "Java", "C++", and "HTML" (I should mention
that my input is going to be resumes).



I figured I need to train my own models, especially since my training data
should look more like my input to give a better context (i.e. resumes).
I've been trying to find some information on how to do this in the
documentation and also doing google searches.  I found a few simple
examples, but not much more.  I did see the example in the documentation
with the "<START:person> <END>" tags and the command line to process the
training data into a .bin file, but nothing with organization names.  I
tried to look at one or two of the annotation guides and that created more
questions than answers (for example, the annotation guides not consistent
with each other or the example in the documentation.  Are there pros and
cons between the different formats?  Are the examples in the documentation
in a native format?  Is there a conversion utility?  If so and I'm creating
data from scratch, would it not be better to just put it in the native
format?)



I just lack understanding of OpenNLP and NLP in general and the OpenNLP
Manual just hasn't worked for me.  Maybe I'm just misinterpreting the
documentation or just not looking in the right place.  I would appreciate
it greatly if someone could point me in the right direction in the way of
real world examples of training a model, recommending a book I can read
through, or maybe just some good examples of training data.  Beyond the
specific task I'm trying to accomplish, I would like to get a deeper
understanding of how OpenNLP works.



Thanks for any help.

Training new models

Reply via email to