Hi,
I am new to OpenNLP. I've been playing with chunking, tokenizing, POS tagging, and Name recognition for a few days. I've been following the example code and using preexisting models from http://opennlp.sourceforge.net/models-1.5/. I've been having some trouble with name recognition and organization recognition in that using the above mentioned models I can only identify common names or organizations like "Mike Smith" and "IBM". In addition I need to be able to find date ranges and technical language like "Java", "C++", and "HTML" (I should mention that my input is going to be resumes). I figured I need to train my own models, especially since my training data should look more like my input to give a better context (i.e. resumes). I've been trying to find some information on how to do this in the documentation and also doing google searches. I found a few simple examples, but not much more. I did see the example in the documentation with the "<START:person> <END>" tags and the command line to process the training data into a .bin file, but nothing with organization names. I tried to look at one or two of the annotation guides and that created more questions than answers (for example, the annotation guides not consistent with each other or the example in the documentation. Are there pros and cons between the different formats? Are the examples in the documentation in a native format? Is there a conversion utility? If so and I'm creating data from scratch, would it not be better to just put it in the native format?) I just lack understanding of OpenNLP and NLP in general and the OpenNLP Manual just hasn't worked for me. Maybe I'm just misinterpreting the documentation or just not looking in the right place. I would appreciate it greatly if someone could point me in the right direction in the way of real world examples of training a model, recommending a book I can read through, or maybe just some good examples of training data. Beyond the specific task I'm trying to accomplish, I would like to get a deeper understanding of how OpenNLP works. Thanks for any help.
