I am using Opennlp in my research to extract terms from educational corpus and I would like to ask you about the opennlp models (chunker, Sentence Detector, Tokenizer, maxentropy POS tagger). What is the training data set used. It is mentioned clearly that CONLL 2000 is used to train the chunker. however, no information is provided about the training data used in Sentence Detector, Tokenizer, maxentropy POS tagger.
- Training data sets used in OpenNLP demaidim
- Re: Training data sets used in OpenNLP Joern Kottmann
