On 03/09/13 17:25, Danica Damljanovic wrote:
I was trying to find the original opennlp corpora used for training, but
could not get anything apart from the binary model...

Anyone has any idea on whether it is possible to get this and how?

If I'm not mistaken the original corpora cannot be re-distributed due to licensing issues...However, don't take my word for it - someone with the appropriate authority should answer this (someone from the dev-team)...

Also, if I remember correctly, you can get a pretty decent sentence-detecting model with less than 100 sentences, whereas for the rest of the components (Tokenizer ,POSTagger, NER etc etc) you need thousands of sentences!

Jim

Reply via email to