Hello, I'm exploring the possibility of using OpenNLP in commercial software. As part of this, I'd like to assess the quality of some of the models available on http://opennlp.sourceforge.net/models-1.5/ and also learn more about the applicable license terms.
My primary interest for now are the English models for Tokenizer, Sentence Detector and POS Tagger. The documentation on http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html provides scores for various models as part of evaluation run examples. Do these scores generally reflect those of the models on the SourceForge download page? Are further details on model quality, source corpora, features used, etc. available? I've seen posts to this list explain that "the models are subject to the licensing restrictions of the copyright holders of the corpus used to train them." as a general comment. I understand that the models on SourceForge aren't part of any Apache OpenNLP release, but I'd very much appreciate if someone in the know could provide further insights into licensing terms applicable. I'd be glad to be wrong about this, but my understanding is that the models can't be used commercially. Many thanks for any insight. Christian
