OpenNLP is using either a Maxent or Perceptron classifier to classify a piece of text. This can give you back the provabilities for the various categories, but its not designed to tell you how much each topic is represented in your input document.
You could take a document and assume each paragraph has one topic and then classify it paragraph by paragraph. We sadly don't have support for topic models, such as LDA. All the training logs are still written to the console, we have plans to properly capture them and report training process back via an API. This output should then be logged and maybe just stored in inside the model for later debugging. Jörn On 04/23/2012 07:41 PM, Alex Kudlick wrote:
Hi, I've just started using open nlp for a project to classify scientific articles in to subjects. I have a few questions: 1. How do I configure logging for the model? I'm using sf4j-log4j for the rest of my application, but the training output from the model just goes to stdout. 2. Is there any support for classifying documents with multiple classes? For instance, a given article may be classified as Computational Biology, Cell Biology, and Molecular Biology. Thanks, Alex Kudlick
