Re: Document Classification

Jörn Kottmann Mon, 23 Apr 2012 15:12:50 -0700

OpenNLP is using either a Maxent or Perceptron classifier
to classify a piece of text. This can give you back the provabilities
for the various categories, but its not designed to tell you how
much each topic is represented in your input document.


You could take a document and assume each paragraph has one topic
and then classify it paragraph by paragraph.
We sadly don't have support for topic models, such as LDA.

All the training logs are still written to the console, we have plans
to properly capture them and report training process back via an
API. This output should then be logged and maybe just stored in inside
the model for later debugging.

Jörn

On 04/23/2012 07:41 PM, Alex Kudlick wrote:

Hi,

I've just started using open nlp for a project to classify scientific
articles in to subjects.  I have a few questions:

1. How do I configure logging for the model? I'm using sf4j-log4j for the
rest of my application, but the training output from the model just goes to
stdout.

2. Is there any support for classifying documents with multiple classes?
For instance, a given article may be classified as Computational Biology,
Cell Biology, and Molecular Biology.

Thanks,

Alex Kudlick

Re: Document Classification

Reply via email to