Naive Bayes, perceptron variants (incl passive agressive), faster training for maxent, and a better overall architecture. These are things my students and I are working on independently, and I will bring in to OpenNLP when time frees up to do so.
On Tue, Apr 24, 2012 at 2:26 AM, Jörn Kottmann <[email protected]> wrote: > What are you planning to add? > > Jörn > > > On 04/24/2012 03:53 AM, Jason Baldridge wrote: > >> FWIW, there will be more classification capabilities coming in the next >> several months. >> >> -Jason >> >> On Mon, Apr 23, 2012 at 5:12 PM, Jörn Kottmann<[email protected]> >> wrote: >> >> OpenNLP is using either a Maxent or Perceptron classifier >>> to classify a piece of text. This can give you back the provabilities >>> for the various categories, but its not designed to tell you how >>> much each topic is represented in your input document. >>> >>> You could take a document and assume each paragraph has one topic >>> and then classify it paragraph by paragraph. >>> We sadly don't have support for topic models, such as LDA. >>> >>> All the training logs are still written to the console, we have plans >>> to properly capture them and report training process back via an >>> API. This output should then be logged and maybe just stored in inside >>> the model for later debugging. >>> >>> Jörn >>> >>> >>> On 04/23/2012 07:41 PM, Alex Kudlick wrote: >>> >>> Hi, >>>> >>>> I've just started using open nlp for a project to classify scientific >>>> articles in to subjects. I have a few questions: >>>> >>>> 1. How do I configure logging for the model? I'm using sf4j-log4j for >>>> the >>>> rest of my application, but the training output from the model just goes >>>> to >>>> stdout. >>>> >>>> 2. Is there any support for classifying documents with multiple classes? >>>> For instance, a given article may be classified as Computational >>>> Biology, >>>> Cell Biology, and Molecular Biology. >>>> >>>> Thanks, >>>> >>>> Alex Kudlick >>>> >>>> >>>> >> > -- Jason Baldridge Associate Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
