Hi, OpenNLP Team,

I am new to Java and OpenNLP.

Tried to use the openNLP-1.6.0 for document categorization, and


  1.  In the online documentation at 
http://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.doccat.classifying.api:

  InputStream dataIn = new FileInputStream("en-sentiment.train");

  ObjectStream<String> lineStream =
                new PlainTextByLineStream(dataIn, "UTF-8");
  ObjectStream<DocumentSample> sampleStream = new 
DocumentSampleStream(lineStream);

  model = DocumentCategorizerME.train("en", sampleStream);

The calling of method PlainTextByLineStream() is depreciated and suggesting to 
use InputStreamFactory. However, I found InputStreamFactory has a 
straightforward interface of createInputStream. Would you mind to show me an 
example of how to constructing an InputStreamFactory from a txt file (each row 
is category docText), and then use it for training a model?

2. I found the doccat can take training parameters of ALGORITHM_PARAM,default 
to “MAXENT”? Any other algorithm available in the package?

3. I found QNMinimizer is added recently. It implements L-BFGS to support L-1, 
L-2 regularization and Elastic Net. Would you be so kind to provide an example 
on how to add L-1 penalty when training a document categorization model?

I appreciate your help, and please direct me to the best places for these 
questions, if not here.

Thank you very much.

Best Regards,
Guang Yang

Reply via email to