Hi, OpenNLP Team, I am new to Java and OpenNLP.
Tried to use the openNLP-1.6.0 for document categorization, and 1. In the online documentation at http://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.doccat.classifying.api: InputStream dataIn = new FileInputStream("en-sentiment.train"); ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8"); ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream); model = DocumentCategorizerME.train("en", sampleStream); The calling of method PlainTextByLineStream() is depreciated and suggesting to use InputStreamFactory. However, I found InputStreamFactory has a straightforward interface of createInputStream. Would you mind to show me an example of how to constructing an InputStreamFactory from a txt file (each row is category docText), and then use it for training a model? 2. I found the doccat can take training parameters of ALGORITHM_PARAM,default to “MAXENT”? Any other algorithm available in the package? 3. I found QNMinimizer is added recently. It implements L-BFGS to support L-1, L-2 regularization and Elastic Net. Would you be so kind to provide an example on how to add L-1 penalty when training a document categorization model? I appreciate your help, and please direct me to the best places for these questions, if not here. Thank you very much. Best Regards, Guang Yang
