Hi to all!
i try to develop a program in java that take a document,extract the text
,analyze the text and extract the main topic of the document.
i think it 's a problem of document categorizer right?
i tried the example in the manual page.
i have create the training file,i rtf file with the line:
GMDecrease Major acquisitions that have a lower gross margin than the existing
network also \
had a negative impact on the overall gross margin, but it should
improve following \
the implementation of its integration strategies .
GMIncrease The upward movement of gross margin resulted from amounts pursuant
to adjustments \
to obligations towards dealers .
then in my code i use this function for training a model:
public static void Train() throws InvalidFormatException, IOException {
DoccatModel model = null;
InputStream dataIn = null;
try {
dataIn = new
FileInputStream("/Users/andry85mae/Desktop/apache-opennlp-1.5.2-incubating/bin/train.train");
ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn,
"UTF-8");
ObjectStream<DocumentSample> sampleStream = new
DocumentSampleStream(lineStream);
model = DocumentCategorizerME.train("en", sampleStream);
} catch (IOException e) {
// Failed to read or parse training data, training failed
e.printStackTrace();
} finally {
if (dataIn != null) {
try {
dataIn.close();
} catch (IOException e) {
// Not an issue, training already finished.
// The exception should be logged and investigated
// if part of a production system.
e.printStackTrace();
}
}
}
}
but i give me an error...
java.io.IOException: Empty lines, or lines with only a category string are not
allowed!
Computing event counts... Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:182)
at
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:154)
at
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:176)
at
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:207)
at opennlp_prova.Opennlp_prova.Train(Opennlp_prova.java:55)
at opennlp_prova.Opennlp_prova.main(Opennlp_prova.java:96)
Java Result: 1
what are the error?
thank in advance!!!