The error is thrown because you do not have enough training samples,
try to run your code with at least 10 to 20 training samples.

Jörn

On 08/23/2012 03:15 PM, andrea maestroni wrote:
Hi to all!

i try to develop a program in java that take a document,extract the text 
,analyze the text and extract the main topic of the document.

i think it 's a problem of document categorizer right?

i tried the example in the  manual page.

i have create the training file,i rtf file with the line:

GMDecrease Major acquisitions that have a lower gross margin than the existing 
network also \
            had a negative impact on the overall gross margin, but it should 
improve following \
            the implementation of its integration strategies .
GMIncrease The upward movement of gross margin resulted from amounts pursuant 
to adjustments \
            to obligations towards dealers .
then in my code i use this function for training a model:

public static void Train() throws InvalidFormatException, IOException {
DoccatModel model = null;

         InputStream dataIn = null;
         try {
             dataIn = new 
FileInputStream("/Users/andry85mae/Desktop/apache-opennlp-1.5.2-incubating/bin/train.train");
             ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, 
"UTF-8");
             ObjectStream<DocumentSample> sampleStream = new 
DocumentSampleStream(lineStream);

             model = DocumentCategorizerME.train("en", sampleStream);
         } catch (IOException e) {
             // Failed to read or parse training data, training failed
             e.printStackTrace();
         } finally {
             if (dataIn != null) {
                 try {
                     dataIn.close();
                 } catch (IOException e) {
                     // Not an issue, training already finished.
                     // The exception should be logged and investigated
                     // if part of a production system.
                     e.printStackTrace();
                 }
             }
             }
}

but i give me an error...

java.io.IOException: Empty lines, or lines with only a category string are not 
allowed!
        Computing event counts...  Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
        at opennlp.maxent.GIS.trainModel(GIS.java:256)
        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
        at 
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:154)
        at 
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:176)
        at 
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:207)
        at opennlp_prova.Opennlp_prova.Train(Opennlp_prova.java:55)
        at opennlp_prova.Opennlp_prova.main(Opennlp_prova.java:96)
Java Result: 1

what are the error?

thank in advance!!!



Reply via email to