Hi to all!

i try to develop a program in java that take a document,extract the text 
,analyze the text and extract the main topic of the document.

i think it 's a problem of document categorizer right?

i tried the example in the  manual page.

i have create the training file,i rtf file with the line:

GMDecrease Major acquisitions that have a lower gross margin than the existing 
network also \ 
           had a negative impact on the overall gross margin, but it should 
improve following \ 
           the implementation of its integration strategies .
GMIncrease The upward movement of gross margin resulted from amounts pursuant 
to adjustments \
           to obligations towards dealers .
then in my code i use this function for training a model:

public static void Train() throws InvalidFormatException, IOException {
        
        DoccatModel model = null;

        InputStream dataIn = null;
        try {
            dataIn = new 
FileInputStream("/Users/andry85mae/Desktop/apache-opennlp-1.5.2-incubating/bin/train.train");
            ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, 
"UTF-8");
            ObjectStream<DocumentSample> sampleStream = new 
DocumentSampleStream(lineStream);

            model = DocumentCategorizerME.train("en", sampleStream);
        } catch (IOException e) {
            // Failed to read or parse training data, training failed
            e.printStackTrace();
        } finally {
            if (dataIn != null) {
                try {
                    dataIn.close();
                } catch (IOException e) {
                    // Not an issue, training already finished.
                    // The exception should be logged and investigated
                    // if part of a production system.
                    e.printStackTrace();
                }
            }
            }
      
    }

but i give me an error...

java.io.IOException: Empty lines, or lines with only a category string are not 
allowed!
        Computing event counts...  Incorporating indexed data for training...  
Exception in thread "main" java.lang.NullPointerException
        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
        at opennlp.maxent.GIS.trainModel(GIS.java:256)
        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
        at 
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:154)
        at 
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:176)
        at 
opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:207)
        at opennlp_prova.Opennlp_prova.Train(Opennlp_prova.java:55)
        at opennlp_prova.Opennlp_prova.main(Opennlp_prova.java:96)
Java Result: 1

what are the error?

thank in advance!!!

Reply via email to