Dear Apache OpenNLP Project Team,
I have another error with command line tool:
- I did exactly as information in site
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
File test only contains sample sentence as in the site :
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
And here is the error:
Computing event counts... done. 0 events
Indexing... done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:184)
at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
at
opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
ol.java:68)
at opennlp.tools.cmdline.CLI.main(CLI.java:222)
Another point: The function cannot read more than 2 sentence in one
train file.
Would you please check these points for me?
Thank you so much for your help.
Best regards,
Trung Tran.
On 05/17/2016 02:06 PM, [email protected] wrote:
Dear Apache OpenNLP Project Team,
I have an critical issue when training with Chunker tool in Java:
- Firstly, the sample code in documentation site
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
is not work, both for version 1.5.3 and 1.6.0
- Secondly, I have to edit the codes myself to (using version 1.5.3):
try {
Charset charset = Charset.forName("UTF-8");
ObjectStream lineStream = new PlainTextByLineStream(new
FileInputStream(fileChunker), charset);
ObjectStream<ChunkSample> sampleStream = new
ChunkSampleStream(lineStream);
chunkerModel = ChunkerME.train("vn", sampleStream,
TrainingParameters.defaultParams(), new ChunkerFactory());
modelApacheChunkerPath =
UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
OutputStream modelOut = new BufferedOutputStream(new
FileOutputStream(modelApacheChunkerPath));
chunkerModel.serialize(modelOut);
} catch (FileNotFoundException fe) {
} catch (IOException ie) {
}
- Thirdly, I have the error "java.lang.String cannot be cast to
opennlp.tools.parser.Parse". The reason is:
+ The constructor of class ChunkSampleStream requires
parameter is "ObjectStream<Parse> in"
+ However, the second parameter of method ChunkerME.train
is "ObjectStream<ChunkSample> in"
I cannot find any way to work around this issue.
Would you please check this point for me?
Thank you so much for your help.
Best regards,
Trung Tran.