Dear Apache OpenNLP Project Team,
I have an critical issue when training with Chunker tool in Java:
- Firstly, the sample code in documentation site
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
is not work, both for version 1.5.3 and 1.6.0
- Secondly, I have to edit the codes myself to (using version 1.5.3):
try {
Charset charset = Charset.forName("UTF-8");
ObjectStream lineStream = new PlainTextByLineStream(new
FileInputStream(fileChunker), charset);
ObjectStream<ChunkSample> sampleStream = new
ChunkSampleStream(lineStream);
chunkerModel = ChunkerME.train("vn", sampleStream,
TrainingParameters.defaultParams(), new ChunkerFactory());
modelApacheChunkerPath =
UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
OutputStream modelOut = new BufferedOutputStream(new
FileOutputStream(modelApacheChunkerPath));
chunkerModel.serialize(modelOut);
} catch (FileNotFoundException fe) {
} catch (IOException ie) {
}
- Thirdly, I have the error "java.lang.String cannot be cast to
opennlp.tools.parser.Parse". The reason is:
+ The constructor of class ChunkSampleStream requires
parameter is "ObjectStream<Parse> in"
+ However, the second parameter of method ChunkerME.train
is "ObjectStream<ChunkSample> in"
I cannot find any way to work around this issue.
Would you please check this point for me?
Thank you so much for your help.
Best regards,
Trung Tran.