Dear Apache OpenNLP Project Team,
I have re-tested with sample sentence in the site
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
:
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
And I still receive the same error:
Skipping corrupt line: He PRP B-NPreckons VBZ B-VPthe
DT B-NPcurrent JJ I-NPaccount NN I-NPdeficit NN
I-NPwill MD B-VPnarrow VB I-VPto TO B-PPonly
RB B-NP# # I-NP1.8 CD I-NPbillion CD I-NPin
IN B-PPSeptember NNP B-NP. . O
Exception in thread "AWT-EventQueue-0"
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at
opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
at
opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
at
opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at
opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
at
form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
at
form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
at
javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
at
javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
at
javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
at
javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
at
javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
at
javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
at java.awt.Component.processMouseEvent(Component.java:6535)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
at java.awt.Component.processEvent(Component.java:6300)
at java.awt.Container.processEvent(Container.java:2236)
at java.awt.Component.dispatchEventImpl(Component.java:4891)
at java.awt.Container.dispatchEventImpl(Container.java:2294)
at java.awt.Component.dispatchEvent(Component.java:4713)
at
java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
at
java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
at java.awt.Container.dispatchEventImpl(Container.java:2280)
at java.awt.Window.dispatchEventImpl(Window.java:2750)
at java.awt.Component.dispatchEvent(Component.java:4713)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
at java.awt.EventQueue.access$500(EventQueue.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:709)
at java.awt.EventQueue$3.run(EventQueue.java:703)
at java.security.AccessController.doPrivileged(Native Method)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
at java.awt.EventQueue$4.run(EventQueue.java:731)
at java.awt.EventQueue$4.run(EventQueue.java:729)
at java.security.AccessController.doPrivileged(Native Method)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
at
java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
at
java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
Sorting and merging events...
Here are whole java code:
try {
Charset charset = Charset.forName("UTF-8");
File fileChunker = new File("trainApacheChunker.txt");
MarkableFileInputStreamFactory i = new
MarkableFileInputStreamFactory(fileChunker);
ObjectStream lineStream = new PlainTextByLineStream(i,
charset);
ObjectStream<ChunkSample> sampleStream = new
ChunkSampleStream(lineStream);
chunkerModel = ChunkerME.train("en", sampleStream,
TrainingParameters.defaultParams(), new ChunkerFactory());
modelApacheChunkerPath = "chunkerModel.bin";
OutputStream modelOut = new BufferedOutputStream(new
FileOutputStream(modelApacheChunkerPath));
chunkerModel.serialize(modelOut);
} catch (FileNotFoundException fe) {
} catch (IOException ie) {
}
Would you please check this point for me?
Thank you so much for your help.
Best regards,
Trung Tran.
On 05/18/2016 04:56 AM, [email protected] wrote:
Dear Apache OpenNLP Project Team,
Thank you so much for giving me very useful information about class (
/opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
)
It works very well.
There is one more point: I have error when train Vietnamese sentences
(more than 2 sentences in one training file).
Here is 2 example sentences in file trainChunker.txt:
buo^?i _T_C B-ADVP
tru+a _T_C I-ADVP
, , O
cu+`u A_C B-NP
cha.y IT_M B-VP
theo IT_M I-VP
me. H_C I-VP
ra IT_M B-PP
bo+` S_C I-PP
suo^'i S_C I-PP
. . O
nó C_N_T B-NP
tha^'y S_P B-VP
ba^`y A_G B-NP
hu+o+u A_C I-NP
nai A_C I-NP
?ã ST_P_S B-CONJP
o+? IT_P_C B-PP
?a^'y C_N_T I-PP
ro^`i T_G I-PP
. . O
Here is the error right after train the first sentence:
Skipping corrupt line: buo^?i _T_C B-ADVP
Skipping corrupt line: tru+a _T_C I-ADVP
Skipping corrupt line: , , O
Skipping corrupt line: cu+`u A_C B-NP
Skipping corrupt line: cha.y IT_M B-VP
Skipping corrupt line: theo IT_M I-VP
Skipping corrupt line: me. H_C I-VP
Skipping corrupt line: ra IT_M B-PP
Skipping corrupt line: bo+` S_C I-PP
Skipping corrupt line: suo^'i S_C I-PP
Skipping corrupt line: . . O
Exception in thread "AWT-EventQueue-0"
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at
opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
at
opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
at
opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at
opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
at
form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
at
form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
at
javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
at
javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
at
javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
at
javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
at
javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
at
javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
Sorting and merging events... at
java.awt.Component.processMouseEvent(Component.java:6535)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
at java.awt.Component.processEvent(Component.java:6300)
at java.awt.Container.processEvent(Container.java:2236)
at java.awt.Component.dispatchEventImpl(Component.java:4891)
at java.awt.Container.dispatchEventImpl(Container.java:2294)
at java.awt.Component.dispatchEvent(Component.java:4713)
at
java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
at
java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
at java.awt.Container.dispatchEventImpl(Container.java:2280)
at java.awt.Window.dispatchEventImpl(Window.java:2750)
at java.awt.Component.dispatchEvent(Component.java:4713)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
at java.awt.EventQueue.access$500(EventQueue.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:709)
at java.awt.EventQueue$3.run(EventQueue.java:703)
at java.security.AccessController.doPrivileged(Native Method)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
at java.awt.EventQueue$4.run(EventQueue.java:731)
at java.awt.EventQueue$4.run(EventQueue.java:729)
at java.security.AccessController.doPrivileged(Native Method)
at
java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
at
java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
at
java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
at
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
Would you please check these points for me?
Thank you so much for your help.
Best regards,
Trung Tran.
On 05/17/2016 08:15 PM, [email protected] wrote:
Dear Apache OpenNLP Project Team,
I have another error with command line tool:
- I did exactly as information in site
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
File test only contains sample sentence as in the site :
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
deficit NN I-NP
will MD B-VP
narrow VB I-VP
to TO B-PP
only RB B-NP
# # I-NP
1.8 CD I-NP
billion CD I-NP
in IN B-PP
September NNP B-NP
. . O
And here is the error:
Computing event counts... done. 0 events
Indexing... done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:184)
at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
at
opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
ol.java:68)
at opennlp.tools.cmdline.CLI.main(CLI.java:222)
Another point: The function cannot read more than 2 sentence in one
train file.
Would you please check these points for me?
Thank you so much for your help.
Best regards,
Trung Tran.
On 05/17/2016 02:06 PM, [email protected] wrote:
Dear Apache OpenNLP Project Team,
I have an critical issue when training with Chunker tool in Java:
- Firstly, the sample code in documentation site
(https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
is not work, both for version 1.5.3 and 1.6.0
- Secondly, I have to edit the codes myself to (using version
1.5.3):
try {
Charset charset = Charset.forName("UTF-8");
ObjectStream lineStream = new PlainTextByLineStream(new
FileInputStream(fileChunker), charset);
ObjectStream<ChunkSample> sampleStream = new
ChunkSampleStream(lineStream);
chunkerModel = ChunkerME.train("vn", sampleStream,
TrainingParameters.defaultParams(), new ChunkerFactory());
modelApacheChunkerPath =
UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
OutputStream modelOut = new BufferedOutputStream(new
FileOutputStream(modelApacheChunkerPath));
chunkerModel.serialize(modelOut);
} catch (FileNotFoundException fe) {
} catch (IOException ie) {
}
- Thirdly, I have the error "java.lang.String cannot be cast to
opennlp.tools.parser.Parse". The reason is:
+ The constructor of class ChunkSampleStream requires
parameter is "ObjectStream<Parse> in"
+ However, the second parameter of method
ChunkerME.train is "ObjectStream<ChunkSample> in"
I cannot find any way to work around this issue.
Would you please check this point for me?
Thank you so much for your help.
Best regards,
Trung Tran.