Dear Apache OpenNLP Project Team,

I have re-tested with sample sentence in the site (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training) :

He        PRP  B-NP
reckons   VBZ  B-VP
the       DT   B-NP
current   JJ   I-NP
account   NN   I-NP
deficit   NN   I-NP
will      MD   B-VP
narrow    VB   I-VP
to        TO   B-PP
only      RB   B-NP
#         #    I-NP
1.8       CD   I-NP
billion   CD   I-NP
in        IN   B-PP
September NNP  B-NP
.         .    O

And I still receive the same error:

Skipping corrupt line: He PRP B-NPreckons VBZ B-VPthe DT B-NPcurrent JJ I-NPaccount NN I-NPdeficit NN I-NPwill MD B-VPnarrow VB I-VPto TO B-PPonly RB B-NP# # I-NP1.8 CD I-NPbillion CD I-NPin IN B-PPSeptember NNP B-NP. . O Exception in thread "AWT-EventQueue-0" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
at opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89) at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105) at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74) at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
    at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
at form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989) at form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
    at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
    at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022) at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348) at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
    at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833) at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
    at java.awt.Component.processMouseEvent(Component.java:6535)
    at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
    at java.awt.Component.processEvent(Component.java:6300)
    at java.awt.Container.processEvent(Container.java:2236)
    at java.awt.Component.dispatchEventImpl(Component.java:4891)
    at java.awt.Container.dispatchEventImpl(Container.java:2294)
    at java.awt.Component.dispatchEvent(Component.java:4713)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888) at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
    at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
    at java.awt.Container.dispatchEventImpl(Container.java:2280)
    at java.awt.Window.dispatchEventImpl(Window.java:2750)
    at java.awt.Component.dispatchEvent(Component.java:4713)
    at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
    at java.awt.EventQueue.access$500(EventQueue.java:97)
    at java.awt.EventQueue$3.run(EventQueue.java:709)
    at java.awt.EventQueue$3.run(EventQueue.java:703)
    at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
    at java.awt.EventQueue$4.run(EventQueue.java:731)
    at java.awt.EventQueue$4.run(EventQueue.java:729)
    at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
    at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
    at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
Sorting and merging events...

Here are whole java code:

try {
            Charset charset = Charset.forName("UTF-8");
            File fileChunker = new File("trainApacheChunker.txt");
MarkableFileInputStreamFactory i = new MarkableFileInputStreamFactory(fileChunker); ObjectStream lineStream = new PlainTextByLineStream(i, charset); ObjectStream<ChunkSample> sampleStream = new ChunkSampleStream(lineStream);

chunkerModel = ChunkerME.train("en", sampleStream, TrainingParameters.defaultParams(), new ChunkerFactory());

            modelApacheChunkerPath = "chunkerModel.bin";
OutputStream modelOut = new BufferedOutputStream(new FileOutputStream(modelApacheChunkerPath));
            chunkerModel.serialize(modelOut);
        } catch (FileNotFoundException fe) {

        } catch (IOException ie) {

        }

Would you please check this point for me?

Thank you so much for your help.

Best regards,

Trung Tran.


On 05/18/2016 04:56 AM, [email protected] wrote:
Dear Apache OpenNLP Project Team,

Thank you so much for giving me very useful information about class (
/opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
)

It works very well.

There is one more point: I have error when train Vietnamese sentences (more than 2 sentences in one training file).

Here is 2 example sentences in file trainChunker.txt:

buo^?i                _T_C               B-ADVP
tru+a               _T_C               I-ADVP
,                       ,               O
cu+`u               A_C               B-NP
cha.y               IT_M               B-VP
theo               IT_M               I-VP
me.               H_C               I-VP
ra               IT_M               B-PP
bo+`               S_C               I-PP
suo^'i               S_C               I-PP
.               .               O

nó               C_N_T           B-NP
tha^'y               S_P               B-VP
ba^`y               A_G               B-NP
hu+o+u               A_C               I-NP
nai               A_C               I-NP
?ã               ST_P_S          B-CONJP
o+?               IT_P_C          B-PP
?a^'y               C_N_T           I-PP
ro^`i               T_G               I-PP
.               .               O

Here is the error right after train the first sentence:

Skipping corrupt line: buo^?i           _T_C           B-ADVP
Skipping corrupt line: tru+a           _T_C           I-ADVP
Skipping corrupt line: ,           ,           O
Skipping corrupt line: cu+`u           A_C           B-NP
Skipping corrupt line: cha.y           IT_M           B-VP
Skipping corrupt line: theo           IT_M           I-VP
Skipping corrupt line: me.           H_C           I-VP
Skipping corrupt line: ra           IT_M           B-PP
Skipping corrupt line: bo+`           S_C           I-PP
Skipping corrupt line: suo^'i           S_C           I-PP
Skipping corrupt line: .           .           O
Exception in thread "AWT-EventQueue-0" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
at opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89) at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105) at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74) at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
    at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
at form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939) at form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
    at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
    at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022) at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348) at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
    at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833) at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877) Sorting and merging events... at java.awt.Component.processMouseEvent(Component.java:6535)
    at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
    at java.awt.Component.processEvent(Component.java:6300)
    at java.awt.Container.processEvent(Container.java:2236)
    at java.awt.Component.dispatchEventImpl(Component.java:4891)
    at java.awt.Container.dispatchEventImpl(Container.java:2294)
    at java.awt.Component.dispatchEvent(Component.java:4713)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888) at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
    at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
    at java.awt.Container.dispatchEventImpl(Container.java:2280)
    at java.awt.Window.dispatchEventImpl(Window.java:2750)
    at java.awt.Component.dispatchEvent(Component.java:4713)
    at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
    at java.awt.EventQueue.access$500(EventQueue.java:97)
    at java.awt.EventQueue$3.run(EventQueue.java:709)
    at java.awt.EventQueue$3.run(EventQueue.java:703)
    at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
    at java.awt.EventQueue$4.run(EventQueue.java:731)
    at java.awt.EventQueue$4.run(EventQueue.java:729)
    at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
    at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
    at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)

Would you please check these points for me?

Thank you so much for your help.

Best regards,

Trung Tran.

On 05/17/2016 08:15 PM, [email protected] wrote:
Dear Apache OpenNLP Project Team,

I have another error with command line tool:

- I did exactly as information in site (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):

E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8

File test only contains sample sentence as in the site :

He        PRP  B-NP
reckons   VBZ  B-VP
the       DT   B-NP
current   JJ   I-NP
account   NN   I-NP
deficit   NN   I-NP
will      MD   B-VP
narrow    VB   I-VP
to        TO   B-PP
only      RB   B-NP
#         #    I-NP
1.8       CD   I-NP
billion   CD   I-NP
in        IN   B-PP
September NNP  B-NP
.         .    O
And here is the error:

        Computing event counts...  done. 0 events
        Indexing...  done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
        at opennlp.maxent.GIS.trainModel(GIS.java:256)
        at opennlp.model.TrainUtil.train(TrainUtil.java:184)
        at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
at opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
ol.java:68)
        at opennlp.tools.cmdline.CLI.main(CLI.java:222)


Another point: The function cannot read more than 2 sentence in one train file.

Would you please check these points for me?

Thank you so much for your help.

Best regards,

Trung Tran.

On 05/17/2016 02:06 PM, [email protected] wrote:
Dear Apache OpenNLP Project Team,

I have an critical issue when training with Chunker tool in Java:

- Firstly, the sample code in documentation site (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api) is not work, both for version 1.5.3 and 1.6.0

- Secondly, I have to edit the codes myself to (using version 1.5.3):

try {
            Charset charset = Charset.forName("UTF-8");
ObjectStream lineStream = new PlainTextByLineStream(new FileInputStream(fileChunker), charset); ObjectStream<ChunkSample> sampleStream = new ChunkSampleStream(lineStream);

chunkerModel = ChunkerME.train("vn", sampleStream, TrainingParameters.defaultParams(), new ChunkerFactory());

modelApacheChunkerPath = UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin"); OutputStream modelOut = new BufferedOutputStream(new FileOutputStream(modelApacheChunkerPath));
            chunkerModel.serialize(modelOut);
        } catch (FileNotFoundException fe) {

        } catch (IOException ie) {

        }

- Thirdly, I have the error "java.lang.String cannot be cast to opennlp.tools.parser.Parse". The reason is:

+ The constructor of class ChunkSampleStream requires parameter is "ObjectStream<Parse> in"

+ However, the second parameter of method ChunkerME.train is "ObjectStream<ChunkSample> in"

I cannot find any way to work around this issue.

Would you please check this point for me?

Thank you so much for your help.

Best regards,

Trung Tran.



Reply via email to