Hi,

Can you please try training with the CoNLL 2000 data and see if you
get the same error?

Also, can you please put somewhere online your Vietnamese data so we
can try and reproduce that error?

Best,

R



On Fri, Jun 3, 2016 at 11:30 PM, [email protected]
<[email protected]> wrote:
> Dear Apache OpenNLP Project Team,
>
> We really appreciate that you provides the wonderful tools OpenNLG and we
> already successfully trained with most (Tokenizer, POS Tagger).
>
> There is only one small problem (we really believe this) that I described
> below when training with Chunker.
>
> I hope that you will re-test and give us some information soon so that we
> can fix this critical point.
>
> By the way, you are always amazing team :)
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
>
>
> On 05/27/2016 08:23 AM, [email protected] wrote:
>>
>> Dear Apache OpenNLP Project Team,
>>
>> To help you reproduce the situation, I describe the experiment step by
>> step here:
>>
>>     - Firstly, I read carefully the instruction on the site
>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training):
>>
>> " The training data can be converted to the OpenNLP chunker training
>> format, that is based onCoNLL2000
>> <http://www.cnts.ua.ac.be/conll2000/chunking>. Other formats may also be
>> available. The train data consist of three columns separated by spaces. Each
>> word has been put on a separate line and there is an empty line after each
>> sentence. The first column contains the current word, the second its
>> part-of-speech tag and the third its chunk tag. The chunk tags contain the
>> name of the chunk type, for example I-NP for noun phrase words and I-VP for
>> verb phrase words. Most chunk types have two types of chunk tags, B-CHUNK
>> for the first word of the chunk and I-CHUNK for each other word in the
>> chunk. Here is an example of the file format:"
>>
>>     - Secondly, I created two tested file ".txt". The first file contains
>> only one sample sentence on the site:
>>
>> He        PRP  B-NP
>> reckons   VBZ  B-VP
>> the       DT   B-NP
>> current   JJ   I-NP
>> account   NN   I-NP
>> deficit   NN   I-NP
>> will      MD   B-VP
>> narrow    VB   I-VP
>> to        TO   B-PP
>> only      RB   B-NP
>> #         #    I-NP
>> 1.8       CD   I-NP
>> billion   CD   I-NP
>> in        IN   B-PP
>> September NNP  B-NP
>> .         .    O
>>
>>     - The second tested file contains 300 Vietnamese sentences. As
>> described on the site: Each word has been put on a separate line and there
>> is an empty line after each sentence..
>>
>>     - Thirdly, I ran the program 2 times to train these two files. With
>> both times, I had the same error, right after reading the first sentence.
>>
>> Would you please point out that I misses something?
>>
>> PS: I trained Tokenizer and POS Tagger successfully according to the
>> instruction on this site :)
>>
>> Thank you so much for helping me.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>>
>> On 05/26/2016 07:49 PM, [email protected] wrote:
>>>
>>> Dear Apache OpenNLP Project Team,
>>
>>
>>> I have re-tested with sample sentence in the site
>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
>>> :
>>>
>>> He        PRP  B-NP
>>> reckons   VBZ  B-VP
>>> the       DT   B-NP
>>> current   JJ   I-NP
>>> account   NN   I-NP
>>> deficit   NN   I-NP
>>> will      MD   B-VP
>>> narrow    VB   I-VP
>>> to        TO   B-PP
>>> only      RB   B-NP
>>> #         #    I-NP
>>> 1.8       CD   I-NP
>>> billion   CD   I-NP
>>> in        IN   B-PP
>>> September NNP  B-NP
>>> .         .    O
>>> And I still receive the same error:
>>>
>>> Skipping corrupt line: He        PRP  B-NPreckons   VBZ B-VPthe       DT
>>> B-NPcurrent   JJ   I-NPaccount   NN I-NPdeficit   NN   I-NPwill      MD
>>> B-VPnarrow    VB I-VPto        TO   B-PPonly      RB   B-NP#         #
>>> I-NP1.8       CD   I-NPbillion   CD   I-NPin        IN B-PPSeptember NNP
>>> B-NP.         .    O
>>> Exception in thread "AWT-EventQueue-0"
>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>>     at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>>     at java.util.ArrayList.get(ArrayList.java:429)
>>>     at
>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>>     at
>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>>     at
>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>>     at
>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>>     at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>>     at
>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
>>>     at
>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
>>>     at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>>     at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>>     at
>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>>     at
>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>>     at
>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>>     at
>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>>     at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>>     at
>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>>     at
>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>>     at java.awt.Component.processMouseEvent(Component.java:6535)
>>>     at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>>     at java.awt.Component.processEvent(Component.java:6300)
>>>     at java.awt.Container.processEvent(Container.java:2236)
>>>     at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>>     at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>>     at java.awt.Component.dispatchEvent(Component.java:4713)
>>>     at
>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>>     at
>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>>     at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>>     at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>>     at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>>     at java.awt.Component.dispatchEvent(Component.java:4713)
>>>     at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>>     at java.awt.EventQueue.access$500(EventQueue.java:97)
>>>     at java.awt.EventQueue$3.run(EventQueue.java:709)
>>>     at java.awt.EventQueue$3.run(EventQueue.java:703)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>     at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>>     at java.awt.EventQueue$4.run(EventQueue.java:731)
>>>     at java.awt.EventQueue$4.run(EventQueue.java:729)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at
>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>     at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>>     at
>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>>     at
>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>>     at
>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>>     at
>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>>     at
>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>>     at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>> Sorting and merging events...
>>>
>>> Here are whole java code:
>>>
>>> try {
>>>             Charset charset = Charset.forName("UTF-8");
>>>             File fileChunker = new File("trainApacheChunker.txt");
>>>             MarkableFileInputStreamFactory i = new
>>> MarkableFileInputStreamFactory(fileChunker);
>>>             ObjectStream lineStream = new PlainTextByLineStream(i,
>>> charset);
>>>             ObjectStream<ChunkSample> sampleStream = new
>>> ChunkSampleStream(lineStream);
>>>
>>>             chunkerModel = ChunkerME.train("en", sampleStream,
>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>
>>>             modelApacheChunkerPath = "chunkerModel.bin";
>>>             OutputStream modelOut = new BufferedOutputStream(new
>>> FileOutputStream(modelApacheChunkerPath));
>>>             chunkerModel.serialize(modelOut);
>>>         } catch (FileNotFoundException fe) {
>>>
>>>         } catch (IOException ie) {
>>>
>>>         }
>>>
>>> Would you please check this point for me?
>>>
>>> Thank you so much for your help.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>>
>>>
>>> On 05/18/2016 04:56 AM, [email protected] wrote:
>>>>
>>>> Dear Apache OpenNLP Project Team,
>>>>
>>>> Thank you so much for giving me very useful information about class (
>>>>
>>>> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
>>>> )
>>>>
>>>> It works very well.
>>>>
>>>> There is one more point: I have error when train Vietnamese sentences
>>>> (more than 2 sentences in one training file).
>>>>
>>>> Here is 2 example sentences in file trainChunker.txt:
>>>>
>>>> buo^?i                _T_C               B-ADVP
>>>> tru+a               _T_C               I-ADVP
>>>> ,                       ,               O
>>>> cu+`u               A_C               B-NP
>>>> cha.y               IT_M               B-VP
>>>> theo               IT_M               I-VP
>>>> me.               H_C               I-VP
>>>> ra               IT_M               B-PP
>>>> bo+`               S_C               I-PP
>>>> suo^'i               S_C               I-PP
>>>> .               .               O
>>>>
>>>> nó               C_N_T           B-NP
>>>> tha^'y               S_P               B-VP
>>>> ba^`y               A_G               B-NP
>>>> hu+o+u               A_C               I-NP
>>>> nai               A_C               I-NP
>>>> ?ã               ST_P_S          B-CONJP
>>>> o+?               IT_P_C          B-PP
>>>> ?a^'y               C_N_T           I-PP
>>>> ro^`i               T_G               I-PP
>>>> .               .               O
>>>>
>>>> Here is the error right after train the first sentence:
>>>>
>>>> Skipping corrupt line: buo^?i           _T_C           B-ADVP
>>>> Skipping corrupt line: tru+a           _T_C           I-ADVP
>>>> Skipping corrupt line: ,           ,           O
>>>> Skipping corrupt line: cu+`u           A_C           B-NP
>>>> Skipping corrupt line: cha.y           IT_M           B-VP
>>>> Skipping corrupt line: theo           IT_M           I-VP
>>>> Skipping corrupt line: me.           H_C           I-VP
>>>> Skipping corrupt line: ra           IT_M           B-PP
>>>> Skipping corrupt line: bo+`           S_C           I-PP
>>>> Skipping corrupt line: suo^'i           S_C           I-PP
>>>> Skipping corrupt line: .           .           O
>>>> Exception in thread "AWT-EventQueue-0"
>>>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>>>     at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>>>     at java.util.ArrayList.get(ArrayList.java:429)
>>>>     at
>>>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>>>     at
>>>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>>>     at
>>>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>>>     at
>>>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>>>     at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>>>     at
>>>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
>>>>     at
>>>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
>>>>     at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>>>     at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>>>     at
>>>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>>>     at
>>>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>>>     at
>>>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>>>     at
>>>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>>>     at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>>>     at
>>>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>>>     at
>>>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>>>> Sorting and merging events...     at
>>>> java.awt.Component.processMouseEvent(Component.java:6535)
>>>>     at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>>>     at java.awt.Component.processEvent(Component.java:6300)
>>>>     at java.awt.Container.processEvent(Container.java:2236)
>>>>     at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>>>     at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>>>     at java.awt.Component.dispatchEvent(Component.java:4713)
>>>>     at
>>>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>>>     at
>>>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>>>     at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>>>     at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>>>     at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>>>     at java.awt.Component.dispatchEvent(Component.java:4713)
>>>>     at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>>>     at java.awt.EventQueue.access$500(EventQueue.java:97)
>>>>     at java.awt.EventQueue$3.run(EventQueue.java:709)
>>>>     at java.awt.EventQueue$3.run(EventQueue.java:703)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>>     at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>>>     at java.awt.EventQueue$4.run(EventQueue.java:731)
>>>>     at java.awt.EventQueue$4.run(EventQueue.java:729)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at
>>>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>>>     at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>>>     at
>>>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>>>     at
>>>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>>>     at
>>>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>>>     at
>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>>>     at
>>>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>>>     at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>>>
>>>> Would you please check these points for me?
>>>>
>>>> Thank you so much for your help.
>>>>
>>>> Best regards,
>>>>
>>>> Trung Tran.
>>>>
>>>> On 05/17/2016 08:15 PM, [email protected] wrote:
>>>>>
>>>>> Dear Apache OpenNLP Project Team,
>>>>>
>>>>> I have another error with command line tool:
>>>>>
>>>>>     - I did exactly as information in site
>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool):
>>>>>
>>>>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
>>>>> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
>>>>>
>>>>> File test only contains sample sentence as in the site :
>>>>>
>>>>> He        PRP  B-NP
>>>>> reckons   VBZ  B-VP
>>>>> the       DT   B-NP
>>>>> current   JJ   I-NP
>>>>> account   NN   I-NP
>>>>> deficit   NN   I-NP
>>>>> will      MD   B-VP
>>>>> narrow    VB   I-VP
>>>>> to        TO   B-PP
>>>>> only      RB   B-NP
>>>>> #         #    I-NP
>>>>> 1.8       CD   I-NP
>>>>> billion   CD   I-NP
>>>>> in        IN   B-PP
>>>>> September NNP  B-NP
>>>>> .         .    O
>>>>> And here is the error:
>>>>>
>>>>>         Computing event counts...  done. 0 events
>>>>>         Indexing...  done.
>>>>> Sorting and merging events... Done indexing.
>>>>> Incorporating indexed data for training...
>>>>> Exception in thread "main" java.lang.NullPointerException
>>>>>         at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>>>>         at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>>>>         at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>>>>>         at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>>>>>         at
>>>>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>>>>> ol.java:68)
>>>>>         at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>>>>
>>>>>
>>>>> Another point: The function cannot read more than 2 sentence in one
>>>>> train file.
>>>>>
>>>>> Would you please check these points for me?
>>>>>
>>>>> Thank you so much for your help.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Trung Tran.
>>>>>
>>>>> On 05/17/2016 02:06 PM, [email protected] wrote:
>>>>>>
>>>>>> Dear Apache OpenNLP Project Team,
>>>>>>
>>>>>> I have an critical issue when training with Chunker tool in Java:
>>>>>>
>>>>>>     - Firstly, the sample code in documentation site
>>>>>> (https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>>>>> is not work, both for version 1.5.3 and 1.6.0
>>>>>>
>>>>>>     - Secondly, I have to edit the codes myself to (using version
>>>>>> 1.5.3):
>>>>>>
>>>>>> try {
>>>>>>             Charset charset = Charset.forName("UTF-8");
>>>>>>             ObjectStream lineStream = new PlainTextByLineStream(new
>>>>>> FileInputStream(fileChunker), charset);
>>>>>>             ObjectStream<ChunkSample> sampleStream = new
>>>>>> ChunkSampleStream(lineStream);
>>>>>>
>>>>>>             chunkerModel = ChunkerME.train("vn", sampleStream,
>>>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>>>
>>>>>>             modelApacheChunkerPath =
>>>>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>>>>>             OutputStream modelOut = new BufferedOutputStream(new
>>>>>> FileOutputStream(modelApacheChunkerPath));
>>>>>>             chunkerModel.serialize(modelOut);
>>>>>>         } catch (FileNotFoundException fe) {
>>>>>>
>>>>>>         } catch (IOException ie) {
>>>>>>
>>>>>>         }
>>>>>>
>>>>>>     - Thirdly, I have the error "java.lang.String cannot be cast to
>>>>>> opennlp.tools.parser.Parse". The reason is:
>>>>>>
>>>>>>             + The constructor of class ChunkSampleStream requires
>>>>>> parameter is "ObjectStream<Parse> in"
>>>>>>
>>>>>>             + However, the second parameter of method ChunkerME.train
>>>>>> is "ObjectStream<ChunkSample> in"
>>>>>>
>>>>>> I cannot find any way to work around this issue.
>>>>>>
>>>>>> Would you please check this point for me?
>>>>>>
>>>>>> Thank you so much for your help.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Trung Tran.
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to