On which data do you train exactly?
How many sentences?

Jörn

On Thu, May 26, 2016 at 2:49 PM, [email protected] <
[email protected]> wrote:

> Dear Apache OpenNLP Project Team,
>
> I have re-tested with sample sentence in the site (
> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training)
> :
>
> He        PRP  B-NP
> reckons   VBZ  B-VP
> the       DT   B-NP
> current   JJ   I-NP
> account   NN   I-NP
> deficit   NN   I-NP
> will      MD   B-VP
> narrow    VB   I-VP
> to        TO   B-PP
> only      RB   B-NP
> #         #    I-NP
> 1.8       CD   I-NP
> billion   CD   I-NP
> in        IN   B-PP
> September NNP  B-NP
> .         .    O
>
> And I still receive the same error:
>
> Skipping corrupt line: He        PRP  B-NPreckons   VBZ B-VPthe       DT
>  B-NPcurrent   JJ   I-NPaccount   NN I-NPdeficit   NN   I-NPwill      MD
>  B-VPnarrow    VB I-VPto        TO   B-PPonly      RB   B-NP#         #
> I-NP1.8       CD   I-NPbillion   CD   I-NPin        IN B-PPSeptember NNP
> B-NP.         .    O
> Exception in thread "AWT-EventQueue-0"
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>     at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>     at java.util.ArrayList.get(ArrayList.java:429)
>     at
> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>     at
> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>     at opennlp.tools.ml
> .AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>     at opennlp.tools.ml
> .AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>     at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>     at
> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989)
>     at
> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166)
>     at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>     at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>     at
> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>     at
> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>     at
> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>     at
> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>     at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>     at
> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>     at
> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>     at java.awt.Component.processMouseEvent(Component.java:6535)
>     at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>     at java.awt.Component.processEvent(Component.java:6300)
>     at java.awt.Container.processEvent(Container.java:2236)
>     at java.awt.Component.dispatchEventImpl(Component.java:4891)
>     at java.awt.Container.dispatchEventImpl(Container.java:2294)
>     at java.awt.Component.dispatchEvent(Component.java:4713)
>     at
> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>     at
> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>     at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>     at java.awt.Container.dispatchEventImpl(Container.java:2280)
>     at java.awt.Window.dispatchEventImpl(Window.java:2750)
>     at java.awt.Component.dispatchEvent(Component.java:4713)
>     at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>     at java.awt.EventQueue.access$500(EventQueue.java:97)
>     at java.awt.EventQueue$3.run(EventQueue.java:709)
>     at java.awt.EventQueue$3.run(EventQueue.java:703)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>     at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>     at java.awt.EventQueue$4.run(EventQueue.java:731)
>     at java.awt.EventQueue$4.run(EventQueue.java:729)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at
> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>     at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>     at
> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>     at
> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>     at
> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>     at
> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>     at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>     at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
> Sorting and merging events...
>
> Here are whole java code:
>
> try {
>             Charset charset = Charset.forName("UTF-8");
>             File fileChunker = new File("trainApacheChunker.txt");
>             MarkableFileInputStreamFactory i = new
> MarkableFileInputStreamFactory(fileChunker);
>             ObjectStream lineStream = new PlainTextByLineStream(i,
> charset);
>             ObjectStream<ChunkSample> sampleStream = new
> ChunkSampleStream(lineStream);
>
>             chunkerModel = ChunkerME.train("en", sampleStream,
> TrainingParameters.defaultParams(), new ChunkerFactory());
>
>             modelApacheChunkerPath = "chunkerModel.bin";
>             OutputStream modelOut = new BufferedOutputStream(new
> FileOutputStream(modelApacheChunkerPath));
>             chunkerModel.serialize(modelOut);
>         } catch (FileNotFoundException fe) {
>
>         } catch (IOException ie) {
>
>         }
>
> Would you please check this point for me?
>
> Thank you so much for your help.
>
> Best regards,
>
> Trung Tran.
>
>
> On 05/18/2016 04:56 AM, [email protected] wrote:
>
>> Dear Apache OpenNLP Project Team,
>>
>> Thank you so much for giving me very useful information about class (
>> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java
>> )
>>
>> It works very well.
>>
>> There is one more point: I have error when train Vietnamese sentences
>> (more than 2 sentences in one training file).
>>
>> Here is 2 example sentences in file trainChunker.txt:
>>
>> buo^?i                _T_C               B-ADVP
>> tru+a               _T_C               I-ADVP
>> ,                       ,               O
>> cu+`u               A_C               B-NP
>> cha.y               IT_M               B-VP
>> theo               IT_M               I-VP
>> me.               H_C               I-VP
>> ra               IT_M               B-PP
>> bo+`               S_C               I-PP
>> suo^'i               S_C               I-PP
>> .               .               O
>>
>> nó               C_N_T           B-NP
>> tha^'y               S_P               B-VP
>> ba^`y               A_G               B-NP
>> hu+o+u               A_C               I-NP
>> nai               A_C               I-NP
>> ?ã               ST_P_S          B-CONJP
>> o+?               IT_P_C          B-PP
>> ?a^'y               C_N_T           I-PP
>> ro^`i               T_G               I-PP
>> .               .               O
>>
>> Here is the error right after train the first sentence:
>>
>> Skipping corrupt line: buo^?i           _T_C           B-ADVP
>> Skipping corrupt line: tru+a           _T_C           I-ADVP
>> Skipping corrupt line: ,           ,           O
>> Skipping corrupt line: cu+`u           A_C           B-NP
>> Skipping corrupt line: cha.y           IT_M           B-VP
>> Skipping corrupt line: theo           IT_M           I-VP
>> Skipping corrupt line: me.           H_C           I-VP
>> Skipping corrupt line: ra           IT_M           B-PP
>> Skipping corrupt line: bo+`           S_C           I-PP
>> Skipping corrupt line: suo^'i           S_C           I-PP
>> Skipping corrupt line: .           .           O
>> Exception in thread "AWT-EventQueue-0"
>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>     at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>     at java.util.ArrayList.get(ArrayList.java:429)
>>     at
>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>     at
>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>     at opennlp.tools.ml
>> .AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>     at opennlp.tools.ml
>> .AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>     at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217)
>>     at
>> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939)
>>     at
>> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136)
>>     at form.UtilitiesForm.access$1400(UtilitiesForm.java:108)
>>     at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901)
>>     at
>> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)
>>     at
>> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348)
>>     at
>> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
>>     at
>> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
>>     at javax.swing.AbstractButton.doClick(AbstractButton.java:376)
>>     at
>> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)
>>     at
>> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)
>> Sorting and merging events...     at
>> java.awt.Component.processMouseEvent(Component.java:6535)
>>     at javax.swing.JComponent.processMouseEvent(JComponent.java:3324)
>>     at java.awt.Component.processEvent(Component.java:6300)
>>     at java.awt.Container.processEvent(Container.java:2236)
>>     at java.awt.Component.dispatchEventImpl(Component.java:4891)
>>     at java.awt.Container.dispatchEventImpl(Container.java:2294)
>>     at java.awt.Component.dispatchEvent(Component.java:4713)
>>     at
>> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888)
>>     at
>> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525)
>>     at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466)
>>     at java.awt.Container.dispatchEventImpl(Container.java:2280)
>>     at java.awt.Window.dispatchEventImpl(Window.java:2750)
>>     at java.awt.Component.dispatchEvent(Component.java:4713)
>>     at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758)
>>     at java.awt.EventQueue.access$500(EventQueue.java:97)
>>     at java.awt.EventQueue$3.run(EventQueue.java:709)
>>     at java.awt.EventQueue$3.run(EventQueue.java:703)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>     at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
>>     at java.awt.EventQueue$4.run(EventQueue.java:731)
>>     at java.awt.EventQueue$4.run(EventQueue.java:729)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at
>> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76)
>>     at java.awt.EventQueue.dispatchEvent(EventQueue.java:728)
>>     at
>> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
>>     at
>> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
>>     at
>> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
>>     at
>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>>     at
>> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
>>     at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>>
>> Would you please check these points for me?
>>
>> Thank you so much for your help.
>>
>> Best regards,
>>
>> Trung Tran.
>>
>> On 05/17/2016 08:15 PM, [email protected] wrote:
>>
>>> Dear Apache OpenNLP Project Team,
>>>
>>> I have another error with command line tool:
>>>
>>>     - I did exactly as information in site (
>>> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool
>>> ):
>>>
>>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model
>>> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8
>>>
>>> File test only contains sample sentence as in the site :
>>>
>>> He        PRP  B-NP
>>> reckons   VBZ  B-VP
>>> the       DT   B-NP
>>> current   JJ   I-NP
>>> account   NN   I-NP
>>> deficit   NN   I-NP
>>> will      MD   B-VP
>>> narrow    VB   I-VP
>>> to        TO   B-PP
>>> only      RB   B-NP
>>> #         #    I-NP
>>> 1.8       CD   I-NP
>>> billion   CD   I-NP
>>> in        IN   B-PP
>>> September NNP  B-NP
>>> .         .    O
>>> And here is the error:
>>>
>>>         Computing event counts...  done. 0 events
>>>         Indexing...  done.
>>> Sorting and merging events... Done indexing.
>>> Incorporating indexed data for training...
>>> Exception in thread "main" java.lang.NullPointerException
>>>         at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>>         at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>>         at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>>>         at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214)
>>>         at
>>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo
>>> ol.java:68)
>>>         at opennlp.tools.cmdline.CLI.main(CLI.java:222)
>>>
>>>
>>> Another point: The function cannot read more than 2 sentence in one
>>> train file.
>>>
>>> Would you please check these points for me?
>>>
>>> Thank you so much for your help.
>>>
>>> Best regards,
>>>
>>> Trung Tran.
>>>
>>> On 05/17/2016 02:06 PM, [email protected] wrote:
>>>
>>>> Dear Apache OpenNLP Project Team,
>>>>
>>>> I have an critical issue when training with Chunker tool in Java:
>>>>
>>>>     - Firstly, the sample code in documentation site (
>>>> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api)
>>>> is not work, both for version 1.5.3 and 1.6.0
>>>>
>>>>     - Secondly, I have to edit the codes myself to (using version
>>>> 1.5.3):
>>>>
>>>> try {
>>>>             Charset charset = Charset.forName("UTF-8");
>>>>             ObjectStream lineStream = new PlainTextByLineStream(new
>>>> FileInputStream(fileChunker), charset);
>>>>             ObjectStream<ChunkSample> sampleStream = new
>>>> ChunkSampleStream(lineStream);
>>>>
>>>>             chunkerModel = ChunkerME.train("vn", sampleStream,
>>>> TrainingParameters.defaultParams(), new ChunkerFactory());
>>>>
>>>>             modelApacheChunkerPath =
>>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin");
>>>>             OutputStream modelOut = new BufferedOutputStream(new
>>>> FileOutputStream(modelApacheChunkerPath));
>>>>             chunkerModel.serialize(modelOut);
>>>>         } catch (FileNotFoundException fe) {
>>>>
>>>>         } catch (IOException ie) {
>>>>
>>>>         }
>>>>
>>>>     - Thirdly, I have the error "java.lang.String cannot be cast to
>>>> opennlp.tools.parser.Parse". The reason is:
>>>>
>>>>             + The constructor of class ChunkSampleStream requires
>>>> parameter is "ObjectStream<Parse> in"
>>>>
>>>>             + However, the second parameter of method ChunkerME.train
>>>> is "ObjectStream<ChunkSample> in"
>>>>
>>>> I cannot find any way to work around this issue.
>>>>
>>>> Would you please check this point for me?
>>>>
>>>> Thank you so much for your help.
>>>>
>>>> Best regards,
>>>>
>>>> Trung Tran.
>>>>
>>>
>>>
>>
>

Reply via email to