On which data do you train exactly? How many sentences? Jörn
On Thu, May 26, 2016 at 2:49 PM, [email protected] < [email protected]> wrote: > Dear Apache OpenNLP Project Team, > > I have re-tested with sample sentence in the site ( > https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training) > : > > He PRP B-NP > reckons VBZ B-VP > the DT B-NP > current JJ I-NP > account NN I-NP > deficit NN I-NP > will MD B-VP > narrow VB I-VP > to TO B-PP > only RB B-NP > # # I-NP > 1.8 CD I-NP > billion CD I-NP > in IN B-PP > September NNP B-NP > . . O > > And I still receive the same error: > > Skipping corrupt line: He PRP B-NPreckons VBZ B-VPthe DT > B-NPcurrent JJ I-NPaccount NN I-NPdeficit NN I-NPwill MD > B-VPnarrow VB I-VPto TO B-PPonly RB B-NP# # > I-NP1.8 CD I-NPbillion CD I-NPin IN B-PPSeptember NNP > B-NP. . O > Exception in thread "AWT-EventQueue-0" > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89) > at > opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105) > at opennlp.tools.ml > .AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74) > at opennlp.tools.ml > .AbstractEventTrainer.train(AbstractEventTrainer.java:91) > at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217) > at > form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2989) > at > form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1166) > at form.UtilitiesForm.access$1400(UtilitiesForm.java:108) > at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901) > at > javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022) > at > javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348) > at > javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) > at > javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259) > at javax.swing.AbstractButton.doClick(AbstractButton.java:376) > at > javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833) > at > javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877) > at java.awt.Component.processMouseEvent(Component.java:6535) > at javax.swing.JComponent.processMouseEvent(JComponent.java:3324) > at java.awt.Component.processEvent(Component.java:6300) > at java.awt.Container.processEvent(Container.java:2236) > at java.awt.Component.dispatchEventImpl(Component.java:4891) > at java.awt.Container.dispatchEventImpl(Container.java:2294) > at java.awt.Component.dispatchEvent(Component.java:4713) > at > java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888) > at > java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525) > at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466) > at java.awt.Container.dispatchEventImpl(Container.java:2280) > at java.awt.Window.dispatchEventImpl(Window.java:2750) > at java.awt.Component.dispatchEvent(Component.java:4713) > at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758) > at java.awt.EventQueue.access$500(EventQueue.java:97) > at java.awt.EventQueue$3.run(EventQueue.java:709) > at java.awt.EventQueue$3.run(EventQueue.java:703) > at java.security.AccessController.doPrivileged(Native Method) > at > java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76) > at > java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86) > at java.awt.EventQueue$4.run(EventQueue.java:731) > at java.awt.EventQueue$4.run(EventQueue.java:729) > at java.security.AccessController.doPrivileged(Native Method) > at > java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76) > at java.awt.EventQueue.dispatchEvent(EventQueue.java:728) > at > java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201) > at > java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) > at > java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) > at > java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) > at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) > at java.awt.EventDispatchThread.run(EventDispatchThread.java:82) > Sorting and merging events... > > Here are whole java code: > > try { > Charset charset = Charset.forName("UTF-8"); > File fileChunker = new File("trainApacheChunker.txt"); > MarkableFileInputStreamFactory i = new > MarkableFileInputStreamFactory(fileChunker); > ObjectStream lineStream = new PlainTextByLineStream(i, > charset); > ObjectStream<ChunkSample> sampleStream = new > ChunkSampleStream(lineStream); > > chunkerModel = ChunkerME.train("en", sampleStream, > TrainingParameters.defaultParams(), new ChunkerFactory()); > > modelApacheChunkerPath = "chunkerModel.bin"; > OutputStream modelOut = new BufferedOutputStream(new > FileOutputStream(modelApacheChunkerPath)); > chunkerModel.serialize(modelOut); > } catch (FileNotFoundException fe) { > > } catch (IOException ie) { > > } > > Would you please check this point for me? > > Thank you so much for your help. > > Best regards, > > Trung Tran. > > > On 05/18/2016 04:56 AM, [email protected] wrote: > >> Dear Apache OpenNLP Project Team, >> >> Thank you so much for giving me very useful information about class ( >> /opennlp-tools/src/main/java/opennlp/tools/chunker/ChunkSampleStream.java >> ) >> >> It works very well. >> >> There is one more point: I have error when train Vietnamese sentences >> (more than 2 sentences in one training file). >> >> Here is 2 example sentences in file trainChunker.txt: >> >> buo^?i _T_C B-ADVP >> tru+a _T_C I-ADVP >> , , O >> cu+`u A_C B-NP >> cha.y IT_M B-VP >> theo IT_M I-VP >> me. H_C I-VP >> ra IT_M B-PP >> bo+` S_C I-PP >> suo^'i S_C I-PP >> . . O >> >> nó C_N_T B-NP >> tha^'y S_P B-VP >> ba^`y A_G B-NP >> hu+o+u A_C I-NP >> nai A_C I-NP >> ?ã ST_P_S B-CONJP >> o+? IT_P_C B-PP >> ?a^'y C_N_T I-PP >> ro^`i T_G I-PP >> . . O >> >> Here is the error right after train the first sentence: >> >> Skipping corrupt line: buo^?i _T_C B-ADVP >> Skipping corrupt line: tru+a _T_C I-ADVP >> Skipping corrupt line: , , O >> Skipping corrupt line: cu+`u A_C B-NP >> Skipping corrupt line: cha.y IT_M B-VP >> Skipping corrupt line: theo IT_M I-VP >> Skipping corrupt line: me. H_C I-VP >> Skipping corrupt line: ra IT_M B-PP >> Skipping corrupt line: bo+` S_C I-PP >> Skipping corrupt line: suo^'i S_C I-PP >> Skipping corrupt line: . . O >> Exception in thread "AWT-EventQueue-0" >> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 >> at java.util.ArrayList.rangeCheck(ArrayList.java:653) >> at java.util.ArrayList.get(ArrayList.java:429) >> at >> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89) >> at >> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105) >> at opennlp.tools.ml >> .AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74) >> at opennlp.tools.ml >> .AbstractEventTrainer.train(AbstractEventTrainer.java:91) >> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:217) >> at >> form.UtilitiesForm.handleTrainingTreebankWithApacheChunker(UtilitiesForm.java:2939) >> at >> form.UtilitiesForm.jmnitLoadTreebankActionPerformed(UtilitiesForm.java:1136) >> at form.UtilitiesForm.access$1400(UtilitiesForm.java:108) >> at form.UtilitiesForm$17.actionPerformed(UtilitiesForm.java:901) >> at >> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022) >> at >> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348) >> at >> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) >> at >> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259) >> at javax.swing.AbstractButton.doClick(AbstractButton.java:376) >> at >> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833) >> at >> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877) >> Sorting and merging events... at >> java.awt.Component.processMouseEvent(Component.java:6535) >> at javax.swing.JComponent.processMouseEvent(JComponent.java:3324) >> at java.awt.Component.processEvent(Component.java:6300) >> at java.awt.Container.processEvent(Container.java:2236) >> at java.awt.Component.dispatchEventImpl(Component.java:4891) >> at java.awt.Container.dispatchEventImpl(Container.java:2294) >> at java.awt.Component.dispatchEvent(Component.java:4713) >> at >> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888) >> at >> java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525) >> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466) >> at java.awt.Container.dispatchEventImpl(Container.java:2280) >> at java.awt.Window.dispatchEventImpl(Window.java:2750) >> at java.awt.Component.dispatchEvent(Component.java:4713) >> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758) >> at java.awt.EventQueue.access$500(EventQueue.java:97) >> at java.awt.EventQueue$3.run(EventQueue.java:709) >> at java.awt.EventQueue$3.run(EventQueue.java:703) >> at java.security.AccessController.doPrivileged(Native Method) >> at >> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76) >> at >> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86) >> at java.awt.EventQueue$4.run(EventQueue.java:731) >> at java.awt.EventQueue$4.run(EventQueue.java:729) >> at java.security.AccessController.doPrivileged(Native Method) >> at >> java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:76) >> at java.awt.EventQueue.dispatchEvent(EventQueue.java:728) >> at >> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201) >> at >> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) >> at >> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) >> at >> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) >> at >> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) >> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82) >> >> Would you please check these points for me? >> >> Thank you so much for your help. >> >> Best regards, >> >> Trung Tran. >> >> On 05/17/2016 08:15 PM, [email protected] wrote: >> >>> Dear Apache OpenNLP Project Team, >>> >>> I have another error with command line tool: >>> >>> - I did exactly as information in site ( >>> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.tool >>> ): >>> >>> E:\test\apache-opennlp-1.5.3\bin>opennlp.bat ChunkerTrainerME -model >>> E:\test\en-chunker.bin -lang en -data E:\test\tmp.txt -encoding UTF-8 >>> >>> File test only contains sample sentence as in the site : >>> >>> He PRP B-NP >>> reckons VBZ B-VP >>> the DT B-NP >>> current JJ I-NP >>> account NN I-NP >>> deficit NN I-NP >>> will MD B-VP >>> narrow VB I-VP >>> to TO B-PP >>> only RB B-NP >>> # # I-NP >>> 1.8 CD I-NP >>> billion CD I-NP >>> in IN B-PP >>> September NNP B-NP >>> . . O >>> And here is the error: >>> >>> Computing event counts... done. 0 events >>> Indexing... done. >>> Sorting and merging events... Done indexing. >>> Incorporating indexed data for training... >>> Exception in thread "main" java.lang.NullPointerException >>> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263) >>> at opennlp.maxent.GIS.trainModel(GIS.java:256) >>> at opennlp.model.TrainUtil.train(TrainUtil.java:184) >>> at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:214) >>> at >>> opennlp.tools.cmdline.chunker.ChunkerTrainerTool.run(ChunkerTrainerTo >>> ol.java:68) >>> at opennlp.tools.cmdline.CLI.main(CLI.java:222) >>> >>> >>> Another point: The function cannot read more than 2 sentence in one >>> train file. >>> >>> Would you please check these points for me? >>> >>> Thank you so much for your help. >>> >>> Best regards, >>> >>> Trung Tran. >>> >>> On 05/17/2016 02:06 PM, [email protected] wrote: >>> >>>> Dear Apache OpenNLP Project Team, >>>> >>>> I have an critical issue when training with Chunker tool in Java: >>>> >>>> - Firstly, the sample code in documentation site ( >>>> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.chunker.training.api) >>>> is not work, both for version 1.5.3 and 1.6.0 >>>> >>>> - Secondly, I have to edit the codes myself to (using version >>>> 1.5.3): >>>> >>>> try { >>>> Charset charset = Charset.forName("UTF-8"); >>>> ObjectStream lineStream = new PlainTextByLineStream(new >>>> FileInputStream(fileChunker), charset); >>>> ObjectStream<ChunkSample> sampleStream = new >>>> ChunkSampleStream(lineStream); >>>> >>>> chunkerModel = ChunkerME.train("vn", sampleStream, >>>> TrainingParameters.defaultParams(), new ChunkerFactory()); >>>> >>>> modelApacheChunkerPath = >>>> UtilityHelper.getTemporaryFilePathInsideDir("chunkerModel.bin"); >>>> OutputStream modelOut = new BufferedOutputStream(new >>>> FileOutputStream(modelApacheChunkerPath)); >>>> chunkerModel.serialize(modelOut); >>>> } catch (FileNotFoundException fe) { >>>> >>>> } catch (IOException ie) { >>>> >>>> } >>>> >>>> - Thirdly, I have the error "java.lang.String cannot be cast to >>>> opennlp.tools.parser.Parse". The reason is: >>>> >>>> + The constructor of class ChunkSampleStream requires >>>> parameter is "ObjectStream<Parse> in" >>>> >>>> + However, the second parameter of method ChunkerME.train >>>> is "ObjectStream<ChunkSample> in" >>>> >>>> I cannot find any way to work around this issue. >>>> >>>> Would you please check this point for me? >>>> >>>> Thank you so much for your help. >>>> >>>> Best regards, >>>> >>>> Trung Tran. >>>> >>> >>> >> >
