Hello, do you think the examples are incorrect in general or just the bracket? I recognized in the code that the brackets should get translated to "LRB" auomatically. So i found no obvious mistake in my training data. But i'm not sure if the -NONE- Tag is correct or if i can ommit unknown POS Tags.
All the best Andreas Am 20.03.2014 17:48, schrieb Rodrigo Agerri: > Have you tried correcting them? > > Cheers, > > Rodrigo > > On 2014/03/20 at 16:34, Andreas Niekler wrote: >> Hi, >> >> i converted the XML Tiger Corpus to the training format >> >> (TOP (S (NN Zugeständnisse) (VP (ADJD unzureichend) (VVPP genannt) >> ))(-NONE- /) ) >> (TOP (-NONE- ``) (VP (NN Land) (PP (APPR auf) (NN Konfrontationskurs) >> )(VVPP gesteuert) )(-NONE- '') (-NONE- /) ) >> (TOP (ADJA Harte) (NN Töne) (NP (ART der) (NN Regierung) )(PP (APPR >> gegen) (NN Nationalkongreß) )) >> (TOP (NE JOHANNESBURG) (, ,) (NP (ADJA 5.) (NN Juli) )(-NONE- () (CNP >> (NE AP) (NE jod) )(-NONE- /) (-NONE- )) (. .) ) >> >> I copied some HeadRules from the >> corenlp/edu/stanford/nlp/trees/international/negra class. >> >> When i now run the trainer for the parster i get this error regarding >> the puctuations: >> >> Building dictionary >> Exception in thread "main" java.lang.NullPointerException >> at >> opennlp.tools.parser.AbstractBottomUpParser.lastChild(AbstractBottomUpParser.java:502) >> at >> opennlp.tools.parser.AbstractBottomUpParser.buildDictionary(AbstractBottomUpParser.java:552) >> at opennlp.tools.parser.chunking.Parser.train(Parser.java:287) >> at >> opennlp.tools.cmdline.parser.ParserTrainerTool.run(ParserTrainerTool.java:132) >> at opennlp.tools.cmdline.CLI.main(CLI.java:222) >> >> Has this something to do with the rraining instances that have no end >> marker? I also recognize this when there is a ( int the text: (-NONE- () >> >> Would that be the error and do i have to replace those instances. >> >> Thank you >> >> Andreas >> >> >> >> Am 20.03.2014 11:52, schrieb Andreas Niekler: >>> Hi, >>> >>> as i understand this my examples are binarized within the training >>> process and i have to provide rules for binarized trees? >>> >>> All the best >>> >>> Andreas >>> >>> Am 19.03.2014 15:31, schrieb Rodrigo Agerri: >>>> Hi Andreas, >>>> >>>> This issue has already been discussed here, so I will summarize: >>>> >>>> the english head rules come from Michael Collins thesis, check Annex A >>>> >>>> http://www.dfki.de/~neumann/dop-seminar/References/collins-thesis.pdf >>>> >>>> I have recently posted about the head rules in Spanish (Ancora corpus) >>>> >>>> https://issues.apache.org/jira/browse/OPENNLP-665 >>>> >>>> Also check the 7th of March thread about language specific headrules when >>>> training parser >>>> >>>> Finally, Stanford Parser provides headrules for the Negra corpus, which >>>> could >>>> be useful for you. >>>> >>>> corenlp/edu/stanford/nlp/trees/international/negra >>>> >>>> Cheers, >>>> >>>> Rodrigo >>>> >>>> On 2014/03/19 at 15:02, Andreas Niekler wrote: >>>>> Hi all, >>>>> >>>>> i want to train a german parser model with the tiger corpus. For this >>>>> reason i need some other HeadRules for the training process. In the >>>>> moment i'm a bit stuck understanding what this rules are exactly for and >>>>> if it would be ok if i just provide empty rules. >>>>> >>>>> Can somebody comment on this or give me a short intuition how those >>>>> rules work or how do i have to interpret / understand them? >>>>> >>>>> Thank you >>>>> >>>>> Andreas >>>>> -- >>>>> Andreas Niekler, Dipl. Ing. (FH) >>>>> NLP Group | Department of Computer Science >>>>> University of Leipzig >>>>> Johannisgasse 26 | 04103 Leipzig >>>>> >>>>> mail: [email protected] >>>> >>> >> >> -- >> Andreas Niekler, Dipl. Ing. (FH) >> NLP Group | Department of Computer Science >> University of Leipzig >> Johannisgasse 26 | 04103 Leipzig >> >> mail: [email protected] > -- Andreas Niekler, Dipl. Ing. (FH) NLP Group | Department of Computer Science University of Leipzig Johannisgasse 26 | 04103 Leipzig mail: [email protected]
