Thanks Jörn, that worked. Just in case anyone is wondering about the 4 steps Jörn mentioned, I looked at the chunking/Parser.java code again and found the reference to the author of the parsing approach used by the chunker parser (based on MaxEnt), whose thesis can be found here:
http://www.ircs.upenn.edu/download/techreports/1998/98-15.pdf As the first two steps (tag and chunk, in this order) are already provided by the training data you can configure the other two (build and check, in this order) in the lang/TrainerParams.txt as you suggested: build.Cuttoff=2 build.Iterations=200 build.Threads=4 check.Cuttoff=2 check.Iterations=200 check.Threads=4 Cheers, Rodrigo On Tue, Apr 30, 2013 at 9:46 PM, Joern Kottmann <[email protected]> wrote: > Short answer from my phone, instead of Cutoff the parameter name is > check.Cutoff=0 for example. I will have a closer look tomorrow and reply on > the list, would be nice to have a sample parameter file for the parser be > checked in. > > Cheers Jörn > > On Apr 30, 2013 7:50 PM, "Rodrigo Agerri" <[email protected]> wrote: >> >> Hi, >> >> Thanks for your answers, I will explain myself better. >> >> I edit the lang/TrainerParams.txt file where I specify, for example: >> >> Algorithm=MAXENT >> Iterations=1000 >> Cutoff=0 >> Threads=4 >> >> Then I run the ParserTrainer from the CLI: >> >> bin/opennlp ParserTrainer -headRules >> /home/ragerri/experiments/parsing/opennlp/es/data/es-head-rules >> -parserType CHUNKING -params lang/TrainerParams.txt -lang es -model >> test.bin -encoding UTF-8 -data >> /home/ragerri/experiments/parsing/ancora-2.0/ancora2.treebank >> >> It trains fine, and the model works fine in a system using Apache >> OpenNLP API, but it still uses the cutoff 5 and 100 iterations that >> seems to be the default specification training parameters for >> ParserTrainer. >> >> I can change these parameters for parser training using the API, that >> works fine, but I cannot manage to do it from the command line. >> >> I did not understand your suggestion, Jörn, could you please provide >> an example? >> >> Thanks, >> >> Rodrigo >> >> >> >> On Tue, Apr 30, 2013 at 4:21 PM, Jörn Kottmann <[email protected]> wrote: >> > On 04/30/2013 04:03 PM, William Colen wrote: >> >> >> >> Are you using the command line tool? If yes, you should pass the path >> >> to >> >> the parameters file in the command line argument -params <file-path> >> >> >> > >> > The parser trains multiple models, to make the parameters work they are >> > prefixed, >> > the prefixes for the four models are: tagger, chunker, check and build. >> > Just >> > put them in front >> > of the usual parameter names. >> > >> > HTH, >> > Jörn
