If you like you can take a look at the chapters 6.6 and 6.8 of http://www.teses.usp.br/teses/disponiveis/45/45134/tde-02052013-135414/publico/WilliamColen_Dissertation.pdf
There I wrote about my experience tuning Portuguese models for POS Tagger and Chunker. I tried out many OpenNLP configurations and measured their impact both using the performance monitor and my final application itself. 2013/10/7 Jörn Kottmann <[email protected]> > On 10/07/2013 11:00 PM, Michael Schmitz wrote: > >> Do you know how many sentences/tokens were annotated for the OpenNLP >> POS and CHUNK models? Do you have an idea of the "sweet spot" for >> number of annotations vs performance? >> > > If the model gets bigger the computations get more complex, but as far as > I know > the effect of the model not fitting anymore in the CPU cache is much more > significant then > that. I am using hash based int features to reduce the memory footprint in > the name finder. > > I don't have much experience with the Chunker or Pos Tagger in regards to > performance, but > it should be easy to do a series of tests, the command line tools have > built in performance monitoring. > > Jörn >
