On 10/07/2013 11:00 PM, Michael Schmitz wrote:
Do you know how many sentences/tokens were annotated for the OpenNLP
POS and CHUNK models?  Do you have an idea of the "sweet spot" for
number of annotations vs performance?

If the model gets bigger the computations get more complex, but as far as I know the effect of the model not fitting anymore in the CPU cache is much more significant then that. I am using hash based int features to reduce the memory footprint in the name finder.

I don't have much experience with the Chunker or Pos Tagger in regards to performance, but it should be easy to do a series of tests, the command line tools have built in performance monitoring.

Jörn

Reply via email to