Usually you don't need a huge training data set to have an effective model.
You can measure the tradeoff between the training dataset size, the cutoff
and the algorithm using the 10-fold cross-validation tool included in the
OpenNLP command line interface. You would need to run different experiments
changing these parameters. In your case not only the F-measure is
important, but also the model size.


2014-05-27 18:59 GMT-03:00 Jeffrey Zemerick <[email protected]>:

> I do not, William. I assumed it was due to the large training data set. I
> will look into the things you mentioned. Thanks!
>
>
> On Tue, May 27, 2014 at 3:35 PM, William Colen <[email protected]
> >wrote:
>
> > Do you know why your model is so big?
> >
> > You can reduce its size by using a higher cutoff, or trying Perceptron.
> You
> > can also try using a entity dictionary, which will avoid the algorithm
> > storing the entities in the form of features.
> >
> > I am not aware of a way to avoid loading it into memory.
> >
> > Regards,
> > William
> >
> > 2014-05-27 16:11 GMT-03:00 Jeffrey Zemerick <[email protected]>:
> >
> > > Hi Users,
> > >
> > > Is anyone aware of a way to load a TokenNameFinder model and use it
> > without
> > > storing the entire model in memory? My models take up about 6 GB of
> > memory.
> > > I see in the code that the model files are unzipped and put into a
> > HashMap.
> > > Is it possible to store the data structure off-heap somewhere?
> > >
> > > Thanks,
> > > Jeff
> > >
> >
>

Reply via email to