The model size depends on the amount of features you have, each feature
is stored as a String object in memory combined with some weights which are stored
as doubles.

How much training data do you have? How many features and outcomes does the data have?

Jörn

On 05/28/2014 12:32 AM, William Colen wrote:
Usually you don't need a huge training data set to have an effective model.
You can measure the tradeoff between the training dataset size, the cutoff
and the algorithm using the 10-fold cross-validation tool included in the
OpenNLP command line interface. You would need to run different experiments
changing these parameters. In your case not only the F-measure is
important, but also the model size.


2014-05-27 18:59 GMT-03:00 Jeffrey Zemerick <[email protected]>:

I do not, William. I assumed it was due to the large training data set. I
will look into the things you mentioned. Thanks!


On Tue, May 27, 2014 at 3:35 PM, William Colen <[email protected]
wrote:
Do you know why your model is so big?

You can reduce its size by using a higher cutoff, or trying Perceptron.
You
can also try using a entity dictionary, which will avoid the algorithm
storing the entities in the form of features.

I am not aware of a way to avoid loading it into memory.

Regards,
William

2014-05-27 16:11 GMT-03:00 Jeffrey Zemerick <[email protected]>:

Hi Users,

Is anyone aware of a way to load a TokenNameFinder model and use it
without
storing the entire model in memory? My models take up about 6 GB of
memory.
I see in the code that the model files are unzipped and put into a
HashMap.
Is it possible to store the data structure off-heap somewhere?

Thanks,
Jeff


Reply via email to