I think distributed cache is a good way to do this.
I did some similar work about stanford parser model loading in Hadoop using 
distributed cache.
I think that will solve the problem. But we should be careful because the 
Hadoop system is normally data-intensive, and NLP handling there may cause 
high-CPU usage and problem to other jobs.

Sheng

> Date: Thu, 7 Jun 2012 20:17:26 -0400
> From: [email protected]
> To: [email protected]
> Subject: Re: openNLP with Hadoop MapReduce Programming
> 
> Hadoop seems to be a large scale project; so, the work would be spread
> across many servers / clients to perform the work.  The map reduce would
> allow all the processes across many servers to be done and then
> synchronized to provide the final results.  So, each process would have
> to load its own model.  The file system using HDFS should allow sharing
> of the models and large data collection between them all.
> 
> On 6/7/2012 3:45 AM, Jörn Kottmann wrote:
> > On 06/07/2012 05:39 AM, James Kosin wrote:
> >> Hmm, good idea.  I'll have to try that soon... I do create models for my
> >> project and have them included in the JAR... but, haven't gotten around
> >> to testing with them embedded in the JAR file.  I know there will be
> >> issues with this and it is usually best to keep them in either windows
> >> or linux file system.
> >> Jorn has the start of supporting the web-server side; but, I know it is
> >> far from complete... he still has this marked as a TODO for the
> >> interface.  Unless I'm a bit behind now.
> >
> > I usually load my models from an http server, because
> > they are getting updated much more frequently than
> > my jars, but if you use map reduce you will need to do
> > the loading yourself (very easy in java).
> >
> > Just including a model in a jar works great and many
> > people actually do that.
> >
> > If you have many threads you want to share the models
> > between them I am not sure how this is done in map reduce.
> >
> > Jörn
> 
> 
                                          

Reply via email to