On Jun 7, 2012, at 8:37 PM, Sheng Guo wrote:

> I think distributed cache is a good way to do this.
> I did some similar work about stanford parser model loading in Hadoop using 
> distributed cache.
> I think that will solve the problem. But we should be careful because the 
> Hadoop system is normally data-intensive, and NLP handling there may cause 
> high-CPU usage and problem to other jobs.

I have used the distributed cache with Hadoop on Elastic MapReduce, which 
worked really well with OpenNLP models.  Additionally, with EMR models can be 
stored on S3 and added to the distributed cache using their s3:// paths.

Reply via email to