Re: Scaling using Hadoop

Greg Holmberg Wed, 05 Oct 2011 11:25:05 -0700

On Tue, 27 Sep 2011 01:06:02 -0700, Thilo Götz <twgo...@gmx.de> wrote:

On 26/09/11 22:31, Greg Holmberg wrote:
This is what I'm doing. I use JavaSpaces (producer/consumer queue),but I'm
sure you can get the same effect with UIMA AS and ActiveMQ.
Or Hadoop.

Thilo, could you expand on this? Exactly how do you use Hadoop to scaleUIMA?

What storage do you use under Hadoop (HDFS, Hbase, Hive, etc), and what isyour final storage destination for the CAS data?


Are you doing on-demand, streaming, or batch processing of documents?

What are your key/value pairs? URLs? What's your map step, what's yourreduce step?

How do you partition? Do you find the system is load balanced? Whatlevel of efficiency do you get? What level of CPU utilization?

Do you do just document (UIMA) analysis in Hadoop, or also collection(multi-doc) analytics?

The fit between UIMA and Hadoop isn't obvious to me. Just trying tofigure it out.


Thanks,


Greg Holmberg

Re: Scaling using Hadoop

Reply via email to