On Tue, 27 Sep 2011 01:06:02 -0700, Thilo Götz <twgo...@gmx.de> wrote:

On 26/09/11 22:31, Greg Holmberg wrote:

This is what I'm doing. I use JavaSpaces (producer/consumer queue), but I'm
sure you can get the same effect with UIMA AS and ActiveMQ.

Or Hadoop.

Thilo, could you expand on this? Exactly how do you use Hadoop to scale UIMA?

What storage do you use under Hadoop (HDFS, Hbase, Hive, etc), and what is your final storage destination for the CAS data?

Are you doing on-demand, streaming, or batch processing of documents?

What are your key/value pairs? URLs? What's your map step, what's your reduce step?

How do you partition? Do you find the system is load balanced? What level of efficiency do you get? What level of CPU utilization?

Do you do just document (UIMA) analysis in Hadoop, or also collection (multi-doc) analytics?

The fit between UIMA and Hadoop isn't obvious to me. Just trying to figure it out.

Thanks,


Greg Holmberg

Reply via email to