On Tue, 27 Sep 2011 01:06:02 -0700, Thilo Götz <twgo...@gmx.de> wrote:
On 26/09/11 22:31, Greg Holmberg wrote:
This is what I'm doing. I use JavaSpaces (producer/consumer queue),
but I'm
sure you can get the same effect with UIMA AS and ActiveMQ.
Or Hadoop.
Thilo, could you expand on this? Exactly how do you use Hadoop to scale
UIMA?
What storage do you use under Hadoop (HDFS, Hbase, Hive, etc), and what is
your final storage destination for the CAS data?
Are you doing on-demand, streaming, or batch processing of documents?
What are your key/value pairs? URLs? What's your map step, what's your
reduce step?
How do you partition? Do you find the system is load balanced? What
level of efficiency do you get? What level of CPU utilization?
Do you do just document (UIMA) analysis in Hadoop, or also collection
(multi-doc) analytics?
The fit between UIMA and Hadoop isn't obvious to me. Just trying to
figure it out.
Thanks,
Greg Holmberg