Re: remoteAnalysisEngine services not scaling to effect

Greg Holmberg Mon, 26 Sep 2011 13:32:03 -0700

Arun--

I don't know what the cause of your specific technical issue is, but in myopinion, there's a better way to slice the problem.

What you're doing is taking each step in your analysis engine and runningit on one or more machines. The creates two problems.

One, it's a lot of network overhead. You're moving each document acrossthe network many times. You can easily spend more time just moving thedata around than actually processing. It also creates a low ceiling toscalability, since you chew up a lot of network bandwidth.

Two, in order to use your hardware efficiently, you have to get the rightratio of machines/CPUs for each step. Some steps use more cycles thanothers. For example, you might find that for a given configuration andset of documents that the ratio of CPU usage for steps A, B, and C are1:5:2. Now you need to instantiate A, B, and C services to use cores inthat ratio. Then, suppose you want to add more machines--how should youallocate them to A, B, and C? It will always be lumpy, with some coresnot being used much. But worse, with a different configuration (differentdictionaries, for example), or with different documents (longer vs.shorter, for example), the ratios will change, and you will have toreconfigure your machines again. It's never-ending, and it's nevercompletely right.

So, it would be much easier to manage and more efficient, more scalable,if you just run your analysis engine self-contained in a single process,and then replicate the engine over your machines/CPUs. You slice bydocument, not by service--send each document to a different analysisengine instance. This makes your life easier, always runs the CPUs at100%, and scales indefinitely. Just add more machines, it goes faster.

This is what I'm doing. I use JavaSpaces (producer/consumer queue), butI'm sure you can get the same effect with UIMA AS and ActiveMQ.



Greg

Re: remoteAnalysisEngine services not scaling to effect

Reply via email to