Re: Scaling out cTAKES

Greg Silverman Tue, 17 Nov 2020 09:11:23 -0800

We at the UMN NLP/IE Lab have developed NLP-ADAPT-kube to scale out 4-UIMA
NLP annotators using Kubernetes/UIMA-AS, including cTAKES, CLAMP, MetaMap
(using the UIMA wrapper), and our own homegrown BioMedICUS. Our project is
here: https://github.com/nlpie/nlp-adapt-kube

There are 2-versions: One for CPM, which includes QuickUMLS; and the other
for UIMA-AS. The AS versions are under the docker folder and the argo-k8s
folder, and use the 4-engines mentioned above. There is a project Wiki (but
it is slightly out-of-date). We are in the process of working non-UIMA
engines (like QuickUMLS and our new version of BioMedICUS) into the AS
workflow (we're using AMQ for message queuing).

We're currently running cTAKES using Kubernetes hpa with 6-backends and
2-clients across 3-compute nodes getting very decent throughput (~150
docs/second). We could definitely scale it up even further.

For comparison how well this scales, we were running 64-MetaMap backends
with 16-clients and getting  ~40 docs/second for very large clinical
documents (which for MetaMap is very decent). This was across 5-compute
nodes.

If you're interested, we can assist in implementation. The client does
require some customizations based on the backend database you're using:
https://github.com/nlpie/nlp-adapt-kube/tree/master/docker/as/client, but
that is pretty straightforward.

Best!

Greg--

On Tue, Nov 17, 2020 at 10:47 AM John Doe <[email protected]> wrote:

> Hello,
>
> I'm new to cTAKES and was wondering what the options are for scaling out
> the default clinical pipeline. I'm running it on a large number of clinical
> notes using runClinicalPipeline.bat and specifying the input directory with
> the notes. What are the best options for doing this in a more scalable way?
> For example, can I parallelize it with UIMA-AS? Or should I manually use
> multiple command prompts to run the clinical pipeline on a different set of
> clinical notes in parallel? I'm not sure if there is any build-in solution
> or community resource which uses EMR/Spark or some other method to achieve
> this.
>
> Thank you for your help.
>

-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
[email protected]

Re: Scaling out cTAKES

Reply via email to