Re: Scaling out cTAKES

John Doe Wed, 18 Nov 2020 06:23:51 -0800

Thank you all for the responses. For now, I am going to learn more how
UIMA-AS works to determine if this will work for my use case. If not, I
will check out your other suggestions.


On Tue, Nov 17, 2020 at 5:22 PM Greg Silverman <[email protected]> wrote:

> FYI, I just doubled the number of backends and clients and increased the
> throughput to ~1000 docs/second. Server utilization is only minimal now.
>
> I should note, that unlike on a Spark cluster, this is running on 2-old
> servers and a VM. The nice thing about Kubernetes is that you can easily
> scale up or down the number of instances using horizontal pod autoscaling.
> Plus, it's a lot easier to manage than a Spark cluster.
>
> We just started running the cTAKES pipeline on this, so it's an experiment
> in process.
>
> So far, the results are very decent. I'll scale it up even more in a day
> or so.
>
> Greg--
>
>
>
> On Tue, Nov 17, 2020 at 11:10 AM Greg Silverman <[email protected]> wrote:
>
>> We at the UMN NLP/IE Lab have developed NLP-ADAPT-kube to scale out
>> 4-UIMA NLP annotators using Kubernetes/UIMA-AS, including cTAKES, CLAMP,
>> MetaMap (using the UIMA wrapper), and our own homegrown BioMedICUS. Our
>> project is here: https://github.com/nlpie/nlp-adapt-kube
>>
>> There are 2-versions: One for CPM, which includes QuickUMLS; and the
>> other for UIMA-AS. The AS versions are under the docker folder and the
>> argo-k8s folder, and use the 4-engines mentioned above. There is a project
>> Wiki (but it is slightly out-of-date). We are in the process of working
>> non-UIMA engines (like QuickUMLS and our new version of BioMedICUS) into
>> the AS workflow (we're using AMQ for message queuing).
>>
>> We're currently running cTAKES using Kubernetes hpa with 6-backends and
>> 2-clients across 3-compute nodes getting very decent throughput (~150
>> docs/second). We could definitely scale it up even further.
>>
>> For comparison how well this scales, we were running 64-MetaMap backends
>> with 16-clients and getting  ~40 docs/second for very large clinical
>> documents (which for MetaMap is very decent). This was across 5-compute
>> nodes.
>>
>> If you're interested, we can assist in implementation. The client does
>> require some customizations based on the backend database you're using:
>> https://github.com/nlpie/nlp-adapt-kube/tree/master/docker/as/client,
>> but that is pretty straightforward.
>>
>> Best!
>>
>> Greg--
>>
>>
>>
>>
>>
>>
>> On Tue, Nov 17, 2020 at 10:47 AM John Doe <[email protected]> wrote:
>>
>>> Hello,
>>>
>>> I'm new to cTAKES and was wondering what the options are for scaling out
>>> the default clinical pipeline. I'm running it on a large number of clinical
>>> notes using runClinicalPipeline.bat and specifying the input directory with
>>> the notes. What are the best options for doing this in a more scalable way?
>>> For example, can I parallelize it with UIMA-AS? Or should I manually use
>>> multiple command prompts to run the clinical pipeline on a different set of
>>> clinical notes in parallel? I'm not sure if there is any build-in solution
>>> or community resource which uses EMR/Spark or some other method to achieve
>>> this.
>>>
>>> Thank you for your help.
>>>
>>
>>
>> --
>> Greg M. Silverman
>> Senior Systems Developer
>> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
>> Department of Surgery
>> University of Minnesota
>> [email protected]
>>
>>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> [email protected]
>
>

Re: Scaling out cTAKES

Reply via email to