Thank you all for the responses. For now, I am going to learn more how UIMA-AS works to determine if this will work for my use case. If not, I will check out your other suggestions.
On Tue, Nov 17, 2020 at 5:22 PM Greg Silverman <[email protected]> wrote: > FYI, I just doubled the number of backends and clients and increased the > throughput to ~1000 docs/second. Server utilization is only minimal now. > > I should note, that unlike on a Spark cluster, this is running on 2-old > servers and a VM. The nice thing about Kubernetes is that you can easily > scale up or down the number of instances using horizontal pod autoscaling. > Plus, it's a lot easier to manage than a Spark cluster. > > We just started running the cTAKES pipeline on this, so it's an experiment > in process. > > So far, the results are very decent. I'll scale it up even more in a day > or so. > > Greg-- > > > > On Tue, Nov 17, 2020 at 11:10 AM Greg Silverman <[email protected]> wrote: > >> We at the UMN NLP/IE Lab have developed NLP-ADAPT-kube to scale out >> 4-UIMA NLP annotators using Kubernetes/UIMA-AS, including cTAKES, CLAMP, >> MetaMap (using the UIMA wrapper), and our own homegrown BioMedICUS. Our >> project is here: https://github.com/nlpie/nlp-adapt-kube >> >> There are 2-versions: One for CPM, which includes QuickUMLS; and the >> other for UIMA-AS. The AS versions are under the docker folder and the >> argo-k8s folder, and use the 4-engines mentioned above. There is a project >> Wiki (but it is slightly out-of-date). We are in the process of working >> non-UIMA engines (like QuickUMLS and our new version of BioMedICUS) into >> the AS workflow (we're using AMQ for message queuing). >> >> We're currently running cTAKES using Kubernetes hpa with 6-backends and >> 2-clients across 3-compute nodes getting very decent throughput (~150 >> docs/second). We could definitely scale it up even further. >> >> For comparison how well this scales, we were running 64-MetaMap backends >> with 16-clients and getting ~40 docs/second for very large clinical >> documents (which for MetaMap is very decent). This was across 5-compute >> nodes. >> >> If you're interested, we can assist in implementation. The client does >> require some customizations based on the backend database you're using: >> https://github.com/nlpie/nlp-adapt-kube/tree/master/docker/as/client, >> but that is pretty straightforward. >> >> Best! >> >> Greg-- >> >> >> >> >> >> >> On Tue, Nov 17, 2020 at 10:47 AM John Doe <[email protected]> wrote: >> >>> Hello, >>> >>> I'm new to cTAKES and was wondering what the options are for scaling out >>> the default clinical pipeline. I'm running it on a large number of clinical >>> notes using runClinicalPipeline.bat and specifying the input directory with >>> the notes. What are the best options for doing this in a more scalable way? >>> For example, can I parallelize it with UIMA-AS? Or should I manually use >>> multiple command prompts to run the clinical pipeline on a different set of >>> clinical notes in parallel? I'm not sure if there is any build-in solution >>> or community resource which uses EMR/Spark or some other method to achieve >>> this. >>> >>> Thank you for your help. >>> >> >> >> -- >> Greg M. Silverman >> Senior Systems Developer >> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> >> Department of Surgery >> University of Minnesota >> [email protected] >> >> > > -- > Greg M. Silverman > Senior Systems Developer > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> > Department of Surgery > University of Minnesota > [email protected] > >
