RE: Processing large batches of files in cTAKES

Steve Evans Tue, 29 Jan 2019 11:43:18 -0800

No, not using the ctakes rest service. I ended up rolling our own so I could 
better tailor the web service.


I also built a generic Java API wrapping ctakes which delivers content in a 
more easily consumable structure to be used by researchers here at Duke. That 
java API is also used internally by the Tomcat web app which delivers the same 
structure via web service.

I use my own json based api so that we can have a standard response to nlp 
requests regardless of internal implementation (meaning if we swapped out 
ctakes for some other nlp engine, the json could remain the same).

I am (was) new to docker but found it really good tool for hiding 
implementation complexity and easy deployment. Plus you can spin up as many 
containers your hardware can bear for parallel processing.

I don’t want to understate the time it took to put all this together. It did 
take a lot of effort, but I am happy with the result.

Happy to share thoughts/code with anyone – beware: NO DOC!

Steve


From: [email protected] <[email protected]>
Sent: Tuesday, January 29, 2019 1:35 PM
To: [email protected]
Subject: RE: Processing large batches of files in cTAKES

Steve, are you using the ctakes-rest-service 
https://github.com/GoTeamEpsilon/ctakes-rest-service<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoTeamEpsilon_ctakes-2Drest-2Dservice&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=ecHOdnpj-qRAlXItlO7cR0A5bhd088MpYZuJiNrYwbM&m=v3JOLWPfa48Tk0nGy5frtDt5qP0TPT-LS2ggKcnEEaA&s=nYjh0fzik0j33KN8alcP8C8aXw7KAbjsTZHLW6uLyoA&e=>?

If so, do you have any pointers as to how to configure a custom dictionary 
(such as for ICD10) after installing the ctakes-rest-service. Because the 
ctakes-rest-service installation procedure involves building ctakes from 
source, the runCustomDictionary tool does not work…I have logged an issue here: 
https://github.com/GoTeamEpsilon/ctakes-rest-service/issues/56<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoTeamEpsilon_ctakes-2Drest-2Dservice_issues_56&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=ecHOdnpj-qRAlXItlO7cR0A5bhd088MpYZuJiNrYwbM&m=v3JOLWPfa48Tk0nGy5frtDt5qP0TPT-LS2ggKcnEEaA&s=33F2NRAs7pFwO6YXQBvOwvDZ_qLT59MbXiqIcmTLoNE&e=>

I was wondering if you had any pointers in this regard, or know anything about 
how to implement their suggested solution of creating custom tables etc.

From: Steve Evans <[email protected]<mailto:[email protected]>>
Sent: Tuesday, January 29, 2019 10:23 AM
To: [email protected]<mailto:[email protected]>
Subject: RE: Processing large batches of files in cTAKES

Leah,

I run my ctakes work load using docker containers.

I have built a container that serves ctakes requests via tomcat webservices. 
That’s not for the feint of heart and not for non-programmer types. But you 
might be able to install the ctakes software in a container with the 
input/output directories on the host and then run in parallel using file 
input/output.

I run 10 containers to get the thru put we need (5/second). This is on a 16 cpu 
64GB host (each container consumes about 2GB of ram)

Not a slam dunk type answer but I thought it might help gen ideas

Steve


From: Baas,Leah 
<[email protected]<mailto:[email protected]>>
Sent: Tuesday, January 29, 2019 12:59 PM
To: [email protected]<mailto:[email protected]>
Subject: Processing large batches of files in cTAKES

Hi all,

I would like to process a batch of 13,414 files (avg file size 6.2 KB) using 
the default clinical pipeline. I am new to cTAKES and computer programming, and 
I’m looking for guidance on how to process these files with maximum time/CPU 
efficiency. I am currently running my program on an Ubuntu VM with 3 CPUs. It 
takes me 28 seconds (real time) to process one 6.0 KB file. I’m reading up on 
parallel processing strategies, but would be grateful for any suggestions, 
tips, etc. that you might have!

Thanks,

Leah



-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

RE: Processing large batches of files in cTAKES

Reply via email to