Re: Processing large batches of files in cTAKES [EXTERNAL]

Baas,Leah Tue, 29 Jan 2019 11:34:22 -0800

Hi again Tim,

I am trying to check which version of the dictionary I am using when running 
the Default Clinical Pipeline. I have been running the pipeline according to 
the instructions detailed 
here<https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline>.
 However, I haven’t been able to find documentation specifying which dictionary 
version is built into this pipeline. There must be a simple way to check—I am 
just ignorant. Could you enlighten me?

Thanks,

Leah

From: "Baas,Leah" <[email protected]>
Date: Tuesday, January 29, 2019 at 12:23 PM
To: "[email protected]" <[email protected]>
Subject: Re: Processing large batches of files in cTAKES [EXTERNAL]

Tim,

Thanks for your quick response! Probably unsurprisingly, I’ll have to do some 
googling to learn how to check those things. If you could point me in the right 
direction, that’d be great!

Thanks again,

Leah

From: "Miller, Timothy" <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Tuesday, January 29, 2019 at 12:14 PM
To: "[email protected]" <[email protected]>
Subject: Re: Processing large batches of files in cTAKES [EXTERNAL]

I am able to process that number of files in a reasonable amount of time (maybe 
an hour) on an average desktop. Luckily, debugging your setup should be much 
easier than doing a scaleout. A few possibilities:

* You are running the old (slow) dictionary instead of the new fast one
* Your document has extremely long sentences
* Your VM is _extremely_ resource constrained and is thrashing constantly

Do you know how to check these things?
Tim

-----Original Message-----
From: "Baas,Leah" 
<[email protected]<mailto:%22Baas,leah%22%20%[email protected]%3e>>
Reply-to: <[email protected]>
To: [email protected] 
<[email protected]<mailto:%[email protected]%22%20%[email protected]%3e>>
Subject: Processing large batches of files in cTAKES [EXTERNAL]
Date: Tue, 29 Jan 2019 17:58:48 +0000

Hi all,

I would like to process a batch of 13,414 files (avg file size 6.2 KB) using 
the default clinical pipeline. I am new to cTAKES and computer programming, and 
I’m looking for guidance on how to process these files with maximum time/CPU 
efficiency. I am currently running my program on an Ubuntu VM with 3 CPUs. It 
takes me 28 seconds (real time) to process one 6.0 KB file. I’m reading up on 
parallel processing strategies, but would be grateful for any suggestions, 
tips, etc. that you might have!

Thanks,

Leah

-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

Re: Processing large batches of files in cTAKES [EXTERNAL]

Reply via email to