Re: New to cTakes and need help [EXTERNAL]

Miller, Timothy Mon, 04 Feb 2019 07:17:57 -0800

Or you can get types within UIMA using UIMAfit class called 
org.apache.uima.fit.util.JCasUtil. This class shows an example of using it to 
get cuis:


http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/util/JCasParser.java?view=markup
which references this class as well:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CuiResponse.java?view=markup

My normal workflow is to open the xmi files in the CVD to understand the 
structure of the CAS, and then you can write code using JCasUtil to extract 
pieces of information from the CAS without having to parse the xml manually. 
The above 2 classes show you one example of how that might work.

Tim


-----Original Message-----
From: gandhi rajan 
<[email protected]<mailto:gandhi%20rajan%20%[email protected]%3e>>
Reply-to: <[email protected]>
To: [email protected] 
<[email protected]<mailto:%[email protected]%22%20%[email protected]%3e>>
Subject: Re: New to cTakes and need help [EXTERNAL]
Date: Mon, 4 Feb 2019 20:40:30 +0530

Hi Sajit, I m not sure whether creating custom piper is gonna help as I believe 
default clinical pipeline has the bare minimum but necessary  stuff.

You can probably parse the XML and extract out the stuff you want.

On Monday, February 4, 2019, Sajit Kumar 
<[email protected]<mailto:[email protected]>> wrote:
Hi Gandhi,

Thanks for your reply.

Let me elaborate my task. I need to extract the CUIs for the medical concepts 
in clinical notes. I believe cTakes can be a very good tool for this.
I went through the default clinical pipeline. The pipeline is has various tasks 
in the pipe such as boundary detection, tokenization, entity recognition etc. I 
believe the output XMI document also has sections for each of these tasks. 
However, as I am only interested in the CUIs for medical concepts. I would only 
be interested in entity recognition and entity properties in my output. Is it 
possible to create a custom pipeline based on this. Or is it possible to turn 
off the output of unwanted sections. I hope you understand what I am trying to 
say. Please advise.

Also is there any documentation on the structure of cTAKES output files such as 
XMI files.

Looking forward to your response.

Regards,
Sajit

On Sun, 3 Feb, 2019, 20:48 gandhi rajan 
<[email protected]<mailto:[email protected]> wrote:
Hi Sajit, I would say default clinical pipeline is the best place to start for 
a beginner - 
https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=vgITkX94MJWHn_ZcTDGOIF1unvHBy4hsozSAQDNxVPc&e=>

Also you got to elaborate what information you are looking for when you say 
many of the information are irrelevant for you.

On Sunday, February 3, 2019, Sajit Kumar 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,

I am new to cTakes. I have heard great things about cTakes in processing 
clinical notes. I have been able to successfully install and launch cTakes 
applications. However, I have not been able to find enough documentation for 
the XMI output from these applications such as CPE etc. If anyone can guide me 
to some documentation to understand the structure of these outputs that would 
be helpful.

Additionally, I am working on a task where i am interested in extracting the 
UMLS, SNOMED medical concepts from the clinical notes. However, i see that the 
output usually has lot of information that is not relevant to my task. I tried 
my hands at creating a custom pipeline to get rid of this information. But it 
was throwing an exception. Please find below the script.

//       ***  Piper File  ***
//       Created by Sajit
//       on February 03, 2019


//  Text Files Reader
//  Reads document texts from text files specified in a provided list.
#   files  The text files to be loaded
reader 
org.apache.ctakes.core.cr<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cr&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=L-lJTun5qSIwX1uKN80rjeWGdgyJHcSmgsPW9KCrZqw&e=>.TextReader
 files=C:\apache-ctakes-4.0.0\testdata\Input\SampleInputRadiologyNotes.txt

//  UMLS Dictionary Lookup (Old)
//  Annotates clinically-relevant terms.  This is an older, slower dictionary 
lookup implementation.
add 
org.apache.ctakes.dictionary.lookup.ae<https://urldefense.proofpoint.com/v2/url?u=http-3A__lookup.ae&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=EDNf4iv0l1yNCYVQY60x7dBnic7yrCIOAJaqLKbG62k&e=>.UmlsDictionaryLookupAnnotator

//  XMI Writer
//  Writes XMI files with full representation of input text and all extracted 
information.
#   OutputDirectory  Output directory to write xmi files
add 
org.apache.ctakes.core.cc<https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cc&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=AOpVB1o8HgyIvrpNjByVtDgVmNTIqX-IPIPnQDu2Ns0&e=>.XmiWriterCasConsumerCtakes
 OutputDirectory=C:\apache-ctakes-4.0.0\testdata\output

This passes the validation but fails to execute.
Please tell me if my approach is right or wrong. And is it possible to trim the 
XMI outputs based on ones need in the cTakes tool.

Any suggestion or help is most welcome. Thanks.

Regards,
Sajit



--
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"

Re: New to cTakes and need help [EXTERNAL]

Reply via email to