I've extracted data from XMI using python. It gets a little ugly, but it
gives you an appreciation for the structure.



On Mon, Feb 4, 2019 at 9:17 AM Miller, Timothy <
[email protected]> wrote:

> Or you can get types within UIMA using UIMAfit class called
> org.apache.uima.fit.util.JCasUtil. This class shows an example of using it
> to get cuis:
>
>
> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/util/JCasParser.java?view=markup
> which references this class as well:
>
> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CuiResponse.java?view=markup
>
> My normal workflow is to open the xmi files in the CVD to understand the
> structure of the CAS, and then you can write code using JCasUtil to extract
> pieces of information from the CAS without having to parse the xml
> manually. The above 2 classes show you one example of how that might work.
>
> Tim
>
>
> -----Original Message-----
> *From*: gandhi rajan <[email protected]
> <gandhi%20rajan%20%[email protected]%3e>>
> Reply-to: <[email protected]>
> *To*: [email protected] <[email protected]
> <%[email protected]%22%20%[email protected]%3e>>
> *Subject*: Re: New to cTakes and need help [EXTERNAL]
> *Date*: Mon, 4 Feb 2019 20:40:30 +0530
>
> Hi Sajit, I m not sure whether creating custom piper is gonna help as I
> believe default clinical pipeline has the bare minimum but necessary  stuff.
>
> You can probably parse the XML and extract out the stuff you want.
>
> On Monday, February 4, 2019, Sajit Kumar <[email protected]> wrote:
>
> Hi Gandhi,
>
> Thanks for your reply.
>
> Let me elaborate my task. I need to extract the CUIs for the medical
> concepts in clinical notes. I believe cTakes can be a very good tool for
> this.
> I went through the default clinical pipeline. The pipeline is has various
> tasks in the pipe such as boundary detection, tokenization, entity
> recognition etc. I believe the output XMI document also has sections for
> each of these tasks. However, as I am only interested in the CUIs for
> medical concepts. I would only be interested in entity recognition and
> entity properties in my output. Is it possible to create a custom pipeline
> based on this. Or is it possible to turn off the output of unwanted
> sections. I hope you understand what I am trying to say. Please advise.
>
> Also is there any documentation on the structure of cTAKES output files
> such as XMI files.
>
> Looking forward to your response.
>
> Regards,
> Sajit
>
> On Sun, 3 Feb, 2019, 20:48 gandhi rajan <[email protected] wrote:
>
> Hi Sajit, I would say default clinical pipeline is the best place to start
> for a beginner -
> https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=vgITkX94MJWHn_ZcTDGOIF1unvHBy4hsozSAQDNxVPc&e=>
>
> Also you got to elaborate what information you are looking for when you
> say many of the information are irrelevant for you.
>
> On Sunday, February 3, 2019, Sajit Kumar <[email protected]> wrote:
>
> Hi All,
>
> I am new to cTakes. I have heard great things about cTakes in processing
> clinical notes. I have been able to successfully install and launch cTakes
> applications. However, I have not been able to find enough documentation
> for the XMI output from these applications such as CPE etc. If anyone can
> guide me to some documentation to understand the structure of these outputs
> that would be helpful.
>
> Additionally, I am working on a task where i am interested in extracting
> the UMLS, SNOMED medical concepts from the clinical notes. However, i see
> that the output usually has lot of information that is not relevant to my
> task. I tried my hands at creating a custom pipeline to get rid of this
> information. But it was throwing an exception. Please find below the
> script.
>
> //       ***  Piper File  ***
> //       Created by Sajit
> //       on February 03, 2019
>
>
> //  Text Files Reader
> //  Reads document texts from text files specified in a provided list.
> #   files  The text files to be loaded
> reader org.apache.ctakes.core.cr
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cr&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=L-lJTun5qSIwX1uKN80rjeWGdgyJHcSmgsPW9KCrZqw&e=>.TextReader
> files=C:\apache-ctakes-4.0.0\testdata\Input\SampleInputRadiologyNotes.txt
>
> //  UMLS Dictionary Lookup (Old)
> //  Annotates clinically-relevant terms.  This is an older, slower
> dictionary lookup implementation.
> add org.apache.ctakes.dictionary.lookup.ae
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lookup.ae&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=EDNf4iv0l1yNCYVQY60x7dBnic7yrCIOAJaqLKbG62k&e=>
> .UmlsDictionaryLookupAnnotator
>
> //  XMI Writer
> //  Writes XMI files with full representation of input text and all
> extracted information.
> #   OutputDirectory  Output directory to write xmi files
> add org.apache.ctakes.core.cc
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cc&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=AOpVB1o8HgyIvrpNjByVtDgVmNTIqX-IPIPnQDu2Ns0&e=>.XmiWriterCasConsumerCtakes
> OutputDirectory=C:\apache-ctakes-4.0.0\testdata\output
>
> This passes the validation but fails to execute.
> Please tell me if my approach is right or wrong. And is it possible to
> trim the XMI outputs based on ones need in the cTakes tool.
>
> Any suggestion or help is most welcome. Thanks.
>
> Regards,
> Sajit
>
>
>
> --
> Regards,
> Gandhi
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>
>
>
>
>
>

-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Cardiovascular Informatics <http://www.med.umn.edu/cardiology/>
University of Minnesota
[email protected]

 ›  evaluate-it.org  ‹

Reply via email to