I've extracted data from XMI using python. It gets a little ugly, but it gives you an appreciation for the structure.
On Mon, Feb 4, 2019 at 9:17 AM Miller, Timothy < [email protected]> wrote: > Or you can get types within UIMA using UIMAfit class called > org.apache.uima.fit.util.JCasUtil. This class shows an example of using it > to get cuis: > > > http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/util/JCasParser.java?view=markup > which references this class as well: > > http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CuiResponse.java?view=markup > > My normal workflow is to open the xmi files in the CVD to understand the > structure of the CAS, and then you can write code using JCasUtil to extract > pieces of information from the CAS without having to parse the xml > manually. The above 2 classes show you one example of how that might work. > > Tim > > > -----Original Message----- > *From*: gandhi rajan <[email protected] > <gandhi%20rajan%20%[email protected]%3e>> > Reply-to: <[email protected]> > *To*: [email protected] <[email protected] > <%[email protected]%22%20%[email protected]%3e>> > *Subject*: Re: New to cTakes and need help [EXTERNAL] > *Date*: Mon, 4 Feb 2019 20:40:30 +0530 > > Hi Sajit, I m not sure whether creating custom piper is gonna help as I > believe default clinical pipeline has the bare minimum but necessary stuff. > > You can probably parse the XML and extract out the stuff you want. > > On Monday, February 4, 2019, Sajit Kumar <[email protected]> wrote: > > Hi Gandhi, > > Thanks for your reply. > > Let me elaborate my task. I need to extract the CUIs for the medical > concepts in clinical notes. I believe cTakes can be a very good tool for > this. > I went through the default clinical pipeline. The pipeline is has various > tasks in the pipe such as boundary detection, tokenization, entity > recognition etc. I believe the output XMI document also has sections for > each of these tasks. However, as I am only interested in the CUIs for > medical concepts. I would only be interested in entity recognition and > entity properties in my output. Is it possible to create a custom pipeline > based on this. Or is it possible to turn off the output of unwanted > sections. I hope you understand what I am trying to say. Please advise. > > Also is there any documentation on the structure of cTAKES output files > such as XMI files. > > Looking forward to your response. > > Regards, > Sajit > > On Sun, 3 Feb, 2019, 20:48 gandhi rajan <[email protected] wrote: > > Hi Sajit, I would say default clinical pipeline is the best place to start > for a beginner - > https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline > <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_Default-2BClinical-2BPipeline&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=vgITkX94MJWHn_ZcTDGOIF1unvHBy4hsozSAQDNxVPc&e=> > > Also you got to elaborate what information you are looking for when you > say many of the information are irrelevant for you. > > On Sunday, February 3, 2019, Sajit Kumar <[email protected]> wrote: > > Hi All, > > I am new to cTakes. I have heard great things about cTakes in processing > clinical notes. I have been able to successfully install and launch cTakes > applications. However, I have not been able to find enough documentation > for the XMI output from these applications such as CPE etc. If anyone can > guide me to some documentation to understand the structure of these outputs > that would be helpful. > > Additionally, I am working on a task where i am interested in extracting > the UMLS, SNOMED medical concepts from the clinical notes. However, i see > that the output usually has lot of information that is not relevant to my > task. I tried my hands at creating a custom pipeline to get rid of this > information. But it was throwing an exception. Please find below the > script. > > // *** Piper File *** > // Created by Sajit > // on February 03, 2019 > > > // Text Files Reader > // Reads document texts from text files specified in a provided list. > # files The text files to be loaded > reader org.apache.ctakes.core.cr > <https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cr&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=L-lJTun5qSIwX1uKN80rjeWGdgyJHcSmgsPW9KCrZqw&e=>.TextReader > files=C:\apache-ctakes-4.0.0\testdata\Input\SampleInputRadiologyNotes.txt > > // UMLS Dictionary Lookup (Old) > // Annotates clinically-relevant terms. This is an older, slower > dictionary lookup implementation. > add org.apache.ctakes.dictionary.lookup.ae > <https://urldefense.proofpoint.com/v2/url?u=http-3A__lookup.ae&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=EDNf4iv0l1yNCYVQY60x7dBnic7yrCIOAJaqLKbG62k&e=> > .UmlsDictionaryLookupAnnotator > > // XMI Writer > // Writes XMI files with full representation of input text and all > extracted information. > # OutputDirectory Output directory to write xmi files > add org.apache.ctakes.core.cc > <https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.ctakes.core.cc&d=DwMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=ysAxxHm1TU8AhZe18e19G1np5RXR9H2kgXRrY9jLHMI&s=AOpVB1o8HgyIvrpNjByVtDgVmNTIqX-IPIPnQDu2Ns0&e=>.XmiWriterCasConsumerCtakes > OutputDirectory=C:\apache-ctakes-4.0.0\testdata\output > > This passes the validation but fails to execute. > Please tell me if my approach is right or wrong. And is it possible to > trim the XMI outputs based on ones need in the cTakes tool. > > Any suggestion or help is most welcome. Thanks. > > Regards, > Sajit > > > > -- > Regards, > Gandhi > > "The best way to find urself is to lose urself in the service of others > !!!" > > > > > > -- Greg M. Silverman Senior Systems Developer NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> Cardiovascular Informatics <http://www.med.umn.edu/cardiology/> University of Minnesota [email protected] › evaluate-it.org ‹
