Hi, Yes, we are using that format. I have a parser that I wrote, but it isn't integrated into UIMA. It runs separately and loads the full clinical trial data into a triplestore (Stardog). I would be interested in your system since I am not really familiar with how to write file readers in the UMIA framework. Perhaps I can merge my parser into it and end up with just the right thing. If you can make it available, I would definitely be interested. I will take a look at the other links as well. Thanks!!
Bonnie MacKellar On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler <erik.faess...@uni-jena.de> wrote: > Dear Bonnie, > > are you talking about the clinical trial XML format used by > ClinicalTrials. <http://clinicaltrials.org/>gov by any chance? > If so, I did create a UIMA reader for these data. Its not perfect but > perhaps enough for your purposes and also you might want to enhance it. > Please let me know if you would be interested in that, I did not get > around to make it publicly available yet but could do so quickly. > > To answer the general question to the best of my knowledge: > There is no such thing as a general XML reader built-in into the UIMA > framework. For all non-trivial formats, a specific reader is necessary. > This also holds true with regard to the employed type system. > That being said, there are UIMA readers that try to serve as a general XML > reading facility, e.g. the “XML Reader” from our lab (JULIELab, > https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader < > https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader>). > However, in my experience XML inputs come in a lot of different forms > which might often not be suitable to a generic approach which is why I > wrote quite a few UIMA readers for specific XML formats in the past. > > Hope that helps, > > Erik > > > On 20. Feb 2019, at 01:13, Bonnie MacKellar <bkmackel...@gmail.com> > wrote: > > > > This is probably a very naive question, but I can't seem to find anything > > about this. I currently have a lot of XML files (clinical trial > > descriptions). My current workflow is to run a preprocessor that parses > the > > XML and generates text files in a simple format. I then run these files > in > > a UIMA pipeline, using FileCollectionReader to load the text files, RUTA > to > > parse the simple format, the Metamap annotator to do some UMLS > annotations, > > and finally I have a writer that generates RDF triples from the UMIA > > annotations and loads the triples into a database. This has worked but is > > clunky, especially the preprocessing. I feel like there has to be a > better > > way. Is there any support for reading XML files or do I need to write my > > own CollectionReader? Are there any other tools within UIMA for handling > > XML text? > > > > thanks, > > Bonnie MacKellar > >