On 14.01.2016, at 15:09, Sean Crist <sean.cr...@humedica.com> wrote: > > Hi, > > I have a few questions on the basic concepts of UIMA. It’s fine if you tell > me to read the manuals, but I haven’t been able to find the answers there so > far, so a chapter reference would be a big help. > > 1) If Annotator A creates an annotation, is it OK for Annotator B to > modify the information in the annotations which A created?
In general, yes. If you work with delta CAS or if you plan to modify feature values which are used as index keys (e.g. begin/end offsets), you should be careful though as it depends on the UIMA version you are using. Cf. here: https://uima.apache.org/d/uimaj-2.7.0/references.html#ugr.ref.config.protect-index > 2) I’ve read that an annotation can contain a reference to another > annotation, but I haven’t been able to find instructions or an example. > > Possibly, I could generate the annotation class using JCasGen, and then > manually augment the auto-generated code to support references to other > annotation objects. Is that a good way to do it? Or is there some kind of > built-in support? You first define the type X you want to reference to. Then you define a type Y and feature on type Y of type X. That's it. Cf. http://stackoverflow.com/questions/34685195/uima-custom-type-with-custom-feature-type-range JCasGen will generate the appropriate getters and setters for that feature/type. > 3) Suppose I want a parser to build a parse tree over tokens. A parse tree > consists of a hierarchy of nodes. > > I could represent each node as an annotation. Is that the most UIMA-like > solution? Sure. Typical representation of a parse tree is this: Constituent extends Annotation { Constituent parent; Array of Constituent children; } Cf. e.g. the documentation of the DKPro Core type system: https://dkpro.github.io/dkpro-core/documentation/ Currently under the heading "DKPro Core 1.8.0-SNAPSHOT" - "Typesystem Reference". These types are all defined as UIMA types and the documentation is actually auto-generated from the UIMA XML typedescriptors in DKPro Core. > The reason I hesitate is this. If I were writing a non-UIMA solution from > scratch, I’d treat all of the nodes above the token level as abstract units, > and those abstract units wouldn’t deal in concrete information such as the > beginning and end of a character range. I’d keep track of that only at the > token level. I think that all UIMA annotations are required to keep track of > this information. You can derive your types from AnnotationBase which does not have begin/end features if you do not wish to duplicate offset information. But it is often a good idea to repeat that on higher-level annotations. > Also, it sounds the only way for an annotator to retrieve existing > annotations is to create an iterator and pull them out one by one. I wish > there were a way to just get a reference to the root node of my parse tree, > so that I can simply step recursively through the tree (which assumes I’ve > arranged for each node to contain references to its children). The typical approach is to give the root node a dedicated type, e.g. ROOT (extends Constituent) and then iterate over all ROOT annotations. There are a number of type systems for UIMA that already define all kinds of annotation types for linguistc annotations: - DKPro Core - ClearTK - U-Compare - JCoRe - ... I would recommend using one of them instead of inventing your own from scratch. Cheers, -- Richard Disclaimer: I'm also working on DKPro Core, so sorry for all the respective references ;)