Re: Basic UIMA questions

Richard Eckart de Castilho Thu, 14 Jan 2016 06:32:38 -0800

On 14.01.2016, at 15:09, Sean Crist <sean.cr...@humedica.com> wrote:
> 
> Hi,
> 
> I have a few questions on the basic concepts of UIMA.  It’s fine if you tell 
> me to read the manuals, but I haven’t been able to find the answers there so 
> far, so a chapter reference would be a big help.
> 
> 1)    If Annotator A creates an annotation, is it OK for Annotator B to 
> modify the information in the annotations which A created?


In general, yes. If you work with delta CAS or if you plan to modify feature 
values which are used as index keys (e.g. begin/end offsets), you should be 
careful though as it depends on the UIMA version you are using. Cf. here:

https://uima.apache.org/d/uimaj-2.7.0/references.html#ugr.ref.config.protect-index

> 2)   I’ve read that an annotation can contain a reference to another 
> annotation, but I haven’t been able to find instructions or an example.
> 
> Possibly, I could generate the annotation class using JCasGen, and then 
> manually augment the auto-generated code to support references to other 
> annotation objects.  Is that a good way to do it?  Or is there some kind of 
> built-in support?

You first define the type X you want to reference to. Then you define a type Y 
and feature on type Y of type X. That's it. Cf. 

http://stackoverflow.com/questions/34685195/uima-custom-type-with-custom-feature-type-range

JCasGen will generate the appropriate getters and setters for that feature/type.

> 3)   Suppose I want a parser to build a parse tree over tokens.  A parse tree 
> consists of a hierarchy of nodes.
> 
> I could represent each node as an annotation.  Is that the most UIMA-like 
> solution?

Sure. Typical representation of a parse tree is this:

Constituent extends Annotation {
  Constituent parent;
  Array of Constituent children;
}

Cf. e.g. the documentation of the DKPro Core type system: 

https://dkpro.github.io/dkpro-core/documentation/

Currently under the heading "DKPro Core 1.8.0-SNAPSHOT" - "Typesystem 
Reference". These types are all defined as UIMA types and the documentation is 
actually auto-generated from the UIMA XML typedescriptors in DKPro Core.

> The reason I hesitate is this.  If I were writing a non-UIMA solution from 
> scratch, I’d treat all of the nodes above the token level as abstract units, 
> and those abstract units wouldn’t deal in concrete information such as the 
> beginning and end of a character range.  I’d keep track of that only at the 
> token level.  I think that all UIMA annotations are required to keep track of 
> this information.

You can derive your types from AnnotationBase which does not have begin/end 
features if you do not wish to duplicate offset information. But it is often a 
good idea to repeat that on higher-level annotations.

> Also, it sounds the only way for an annotator to retrieve existing 
> annotations is to create an iterator and pull them out one by one.  I wish 
> there were a way to just get a reference to the root node of my parse tree, 
> so that I can simply step recursively through the tree (which assumes I’ve 
> arranged for each node to contain references to its children).

The typical approach is to give the root node a dedicated type, e.g. ROOT 
(extends Constituent) and then iterate over all ROOT annotations.

There are a number of type systems for UIMA that already define all kinds of 
annotation types for linguistc annotations:

- DKPro Core
- ClearTK
- U-Compare
- JCoRe
- ... 

I would recommend using one of them instead of inventing your own from scratch.

Cheers,

-- Richard

Disclaimer: I'm also working on DKPro Core, so sorry for all the respective 
references ;)

Re: Basic UIMA questions

Reply via email to