big offsets efficiency, and multiple offsets

Jens Grivolla Wed, 04 Dec 2013 06:34:07 -0800

Hi, we're now starting the EUMSSI project, which deals with integratingannotation layers coming from audio, video and text analysis.

We're thinking to base it all on UIMA, having different views withseparate audio, video, transcribed text, etc. sofas. In order to alignthe different views we need to have a common offset specification thatallows us to map e.g. character offsets to the corresponding timestamps.

In order to avoid float timestamps (which would mean we can't derivefrom Annotation) I was thinking of using audio/video frames with e.g.100 or 1000 frames/second. Annotation has begin and end defined assigned 32 bit ints, leaving sufficient room for very long documents evenat 1000 fps, so I don't think we're going to run into any limits there.Is there anything that could become problematic when working withoffsets that are probably quite a bit larger than what is typicallyfound with character offsets?

Also, can I have several indexes on the same annotations in order towork with character offsets for text analysis, but then efficientlyquery for overlapping annotations from other views based on frame offsets?

Btw, if you're interested in the project we have a writeup (condensedfrom the project proposal) here:https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and therewill hopefully soon be some content on http://eumssi.eu/


Thanks,
Jens

big offsets efficiency, and multiple offsets

Reply via email to