Re: big offsets efficiency, and multiple offsets

Richard Eckart de Castilho Wed, 04 Dec 2013 06:36:17 -0800

Why is it bad if you cannot inherit from Annotation? The getCoveredText() will 
not work anyway if you are working with audio/video data.


-- Richard

On 04.12.2013, at 12:31, Jens Grivolla <j+...@grivolla.net> wrote:

> Hi, we're now starting the EUMSSI project, which deals with integrating 
> annotation layers coming from audio, video and text analysis.
> 
> We're thinking to base it all on UIMA, having different views with separate 
> audio, video, transcribed text, etc. sofas.  In order to align the different 
> views we need to have a common offset specification that allows us to map 
> e.g. character offsets to the corresponding timestamps.
> 
> In order to avoid float timestamps (which would mean we can't derive from 
> Annotation) I was thinking of using audio/video frames with e.g. 100 or 1000 
> frames/second.  Annotation has begin and end defined as signed 32 bit ints, 
> leaving sufficient room for very long documents even at 1000 fps, so I don't 
> think we're going to run into any limits there.  Is there anything that could 
> become problematic when working with offsets that are probably quite a bit 
> larger than what is typically found with character offsets?
> 
> Also, can I have several indexes on the same annotations in order to work 
> with character offsets for text analysis, but then efficiently query for 
> overlapping annotations from other views based on frame offsets?
> 
> Btw, if you're interested in the project we have a writeup (condensed from 
> the project proposal) here: 
> https://dl.dropboxusercontent.com/u/4169273/UIMA_EUMSSI.pdf and there will 
> hopefully soon be some content on http://eumssi.eu/
> 
> Thanks,
> Jens

Re: big offsets efficiency, and multiple offsets

Reply via email to