Hello UIMA Gurus, I am relatively new to UIMA, so please excuse the general nature of my question and any butchering of the terminology.
I am attempting to write an application to process transcripts of audio files. Each "raw" transcript is in its own HTML file with a section listing biographical information for the speakers on the call followed by a number of sections containing transcriptions of the discussion of different topics. I would like to be able to analyze each speaker's contributions separately by topic and then aggregate and compare these analyses between speakers and between each speaker and the full text. I was thinking that I would break the document into a new segment each time the speaker or the section of the document changes (attaching relevant speaker metadata to each section), run additional Analysis Engines on each segment (tokenizer, etc.), and then arbitrarily recombine the results of the analysis by speaker, etc. Looking through the documentation, I am considering two approaches: 1. Using a CAS Multiplier. Under this approach, I would follow the example in Chapter 7 of the documentation, divide on section and speaker demarcations, add metadata to each CAS, run additional AEs on the CASes, and then use a multiplier to recombine the many CASes for each document (one for the whole transcript, one for each section, one for each speaker, etc.). The advantage of this approach is that it seems easy to incorporate into a pipeline of AEs, since they are designed to run on each CAS. The disadvantage is that it seems unwieldy to have to keep track of all of the related CASes per document and aggregate statistics across the CASes. 2. Use CAS Views. This option is appealing because it seems like CAS Views were designed for associating many different aspects of the same document with one another. However, it looks to me that I would have to specify different views both when parsing the document into sections and when passing them through subsequent AEs, which would make it harder to drop into an existing pipeline. I may be misunderstanding how subsequent AEs work with Views, however. For those more experience with UIMA, how would you approach this problem? It's entirely possible that I am missing a third (fourth, fifth...) approach that would work better than either of those above, so any guidance would be much appreciated. Regards and thanks, Matt