UIMA works best when you are investigating one document at a time.  My 
suggestion would be to run the initial pipeline to get the correct annotation, 
which I assume are tokens in your case, then save those off into some 
relational table.

From there, you can run the documents through again and load your df values as 
an external resource, then do the tf the second time.

There are ways to estimate the tf/idf values, but, frankly, the whole notion of 
“document frequency” means you’ve looked at the whole corpus at least once.

> On Nov 6, 2015, at 7:12 AM, Christopher Baechle <cbaec...@my.fau.edu> wrote:
> 
> I am working with an existing project that is built with UIMA. I am trying
> to create a tf-idf style score that looks at the set of documents as a
> whole.
> 
> Since the rest of the project uses UIMA heavily, I would like to implement
> this as an annotator if possible, rather than a separate program. Is it
> possible within UIMA to do this?

Reply via email to