Re: How to annotate based on document collection

buddha Fri, 06 Nov 2015 07:22:39 -0800

UIMA works best when you are investigating one document at a time.  My 
suggestion would be to run the initial pipeline to get the correct annotation, 
which I assume are tokens in your case, then save those off into some 
relational table.

From there, you can run the documents through again and load your df values as 
an external resource, then do the tf the second time.

There are ways to estimate the tf/idf values, but, frankly, the whole notion of 
“document frequency” means you’ve looked at the whole corpus at least once.

> On Nov 6, 2015, at 7:12 AM, Christopher Baechle <cbaec...@my.fau.edu> wrote:
> 
> I am working with an existing project that is built with UIMA. I am trying
> to create a tf-idf style score that looks at the set of documents as a
> whole.
> 
> Since the rest of the project uses UIMA heavily, I would like to implement
> this as an annotator if possible, rather than a separate program. Is it
> possible within UIMA to do this?

Re: How to annotate based on document collection

Reply via email to