That was exactly what I was looking for! I can't believe I missed that in the documentation.
Thank you so much! ~Ben On Wed, Oct 10, 2018 at 11:32 AM Zesch, Torsten <[email protected]> wrote: > Hi Ben, > > each annotator can implement collectionProcessComplete(). > > Quoting from the documentation: > "The framework calls the collectionProcessComplete() method at the end of > the collection (i.e., when all objects in the collection have been > processed). At this point in time, no CAS is passed in as a parameter. This > gives the CAS Consumer or Analysis Engine an opportunity to perform > collection processing over the entire set of objects in the collection." > > In our implementation of tf.idf, we have an annotator collect the tf score > for each document in process() and computes the idf part in > collectionProcessComplete(). > > -Torsten > > On 10.10.18, 17:21, "Benedict Holland" <[email protected]> > wrote: > > Hello all, > > I continue to have a problem that comes up a lot. I have a collection > processing engine. I want something to run after all of the processing > is > done. For example, I have a collection of texts and want to run a > tf-idf. I > generate a tf for each document and at the end, I generate an idf over > the > collection. I can't put that in an annotator as part of my > processing pipeline. > > Is there an aggregate annotator that will run after the entire > collection > is processed? > > Thanks, > ~Ben > > >
