Indeed, you can access a shared resource in the collectionProcessComplete method. I wonder why I thought I could not.
So I was talking about shared resources and using uima-as to scale out. Thanks Richard for your answer On Thu, Nov 14, 2013 at 9:02 PM, Richard Eckart de Castilho <r...@apache.org> wrote: > On 14.11.2013, at 18:19, Nicolas Hernandez <nicolas.hernan...@gmail.com> > wrote: > >> Dear All >> >> Let say I want to count the occurrences of each word in a document >> collection and to use these counters (possibly in the same workflow). >> I am in the situation where I have a CAS per document and I want to >> scale out the workflow. > > How do you scale it out? > >> To scale out the workflow I use a resource to store the counters of >> each word. The resource is accessed in writing mode by several >> instances of an annotator which process in parallel distinct CASes. > > What kind of resource do you use? > >> Here are my questions : >> * I believe I cannot be sure that when a successive annotator in the >> same workflow will use the resource, the resource would not still be >> modified after that (by running counter annotators which are still >> processing remaining CAS). Right ? In other words, I do not have a way >> to run (to delay the run of) an annotator depending the state of a >> resource ? > > You can customize the flow by writing your own workflow controller. > But if that is supported depends on how you do your scaling. > >> * So, I may use two worflows: one to build the resource, the other one >> to use it. But how can I export/save the resource ? I cannot access >> the resource in the collectionProcessComplete method of an AE, can I ? > > I would personally use the two workflows. Why do you believe that you cannot > access the resource in collectionProcessComplete? > >> The solution I imagine was inspired of the use of the CAS multiplier >> to merge CAS. It is to use two workflows with one of them dedicated to >> build the resource. In this workflow, I define an annotator (without >> scaling out, so a cas consumer). In that annotator, I check the >> SourceDocumentInformation Feature Structure in the CAS to see if its >> lastSegment feature is set to true, in that case I can export the >> resource. I know this it not a guarantee that all CAS have been >> processed. I may also have a special counter resource in that >> annotator to count the processed cas and eventually export the desired >> resource when all CAS would have been processed. In that case, I would >> need a way to communicate to the "exporter" annotator the number of >> CAS which will be processed... This is not the main problem. >> >> After writing that, I realize that to do it in a single workflow, I >> could have written a CAS multiplier to save each CAS until all have >> been processed, then create again as many CAS as the ones saved... >> >> These solutions are very complex... >> >> Any suggestions... ? A uimaFIT trick =) ? > > Well, to do small-scale scaling using a CPE, I'd do this: > > - build an aggregate which generates the word counts > - use a custom shared resource to do the counting > - in the collectionProcessComplete call some synchronized "save" method on > the resource > - if "save" is called the second time, it does nothing > > - build an aggregate which uses the word counts > > Run both workflows, one after the other using the CpePipeline of uimaFIT. > > -- Richard > -- Dr. Nicolas Hernandez Associate Professor (Maître de Conférences) Université de Nantes - LINA CNRS UMR 6241 http://enicolashernandez.blogspot.com http://www.univ-nantes.fr/hernandez-n +33 (0)2 51 12 53 94 +33 (0)2 40 30 60 67