Hi! On Tue, Apr 22, 2014 at 05:10:56PM -0400, Marshall Schor wrote: > If you plan on running your pipeline in one JVM (rather than having it scaled > out over multiple JVMs), you can consider using an external resource which > would > be a plain Java Set<String> of the unique covered text so far found. Then, in > the annotator (or annotators) that are adding new FeatureStructures > representing > the possibly duplication annotation, you can first check the shared resource > to > see if its been already annotated, and if so, skip both creating the > additional > FeatureStructure, and adding it to the indexes. > > Would that work for your use case?
That's an interesting approach, thanks for the suggestion. While I could do it this way now, I plan to scale out my setup to multiple machines in the future and this solution would become inconvenient then. For the time being, I have simply loaded all the FSes to a coveredText-addressed map and then removed duplicates. Petr "Pasky" Baudis