On 17.11.2014, at 20:59, Kameron Cole <kameronc...@us.ibm.com> wrote:
> I am trying to get rid of duplicates in the FSIndex. I thought a very > clever way to do this would be to just push them into a Set Collection in > Java, which does not allow duplicates. This is very (very) standard Java: > > ArrayList al = new ArrayList(); > // add elements to al, including duplicates > HashSet hs = new HashSet(); > hs.addAll(al); > al.clear(); > al.addAll(hs); There is no universal definition of equality other than object equality. And this is what Java defaults to unless equals() and hashCode() are implemented. Since each UIMA user might have a different opinion on what is equal, UIMA defers this decision to its indexing mechanism instead of hard-baking it into equals()/hashcode() methods. I suggest you do the following: - implement a Comparator<FeatureStructure> or Comparator<AnnotationFS> according to your definition of equality - create a TreeSet based on your comparator - drop all your annotations into this TreeSet - "duplicates" according to your definition are dropped. The rest is sorted (or not) depending on what your comparator returns in a non-equality case (return value != 0). Cheers, -- Richard