On 17.11.2014, at 20:59, Kameron Cole <kameronc...@us.ibm.com> wrote:

> I am trying to get rid of duplicates in the FSIndex.  I thought a very
> clever way to do this would be to just push them into a Set Collection in
> Java, which does not allow duplicates. This is very (very) standard Java:
> 
> ArrayList al = new ArrayList();
> // add elements to al, including duplicates
> HashSet hs = new HashSet();
> hs.addAll(al);
> al.clear();
> al.addAll(hs);

There is no universal definition of equality other than object equality. And 
this is what Java defaults to unless equals() and hashCode() are implemented.
Since each UIMA user might have a different opinion on what is equal, UIMA 
defers this decision to its indexing mechanism instead of hard-baking it into 
equals()/hashcode() methods.

I suggest you do the following:

- implement a Comparator<FeatureStructure> or Comparator<AnnotationFS> 
according to your definition of equality

- create a TreeSet based on your comparator

- drop all your annotations into this TreeSet

- "duplicates" according to your definition are dropped. The rest is sorted (or 
not) depending on what your comparator returns in a non-equality case (return 
value != 0). 

Cheers,

-- Richard

Reply via email to