Hello, I am trying to get rid of duplicates in the FSIndex. I thought a very clever way to do this would be to just push them into a Set Collection in Java, which does not allow duplicates. This is very (very) standard Java:
ArrayList al = new ArrayList(); // add elements to al, including duplicates HashSet hs = new HashSet(); hs.addAll(al); al.clear(); al.addAll(hs); This list will contain no duplicates. However, I am not getting this to work in my UIMA code: System.out.println("Index size is: "+idx.size()); AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex(); ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size()); FSIterator it = idx.iterator(); //load the Annotations into a temporary list. includes duplicates while(it.hasNext()) { tempList.add((Annotation) it.next()); } Iterator tempIt = tempList.iterator(); // remove all Annotations from the index. this works fine while(tempIt.hasNext()){ ((Annotation) tempIt.next()).removeFromIndexes(aJCas); } // push tempList into HashSet HashSet<Annotation> hs = new HashSet<Annotation>(); hs.addAll(tempList); // this should not allow duplicates System.out.println("HS length: "+hs.size()); // size should be less the size of the FSIndex by the number of duplicates. it is not. This is the main problem tempList.clear(); tempList.addAll(hs); System.out.println("templist length: "+tempList.size()); Iterator<Annotation> it2 = tempList.iterator(); // this should now be the clean list while(it2.hasNext()){ it2.next().addToIndexes(aJCas); }