Input text: ------------------------------
bird, cat, bush, cat ---------------------------- Create the Annotations: ------------------------------- docText = aJCas.getDocumentText(); int index = docText.indexOf("cat"); while(index >= 0) { int begin = index; int end = begin+3; Animal animal = new Animal(aJCas); animal.setBegin(begin); animal.setEnd(end); animal.addToIndexes(); index = docText.indexOf("cat", index+1); } index = docText.indexOf("bird"); while(index >= 0) { int begin = index; int end = begin+4; Animal animal = new Animal(aJCas); animal.setBegin(begin); animal.setEnd(end); animal.addToIndexes(); index = docText.indexOf("bird", index+1); } index = docText.indexOf("bush"); while(index >= 0) { int begin = index; int end = begin+4; Vegetable animal = new Vegetable(aJCas); animal.setBegin(begin); animal.setEnd(end); animal.addToIndexes(); index = docText.indexOf("bird", index+1); } ------------------------------------------------------ Kameron Arthur Cole Watson Content Analytics Applications and Support email: kameronc...@us.ibm.com | Tel: 305-389-8512 upload logs here From: Marshall Schor <m...@schor.com> To: user@uima.apache.org Date: 11/17/2014 04:35 PM Subject: Re: can't remove duplicate Annotations with Java Set Collection Hi, Two Feature Structures are considered "equal" in the sense used by HashSet, if fs1.equals(fs2). The definition of "equals" for feature structures is: they are equal if they refer to the same underlying CAS, and the same "spot" in the the CAS Heap. How did you create the Annotations that you think are "equal" in the HashSet sense? Here's an example of two annotations which are "equal" in the UIMA sorted index sense, but unequal in the HashSet sense. Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of Annotation in myJCas, with a begin = 0, and end = 4. Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of Annotation in myJCas, with a begin = 0, and end = 4. These will be "equal" in the UIMA sense - the same kind of annotation, in the same CAS, with the same feature values, but will be two distinct feature structures, so HashSet will consider them to be unequal. Could this be what is happening in your case? Please respond so we can see if there's another straight-forward solution that does what you're looking for. -Marshall on 11/17/2014 2:59 PM, Kameron Cole wrote: > Hello, > > I am trying to get rid of duplicates in the FSIndex. I thought a very > clever way to do this would be to just push them into a Set Collection in > Java, which does not allow duplicates. This is very (very) standard Java: > > ArrayList al = new ArrayList(); > // add elements to al, including duplicates > HashSet hs = new HashSet(); > hs.addAll(al); > al.clear(); > al.addAll(hs); > > This list will contain no duplicates. > > However, I am not getting this to work in my UIMA code: > > > System.out.println("Index size is: "+idx.size()); > > AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex(); > > ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size()); > > FSIterator it = idx.iterator(); > > //load the Annotations into a temporary list. includes duplicates > > while(it.hasNext()) > { > > tempList.add((Annotation) it.next()); > > } > > Iterator tempIt = tempList.iterator(); > > // remove all Annotations from the index. this works fine > > while(tempIt.hasNext()){ > ((Annotation) tempIt.next ()).removeFromIndexes(aJCas); > } > > // push tempList into HashSet > > HashSet<Annotation> hs = new HashSet<Annotation>(); > > hs.addAll(tempList); > > // this should not allow duplicates > > System.out.println("HS length: "+hs.size()); // size should be less the > size of the FSIndex by the number of duplicates. it is not. This is the > main problem > > tempList.clear(); > > tempList.addAll(hs); > > System.out.println("templist length: "+tempList.size()); > > > Iterator<Annotation> it2 = tempList.iterator(); // this should now be the > clean list > > > while(it2.hasNext()){ > it2.next().addToIndexes(aJCas); > }