Awesome. Your change will work. And i will try it, thank you! But maybe you can help me to get this to work? As I posted, if I use Object as the parameter in the compare method signature, Eclipse is ok; but when I change it to Annotation, it says I must override the methods - as though something about Annotator confuses Eclipse. Here's the code I really want to work:
----------------------------------- public static ArrayList<Annotation> dedupe (AnnotationIndex<Annotation> idx2){ ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx2.size ()); FSIterator<Annotation> it2 = idx2.iterator(); while(it2.hasNext()) { tempList.add((Annotation) it2.next()); } Set set = new TreeSet(new Comparator() { @Override public int compare(Annotation o1, Annotation o2) { if(o1.getCoveredText()==o2.getCoveredText()){ return 0; } return 1; } }); set.addAll(tempList); tempList.clear(); tempList.addAll(set); System.out.println("templist length: "+tempList.size()); return tempList; ----------------------------- But look:at what Eclipse gives me: Kameron Arthur Cole Watson Content Analytics Applications and Support email: kameronc...@us.ibm.com | Tel: 305-389-8512 upload logs here From: Marshall Schor <m...@schor.com> To: user@uima.apache.org Date: 11/18/2014 11:54 AM Subject: Re: can't remove duplicate Annotations with Java Set Collection An even simpler approach: Use a HashMap, where the key is the annotation.getCoveredText() and the value is the annotation, instead of a HashSet. replace this (in your original): // push tempList into HashSet HashSet<Annotation> hs = new HashSet<Annotation>(); hs.addAll(tempList); with // push tempList into HashMap HashMap<String, Annotation> hm = new HashSet<String, Annotation>(); for (Annotation a : tempList) { hm.put(a.getCoveredText(), a); } -Marshall On 11/18/2014 9:45 AM, Marshall Schor wrote: > Eclipse pointed out a bug in my code, fix is below > On 11/18/2014 9:37 AM, Marshall Schor wrote: >> Hi Kameron, >> >> Based on this code snip, the two "cat" annotations you create are "different" >> using the HashSet definition, because they correspond to two distinct UIMA >> Annotations. You could, for instance, update one of them, and not the other; >> that it the sense in which they are distinct. In the case below, the two "cat" >> annotations would have different begin and end offsets. >> >> I'm guessing that your goal was to to have one of the two cat annotations be >> dropped. >> >> You could do that by using your hash set approach, if you defined equal to mean >> that just the covered text of the annotation was equal. >> >> Here's one way to do this: Create a "cover object" for your annotations, that >> contains a reference to the annotation and defines equals and hashcode (you have >> to define these together). The easy way to do this is using Eclipse - define a >> new class: e.g. >> >> public class MyAnnotationWithSpecialEquals { >> final public Annotation annotation; // the covered annotation >> >> public MyAnnotationWithSpecialEquals(Annotation annotation) { >> this.annotation = annotation; >> } >> } >> >> and then use Eclipse to define the equals and hashcode: go to Menu -> Source -> >> Generate hashcode() and equals() >> and have it generate one based on just "annotation". This will not (yet) be >> correct - it should add two methods like this: >> >> @Override >> public int hashCode() { >> final int prime = 31; >> int result = 1; >> result = prime * result + ((annotation == null) ? 0 : annotation.hashCode()); >> return result; >> } >> >> @Override >> public boolean equals(Object obj) { >> if (this == obj) >> return true; >> if (obj == null) >> return false; >> if (getClass() != obj.getClass()) >> return false; >> MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj; > // buggy lines >> if (annotation == null) { >> if (other.annotation != null) >> return false; > // replace above with > if (annotation == null && other.annotation != null) > return false; >> } else if (!annotation.equals(other.annotation)) >> return false; >> return true; >> } >> >> Now, to get these to be the definitions you want, which depend only on the >> covered text, modify these as follows: >> >> First, for hashCode, use only the string covered text: >> >> @Override >> public int hashCode() { >> final int prime = 31; >> int result = 1; >> result = prime * result + ((annotation == null) ? 0 : >> annotation.getCoveredText().hashCode()); >> return result; >> } >> >> and for equals: replace test for annotation being "equal" with >> annotation.getCoveredText() being "equal", >> with some additional edge case testing in case of nulls: >> >> @Override >> public boolean equals(Object obj) { >> if (this == obj) >> return true; >> if (obj == null) >> return false; >> if (getClass() != obj.getClass()) >> return false; >> MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj; >> if (annotation == null) { >> if (other.annotation != null) >> return false; >> } else { >> String coveredText = annotation.getCoveredText(); >> if (coveredText == null) { >> if (other.annotation.getCoveredText() == null) >> return true; // handle special case if covered text is null >> else return false; >> } >> // coveredText is not null >> if (!coveredText.equals(other.annotation.getCoveredText())) >> return false; >> return true; >> } >> } >> >> HTH. -Marshall >> >> >> On 11/17/2014 4:49 PM, Kameron Cole wrote: >>> Input text: >>> >>> ------------------------------ >>> >>> bird, cat, bush, cat >>> >>> ---------------------------- >>> >>> Create the Annotations: >>> >>> ------------------------------- >>> docText = aJCas.getDocumentText(); >>> >>> *int* index = docText.indexOf("cat"); >>> *while*(index >= 0) { >>> *int* begin = index; >>> *int* end = begin+3; >>> Animal animal = *new* Animal(aJCas); >>> animal.setBegin(begin); >>> animal.setEnd(end); >>> animal.addToIndexes(); >>> >>> index = docText.indexOf("cat", index+1); >>> } >>> >>> index = docText.indexOf("bird"); >>> *while*(index >= 0) { >>> *int* begin = index; >>> *int* end = begin+4; >>> Animal animal = *new* Animal(aJCas); >>> animal.setBegin(begin); >>> animal.setEnd(end); >>> animal.addToIndexes(); >>> >>> index = docText.indexOf("bird", index+1); >>> } >>> >>> index = docText.indexOf("bush"); >>> *while*(index >= 0) { >>> *int* begin = index; >>> *int* end = begin+4; >>> Vegetable animal = *new* Vegetable(aJCas); >>> animal.setBegin(begin); >>> animal.setEnd(end); >>> animal.addToIndexes(); >>> >>> index = docText.indexOf("bird", index+1); >>> } >>> ------------------------------------------------------ >>> >>> -------------------------------------------------------------------------------- >>> >>> *Kameron Arthur Cole >>> Watson Content Analytics Applications and Support >>> email: **kameronc...@us.ibm.com* <mailto:kameronc...@us.ibm.com>* | Tel: >>> 305-389-8512** >>> **upload logs here* <http://www.ecurep.ibm.com/app/upload> >>> >>> >>> >>> >>> >>> <http://www.facebook.com/ibmwatson><https://twitter.com/@ibmwatson ><http://www.youtube.com/user/IBMWatsonSolutions/videos> >>> >>> >>> -------------------------------------------------------------------------------- >>> >>> >>> >>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi, Two >>> Feature Structures are considered "equal" in the sMarshall Schor ---11/17/2014 >>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense >>> used by HashSet, if >>> >>> From: Marshall Schor <m...@schor.com> >>> To: user@uima.apache.org >>> Date: 11/17/2014 04:35 PM >>> Subject: Re: can't remove duplicate Annotations with Java Set Collection >>> >>> -------------------------------------------------------------------------------- >>> >>> >>> >>> Hi, >>> >>> Two Feature Structures are considered "equal" in the sense used by HashSet, if >>> fs1.equals(fs2). The definition of "equals" for feature structures is: they >>> are equal if they refer to the same underlying CAS, and the same "spot" in the >>> the CAS Heap. >>> >>> How did you create the Annotations that you think are "equal" in the HashSet >>> sense? >>> >>> Here's an example of two annotations which are "equal" in the UIMA sorted index >>> sense, but unequal in the HashSet sense. >>> >>> Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance of >>> Annotation in myJCas, with a begin = 0, and end = 4. >>> Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance of >>> Annotation in myJCas, with a begin = 0, and end = 4. >>> >>> These will be "equal" in the UIMA sense - the same kind of annotation, in the >>> same CAS, with the same feature values, but will be two distinct feature >>> structures, so HashSet will consider them to be unequal. >>> >>> Could this be what is happening in your case? Please respond so we can see if >>> there's another straight-forward solution that does what you're looking for. >>> >>> -Marshall >>> on 11/17/2014 2:59 PM, Kameron Cole wrote: >>>> Hello, >>>> >>>> I am trying to get rid of duplicates in the FSIndex. I thought a very >>>> clever way to do this would be to just push them into a Set Collection in >>>> Java, which does not allow duplicates. This is very (very) standard Java: >>>> >>>> ArrayList al = new ArrayList(); >>>> // add elements to al, including duplicates >>>> HashSet hs = new HashSet(); >>>> hs.addAll(al); >>>> al.clear(); >>>> al.addAll(hs); >>>> >>>> This list will contain no duplicates. >>>> >>>> However, I am not getting this to work in my UIMA code: >>>> >>>> >>>> System.out.println("Index size is: "+idx.size()); >>>> >>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex(); >>>> >>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size ()); >>>> >>>> FSIterator it = idx.iterator(); >>>> >>>> //load the Annotations into a temporary list. includes duplicates >>>> >>>> while(it.hasNext()) >>>> { >>>> >>>> tempList.add((Annotation) it.next()); >>>> >>>> } >>>> >>>> Iterator tempIt = tempList.iterator(); >>>> >>>> // remove all Annotations from the index. this works fine >>>> >>>> while(tempIt.hasNext()){ >>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas); >>>> } >>>> >>>> // push tempList into HashSet >>>> >>>> HashSet<Annotation> hs = new HashSet<Annotation>(); >>>> >>>> hs.addAll(tempList); >>>> >>>> // this should not allow duplicates >>>> >>>> System.out.println("HS length: "+hs.size()); // size should be less the >>>> size of the FSIndex by the number of duplicates. it is not. This is the >>>> main problem >>>> >>>> tempList.clear(); >>>> >>>> tempList.addAll(hs); >>>> >>>> System.out.println("templist length: "+tempList.size()); >>>> >>>> >>>> Iterator<Annotation> it2 = tempList.iterator(); // this should now be the >>>> clean list >>>> >>>> >>>> while(it2.hasNext()){ >>>> it2.next().addToIndexes(aJCas); >>>> } > >