Hello,

I am trying to get rid of duplicates in the FSIndex.  I thought a very
clever way to do this would be to just push them into a Set Collection in
Java, which does not allow duplicates. This is very (very) standard Java:

ArrayList al = new ArrayList();
// add elements to al, including duplicates
HashSet hs = new HashSet();
hs.addAll(al);
al.clear();
al.addAll(hs);

This list will contain no duplicates.

However, I am not getting this to work in my UIMA code:


System.out.println("Index size is: "+idx.size());

AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();

ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());

        FSIterator it  = idx.iterator();

//load the Annotations into a temporary list.  includes duplicates

        while(it.hasNext())
        {

                tempList.add((Annotation) it.next());

        }

Iterator tempIt = tempList.iterator();

// remove all Annotations from the index.  this works fine

                while(tempIt.hasNext()){
                        ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
                }

// push tempList into HashSet

        HashSet<Annotation> hs = new HashSet<Annotation>();

        hs.addAll(tempList);

// this should not allow duplicates

System.out.println("HS length: "+hs.size()); // size should be less the
size of the FSIndex by the number of duplicates.  it is not. This is the
main problem

tempList.clear();

        tempList.addAll(hs);

        System.out.println("templist length: "+tempList.size());


Iterator<Annotation> it2 = tempList.iterator(); // this should now be the
clean list


                while(it2.hasNext()){
                        it2.next().addToIndexes(aJCas);
                }

Reply via email to