> I want to avoid duplicate values in > one multivalued field. > > i am using dataimport handler to import data, the > particular multivalued > field are being filled from xml source. now that xml has > duplicate values, > but i want to have unique valued in this multivalued field. > > e.g. xml > <data> > a1 > b1 > a1 > a1 > </data> > > i have added RemoveDuplicatesTokenFilterFactory in data type > of the field, > in index analyzer. > still it gives below o/p. > > <arr name="field"> > <str>a1</str> > <str>b1</str> > <str>a1</str> > <str>a1</str> > </arr> > > i am using solr 3.5. > > how can i avoid importing duplicate values in the field? >
RDTF removes duplicates at the same position. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory Elegant solution would be subclass the http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/update/processor/FieldValueSubsetUpdateProcessorFactory.html and create DistinctFieldValueUpdateProcessorFactory or something like that. MinFieldValueUpdateProcessorFactory can be used as an example.