> I want to avoid duplicate values in
> one multivalued field.
> 
> i am using dataimport handler to import data,  the
> particular multivalued
> field are being filled from xml source. now that xml has
> duplicate values,
> but i want to have unique valued in this multivalued field.
> 
> e.g. xml
> <data>
>      a1 
>      b1 
>      a1 
>      a1 
> </data>
> 
> i have added RemoveDuplicatesTokenFilterFactory in data type
> of the field,
> in index analyzer.
> still it gives below o/p.
> 
> <arr name="field">
>   <str>a1</str>
>   <str>b1</str>
>   <str>a1</str>
>   <str>a1</str>
> </arr>
> 
> i am using solr 3.5.
> 
> how can i avoid importing duplicate values in the field?
> 

RDTF removes duplicates at the same position. 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory

Elegant solution would be subclass the 
http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/update/processor/FieldValueSubsetUpdateProcessorFactory.html

and create DistinctFieldValueUpdateProcessorFactory or something like that. 
MinFieldValueUpdateProcessorFactory can be used as an example.

Reply via email to