Have a look at the DedupUpdateProcessorFactory, which may help you. Although, I'm not sure if it works with multivalued fields.
Upayavira On Mon, Jul 1, 2013, at 02:34 PM, tuedel wrote: > Hello everybody, > > i have tried to make use of the UniqFieldsUpdateProcessorFactory in > order to achieve distinct values in multivalued fields. Example below: > > <updateRequestProcessorChain name="uniq_fields"> > <processor > class="org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory"> > <lst name="fields"> > <str>title</str> > <str>tag_type</str> > </lst> > </processor> > <processor class="solr.RunUpdateProcessorFactory" /> > </updateRequestProcessorChain> > > <requestHandler name="/update" class="solr.UpdateRequestHandler"> > <lst name="defaults"> > <str name="update.chain">uniq_fields</str> > </lst> > </requestHandler> > > However the data being is indexed one by one. This may happen, since a > document may will get an additional tag in a future update. Unfortunately > in > order to ensure not having any duplicate tags, i was hoping, the > UpdateProcessorFactory is doing what i want to achieve. In order to > actually > add a tag, i am sending an > > "tag_type" :{"add":"foo"}, which still adds the tag, without questioning > if > its already part of the field. How may i be able to achieve distinct > values > on solr side?! > > In order to achieve this behavior i suggest writing an own processor > might > be a solution. However i am uncertain how to do and if it's the proper > way. > Imagine an incoming update - e.g. an update of an existing document > having > several multivalued fields without specifying "add" or "set". This task > would cause the corresponding document to get dropped and re-indexed > without > keeping any previously added values within the multivalued field. > Therefore if a field is getting updated and not having the distinct value > being part of the index yet, shall add the value, otherwise ignore it. > The > processor needs to define whether a field is getting added to the index > or > not in condition of the existing index. Is that achievable on Solr side?! > Below my current pretty empty processor class: > > public class ConditionalSolrUniqFieldValuesProcessorFactory extends > UpdateRequestProcessorFactory { > > @Override > public UpdateRequestProcessor getInstance(SolrQueryRequest sqr, > SolrQueryResponse sqr1, UpdateRequestProcessor urp) { > return new ConditionalUniqFieldValuesProcessor(urp); > } > > class ConditionalUniqFieldValuesProcessor extends > UpdateRequestProcessor > { > > public ConditionalUniqFieldValuesProcessor(UpdateRequestProcessor > next) { > super(next); > } > > @Override > public void processAdd(AddUpdateCommand cmd) throws IOException { > SolrInputDocument doc = cmd.getSolrInputDocument(); > > Collection<String> incomingFieldNames = doc.getFieldNames(); > for (String t : incomingFieldNames) { > /* > is multivalued > if (doc.getField(t).) { > If multivalued and already part of index, drop from > index. Otherwise add to multivalued field. > } > */ > } > > } > } > } > > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Distinct-values-in-multivalued-fields-tp4074337.html > Sent from the Solr - User mailing list archive at Nabble.com.