Unfortunately, update processors only "see" the new, fresh, incoming data, not any existing document data.

This is a case where your best bet may be to read the document first and then merge your new value into the existing list of values.


-- Jack Krupansky
-----Original Message----- From: tuedel
Sent: Monday, July 01, 2013 9:34 AM
To: solr-user@lucene.apache.org
Subject: Distinct values in multivalued fields

Hello everybody,

i have tried to make use of the UniqFieldsUpdateProcessorFactory in
order to achieve distinct values in multivalued fields. Example below:

<updateRequestProcessorChain name="uniq_fields">
  <processor
class="org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory">
    <lst name="fields">
      <str>title</str>
      <str>tag_type</str>
    </lst>
  </processor>
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

<requestHandler name="/update" class="solr.UpdateRequestHandler">
  <lst name="defaults">
     <str name="update.chain">uniq_fields</str>
   </lst>
 </requestHandler>

However the data being is indexed one by one. This may happen, since a
document may will get an additional tag in a future update. Unfortunately in
order to ensure not having any duplicate tags, i was hoping, the
UpdateProcessorFactory is doing what i want to achieve. In order to actually
add a tag, i am sending an

"tag_type" :{"add":"foo"}, which still adds the tag, without questioning if
its already part of the field. How may i be able to achieve distinct values
on solr side?!

In order to achieve this behavior i suggest writing an own processor might
be a solution. However i am uncertain how to do and if it's the proper way.
Imagine an incoming update - e.g. an update of an existing document having
several multivalued fields without specifying "add" or "set". This task
would cause the corresponding document to get dropped and re-indexed without
keeping any previously added values within the multivalued field.
Therefore if a field is getting updated and not having the distinct value
being part of the index yet, shall add the value, otherwise ignore it. The
processor needs to define whether a field is getting added to the index or
not in condition of the existing index. Is that achievable on Solr side?!
Below my current pretty empty processor class:

public class ConditionalSolrUniqFieldValuesProcessorFactory extends
UpdateRequestProcessorFactory {

   @Override
   public UpdateRequestProcessor getInstance(SolrQueryRequest sqr,
SolrQueryResponse sqr1, UpdateRequestProcessor urp) {
       return new ConditionalUniqFieldValuesProcessor(urp);
   }

   class ConditionalUniqFieldValuesProcessor extends UpdateRequestProcessor
{

       public ConditionalUniqFieldValuesProcessor(UpdateRequestProcessor
next) {
           super(next);
       }

       @Override
       public void processAdd(AddUpdateCommand cmd) throws IOException {
           SolrInputDocument doc = cmd.getSolrInputDocument();

           Collection<String> incomingFieldNames = doc.getFieldNames();
           for (String t : incomingFieldNames) {
               /*
               is multivalued
               if (doc.getField(t).) {
                   If multivalued and already part of index, drop from
index. Otherwise add to multivalued field.
               }
               */
           }

       }
   }
}







--
View this message in context: http://lucene.472066.n3.nabble.com/Distinct-values-in-multivalued-fields-tp4074337.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to