Hi,

I am trying to use the dedupe feature to detect and mark near duplicate content in my collections. I dont want to prevent duplicate content. I woud like to detect it and keep it for further processing. Thats why Im using an extra field and not the documents unique field.

Here is how I added it to the solrConfig.xml :

     <requestHandler name="/update" class="solr.UpdateRequestHandler">
           <lst name="defaults">
                 <str name="update.chain">fill_signature</str>
           </lst>
     </requestHandler>

<updateRequestProcessorChain name="fill_signature" processor="signature">
        <processor class="solr.RunUpdateProcessorFactory" />
     </updateRequestProcessorChain>

<updateProcessor class="solr.processor.SignatureUpdateProcessorFactory" name="signature">
         <bool name="enabled">true</bool>
         <str name="signatureField">signature</str>
         <bool name="overwriteDupes">false</bool>
         <str name="fields">content</str>
<str name="signatureClass">solr.processor.TextProfileSignature</str>
         <str name="quantRate">.2</str>
         <str name="minTokenLen">3</str>
     </updateProcessor>

When I initially add the documents to the cloud everything works as expected ..... the documents are added and the signature will be created and added.....perfect:) The problem occours when I want to update an exisiting document. In that case the update.chain=fill_signature parameter will of course be set too and I get a bad request error.

I found this solr issue: https://issues.apache.org/jira/browse/SOLR-3473

Is it that problem I am running into?
Is it somehow possible to add parameters or set a specific update Handler when Im adding documents to the cloud using solrJ? In that case I could ether set the update.chain manually and remove it from the request handler or write a second request Handler which I only use if I want set the signature field. I know I can do that manually when Im using eg curl but is it also possible with SolrJ? :)


Thanks,
Markus




Reply via email to