This issue doesn't really describe your problem but a more general problem of 
distributed deduplication:
https://issues.apache.org/jira/browse/SOLR-3473
 
 
-----Original message-----
> From:Daniel Brügge <daniel.brue...@googlemail.com>
> Sent: Fri 27-Jul-2012 17:38
> To: solr-user@lucene.apache.org
> Subject: Deduplication in SolrCloud
> 
> Hi,
> 
> in my old Solr Setup I have used the deduplication feature in the update
> chain
> with couple of fields.
> 
> <updateRequestProcessorChain name="dedupe">
>  <processor class="solr.processor.SignatureUpdateProcessorFactory">
> <bool name="enabled">true</bool>
>  <str name="signatureField">signature</str>
> <bool name="overwriteDupes">false</bool>
>  <str name="fields">uuid,type,url,content_hash</str>
> <str
> name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
>  </processor>
> <processor class="solr.LogUpdateProcessorFactory" />
>  <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> 
> This worked fine. When I now use this in my 2 shards SolrCloud setup when
> inserting 150.000 documents,
> I am always getting an error:
> 
> *INFO: end_commit_flush*
> *Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log*
> *SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError:
> unable to create new native thread*
> * at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
> *
> * at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
> *
> 
> I am inserting the documents via CSV import and curl command and split them
> also into 50k chunks.
> 
> Without the dedupe chain, the import finishes after 40secs.
> 
> The curl command writes to one of my shards.
> 
> 
> Do you have an idea why this happens? Should I reduce the fields to one? I
> have read that not using the id as
> dedupe fields could be an issue?
> 
> 
> I have searched for deduplication with SolrCloud and I am wondering if it
> is already working correctly? see e.g.
> http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html
> 
> Thanks & regards
> 
> Daniel
> 

Reply via email to