Should the old Signature code be removed? Given that the goal is to
have everyone use SolrCloud, maybe this kind of landmine should be
removed?

On Fri, Jul 27, 2012 at 8:43 AM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
> This issue doesn't really describe your problem but a more general problem of 
> distributed deduplication:
> https://issues.apache.org/jira/browse/SOLR-3473
>
>
> -----Original message-----
>> From:Daniel Brügge <daniel.brue...@googlemail.com>
>> Sent: Fri 27-Jul-2012 17:38
>> To: solr-user@lucene.apache.org
>> Subject: Deduplication in SolrCloud
>>
>> Hi,
>>
>> in my old Solr Setup I have used the deduplication feature in the update
>> chain
>> with couple of fields.
>>
>> <updateRequestProcessorChain name="dedupe">
>>  <processor class="solr.processor.SignatureUpdateProcessorFactory">
>> <bool name="enabled">true</bool>
>>  <str name="signatureField">signature</str>
>> <bool name="overwriteDupes">false</bool>
>>  <str name="fields">uuid,type,url,content_hash</str>
>> <str
>> name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
>>  </processor>
>> <processor class="solr.LogUpdateProcessorFactory" />
>>  <processor class="solr.RunUpdateProcessorFactory" />
>> </updateRequestProcessorChain>
>>
>> This worked fine. When I now use this in my 2 shards SolrCloud setup when
>> inserting 150.000 documents,
>> I am always getting an error:
>>
>> *INFO: end_commit_flush*
>> *Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log*
>> *SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError:
>> unable to create new native thread*
>> * at
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
>> *
>> * at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
>> *
>>
>> I am inserting the documents via CSV import and curl command and split them
>> also into 50k chunks.
>>
>> Without the dedupe chain, the import finishes after 40secs.
>>
>> The curl command writes to one of my shards.
>>
>>
>> Do you have an idea why this happens? Should I reduce the fields to one? I
>> have read that not using the id as
>> dedupe fields could be an issue?
>>
>>
>> I have searched for deduplication with SolrCloud and I am wondering if it
>> is already working correctly? see e.g.
>> http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html
>>
>> Thanks & regards
>>
>> Daniel
>>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to