[ 
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675770#action_12675770
 ] 

Shalin Shekhar Mangar commented on SOLR-799:
--------------------------------------------

bq. <field name="signatureField" type="signatureField" indexed="true" 
stored="false" signature="solr.TextProfileSignature" fields="product_name, 
model_t, *_s" />

I don't think signatureField is a separate type. It is just a string, right?

bq. The patch as committed moves the specification of one field out of 
schema.xml file to another file.

bq. That is, the design of the signature field should go in schema.xml, and 
each updateRequest section should only describe how it is used with that 
section's declared name. Also, there should be no default field, since every 
field in the schema should be described in schema.xml.

The design of the signature field goes into schema.xml right now too. The wiki 
clearly states the following about signatureField:
{code}
The name of the field used to hold the fingerprint/signature. Be sure the field 
is defined in schema.xml. 
{code}

bq. <field name="signatureField" type="signatureField" indexed="true" 
stored="false" signature="solr.TextProfileSignature" fields="product_name, 
model_t, *_s" />

I don't agree with the above. The method of computing the contents of the field 
should not be part of schema.xml. I do not understand your concern, maybe 
because I'm not very familiar with this feature.

> Add support for hash based exact/near duplicate document handling
> -----------------------------------------------------------------
>
>                 Key: SOLR-799
>                 URL: https://issues.apache.org/jira/browse/SOLR-799
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Mark Miller
>            Assignee: Yonik Seeley
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-799.patch, SOLR-799.patch, SOLR-799.patch, 
> SOLR-799.patch
>
>
> Hash based duplicate document detection is efficient and allows for blocking 
> as well as field collapsing. Lets put it into solr. 
> http://wiki.apache.org/solr/Deduplication

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to