[ 
https://issues.apache.org/jira/browse/SOLR-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751692#action_12751692
 ] 

Igor Motov commented on SOLR-1401:
----------------------------------

It might be helpful to expand this to other non-trivial analyzers as well. Even 
if an analyzer produces a single token, removal of duplicates and distributed 
search don't function properly for any ids that were modified by the analyzer. 
To see how it works, just change type of id field to tightText and add a record 
with id "ID" twice. The tightText analyzer produces a single token for this 
value, and yet the record appears twice in the result list. At the same time, 
in distributed search (even with a single shard), these records completely 
disappear from the result list.  

This problem combined with recommendation for using textTight for SKUs in the 
schema.xml causes problems for some novice users. Frequently, SKU is a natural 
id and changing type for id from "string" to "textTight" is one of the first 
schema modifications that some users do, and then it takes them days to figure 
out the problem:

http://www.nabble.com/uniqueKey-gives-duplicate-values-td15341288.html
http://www.nabble.com/Adding-new-docs%2C-but-duplicating-instead-of-updating-td25241444.html
http://www.nabble.com/Solr-Shard---Strange-results-td23561201.html
http://www.nabble.com/Shard-Query-Problem-td22110121.html


> solr should error on document add/update if uniqueKey field has multiple 
> tokens.
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-1401
>                 URL: https://issues.apache.org/jira/browse/SOLR-1401
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Hoss Man
>
> over the years, have seem more then a few solr-user posts noticing odd 
> behavior when using a uniqueKey field configured to use TextField with a non 
> trivial analyzer ... we shouldn't error on TextField (KeyworkdTokenizer is 
> perfectly legitimate) but we should error if that analyzer produces multiple 
> tokens.  
> Likewise we should verify that good error messages if uniqueKey field is 
> configured such that multivalued=true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to