[ 
https://issues.apache.org/jira/browse/SOLR-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854700#action_12854700
 ] 

Uwe Schindler commented on SOLR-1869:
-------------------------------------

bq. the RemoveDuplicatesTokenFilter seems broken as it initializes its map and 
attributes at the class level and not within its constructor

The filter is correct. That are final instance fields and the autogenerated 
ctor by javac does the same, so there is no need to move them to ctor. In 
Lucene/Solr all TokenStreams are done this way, thats our code style for 
TokenStreams.

The CharArrayMap is more performant in lookup, but you are right, we may need 
posincr. In general the Map should really be simply a CharArraySet or 
HashSet<String> and the check should use contains.

But I dont understand the rest of the patch.

> RemoveDuplicatesTokenFilter doest have expected behaviour
> ---------------------------------------------------------
>
>                 Key: SOLR-1869
>                 URL: https://issues.apache.org/jira/browse/SOLR-1869
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Joe Calderon
>            Priority: Minor
>         Attachments: SOLR-1869.patch
>
>
> the RemoveDuplicatesTokenFilter seems broken as it initializes its map and 
> attributes at the class level and not within its constructor
> in addition i would think the expected behaviour would be to remove identical 
> terms with the same offset positions, instead it looks like it removes 
> duplicates based on position increment which wont work when using it after 
> something like the edgengram filter. when i posted this to the mailing list 
> even erik hatcher seemed to think thats what this filter was supposed to do...
> attaching a patch that has the expected behaviour and initializes variables 
> in constructor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to